June 23, 2021
The State of Enterprise NLP
Natural Language Processing (NLP) has become increasingly popular over the past few years. Businesses across a variety of industries have started to adopt it for many different applications. As it grows in popularity, it is important to understand its advantages, challenges, applications, and more. Recently, John Snow Labs issued the 2020 State of NLP Survey in order to analyze the use of NLP across industries. The survey was conducted worldwide and questioned nearly 600 organizations from more than 50 countries.
Below, I will discuss key trends found in the survey in order to gain a more comprehensive understanding of the state of enterprise NLP.
2020 State of NLP Highlights
During 2020, businesses were forced to reevaluate their IT budgets and priorities due to the lasting effects of the COVID-19 pandemic. Business managers were forced to think long and hard about what percentage of their overall corporate budget should be allocated to IT as well as other top budget priorities amidst the uncertainty caused by the global pandemic. In fact, according to a survey conducted by TechRepublic, the uncertainty surrounding COVID-19 played a role in many respondents’ IT budgetary plans. For example, in 2019, 9 percent of survey respondents reported that they did not feel confident in their IT budgets, while 20 percent reported they were not confident last year. Thus, a majority of businesses opted to decrease IT spending which caused worldwide IT spending to drop from 3.81 trillion US dollars in 2019 to 3.75 trillion US dollars in 2020.
Despite the overall downturn in IT spending due to the pandemic, the 2020 State of NLP Survey revealed that spending on NLP remarkably did not decrease. As a matter of fact, it did the exact opposite. Budgets from a variety of industries reported increased spending in NLP. In fact, respondents indicated that NLP spending was significantly increasing at a consistent rate. So much so, that NLP budgets increased by 10-30 percent from the previous year. For example, 53 percent of respondents reported their NLP budget was at least 10 percent higher compared to 2019, while 31 percent stated their budget was at least 30 percent higher than the previous year.
Furthermore, the same trend applies to large companies (firms with 5,000 or more employees). For example, 39 percent of respondents from large companies reported their NLP budget was at least 10 percent higher compared to 2019, while 21 percent stated their NLP budget was at least 30 percent higher compared to 2019. It is important to note the significance of these increases given that the 2020 State of NLP survey was conducted during the peak of the COVID-19 pandemic when IT spending was down and businesses were focused solely on critical technologies.
The 2020 State of NLP Survey revealed that users value accuracy. In fact, more than 40 percent of all respondents stated accuracy was the main criteria they use when evaluating NLP libraries. In addition, 25 percent of respondents cited accuracy as the most important factor they use when evaluating NLP cloud services. Accuracy refers to the effectiveness of pre-trained models. These models allow users to input text to receive common outputs such as tokens, lemmas, part-of-speech (POS), and entity recognition.
In addition to its importance, accuracy also proves to be a challenge. In fact, survey respondents cited accuracy as the main challenge faced when dealing with NLP libraries. This is especially true for organizations who are in the early stages of NLP adoption. For example, 38 percent of organizations who are exploring NLP reported accuracy as their most frequent challenge while 24 percent of organizations using NLP reported accuracy as their most frequent challenge. Thus, it is safe to say NLP accuracy improves with use.
NLP Cloud Services
NLP cloud services are becoming increasingly popular especially among organizations in the NLP exploration stage. As a matter of fact, 77 percent of all respondents and 65 percent of NLP users stated that they use at least one of these four NLP cloud services - Google, AWS, Azure, and IBM. Google Cloud was by far the most used service, garnering 41 percent of all respondents and 49 percent of organizations who are in the early stages of adopting NLP.
Despite their growing popularity, NLP cloud services do pose some challenges for organizations. In fact, organizations exploring and using NLP cloud services reported facing similar challenges. In addition to accuracy and customization, respondents cited cost as the main challenge in using NLP cloud services.
The four most common applications of NLP are document classification, named entity recognition (NER), sentiment analysis, and knowledge graphs. With that said, document classification and NER are by far the most popular applications among all survey respondents. In fact, 63 percent of organizations using NLP indicated using NLP for document classification while 61 percent indicated using NLP for NER. In addition, 43 percent of organizations exploring NLP indicated using NLP for document classification while 27 percent indicated using NLP for NER.
In addition to the top four applications, respondents from the healthcare industry cited de-identification, the processing of removing personal information, as an additional common NLP use case. This indicates that what was once an extremely tedious and time consuming process has been made far less of a burden due to automated NLP.
The most common source of data for NLP systems reported by respondents was data from files such as e.g., pdf, txt, docx and databases which accounted for 51 percent of all responses. It is important to note that input documents are often stored as PDFs. Thus, legal contracts, news articles, reports, filings, and medical records would all fall under the files category.
Interestingly, organizations in the early stages of NLP reported using audio (speech-to-text) more frequently than organizations using NLP. For example, 29 percent of respondents in the exploration phase of NLP cited using audio data while only 22 percent of respondents further along the NLP curve cited using audio data.
Over half of respondents reported using one of the top two NLP libraries. Spark NLP was the most commonly used library, garnering a third of the responses, while a little over a quarter of all survey respondents reported using spaCy, and a little less than a quarter reported using Allen NLP. These results varied slightly by industry with the most popular NLP library being Spark NLP in healthcare, spaCy in technology, and NLTK in financial.
A lot has changed since the beginning of COVID-19. As businesses learn to adapt, new trends will emerge. Although NLP does pose a few challenges, its applications and advantages far outweigh them. Based on NLP’s growth over the past year, it is evident that the technology will continue to grow in popularity throughout 2021. As this year comes to an end and the world begins to emerge from the pandemic, it will be interesting to see how NLP has evolved and impacted businesses across industries.