Data analytics has the potential to be the industry’s game-changer by providing useful insights and necessary to understand future business opportunities. Magesh Rajaram, AVP, analytics, Indium Software, discusses the potential of text analytics, a subset of data analytics, in different sectors and ways to improve its efficient use in a conversation with Ayushee Sharma.
Q. What are the main sectors where data analytics is useful?
A. Executive suite officers from all sectors demand rationale and forecast for actions they have to take. Sectors like BFSI, retail, and telecom were some of the early adopters of analytics as they had structured data to begin with. In healthcare, artificial intelligence (AI) has been used for long in research for genome sequencing and recently in tumor detection. In logistics and supply chain management, data science problems related to travelling salesmen, resource allocation, and process scheduling are known. Of late, e-commerce has stepped on gas using analytics in merchandising to acquisition to cross-selling to delivery to forecasting.
Q. Is it a good idea to convert unstructured data to structured?
A. Unstructured data is difficult to analyse compared to structured data, but that is where the treasure lies. Nuggets of insights lie embedded, which are left unearthed. The difficulty lies in the form of multiple formats like text bytes, pages of documents, text with tables, images of text and tables, and others. Machines can only process patterned datasets; to do this, these have to be converted to structured. To structure unstructured data, weeding out the noise from the signal is one of the problems. Stemming, lemmatisation, and stop words’ removal are some techniques that can be used.
Q. What is the importance of technologies like NLP and DL in text analytics?
A. Text analytics involves converting text to vectors and then processing to form insights by different methods, including synonymising, identifying named entity recognition (NER), and summarising. Many prebuilt methods are built using open source in natural language processing (NLP) areas as the interpretation is highly subjective in text analytics.
While NLP helps in these areas, neural networks are used to generate outputs in classification scenarios. These include label recognition, image processing, identifying text portions, and others. In many areas like finding sentiment and extracting text from images, both NLP and deep learning (DL) are used.
Q. How does one ensure the quality of text-analytics software?
A. There are different cases of text analytics, where there is labelled data. We have a direct predicted output versus original output comparison for testing.
In cases of unsupervised models, say topic modelling, manual intervention is required to validate the topics. There are cases like sentimental analysis where there is no right labelled data.
There we establish agreeable benchmark scores like 75 per cent accuracy and try to achieve adjacent scores. In cases of predicting trends, we should have a training data with annotated trends. It is manual, time-consuming, and subjective. But this is the way to test.
Q. How can the accuracy of training models be improved?
A. All the supervised algorithms already have knowledge of output to input comparison. These models can improve accuracy by performing multiple runs of trial and error.
In neural networks, the accuracy improvement is taken care of by a process called backward propagation, which passes the error between predicted output and original output. Say, for an image, the predicted probability of a bike is 90 per cent when it is actually a bicycle. This information is propagated backward so that the probability of the bike tends to spiral towards zero per cent for that image.
Q. How has Covid-19 impacted the industry?
A. Covid-19 has brought changes in the way we communicate and search for automation as manual resources are scarce. Some use cases that have become popular are—to replace telecallers with bots for customer support, classify tickets and allocate to the right departments, and identify the text or object in images like e-KYC to avoid physical contact with customer walk-ins.
Q. What is the scope of data analytics software in India?
A. Two billion dollars is the value of the Indian data analytics market. It is expected to grow to nearly 16 billion dollars by 2025. Based on the stage of evolution, every company can be an analytics consumer. In the Indian business landscape, some of them are at the early stages of dashboarding, some are at using investigative descriptive analysis, some are at advanced stages of performing predictive analytics, and some are at supreme levels of making each action by data science.