If you interact with customers directly, it's usually crystal clear when they are unhappy with your products or service.
Their faces turn purple? Check. They are unhappy.
They throw rotten eggs at your shop? Then, they're probably even more unhappy, and you might wish you were running a 100% online business instead.
Joking aside—
For many businesses, measuring customer emotions and sentiment isn't very straightforward, especially at scale. Occasionally, you can use a point-based system like CSAT. But satisfaction isn’t really that much of a sentiment. These methods and customer satisfaction metrics fail to capture the nuanced emotions and underlying reasons behind customer satisfaction (or dissatisfaction).
This is where AI-powered sentiment analysis becomes indispensable. Rather than having someone read through endless text, businesses can harness AI to transform data such as emails and reviews into actionable insights.
In this guide you’ll learn:
Finally, you’ll find out more about the benefits of using modern AI solutions over traditional lexicon-based frameworks and polarity scores.
Let’s start with the key concepts and methods used in sentiment analysis.
Sentiment analysis, sometimes referred to as opinion mining, is the process of detecting subjective attitudes, opinions, and feelings in text data using natural language processing (NLP), machine learning, and AI. It goes beyond just determining if a piece of writing is positive, negative or neutral. Advanced sentiment analysis can identify precise emotions like anger, sarcasm, confidence or frustration.
NLP sentiment analysis shouldn’t be confused with AI facial expression recognition. While both are used to detect emotions, they analyze different types of data and employ different technologies. Sentiment analysis uses text and foundation models while facial recognition uses visual data, such as videos, and computer vision models capable of landmark detection.
In the context of customer data, sentiment analysis allows companies to automatically analyze customer reviews, social media mentions, support conversations and other subjective feedback at scale. This provides quantified insights into how customers authentically perceive products, services and experiences.
Many companies are now using AI to monitor customer sentiment in real-time across various channels, allowing for immediate response and engagement. It's like having a bunch of digital mood trackers on standby. This technology is being integrated into chatbots and virtual assistants to enhance their ability to handle customer inquiries with empathy and effectiveness.
The rise of cloud computing and SaaS platforms has made advanced AI technologies available to businesses of all sizes without the need for extensive in-house infrastructure. You can copy and paste text fragments into your sentiment analysis tool or import data from CSV files or PDFs and export the results. Additionally, advanced solutions enable setting up automated workflows using APIs, webhooks, or platforms like Zapier to keep the data flowing smoothly. This allows you to perform real-time analysis and feed text to your AI models.
Alternatively, you can use GenAI models or Python packages for natural language processing to run your own analysis locally or in environments like Google Colab. This approach allows for more control and customization of your sentiment analysis tasks but may be harder to implement and potentially costly if you want to use the newest iterations of LLMs (Large Language Models) like Claude or ChatGPT. We’ll discuss different methods in more detail in our AI sentiment analysis tutorial below.
The core process for AI-based sentiment analysis involves feeding sample text data into natural language processing models that can understand the subjective meaning and context behind the words. The sentiment is usually classified as positive, negative, or neutral.
Now, AI can mean many different things and it is worth noting that there is a huge difference between “AI” meaning using complex algorithms and the AI proper, powered by deep learning networks. For example, here is the RoBERTa-base sentiment detection model trained on over 120M tweets in action:
This model uses transformer neural networks that are very good at analyzing text data. It can classify sentences and assign sentiment scores.
On the surface level, the results look very similar to scores generated by rule-based and lexicon-based solutions. However, there is a huge difference regarding what happens under the hood of different solutions.
We will discuss the meaning of different sentiment scores further in the guide, but, for now, let’s take a look at some outputs of some popular sentiment analysis tools.
Notice how I love the new design of your website! Analyzed with TextBlob has a lower positive sentiment score than The product is okay, but could be better.
TextBlob calculates the overall sentiment polarity by averaging the polarity scores of individual words, like verbs and adjectives, which carry sentiment information. The overall sentence polarity is to some extent also influenced by the context or word combinations.
In the lexicon that powers TextBlob sentiment analysis, words have two additional parameters: confidence and intensity. These scores can modify the strength of the sentiment, reflecting how strongly a word expresses an emotion.
While sounding quite complex and advanced, this method is still very limited. Observe how in the sequence “could be better,” the last word has been misinterpreted as an instance of positive sentiment, not as dissatisfaction.
The evolution of sentiment analysis techniques has seen significant advancements, from traditional methods such as NLTK, TextBlob, and VADER to more sophisticated approaches using AI and transformer-based models like ChatGPT-4.
Here is an example of setting up two instances of ChatGPT that extract information from online reviews:
If you want to replicate this setup, you can start here.
In the table below you can find an overview of popular sentiment analysis techniques.
Now, let's take a closer look at each of the methods.
Some of the frameworks presented below may be less accurate than using LLMs. On the other hand, they are established sentiment analysis techniques widely used both in business and scientific research.
NLTK (Natural Language Toolkit) is one of the oldest and most widely used libraries in the field of NLP. It provides tools for various text processing tasks, including tokenization, tagging, parsing, and, importantly, sentiment analysis. TextBlob, built on top of NLTK and the Pattern library, offers a simple API for common NLP operations, making sentiment analysis accessible with minimal code.
Here is an example python script that analyzes a sentence with TextBlob:
As we mentioned before, TextBlob uses a lexicon-based approach for sentiment analysis, which involves assigning predefined sentiment scores to words. This method is straightforward and effective for many applications but can struggle with the nuances of human language, such as sarcasm or context-specific meanings.
VADER (Valence Aware Dictionary and sEntiment Reasoner) is another popular tool for sentiment analysis, particularly suited for social media texts. It is lexicon and rule-based, designed to handle the informal and often noisy nature of social media content. VADER computes a compound score that combines the valence of individual words into a single sentiment score. It also provides detailed scores for positive, negative, and neutral sentiment components.
VADER excels in scenarios where texts are short and contain slang, abbreviations, and emoticons. Its simplicity and efficiency make it a valuable tool for real-time sentiment analysis.
Naive Bayes is a probabilistic classifier based on Bayes' theorem, often used for text classification tasks, including sentiment analysis. It assumes that the features (words in the case of text classification) are independent of each other given the class label. Despite this 'naive' assumption, Naive Bayes classifiers perform surprisingly well in many NLP tasks.
For sentiment analysis, Naive Bayes requires a labeled dataset to train on, where texts are annotated with their respective sentiment labels (positive, negative, neutral). Once trained, the classifier can predict the sentiment of new, unseen texts. This approach allows for flexibility and customization based on the specific dataset but requires manual feature extraction and significant preprocessing.
It is worth noting that dataset labeling is more relevant to computer vision. For example, DICOM labeling can help train better AI radiology models that detect diseases and lesions. If you want to improve the performance of your AI sentiment classifier tool, it may be better to use prompt engineering techniques or RLFH.
The advent of transformer-based models has revolutionized the field of NLP, including sentiment analysis. Models like BERT (Bidirectional Encoder Representations from Transformers), GPT-3, and the latest ChatGPT-4 from OpenAI represent a significant leap in understanding human language and generating human-like responses.
ChatGPT-4 and similar models are pre-trained on vast amounts of text data and fine-tuned for specific tasks, including sentiment analysis. These large language models, known as LLMs, leverage self-attention mechanisms to capture contextual relationships between words in a sentence, allowing them to understand nuances, handle long-range dependencies, and recognize context-specific meanings.
Consider the example below:
The prompt for our AI tool was:
Right off the bat the numeric score was much better than other tools:
However, the real magic happens when we explicitly tell GPT to explain the reasoning behind the score and then to adjust it. In our example the revised sentiment polarity score seems even more accurate. This technique can significantly improve the quality of our analysis and detect all nuances.
Additionally, if we want to get more structured responses that are going to be easier to parse, we can limit our LLM outputs to specific multi-select options. Here is an example:
Now the results for these three images are based on a fixed list.
As you can see, with AI we can get much better results and still make the outputs controllable and easy to format.
Advantages of using LLMs for sentiment recognition:
Now, the only downside is that the initial polarity “score” provided by LLMs, while more accurate, is not very rigid. It is not calculated in the same way as it is with rule-based sentiment detection tools. It is more of a numeric approximation that could be subjective for each request. This means that In some cases it is going to be slightly higher or lower, while with traditional lexicon-based polarity scores we get predictable and repeatable results. Very often the traditional sentiment scores are completely wrong, but they are always wrong in exactly the same way. So, keep reading to find out more about different metrics for sentiment analysis.
Two popular metrics used in sentiment analysis are the polarity score and the compound sentiment score.
Sentiment polarity score
The polarity score measures the positivity or negativity of a text, indicating the sentiment. A high positive score means positive sentiment, while a negative score indicates negative sentiment. The subjectivity score, on the other hand, measures how subjective or objective a text is. A high subjectivity score suggests the text is based on personal opinions and feelings, while a low score indicates it is more objective and factual.
In the context of TextBlob, the polarity score ranges from -1 to 1, where -1 indicates a very negative sentiment, 1 indicates a very positive sentiment, and 0 is neutral. The subjectivity score ranges from 0 to 1.
Compound sentiment score
The compound sentiment score is a more comprehensive metric, often associated with the VADER sentiment analysis tool. This score also ranges from -1 to 1 but is calculated differently.
It considers the sentiment of individual words, the intensity of these sentiments, and the context in which they appear.
VADER produces four key metrics:
Visualization of sentiment analysis results is crucial for easy interpretation and actionable insights. Here are some common methods to visualize sentiment polarity scores:
Bar charts are effective for comparing the frequency of positive, negative, and neutral sentiments across different data points. For instance, you can visualize the number of positive, negative, and neutral reviews over time.
Word clouds highlight the most frequently occurring words in a dataset, with the size of each word representing its frequency. Color coding these words based on their sentiment can provide a quick visual overview of prevalent sentiments.
Pie charts or stacked bar charts can be used to show the distribution of sentiments within a dataset. This provides a clear picture of the overall sentiment landscape.
Plotting sentiment scores over time can reveal trends and patterns. This is particularly useful for tracking changes in customer sentiment following a new product launch or a significant event.
To integrate sentiment analysis into your business processes, follow these steps:
AI sentiment analysis has evolved from traditional lexicon and rule-based methods to advanced transformer models. While tools like NLTK, TextBlob, and VADER provide accessible and efficient solutions for many sentiment analysis tasks, modern models like ChatGPT-4 offer superior performance by leveraging deep contextual understanding and nuanced language processing. Choosing the right method depends on the specific requirements, available resources, and desired accuracy of the sentiment analysis task at hand.
If you want to prepare a report or set up a sentiment analysis workflow powered by AI, V7 Go is a great app to start. You can use it to orchestrate LLMs and extract critical insights from text, CSVs, or even screenshots.
Also, find more about sentiment analysis in AI and other application of LLMs in business here: