Data science is an incredibly flexible discipline, capable of providing insights into basically any subject that can be described. One of the most prominent subjects that data scientists look at is human language. They use language analysis to investigate a variety of different subjects and to answer a variety of different questions and use a variety of methodologies to do so. This article looks at one particular type of language analysis, known as sentiment analysis.
What is sentiment analysis?
Sentiment analysis provides data scientists with the ability to measure the sentiment contained within a particular text or group of texts (a “corpus”). By classifying the sentiment associated with the words used in a text, data scientists can create an empirical measure of the feelings and attitudes that the text contains. There are multiple different ways to code the sentiment found within a text, but all involve classifying individual words in terms of the tone or feeling that they convey.
What sort of texts does sentiment analysis measure?
In most human communication, the full meaning of a word is understood as a combination of both its common definition and the sentiment or feeling that the word conveys. Because of this, sentiment analysis can be used to analyse most types of human communication. Data scientists must only ensure that they pay attention to context, as the sentiment associated with a word can change depending on where it is used; for instance, “death” is a negative term in everyday contexts, but a neutral descriptor in clinical medicine.
The insights that sentiment analysis can provide vary depending on what’s being analysed. If applied to the collected song lyrics of a musician, a sentiment analysis can provide information about how that musician’s artistic mood changed during their career. If applied to the script of a Game of Thrones episode, a sentiment analysis can provide information about how the episode is structured dramatically. And if applied to social media—a very common use—a sentiment analysis can show how the public is reacting to a given event, or how it feels about a certain public figure, such as a politician.
This latter example—the use of sentiment analysis as a means of analysing the public attitude towards a given person, product, brand, organisation, or event—is the most common use of sentiment analysis as a discrete professional service. When used for this purpose, sentiment analysis can provide efficient access to massive amounts of opinion data without the financial and labour expense associated with opinion polling or the size limitations of a focus group.
How does sentiment analysis work?
The general formula for conducting a sentiment analysis is relatively simple, although the actual process for finding useful insights is more complex in practice.
To begin their analysis, a data scientist will use something called a “sentiment dictionary” to code all the words in the text that convey an identifiable sentiment. There is no standardised sentiment dictionary—of the three commonly used, one simply describes words as “positive” or “negative”, one uses a sliding scale from -5 to +5, and one describes words in terms of the specific emotion they convey (e.g., “disgust”, “joy”). A data scientist determines which dictionary to use based on its suitability for the type of analysis they want to perform.
After the initial coding process, the data scientist will analyse the resulting sentiment data through the use of one or more statistical models. The statistical modelling process is the more complex part of the sentiment analysis process, as it requires the data scientist to draw upon their expertise in order to make a subjective judgement about which statistical model will work best for their needs.
There is no single “correct” statistical model to use when conducting data analysis. As the eminent British statistician George Box once said, “all models are wrong, but some are useful”. The quality that makes a data scientist’s expertise valuable is their ability to assess which model will be the most useful for a given analysis. This typically requires exploring multiple potential models in order to determine which one produces the most useful information about the subject being analysed.
An example of sentiment analysis: sentiment trend for Game of Thrones Episode 1 of Season 7
The use of a statistical model to generate useful insights from a sentiment analysis can be seen in the graphic presented above, which represents an analysis of a Game of Thrones script. This analysis coded each word of spoken dialogue as either “positive” or “negative”, with neutral words omitted from the analysis. Without a statistical model applied, this graph would just be two lines of dots—the negative and positive words—providing very little information about the nature of the script. The choice of model is what makes the analysis useful.
In this case, the data has been modelled using a LOESS regression algorithm. This algorithm was not chosen because it’s inherently the “right” choice, but because it produces an easy-to-understand representation of how the sentiment presented in the script rises and falls throughout the episode, making it a useful way to visualise the episode’s dramatic structure.
Where is it used?
Sentiment analysis is commonly used by firms that want to understand customer/public attitudes towards their organisation or something associated with it—when used in this context, sentiment analysis is often referred to as “opinion mining”. This type of analysis can include both moment-in-time snapshots of reactions to a specific event, as well as studies of how sentiment related to a specific subject may change over time.
For example, if a company releases a new product, they can use sentiment analysis to gain a high-level view of the size and nature of the public’s response. After the initial sentiment analysis is completed, a data scientist can easily add new data in order to see how sentiment shifts over time. This capacity to review public feeling makes sentiment analysis an important strategic tool for both businesses that rely on a satisfied customer base and for politicians that rely on public support.
Sentiment analysis is also important for machine learning programs that handle natural language. Machine learning models typically have trouble interpreting the emotional complexity of human language; however, by leveraging sentiment analysis, data scientists can train machines to understand the role that sentiment plays in creating meaning within a given text.
Who uses it?
Along with other language analysis tools, sentiment analysis provides a valuable tool for finding common causes of complaint as well as positive feedback related to a product, person, or service. As many companies lack robust data science capabilities, these services are often purchased from third parties. Business intelligence firms that provide sentiment analysis services include SAS, and CISION, as well as IBM, which uses sentiment analysis to help power the natural language understanding (NLU) capabilities of its Watson AI platform.
As previously mentioned, audience sentiment analysis also lends itself well to political strategy. For instance, during the 2012 U.S. election, Barack Obama’s campaign utilised sentiment analysis to gauge the response to various policies and messages that the campaign was pushing. Combined with demographic analysis (e.g., age, gender, region), this use of sentiment analysis allows political campaigns to measure how their messaging is being received by specific segments of a population.
The use of sentiment analysis is also relevant to fields that don’t need customer/audience analysis services, as even highly standardised professions—such as medicine—tend to rely on a significant amount of sentiment-laden communication. For example, in clinical medicine, sentiment analysis can be used to provide insights into conditions, such as chronic pain, which rely heavily on patients’ subjective descriptions of their symptoms.
Sentiment analysis: a tool for the 21st century
One of the defining features of the 21st century is the remarkable acceleration of the rate at which humans produce new data, and language data is no exception. As ever-more content is generated through both online and traditional media channels, sentiment analysis provides the means to understand how populations are being impacted by the events occurring within their local and online environments.
A career in data science offers the opportunity to explore these impacts in-depth, and the University of New South Wales’ data science programs provide an ideal path towards making that career a reality. UNSW’s flexible online programs include masters, graduate diploma, and graduate certificate, and include a focus on the statistical and machine learning methodologies that are necessary to make sentiment analysis a truly valuable skill. Also, UNSW is globally recognised as a top school for mathematics, statistics, and computer science—the core subjects that data science is built on—so you can be confident that your education is top-tier.
Organisations of all stripes and purpose rely on data science to help them understand why people view them the way they do. If you aspire to a career that allows you to uncover new insights into how are people thinking and feeling about the world around them, UNSW data science programs provide the skills necessary to make your aspirations into a reality. Find out more about studying a Master of Data Science.