Guest Blog: “In Defence of Sentiment Analysis: The Wrong Kind of Snow” by Diana Maynard
In defence of sentiment analysis: the wrong kind of snow
On the one hand, sentiment analysis is one of the hottest topics in text analytics right now. The rise of social media in today’s society provides a gold mine for companies who want to know what their customers think about them, their products, and those of their rivals. Politics is another winner: find out not only what people think about your party and your policies, but how particular events might affect these opinions. And as for the stock market: find ways to predict performance based on what people are talking about on Twitter and how happy or sad they are, and you’re laughing all the way to the bank (as exemplified in Robert Harris’ novel “The Fear Index” , where “the artificial intelligence based on human emotion” is not as fictional as you might expect, and which, incidentally, is a cracking read).
On the other hand, sentiment analysis has come under a lot of stick recently, and it’s not hard to see why. There are dozens of tools and services which offer to analyse your data for you, but a quick demo of sites such as Sentiment140  soon shows its failings (many are based on matching sentiment-containing words, with a few rules or some machine learning thrown on top, but most can’t handle the subtleties of English, and many don’t consider the target of the opinion correctly: for example, being sad about a famous person’s death does not indicate a negative sentiment towards the actual person).
Sentiment analysis isn’t just about deciding whether a sentence, tweet, or review is positive or negative. It can also cover things like the more fine-grained emotion detection, which is often not just more useful, but also more successful. Knowing people are concerned about the safety of your aeroplane, for example, is far more interesting than just knowing they’re unhappy. There’s a growing body of work in this area: mood has proved more useful than sentiment for things like stock market prediction (fluctuations are driven mainly by fear rather than by happiness or sadness).
The wrong kind of snow. People are wary of sentiment analysis tools because while they may claim to have accuracy around 80-90%, in practice, performance is often much lower when applied to one’s own data. As with most NLP tools, sentiment analysis systems need to be tailored to the domain and/or application, and this doesn’t just mean retraining on a new set of data. Sentiment lexicons are often domain-specific: “dry” could be positive when describing a waterproof jacket, but negative when describing a cake. Expecting tools to perform as well in different conditions is like expecting British trains to run on time during heavy snow: they’re simply not cut out for it, and even when snow is predicted, it may not be the right type.
There are also a bunch of other factors to take into account.
First, if you’re studying market pulse, as opposed to customer feedback, aggregate sentiment is often sufficient. The fact that 20% of your data might be wrongly labelled doesn’t actually matter as long as the general trends are captured. In particular, opinion dynamics can show what events lead to such changes in opinion, which can be more interesting than knowing whether Margaret from Scotland liked the pink one better than the blue one, or whether John from Southport was being sarcastic about gnomes. According to Brendan O’Connor from CMU in his work on predicting US parliamentary election outcome from tweets , if you’re only trying to find aggregates then even basic techniques can be sufficient, even though they’re far from perfect: “Although the error rate can be high, with a fairly large number of measurements, these errors will cancel out relative to the quantity we are interested in, estimating aggregate public opinion”.
Second, there are many other things to consider apart from whether your system is 100% accurate. Humans can typically only perform sentiment analysis accurately on the same texts around 80% of the time. Add to that the advantage of speed (humans could never process the sheer volume of data that machines can, and they’d soon go mad if you gave them such a task). Seth Grimes, a leading expert in text analytics, believes that “sentiment and other human-language analysis technologies, when carefully applied, can deliver super-human accuracy” . Admittedly, the technology needs to be improved to reach this level, but he’s not the only one who believes it’s fully accomplishable. And the best sentiment analysis tool for your needs may not always be the one with the highest accuracy: business impact is typically more critical, and may not have a direct correlation with accuracy.
Third, most text analytics tasks are almost impossible for a machine to perform with 100% accuracy. Google Translate isn’t 100% accurate either, but that doesn’t mean it’s not useful. Named entity recognition is at the core of almost any text mining task: information extraction, summarisation, event detection, and so on, and yet that typically only achieves 90% accuracy at best, and often significantly lower on forms of social media such as tweets.
And on the subject of sarcasm This is often cited as one of the main reasons why sentiment analysis will never be any good. Even humans struggle to detect sarcasm and irony, so what hope do computers possibly have? How on earth do we know if “I really love Fridays at work” is sarcastic or not? It turns out, however, that sarcasm isn’t actually that common in social media, and rarely enough to actually tip the balance when you’re looking at aggregate data. It’s also used much more frequently in some situations and by some types of people than others: frequently in politics and general chat; rarely in book and film reviews or technology discussions. It also turns out that, precisely because humans are so bad at identifying sarcasm in written text, people who use it in social media tend to give some clues, such as hashtags or smileys (#sarcasm, #notreally, #onlyjoking etc.). And there are actually some techniques which can be used to detect sarcasm, as the French company Spotter, whose state of the art sarcasm detection tool is reputed to achieve 80% accuracy .
In summary, it’s true that there are lots of linguistic and social quirks that fool sentiment analysis tools. But that doesn’t mean we shouldn’t use them, or that we shouldn’t be researching ways to improve them. What is critical, however, is to use the right tool for the right job. Caveat emptor.
About the Author:
Dr Diana Maynard is a Research Fellow in the Dept of Natural Language Processng at the University of Sheffield, UK, where she is the lead computational linguist on the GATE team. She has a PhD in automatic term recognition from Manchester Metropolitan University, and has been involved in NLP research since 1994. Her main interests are in information extraction, opinion mining and sentiment analysis, social media, terminology and semantic web technologes. She has worked on GATE since 2000, leading the development of its multilingual Information Extraction tools, and has worked on a number of UK and EU projects. She is currently developing tools in GATE for social media analysis, including opinion mining and sarcasm detection.