Linguistics can sharpen your text analytics

We all use nouns to search on Twitter. But are pronouns more useful?

Last year, the New York Times reported a series of incidents where Hasidic Jewish men had refused to sit next to women on flights.

To write the article, they wanted to interview people who had been on the flights and witnessed the incidents, so they turned to Twitter to find them.

pronous gathering

How would you go about this?

Perhaps you’d start with the words ‘Hasidic’ and ‘flight’. But as it turns out, this turns up a haystack of information: ‘2-month-old Hasidic boy dies after brother drops him down flight of stairs.” “Hasidic man dies mid flight from Florida to New York.”

The problem is, no one has time to sift through every single comment including these two nouns.

Sure, you could filter it further with nouns like ‘plane’ and ‘Jewish’, but that still doesn’t filter out all the noise. And the more nouns you add, the more comments you’ll miss that don’t include those nouns.

So, if nouns by themselves are too blunt a search tool, how do we sharpen them?

Well, we’re looking for people’s personal accounts of what happened to them. And when things get personal, people use pronouns.

The smart folk at the New York Times realised this, and found witnesses by adding ‘me’ or ‘my’ to their searches.

“Wow, this Hasidic man on my flight is really refusing to sit next to me. #thisisafirst”

“The Chasidic Jew on my flight requested a no woman seat. I am watching this go down right now.”

A simple observation, but a general lesson in text analytics: don’t ignore the little words. People use them for a reason.

 

We understand language like other people understand numbers. We use both to uncover patterns in text data such as customer surveys and online reviews, as well as social media. If you’re interested to hear what people are saying about your brand, email Chris.