Corpus linguistics: how to justify your hunches when faced with a CEO

Today, we are delighted to have a guest post from Zsofia Demjen. Zsofia is a lecturer at the Open University, and has worked with us for a number of years.

Have you ever wondered how some words or phrases come to have negative associations while others have positive ones? I don’t mean words like crisis or smile, which have negative or positive meanings. I mean seemingly neutral words and phrases like manager or performance, which somehow have a negative aftertaste (if you want the technical term, it’s semantic prosody).

The answer, as is often the case, has to do not with dictionary definitions, but rather with usage: how and in what contexts the words or phrases get used. But how can you find out how a word gets used? As a competent language user, you probably have a hunch or a sense for these things, but sometimes (if you want to convince a client, for example) a more grounded method can be useful. This is one of the ways in which corpus analysis can help.

A corpus is a large searchable collection of written or transcribed spoken language organised in a systematic way and stored digitally. One of the best-known corpora (plural of corpus) is the British National Corpus (BNC) of 100 million words, but a newer and larger corpus is the Corpus of Global Web-based English (GloWbE). It consists of 20 different varieties of English (include American and British) and 1.9 billion words. You can search for a particular word or phrase in these corpora and have hundreds of examples of how it is used at your fingertips in a few seconds. Here are 20 random uses of performance in the UKWaC corpus (another web-corpus of British English)[1]:

Click to Enlarge

Click to Enlarge

Just eyeballing these examples, you’ll see that performance is associated with the contexts of economics, politics and institutions, as well as arts and is mostly used when something is evaluated or measured (e.g. employee performance reviews).

But corpus analysis can do more. A particularly powerful indicator of word usage is collocation – a statistical relationship revealing words or phrases that occur in close proximity to one another more frequently than would be expected by chance. You can also ask the software that holds a corpus to give you information about this. A collocation analysis shows that performance frequently co-occurs with comparatives (better, improved, higher), and more often with negative descriptions (poor) than overtly positive ones.

Although these comparatives are themselves positive, in this context they tend to be used when there is/was a need for improvement. Add this to the anxiety associated with being evaluated and suddenly you can justify that uncomfortable feeling about the word performance. When you have that kind of information to hand, it’s hard not to be convincing!

About the Author:


Zsófia Demjén is Lecturer in English Language and Applied Linguistics at The Open University. Her research centers on non-literary stylistics, discourse analysis, metaphor, corpus methods and health communication and she is keen to explore the uses of linguistic analysis in commercial contexts. She has a BSc in Management from UMIST, an MA in Language Studies and a PhD in Linguistics from Lancaster University. Zsófia works with Verbal Identity in the research phase of our development of Verbal Brand Guidelines.


[1] Concordance created by Sketchengine