Yarn Analytics in a nutshell
Yarn Analytics measures and compares vocabulary difficulty on English-language web pages. It's handy if you're a writer, editor or teacher and hoping to reach readers with intermediate literacy levels.
Rare words are hard
The frequency with which a word occurs across a broad corpus of texts has been found to be a decent proxy for word difficulty: the less often a word is used, the the more difficult it's likely to be.
From 850 million words…
We have built a 100k word difficulty index based on how often words occur across the 850 million words of the Corpus of Contemporary American English, the Corpus of Historical American English, the British National Corpus, and the Corpus of American Soap Operas (in other words, many and diverse texts).
Using a logarithmic mapping to smooth out the raw data a little, we give each word in the Yarn index a difficulty score on a scale of 1 to 100. Then we use a weighted average of the difficulty scores of the words in the main content of each web page we index to get to the score you see.
A lexical curio
Yarn Analytics is not meant to be taken too seriously — after all, the difficulty of a text is clearly about a lot more than the difficulty of its individual constituent words. We hope you find it thought provoking and fun nonetheless.