Research behind the dictionary

Learn These Words First implements a layered monolingual dictionary.

The first layer (Lessons 1 and 2) consists of words representing 61 universal concepts expressed in all languages. This set of "semantic atoms" is based on the Natural Semantic Metalanguage (NSM), developed over the last three decades by Anna Wierzbicka and Cliff Goddard.

The 34 middle layers consist of 300 "semantic molecules" (Lessons 3 through 12). Words in each layer are defined using only the words from the previous layers. This sequence of layers is based on dependency-graph analysis of the non-circular NSM-LDOCE research dictionary.

Bullock, David. 2011. "NSM + LDOCE: A Non-Circular Dictionary of English" in the International Journal of Lexicography. Oxford: Oxford University Press, 24.2: 226-240.

The next layer in Learn These Words First is an alphabetical reference section containing definitions for the 2000 words in the Longman Defining Vocabulary, each defined using only the 360 "atoms" and "molecules" from the lessons.

(The Longman Dictionary of Contemporary English can be considered the final layer, since every word is defined using only the 2000-word defining vocabulary.)

Longman Defining Vocabulary

One way to reduce circularity in dictionary definitions is through the use of a controlled vocabulary. In the Longman Dictionary of Contemporary English (LDOCE), the definitions for over 80,000 words and phrases are written using only the central senses of around 2000 words in the dictionary's core defining vocabulary. This core vocabulary was developed from the General Service List of high-frequency words and their most common meanings (West, Michael. 1953. A General Service List of English Words. London: Longman).

The words appearing in LDOCE definitions are restricted to non-idiomatic uses of only their higher-frequency classes and senses. If a reader understands the 2000 words in the LDOCE's core defining vocabulary, the remaining 78,000 definitions in the LDOCE can be understood without encountering a circular reference.

Natural Semantic Metalanguage

An ideal dictionary definition explains the meaning of its headword using only words that are simpler and easier to understand than the headword being defined. If you repeat this process of "reductive paraphrase" for every headword in the dictionary, you will ultimately find a core subset of headwords that cannot be further reduced to simpler terms. These irreducible words are "semantic atoms" (also called "semantic primes").

By finding and comparing the semantic atoms of many languages, linguist Anna Wierzbicka and colleagues have developed Natural Semantic Metalanguage (NSM), which identifies a common set of concepts appearing as the semantic atoms in all languages. You can find more information about NSM at Griffith University's Natural Semantic Metalanguage Homepage.

NSM semantic atoms and reductive paraphrase are used by Learn These Words First to create a dictionary without circular definitions. Lessons 1 and 2 introduce the 61 NSM semantic atoms in English (the atoms identified as of 2002). These are used to explain 300 "semantic molecules" in Lessons 3 through 12. The rest of the words in the dictionary are defined using only the semantic atoms and molecules.

NSM-LDOCE Non-Circular Dictionary

Can every word in a dictionary be explained using Natural Semantic Metalanguage?

The NSM-LDOCE research dictionary was created to test the expressive power of Natural Semantic Metalanguage (NSM) and its tiny set of semantic primes. In this dictionary, NSM was used to paraphrase definitions for each word in the controlled defining vocabulary of the Longman Dictionary of Contemporary English (LDOCE). The definitions were written using mostly NSM primes, mixed with a few other words from the LDOCE defining vocabulary.

Chains of circular definitions were detected using a computer program. Most were resolved by rewording one of the definitions in the chain, but three were resolved by adding tentative semantic primes (colour, number and shape).

The resulting NSM-LDOCE dictionary is non-circular, and by extension provides non-circular definitions for all the words in the LDOCE.

The NSM-LDOCE research dictionary served as the basis for creating Learn These Words First. New non-circular definitions for colour, number and shape were written and tested, so these three tentative primes could be removed. Other definitions were improved to eliminate more than half of the 700 words used as "semantic molecules" in NSM-LDOCE.

Methodology: creating and testing the dictionary

To create Learn These Words First, the 2352 definitions in the NSM-LDOCE dictionary were sequenced into layers using the recursive-dependency statistics from the "Non-Circular Dictionary" study. Then each definition was edited for greater fluency and precision, utilizing words available in the preceding layers.

Using computer-aided paraphrase, the number of "semantic molecules" was reduced to around 300 words. These words, preceded by the NSM vocabulary, were grouped into 12 lessons and expanded to use full-sentence definitions and examples.

Student participants performed headword-identification tasks to evaluate the quality of every definition in the Learn These Words First lessons. For fill-in-the-blank tasks (given definitions without headwords), students correctly identified the missing headword 95% of the time. For complete-the-word tasks (given definitions and only the first letter of each headword), students identified the headword 100% of the time.

Universal semantic molecules

The Learn These Words First lessons explain about 300 semantic molecules. These semantic molecules were identified by computer-aided analysis of paraphrased dictionary definitions.

Many of these same semantic molecules were independently identified by Cliff Goddard and Anna Wierzbicka. The following lists are adapted from their briefing paper for the "Global English, Minimal English" symposium (July 2015, ANU, Canberra).

Universal or near-universal semantic molecules

Defined in Learn These Words First lessons : animal (creature), around, back, bird, blood, bottom, burn (fire), centre (middle), child, day, drink, ear, eat, egg, eye, fish, flat, front, ground, grow, hair (fur), hand, hard, head, heavy, hold, laugh, leg, light, long, make, man, mouth, name (called), nose, play, quickly, round, sharp, sit, sky, sleep, smooth, straight, sun, sweet, top, tree, water, woman.

: animal (creature), around, back, bird, blood, bottom, burn (fire), centre (middle), child, day, drink, ear, eat, egg, eye, fish, flat, front, ground, grow, hair (fur), hand, hard, head, heavy, hold, laugh, leg, light, long, make, man, mouth, name (called), nose, play, quickly, round, sharp, sit, sky, sleep, smooth, straight, sun, sweet, top, tree, water, woman. Not in the lessons: born, breast, dance, face, father, feather, finger, fingernail, husband, kill, lie, mother, night, on, sing, skin, slowly, soft, tail, tooth, wife, wing.

Semantic molecules found in many languages