WordNet is a lexical database for the English language, which was created by Princeton, and is part of the NLTK corpus.
You can use WordNet alongside the NLTK module to find the meanings of words, synonyms, antonyms, and more. Let's cover some examples.
First, you're going to need to import wordnet:
from nltk.corpus import wordnet
Then, we're going to use the term "program" to find synsets like so:
syns = wordnet.synsets("program")
An example of a synset:
print(syns[0].name())
plan.n.01
Just the word:
print(syns[0].lemmas()[0].name())
plan
Definition of that first synset:
print(syns[0].definition())
a series of steps to be carried out or goals to be accomplished
Examples of the word in use:
print(syns[0].examples())
['they drew up a six-step plan', 'they discussed plans for a new bond issue']
Next, how might we discern synonyms and antonyms to a word? The lemmas will be synonyms, and then you can use .antonyms to find the antonyms to the lemmas. As such, we can populate some lists like:
synonyms = [] antonyms = [] for syn in wordnet.synsets("good"): for l in syn.lemmas(): synonyms.append(l.name()) if l.antonyms(): antonyms.append(l.antonyms()[0].name()) print(set(synonyms)) print(set(antonyms))
As you can see, we got many more synonyms than antonyms, since we just looked up the antonym for the first lemma, but you could easily balance this buy also doing the exact same process for the term "bad."
Next, we can also easily use WordNet to compare the similarity of two words and their tenses, by incorporating the Wu and Palmer method for semantic related-ness.
Let's compare the noun of "ship" and "boat:"
w1 = wordnet.synset('ship.n.01') w2 = wordnet.synset('boat.n.01') print(w1.wup_similarity(w2))
0.9090909090909091
w1 = wordnet.synset('ship.n.01') w2 = wordnet.synset('car.n.01') print(w1.wup_similarity(w2))
0.6956521739130435
w1 = wordnet.synset('ship.n.01') w2 = wordnet.synset('cat.n.01') print(w1.wup_similarity(w2))
0.38095238095238093
How to get synonyms/antonyms from NLTK WordNet in Python?
WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.
WordNet’s structure makes it a useful tool for computational linguistics and natural language processing.
WordNet superficially resembles a thesaurus, in that it groups words together based on their meanings. However, there are some important distinctions.
- First, WordNet interlinks not just word forms—strings of letters—but specific senses of words. As a result, words that are found in close proximity to one another in the network are semantically disambiguated.
- Second, WordNet labels the semantic relations among words, whereas the groupings of words in a thesaurus does not follow any explicit pattern other than meaning similarity.
# First, you're going to need to import wordnet: from nltk.corpus import wordnet # Then, we're going to use the term "program" to find synsets like so: syns = wordnet.synsets( "program" ) # An example of a synset: print (syns[ 0 ].name()) # Just the word: print (syns[ 0 ].lemmas()[ 0 ].name()) # Definition of that first synset: print (syns[ 0 ].definition()) # Examples of the word in use in sentences: print (syns[ 0 ].examples()) |
The output will look like:
plan.n.01
plan
a series of steps to be carried out or goals to be accomplished
[‘they drew up a six-step plan’, ‘they discussed plans for a new bond issue’]
Next, how might we discern synonyms and antonyms to a word? The lemmas will be synonyms, and then you can use .antonyms to find the antonyms to the lemmas. As such, we can populate some lists like:
import nltk from nltk.corpus import wordnet synonyms = [] antonyms = [] for syn in wordnet.synsets( "good" ): for l in syn.lemmas(): synonyms.append(l.name()) if l.antonyms(): antonyms.append(l.antonyms()[ 0 ].name()) print ( set (synonyms)) print ( set (antonyms)) |
The output will be two sets of synonyms and antonyms
{‘beneficial’, ‘just’, ‘upright’, ‘thoroughly’, ‘in_force’, ‘well’, ‘skilful’, ‘skillful’, ‘sound’, ‘unspoiled’, ‘expert’, ‘proficient’, ‘in_effect’, ‘honorable’, ‘adept’, ‘secure’, ‘commodity’, ‘estimable’, ‘soundly’, ‘right’, ‘respectable’, ‘good’, ‘serious’, ‘ripe’, ‘salutary’, ‘dear’, ‘practiced’, ‘goodness’, ‘safe’, ‘effective’, ‘unspoilt’, ‘dependable’, ‘undecomposed’, ‘honest’, ‘full’, ‘near’, ‘trade_good’} {‘evil’, ‘evilness’, ‘bad’, ‘badness’, ‘ill’}
Now , let’s compare the similarity index of any two words
import nltk from nltk.corpus import wordnet # Let's compare the noun of "ship" and "boat:" w1 = wordnet.synset( 'run.v.01' ) # v here denotes the tag verb w2 = wordnet.synset( 'sprint.v.01' ) print (w1.wup_similarity(w2)) |
Output:
0.857142857143
w1 = wordnet.synset( 'ship.n.01' ) w2 = wordnet.synset( 'boat.n.01' ) # n denotes noun print (w1.wup_similarity(w2)) |
Output:
0.9090909090909091