开发者

how did WordNet come in being

开发者 https://www.devze.com 2023-04-13 05:30 出处:网络
I wonder how the hierarchical relationship in WordNet between the words are retrieved. Is that manually don开发者_如何学运维e or via computer techniques.

I wonder how the hierarchical relationship in WordNet between the words are retrieved.

Is that manually don开发者_如何学运维e or via computer techniques.

If based on computer techniques, what are they?


From the FAQ:

q.1.2 Where do you get the definitions for WordNet? (short answer) Our lexicographers write them.

Where do you get the definitions for WordNet? (long answer) From the foreword to WordNet: An Electronic Lexical Database, pp. xviii-xix:

People sometimes ask, "Where did you get your words?" We began in 1985 with the words in Kučera and Francis's Standard Corpus of Present-Day Edited English (familiarly known as the Brown Corpus), principally because they provided frequencies for the different parts of speech. We were well launched into that list when Henry Kučera warned us that, although he and Francis owned the Brown Corpus, the syntactic tagging data had been sold to Houghton Mifflin. We therefore dropped our plan to use their frequency counts (in 1988 Richard Beckwith developed a polysemy index that we use instead). We also incorporated all the adjectives pairs that Charles Osgood had used to develop the semantic differential. And since synonyms were critically important to us, we looked words up in various thesauruses: for example, Laurence Urdang's little "Basic Book of Synonyms and Antonyms" (1978), Urdang's revision of Rodale's "The Synonym Finder" (1978), and Robert Chapman's 4th edition of "Roget's International Thesaurus" (1977) -- in such works, one word quickly leads on to others. Late in 1986 we received a list of words compiled by Fred Chang at the Naval Personnel Research and Development Center, which we compared with our own list; we were dismayed to find only 15% overlap.

So Chang's list became input. And in 1993 we obtained the list of 39,143 words that Ralph Grishman and his colleagues at New York University included in their common lexicon, COMLEX; this time we were dismayed that WordNet contained only 74% of the COMLEX words. But that list, too, became input. In short, a variety of sources have contributed; we were not well disciplined in building our vocabulary. The fact is that the English lexicon is very large, and we were lucky that our sponsors were patient with us as we slowly crawled up the mountain.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号