GC: n
CT: Implementing spelling correction: There are two basic principles underlying most spelling correction algorithms.
- Of various alternative correct spellings for a mis-spelled query, choose the “nearest one. This demands that we have a notion of nearness or proximity between a pair of queries. We will develop these proximity measures in Section 3.3.3 .
- When two correctly spelled queries are tied (or nearly tied), select the one that is more common. For instance, grunt and grant both seem equally plausible as corrections for grnt. Then, the algorithm should choose the more common of grunt and grant as the correction. The simplest notion of more common is to consider the number of occurrences of the term in the collection; thus if grunt occurs more often than grant, it would be the chosen correction. A different notion of more common is employed in many search engines, especially on the web. The idea is to use the correction that is most common among queries typed in by other users. The idea here is that if grunt is typed as a query more often than grant, then it is more likely that the user who typed grnt intended to type the query grunt.
Beginning in Section 3.3.3 we describe notions of proximity between queries, as well as their efficient computation. Spelling correction algorithms build on these computations of proximity; their functionality is then exposed to users in one of several ways:
- On the query carot always retrieve documents containing carot as well as any “spell-corrected version of carot, including carrot and tarot.
- As in (1) above, but only when the query term carot is not in the dictionary.
- As in (1) above, but only when the original query returned fewer than a preset number of documents (say fewer than five documents).
- When the original query returns fewer than a preset number of documents, the search interface presents a spelling suggestion to the end user: this suggestion consists of the spell-corrected query term(s). Thus, the search engine might respond to the user: “Did you mean carrot?
S: http://nlp.stanford.edu/IR-book/html/htmledition/implementing-spelling-correction-1.html (last access: 28 December 2014)
N: 1. spelling (n): mid-15c., “action of reading letter by letter,” verbal noun from spell (v.). Meaning “manner of forming words with letters” is from 1660s; meaning “a way a word has been spelled” is from 1731. Spelling bee is from 1878 (see bee; earlier spelling match, 1845; the act of winning such a schoolroom contest is described 1854 as to spell (someone) down).
suggestion (n): mid-14c., “a prompting to evil,” from Anglo-French and Old French suggestioun “hint, temptation,” from Latin suggestionem (nominative suggestio) “an addition, intimation, suggestion,” noun of action from suggestus, past participle of suggerere “bring up, bring under, lay beneath; furnish, afford, supply; prompt,” from sub “up” + gerere “bring, carry”. Sense evolution in Latin is from “heap up, build” to “bring forward an idea.” Meaning “proposal, statement, declaration” appeared by late 14c., but original English notion of “evil prompting” remains in suggestive. Hypnotism sense” is from 1887.
2. Spelling Corrections and Suggestions:
Not sure how to spell something? Don’t worry, try gessing or speling any way you can. In just the first few months on the job, Google engineer Noam Shazeer developed a spelling correction (suggestion) system based on what other users have entered. The system automatically checks whether you are using the most common spelling of each word in your query.
(We used to suggest that you search Google for phonitick spewling. But so many Web pages added the same example that now — or, at least, when we last checked — Google no longer treats those “words” as incorrectly spelled! Google’s system doesn’t match words against an actual dictionary; it compares them to commonly-used words.)
S: 1. OED – http://www.etymonline.com/index.php?allowed_in_frame=0&search=spelling&searchmode=none; http://www.etymonline.com/index.php?allowed_in_frame=0&search=suggestion&searchmode=none (last access: 28 December 2014). 2. http://www.googleguide.com/spelling_corrections.html (last access: 28 December 2014).
SYN:
S:
CR: artificial intelligence, computer science, spellchecker.