It would be a safe a assumption that words that are
used the most
frequently are probably spelled correctly. True, "recieve" may be a very
common miss spelling, but there are probably a lot more occurrences of
"receive". So the flip side of this is that words that are used rarely are
probably spelled wrong. Now we don't want to go off blindly replacing
these words (mostly because we would know what with) but they are good
words to look for for replacing.
I don't think this would work - I think there will be thousands, if not tens
of thousands of words that are used exactly once. The great majority of these
will not be mis-spellings, but (parts of) proper names and geographical
names that happen to occur exactly once, words from other languages and
reasonable neologisms.
As a test, here is the result from 20 pages got with 'Random page', looking
at the count of those words I think might be unique:
Aratrum
aujtovguon - Greek word used for lack of English equivalent
phktovn - idem
Mounce - name, although not proper or geographical
Subtractive synthesis
synthisizers - indeed a misspelling (*)
highpass - jargon word
Shocking Blue
no unique words
Conjugate base
no unique words
Cenozoic
Caenozoic - given as an alternative spelling of the subject
Aldo Moro
Moro's - genitive form of a proper name (**)
Freedom of speech
no unique words
Film genres
no unique words
Alexandre Fleming
actibacterial - actual typo (*)
Hydrogen cyanide
no unique words
ISDN
no unique words
20-GATE
no unique words
Unspun
unspun - name of a group (**)
unspinning - logical neologism (**)
Apu Nahasapeempatilon
octuplet - logical neologism or normal word
punchcard - might actually be considered a misspelling (*)
Lua programming language
no unique words
Eros
no unique words
Scud
Makeyev - name of an institute
UDMH - jargon term (abbreviation)
RFNA - jargon term (abbrevation)
Academy Award for Writing Adapted Screenplay
Herczag - proper name
Siliphant - proper name
Hauben - proper name
Peploe - proper name
Zaillian - proper name
Gaghan - proper name
Elie Ducommun
no unique words
Masamune Shirow
Deunan - proper name (fictional character)
(*): As I happened upon this misspelling, I corrected it, so you'll have to go
to the previous version of the page to see it.
(**): Occurs several times, but just on one page.
Total: 16 proper singles, 3 misspellings, 3 cases (the ones with **) not counted.
On the other hand, I came across the following mis-spellings which DID occur
more than once (also corrected):
missles (5 times)
I have found that by automating the mundane repetitive
portions of tasks
like this that humans are much more accurate. If you have to go through 10
of the same motions for every 1 that requires thought then you are more
likely to not put any thought into that 1. But if it is only a 2:1 or even
better 1:1 ratio then you will put much more thought into it.
If my attempt above is any guideline, the actual ratio will be more like
1:5 or 1:6.
Andre Engels