Re: [Wikipedia-l] Common typos (was: Tucci528)

14 Sep 2002

At 04:27 PM 09/13/2002 -0700, Ray Saintonge wrote:
...
 Would we use an American or British spell-checker?

We create our own spell checker, but then we make the corrections 
manual.  This is similar to Google's spell checking feature.  We first 
parse out all the words in every article and make a table with each unique 
word and the number of occurrences.  This is typically a step in most 
indexed search engines, but MySQL is really fast without this.

It would be a safe a assumption that words that are used the most 
frequently are probably spelled correctly.  True, "recieve" may be a very 
common miss spelling, but there are probably a lot more occurrences of 
"receive".  So the flip side of this is that words that are used rarely are 
probably spelled wrong.  Now we don't want to go off blindly replacing 
these words (mostly because we would know what with) but they are good 
words to look for for replacing.

So if we could have an automated script that took these "l;east frequently 
occurring words" and listed them for a human they could say "Ah, recieve 
should be receive, this is a miss-spelling."  Then they enter in the 
correct spelling and we use the same method mentioned in my previous e-mail 
to approve each individual change.

I have found that by automating the mundane repetitive portions of tasks 
like this that humans are much more accurate.  If you have to go through 10 
of the same motions for every 1 that requires thought then you are more 
likely to not put any thought into that 1.  But if it is only a 2:1 or even 
better 1:1 ratio then you will put much more thought into it.

Again I don't know if this is even remotely possible with the WikiPedia 
software.  I'd hate to do this off-line since it would be too easy to get 
out of sync.  Maybe an alternative interface for these kinds of edits.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [Wikipedia-l] Common typos (was: Tucci528)