All,
I wanted to share a heads-up. Last weekend I had an hour of spare time,
so I wrote wps-e, a semi-automated and reasonably[1] intelligent spell
checker for Wikipedia. After compiling an extensive dictionary from
several known-good sources, wps-e currently operates with a 668 thousand
word base and some 2.2 thousand "automatic correction" entries
(misspellings that almost certainly should be autocorrected (i.e.
leaderhip -> leadership, prominant -> prominent)).
It's written in Perl, and you can take a look at some of the results of
its run by looking at contributions by User:Ike on en. It's
semi-automated in the sense that - other than the autocorrect list - it
doesn't do more than *propose* correct spelling, and even then it never
autocommits back to wikipedia without a user manually approving the
changes. Because the Princeton Wordnet database is free, the software
also integrates meanings of its proposed spellings into the (colored,
but console-based) interface.
I am working out some final kinks out of the software, and will be
releasing it soon. Sometime in the near future, I expect to release a
windows-based GUI app that might make this type of copyediting very
easily accessible to larger masses, which would mean cleaning up WP
pretty quickly (I was alone able to go through about 2000 articles with
wps-e in under 3 hours).
If there are a few interested fellow Perl-masochists here (Erik, Timwi
maybe?), I'd be more than happy to send them the beta to try and break
or improve before it's released. Just drop me an e-mail. If someone with
proper permissions wanted to copy-edit directly on one of the WP
servers, they would have very low per-correction time (though this
effect can be achieved by normal users by pre-fetching articles, which
I'm looking to add to wps-e anyway).
Cheers,
Ivan
[1] - it doesn't cook dinner for you, but I've had - other than last
names - very few false positives to deal with.
Show replies by date