Hi Federico
Sorry for the late reply, I forgot to check the answers to this thread.
Yes, the spellchecking module should allow to load any hunspell dictionaries which are (I think) the same dictionaries used by Mozilla and LibreOffice, see also https://en.wikipedia.org/wiki/Hunspell. You can do this for example with this command
$ python spellcheck_hunspell.py Wikipedia -dictionary:/usr/share/hunspell/de_DE
which will check the page "Wikipedia" against the given hunspell dictionary (note that there are 2 files that must exist for this to work: /usr/share/hunspell/de_DE.aff and /usr/share/hunspell/de_DE.dic).
The advantage compared to loading this into your LibreOffice Wordprocessor or let Mozilla do the spellchecking is that the Python script will (attempt) to recognize which sections of the text are actually text and skip stuff like templates and references. This will hopefully reduce the number of false positives. On the German wikipedia I get for the page "Wikipedia" 97 hits of words that the hunspell checker does not know (out of a total of 8061 words), over 30 of which are names that appear after the "==Literatur==" section. Most of the rest are also names and English words which I do not expect hunspell to know. However, all of them are correct and thus on this article it seems that it flags about 1.2 % of all words as false which is probably way lower than what you would have if you parsed *all* words but still rather a high number of falsely flagged words.
This is the reason I also provide an implementation using a list of "known false" words.
Hannes
On 12 April 2014 10:35, Federico Leva (Nemo) nemowiki@gmail.com wrote:
I'm very interested in the spellchecking: does it allow to load Mozilla/LibreOffice dictionaries/spellcheckers in other languages too?
Nemo
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l