Hi Federico
Sorry for the late reply, I forgot to check the answers to this thread.
Yes, the spellchecking module should allow to load any hunspell
dictionaries which are (I think) the same dictionaries used by Mozilla
and LibreOffice, see also
https://en.wikipedia.org/wiki/Hunspell. You
can do this for example with this command
$ python spellcheck_hunspell.py Wikipedia -dictionary:/usr/share/hunspell/de_DE
which will check the page "Wikipedia" against the given hunspell
dictionary (note that there are 2 files that must exist for this to
work: /usr/share/hunspell/de_DE.aff and
/usr/share/hunspell/de_DE.dic).
The advantage compared to loading this into your LibreOffice
Wordprocessor or let Mozilla do the spellchecking is that the Python
script will (attempt) to recognize which sections of the text are
actually text and skip stuff like templates and references. This will
hopefully reduce the number of false positives. On the German
wikipedia I get for the page "Wikipedia" 97 hits of words that the
hunspell checker does not know (out of a total of 8061 words), over 30
of which are names that appear after the "==Literatur==" section. Most
of the rest are also names and English words which I do not expect
hunspell to know. However, all of them are correct and thus on this
article it seems that it flags about 1.2 % of all words as false which
is probably way lower than what you would have if you parsed *all*
words but still rather a high number of falsely flagged words.
This is the reason I also provide an implementation using a list of
"known false" words.
Hannes
On 12 April 2014 10:35, Federico Leva (Nemo) <nemowiki(a)gmail.com> wrote:
I'm very interested in the spellchecking: does it
allow to load
Mozilla/LibreOffice dictionaries/spellcheckers in other languages too?
Nemo
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l