Re: [Pywikipedia-l] Diverse Coding Projects

30 Apr 2014


      Hi Federico
Sorry for the late reply, I forgot to check the answers to this thread.
Yes, the spellchecking module should allow to load any hunspell
dictionaries which are (I think) the same dictionaries used by Mozilla
and LibreOffice, see also https://en.wikipedia.org/wiki/Hunspell. You
can do this for example with this command
$ python spellcheck_hunspell.py Wikipedia -dictionary:/usr/share/hunspell/de_DE
which will check the page "Wikipedia" against the given hunspell
dictionary (note that there are 2 files that must exist for this to
work: /usr/share/hunspell/de_DE.aff and
/usr/share/hunspell/de_DE.dic).
The advantage compared to loading this into your LibreOffice
Wordprocessor or let Mozilla do the spellchecking is that the Python
script will (attempt) to recognize which sections of the text are
actually text and skip stuff like templates and references. This will
hopefully reduce the number of false positives. On the German
wikipedia I get for the page "Wikipedia" 97 hits of words that the
hunspell checker does not know (out of a total of 8061 words), over 30
of which are names that appear after the "==Literatur==" section. Most
of the rest are also names and English words which I do not expect
hunspell to know. However, all of them are correct and thus on this
article it seems that it flags about 1.2 % of all words as false which
is probably way lower than what you would have if you parsed *all*
words but still rather a high number of falsely flagged words.
This is the reason I also provide an implementation using a list of
"known false" words.
Hannes
On 12 April 2014 10:35, Federico Leva (Nemo) nemowiki@gmail.com wrote:
...
I'm very interested in the spellchecking: does it allow to load
Mozilla/LibreOffice dictionaries/spellcheckers in other languages too?
Nemo

Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Pywikipedia-l] Diverse Coding Projects