Hello,
Here is what I would like to do : generating reports which give, for
a
given language, a list of words which are used on
the web with a
number
evaluating its occurencies, but which are not in a given wiktionary.
How would you recommand to implemente that within the wikimedia
infrastructure?
Related : the French Wiktionary folks did that using a Wikisource
dump
(I’ll agree that fr.wikisource is a tiny subset of « the web » ;)
See <http://tools.wmflabs.org/dicompte/>
Hope that helps,