Hello,
Here is what I would like to do : generating reports which give, for a
given language, a list of words which are used on the web with a number evaluating its occurencies, but which are not in a given wiktionary.
How would you recommand to implemente that within the wikimedia infrastructure?
Related : the French Wiktionary folks did that using a Wikisource dump (I’ll agree that fr.wikisource is a tiny subset of « the web » ;)
See http://tools.wmflabs.org/dicompte/
Hope that helps,