Today I drafted some short reviews of papers about Wiktionary, from the
backlog of the Wikimedia Research Newsletter. Reviews of the reviews and
edits are welcome before the newsletter is published, in case I missed
or misunderstood something too technical for me. :)
https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-09-30/Recen…
I think Wiktionary users can be very proud, as (in its multiple language
editions) it's regularly shown to be an invaluable linguistic resource,
already better than multiple or all competitors for a long series of
purposes.
1.2 GLAWI, a free XML-encoded Machine-Readable Dictionary built from the
French Wiktionary
1.3 IWNLP: Inverse Wiktionary for Natural Language Processing
1.4 knoWitiary: A Machine Readable Incarnation of Wiktionary
1.5 Zmorge: A German Morphological Lexicon Extracted from Wiktionary
1.6 Dbnary: Wiktionary as Linked Data for 12 Language Editions with
Enhanced Translation Relations
1.7 Observing Online Dictionary Users: Studies Using Wiktionary Log Files
1.8 Multilingual Open Relation Extraction Using Cross-lingual Projection
Nemo
Cross-list posting, as it's relevant to Wiktionary and Wikimedia as a whole.
First of all, please go to Meta page "Names of Wikimedia languages"
[1] and do the best to proofread or translate items. That's
strategically important set of lists for the movement. We have to know
the names of Wikimedia languages in Wikimedia languages.
This is the first mobilization for this kind of simple translations:
few hundred terms, of which this list is the most complex, as it
requires additional column "in <this> language".
The next one will be about lexicographical and grammatical terms and
abbreviations. That one is of strategic importance for Wiktionary, as
it allows anyone to generate sane dictionary entries.
After those two lists we'll be able to start working on the
Ornithological dictionary, with something less than 400 species.
And now about the number of tanks...
Let's say that there are 250 Wikimedia languages and that we have
three matrix sets: names of languages, 100 lexicographical and
grammatical abbreviations and terms and 400 species from
ornithological dictionary. And that we have those lists translated in
all (250) Wikimedia languages. The numbers are...
* The names of 250 languages *times* in 250 languages (=62,500 entries
per project) *times* on 250 projects (=15,625,000 entries on all
projects).
* 100 lexicographical and grammatical terms and abbreviations *times*
250 languages (=25,000 entries per project) *times* on 250 projects
(=6,250,000 entries on all projects).
* 400 bird species * 250 languages (=100,000 entries per project)
*times* on 250 projects (=25,000,000 entries on all projects).
OK. That calculation is too optimistic. I would be happy if we get
translations in 50 languages. The numbers would be then 125,000
entries for languages, 250,000 entries for lexicographical and
grammatical terms and abbreviations and 1,000,000 for birds.
Besides obvious fact that traditional lexicography isn't that
optimized (note that it's about traditional lexicography, not about
Wiktionary itself, thus not that fixable) and that we need a bit
better method (OmegaWiki, Wikidata, we are developing the proof of
concept, as well), there are two other consequences:
1) If we have a set of 400 words and we translate them in 50
languages, we are getting one million of entries. We should be doing
that on monthly basis. It's not hard at all!
2) In a bit more complex form, which requires more work per matrix set
and smaller output ("just" multiplication of the first and third
number), this could be used for Wikipedia articles, as well. (You need
much more information in encyclopedic article for German language than
in a dictionary entry. But it's quite possible to do it. And it's
especially important for languages with small number of speakers.)
Please go to [1] and help this translation! Having the names of
Wikimedia languages in Wikimedia languages *is* important no matter if
it's about Wiktionary or generating the content. We should know the
names of our languages in our languages.
[1] https://meta.wikimedia.org/wiki/Names_of_Wikimedia_languages