[Wiktionary-l] Wiktionary size, format, long tail of languages
Lars Aronsson
lars at aronsson.se
Thu Nov 22 04:43:00 UTC 2007
There's a list of Wiktionaries by raw size at
http://meta.wikimedia.org/wiki/Wiktionary#List_of_Wiktionaries
Do all Wiktionaries follow the same format, with one wiki article
per word, containing sections for language / part of speech /
aspects and then numbered lists for meanings? E.g.
[[Snow]]
==English==
===Noun===
# The frozen, crystalline state of water
# A shade of white
# Random electrical noise
====Derived terms====
====Translations====
===Verb===
# Weather when snow is falling
# Bluff draw in poker
====Derived terms====
====Translations====
Or is there any Wiktionary that breaks this pattern? Does this
pattern have a name? What do you call it when/if some Wiktionary
breaks this pattern?
How did we end up with disambiguation pages on Wikipedia, strictly
keeping one page per meaning of a word, but not on Wiktionary?
Is that because Wiktionary spun off before disambiguation pages
were invented on Wikipedia, and the news never spread to
Wiktionary? Or is it because the Oxford English Dictionary
differs from Encyclopaedia Britannica in this respect, and we want
to keep the best practice? Or why? One could say that all
meanings of "snow" are the same word (by etymology), and should
logically be in one page. But this is not true of "pen"
(etymology 1--4) and the keeping of foreign words of similar
spelling in the same page (Norwegian "pen" meaning "fine"). Has
there been a discussion about this, and where can that be found? I
found something from December 2002,
http://en.wiktionary.org/wiki/Wiktionary_talk:Entry_layout_explained/archive_2002
But the voice of reason, Imran, left the project a year later.
Another discussion took place in December 2005,
http://en.wiktionary.org/wiki/Wiktionary:Beer_parlour_archive/October-December_05#Basic_flaw_in_Wiktionary--What_is_a_.27word.27.3F.3F
(It appears to be a December issue, so I apologize for bringing it
up a few weeks early this year.)
In the English Wiktionary, what percentage of words are in
English? And is the "long tail" of foreign languages similar over
all Wiktionaries? Is there any major Wiktionary that has a higher
concentration of words in the own language?
If the above pattern holds, a simple count of all level-2 headings
from the database dump could give the answer. For example, in the
dump of the Swedish Wiktionary, having 46500 articles and being
the 13th biggest, these level-2 headings appear most frequently:
2510 ==Svenska== Swedish
1847 ==Tvärspråkligt== Translingual
625 ==Engelska== English
343 ==Historik== Etymology
267 ==Tyska== German
245 ==Danska== Danish
230 ==Norska== Norwegian
217 ==Spanska== Spanish
217 ==Franska== French
192 ==Italienska== Italian
184 ==Nederländska== Dutch
169 ==Finska== Finnish
152 ==Polska== Polish
135 ==Serbiska== Serbian
122 ==Rumänska== Romanian
116 ==Interlingua== Interlingua
109 ==Ungerska== Hungarian
--
Lars Aronsson (lars at aronsson.se)
Aronsson Datateknik - http://aronsson.se
More information about the Wiktionary-l
mailing list