There's a list of Wiktionaries by raw size at http://meta.wikimedia.org/wiki/Wiktionary#List_of_Wiktionaries
Do all Wiktionaries follow the same format, with one wiki article per word, containing sections for language / part of speech / aspects and then numbered lists for meanings? E.g.
[[Snow]] ==English== ===Noun=== # The frozen, crystalline state of water # A shade of white # Random electrical noise ====Derived terms==== ====Translations==== ===Verb=== # Weather when snow is falling # Bluff draw in poker ====Derived terms==== ====Translations====
Or is there any Wiktionary that breaks this pattern? Does this pattern have a name? What do you call it when/if some Wiktionary breaks this pattern?
How did we end up with disambiguation pages on Wikipedia, strictly keeping one page per meaning of a word, but not on Wiktionary? Is that because Wiktionary spun off before disambiguation pages were invented on Wikipedia, and the news never spread to Wiktionary? Or is it because the Oxford English Dictionary differs from Encyclopaedia Britannica in this respect, and we want to keep the best practice? Or why? One could say that all meanings of "snow" are the same word (by etymology), and should logically be in one page. But this is not true of "pen" (etymology 1--4) and the keeping of foreign words of similar spelling in the same page (Norwegian "pen" meaning "fine"). Has there been a discussion about this, and where can that be found? I found something from December 2002, http://en.wiktionary.org/wiki/Wiktionary_talk:Entry_layout_explained/archive... But the voice of reason, Imran, left the project a year later. Another discussion took place in December 2005, http://en.wiktionary.org/wiki/Wiktionary:Beer_parlour_archive/October-Decemb... (It appears to be a December issue, so I apologize for bringing it up a few weeks early this year.)
In the English Wiktionary, what percentage of words are in English? And is the "long tail" of foreign languages similar over all Wiktionaries? Is there any major Wiktionary that has a higher concentration of words in the own language?
If the above pattern holds, a simple count of all level-2 headings from the database dump could give the answer. For example, in the dump of the Swedish Wiktionary, having 46500 articles and being the 13th biggest, these level-2 headings appear most frequently:
2510 ==Svenska== Swedish 1847 ==Tvärspråkligt== Translingual 625 ==Engelska== English 343 ==Historik== Etymology 267 ==Tyska== German 245 ==Danska== Danish 230 ==Norska== Norwegian 217 ==Spanska== Spanish 217 ==Franska== French 192 ==Italienska== Italian 184 ==Nederländska== Dutch 169 ==Finska== Finnish 152 ==Polska== Polish 135 ==Serbiska== Serbian 122 ==Rumänska== Romanian 116 ==Interlingua== Interlingua 109 ==Ungerska== Hungarian