בתאריך יום ב׳, 2 בספט׳ 2019 ב-10:43 מאת James Heilman <jmh649(a)gmail.com
One question, we have Crotian and Serbo-Croatian...
How significant is
Now that's a question for the Language committee, although it's one of the
most complicated ones :)
Before I begin and you run away from descriptions, let me say that the most
important parts of this email are not about Slavic linguistics, so bear
So: As far as the languages themselves go, the differences are very minor
as far as I know. Many people say that they are smaller than the
differences between British and American English, but I cannot vouch for it
In real life, I heard people from Serbia and Croatia talking to each other
in their languages (Wikipedians and non-Wikipedians), and they all said
that it's the same language with different names.
More closely to our world, I saw multiple people copying whole articles
from the Serbo-Croatian Wikipedia to the Croatian one, and back without
changing any text.
There are some differences in vocabulary: Croatian has a long tradition of
being more puristic in technical terminology, as well as in names of months
and some other things, but the text is nevertheless readable both ways.
Most of the vocabulary, grammar, and spelling rules are the same, however.
In addition to these two, there are also the Serbian Wikipedia and the
Bosnian Wikipedia. Bosnian, Croatian, and Serbo-Croatian are all written
only in the Latin alphabet, and they are all pretty much the same language,
with the differences being as described above. The Serbian Wikipedia is
written primarily in the Cyrillic alphabet, but automatic script conversion
is installed there.
The reason for the split are complex, but primarily political: there were
many arguments about the name of the language even before the Balkan wars
in the 1990s, and then the wars and the breakup of Yugoslavia only
intensified them. Serbo-Croatian was an attempt to have a unified standard
language with allowance for local standardized varieties, and it could be
quite practical, but it somehow happened that we got four Wikipedias anyway.
To complicate things even further, there is a request to create a
Montenegrin Wikipedia, which would be a yet another nearly-identical
variety. The Language committee is mostly opposed to doing it, although
there are also people who think that since it has a valid ISO 639 code, and
all the other varieties got their own domain, it's fair to give this one
its own domain, too.
In theory, they could all live in one Serbo-Croatian Wikipedia, with
automatic conversion of script and terminology, similar to how it's done
with Chinese and with the different locales of English, Portuguese, and
German. In practice, it's probably too difficult by now to unify them. It
definitely cannot be done by an order from the Board, the Stewards, or the
Language committee. The only scenario for this is to reach a very wide
consensus among all the active editors of the Croatian, Bosnian, and
Serbo-Croatian Wikipedias and to start gradually moving all the editing
activity to one of them, and making the other two read-only. The Serbian
Wikipedia should probably remain distinct because the Cyrillic alphabet is
common in Serbia. This scenario is *kind of* imaginable, but isn't likely
to actually happen.
To connect it all to the ongoing political issue in the Croatian Wikipedia:
One could ask at this point whether this issue is a demonstration of why
too much splitting of wikis by language and country is a problem. Perhaps
it is and perhaps it isn't; I am reluctant about making a strong judgment
here without knowing the languages in question well enough.
I do, however, want to add another point, which isn't frequently discussed,
even though it really should be, because it's probably the most important
one: Are these discussions about the editors community, or about the
readers? The sites are ultimately supposed to serve the readers and not
just the editors. We cannot know much about the readers because they are
usually quiet by their nature. We can, though, look at readership
statistics per country in the relevant countries.
Here is some super-basic data from the Turnilo tool for analyzing Wikimedia
traffic. (This tool is not public. I have access to it as a WMF staff
member, and I can publish some high-level non-private data.) I looked at
the traffic to Wikimedia projects per country. Here's the list of the most
popular editions of Wikipedia in each country for the first half of 2019:
Bosnia and Herzegovina: en, hr, sr, bs, sh, ru, de
Croatia: en, hr, bs, sh, de, ru, sr
Montenegro: sr, en, sh, hr, ru, bs, de
Serbia: sr, en, sh, ru, hr, hu, bs
What can we see here?
* Serbian is the most popular in Montenegro and Serbia. (And note that
Serbian is the most popular in Montenegro, even though the Latin alphabet
is much more popular there from what I've heard. However, I have no way to
know whether the readers get the Latin or the Cyrillic version.)
* English is the most popular in Bosnia and Croatia, and the second most
popular in Montenegro and Serbia.
* Croatian is second to English in both Croatia and Bosnia.
* Bosnian is less popular than Serbian in Bosnian. It is more popular in
Croatia than it is in Bosnia, where you'd expect it to be the most popular
* This list item is an Easter egg in plain sight. If you've read this long
email this far, I appreciate your time and attention.
* Serbian has low popularity in Croatia, which is not so surprising given
that the Latin alphabet is the only standard there and Cyrillic is almost
* Serbian has a pretty high placement in Bosnia, which can probably be
explained by the presence of a significant ethnic Serbian presence there
(read about Republika Srpska). Unfortunately we don't have data by
sub-national entities, but only by whole countries.
* Serbo-Croatian is more popular than Bosnian and Croatian in Montenegro
and Croatia. In fact, it has considerable presence everywhere, even though
some people say that it's the most redundant wiki of the four.
Note also the following points, which apply to pretty much all countries:
* The vast majority of readers don't consciously select one edition or
another, but go to whatever Google gives them.
* English is popular not necessarily because people *want* to read in it,
but probably because they cannot find what they are looking in their
language, either because an article wasn't written, or because Google
ranked the article in English more highly, and they couldn't find how to
switch to their language.
We could perhaps get better information about what the people are actually
interested in if we run a focused reader survey in these countries instead
of looking at dry traffic numbers or listening to anecdotal evidence from
editors (this includes myself). If we are really interested in such a
thing, this is something that can be done quite easily by WMF's Research
team, perhaps in collaboration with Wikimedia chapters and academics on the
ground in these countries.
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
“We're living in pieces,
I want to live in peace.” – T. Moore