‫בתאריך יום ב׳, 2 בספט׳ 2019 ב-10:43 מאת ‪James Heilman‬‏ <‪jmh649@gmail.com‬‏>:‬
> One question, we have Crotian and Serbo-Croatian... How significant is the difference?

Now that's a question for the Language committee, although it's one of the most complicated ones :)

Before I begin and you run away from descriptions, let me say that the most important parts of this email are not about Slavic linguistics, so bear with me.

So: As far as the languages themselves go, the differences are very minor as far as I know. Many people say that they are smaller than the differences between British and American English, but I cannot vouch for it myself.

In real life, I heard people from Serbia and Croatia talking to each other in their languages (Wikipedians and non-Wikipedians), and they all said that it's the same language with different names.

More closely to our world, I saw multiple people copying whole articles from the Serbo-Croatian Wikipedia to the Croatian one, and back without changing any text.

There are some differences in vocabulary: Croatian has a long tradition of being more puristic in technical terminology, as well as in names of months and some other things, but the text is nevertheless readable both ways. Most of the vocabulary, grammar, and spelling rules are the same, however.

In addition to these two, there are also the Serbian Wikipedia and the Bosnian Wikipedia. Bosnian, Croatian, and Serbo-Croatian are all written only in the Latin alphabet, and they are all pretty much the same language, with the differences being as described above. The Serbian Wikipedia is written primarily in the Cyrillic alphabet, but automatic script conversion is installed there.

The reason for the split are complex, but primarily political: there were many arguments about the name of the language even before the Balkan wars in the 1990s, and then the wars and the breakup of Yugoslavia only intensified them. Serbo-Croatian was an attempt to have a unified standard language with allowance for local standardized varieties, and it could be quite practical, but it somehow happened that we got four Wikipedias anyway.

To complicate things even further, there is a request to create a Montenegrin Wikipedia, which would be a yet another nearly-identical variety. The Language committee is mostly opposed to doing it, although there are also people who think that since it has a valid ISO 639 code, and all the other varieties got their own domain, it's fair to give this one its own domain, too.

In theory, they could all live in one Serbo-Croatian Wikipedia, with automatic conversion of script and terminology, similar to how it's done with Chinese and with the different locales of English, Portuguese, and German. In practice, it's probably too difficult by now to unify them. It definitely cannot be done by an order from the Board, the Stewards, or the Language committee. The only scenario for this is to reach a very wide consensus among all the active editors of the Croatian, Bosnian, and Serbo-Croatian Wikipedias and to start gradually moving all the editing activity to one of them, and making the other two read-only. The Serbian Wikipedia should probably remain distinct because the Cyrillic alphabet is common in Serbia. This scenario is *kind of* imaginable, but isn't likely to actually happen.

To connect it all to the ongoing political issue in the Croatian Wikipedia: One could ask at this point whether this issue is a demonstration of why too much splitting of wikis by language and country is a problem. Perhaps it is and perhaps it isn't; I am reluctant about making a strong judgment here without knowing the languages in question well enough.

I do, however, want to add another point, which isn't frequently discussed, even though it really should be, because it's probably the most important one: Are these discussions about the editors community, or about the readers? The sites are ultimately supposed to serve the readers and not just the editors. We cannot know much about the readers because they are usually quiet by their nature. We can, though, look at readership statistics per country in the relevant countries.

Here is some super-basic data from the Turnilo tool for analyzing Wikimedia traffic. (This tool is not public. I have access to it as a WMF staff member, and I can publish some high-level non-private data.) I looked at the traffic to Wikimedia projects per country. Here's the list of the most popular editions of Wikipedia in each country for the first half of 2019:

Bosnia and Herzegovina: en, hr, sr, bs, sh, ru, de
Croatia: en, hr, bs, sh, de, ru, sr
Montenegro: sr, en, sh, hr, ru, bs, de
Serbia: sr, en, sh, ru, hr, hu, bs

What can we see here?

* Serbian is the most popular in Montenegro and Serbia. (And note that Serbian is the most popular in Montenegro, even though the Latin alphabet is much more popular there from what I've heard. However, I have no way to know whether the readers get the Latin or the Cyrillic version.)
* English is the most popular in Bosnia and Croatia, and the second most popular in Montenegro and Serbia.
* Croatian is second to English in both Croatia and Bosnia.
* Bosnian is less popular than Serbian in Bosnian. It is more popular in Croatia than it is in Bosnia, where you'd expect it to be the most popular one!
* This list item is an Easter egg in plain sight. If you've read this long email this far, I appreciate your time and attention.
* Serbian has low popularity in Croatia, which is not so surprising given that the Latin alphabet is the only standard there and Cyrillic is almost non-existent.
* Serbian has a pretty high placement in Bosnia, which can probably be explained by the presence of a significant ethnic Serbian presence there (read about Republika Srpska). Unfortunately we don't have data by sub-national entities, but only by whole countries.
* Serbo-Croatian is more popular than Bosnian and Croatian in Montenegro and Croatia. In fact, it has considerable presence everywhere, even though some people say that it's the most redundant wiki of the four.

Note also the following points, which apply to pretty much all countries:
* The vast majority of readers don't consciously select one edition or another, but go to whatever Google gives them.
* English is popular not necessarily because people *want* to read in it, but probably because they cannot find what they are looking in their language, either because an article wasn't written, or because Google ranked the article in English more highly, and they couldn't find how to switch to their language.

We could perhaps get better information about what the people are actually interested in if we run a focused reader survey in these countries instead of looking at dry traffic numbers or listening to anecdotal evidence from editors (this includes myself). If we are really interested in such a thing, this is something that can be done quite easily by WMF's Research team, perhaps in collaboration with Wikimedia chapters and academics on the ground in these countries.

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬