Stephen Bain wrote:
Swedish Wikipedia is distributed in almost exactly the same way as English Wikipedia, with the difference being that its average size is less than half that of En's, at around 1900 bytes.
In Sweden we have this problem that people in our capital are so arrogant towards the rest of the country. This is of course nothing special about Sweden, it happens in every country. Americans just cannot stand New Yorkers (the big city) and policy makers in Washington DC. People in Russia probably hate people in Moscow. Europe is united in its disliking of the EU bureaucracy headquarters in Brussels. Talking to other Swedes of my age, it is easier to discuss individual streets in San Francisco than streets in a neighboring Swedish city where none of us have been. We know that the U.N. headquarters is on the east side of Manhattan and that Wall Street is at a walking distance from Battery Park. We all tend to look upwards. Swedes look to Stockholm, Paris, London and New York, but more seldom to Latvia or Bangladesh or Malawi.
All Swedish wikipedians know about the Swedish and the English Wikipedia. Of course the English is much larger. The fact that Ingmar Bergman's English article is three times longer (44K) than the Swedish one (16K), is taken for granted and not as an urgent crisis that needs to be addressed. The Swedish Wikipedia has articles about small Swedish places and people that don't have articles in the English Wikipedia, and so their sizes cannot be compared. It is also wellknown that the Swedish Wikipedia is slightly larger than our immediate neighboring languages Finnish, Norwegian and Danish.
The idea to compare the Swedish Wikipedia to the one in Czech, having the same number of speakers (10M), just doesn't occur, because nobody in Sweden speaks Czech and most would believe you need to know the language in order to understand anything. I guess it is mutual, the Czech speak their own language plus English and German and a bit of Polish, but nobody would care about Swedish.
It was only recently that the stubbiness of the Swedish Wikipedia, relative to other languages of Wikipedia, started to come into public awareness. Most regular wikipedians are now aware of this. But when we start to talk about improving quality, average Swedes who are occasional contributors still don't understand what this is about. They think we want to compete with traditional printed encyclopedias, and that we are losing our soul as a Net project. Why should any article be deleted, when storage is so cheap? Why require source citations and complete sentences and birth dates? Surely someone else is going to add that later.
It's easy to measure the average size of articles, but much harder to change it. Even if you add 50 bytes to each article, that would only move the average by 50 bytes and not from 2K to 4K. Just by translating from English, you can add 30K to the Swedish article on Ingmar Bergman, but that doesn't move the average size.
What I did was instead to look at the very smallest articles. There's a page [[Special:Shortpages]] for this, but unfortunately it contains both short articles and disambiguation pages. From the database dumps, I could filter out the disambiguations which are 10K on the Swedish Wikipedia, leaving 270K real articles.
I found that 0.1 percent (270 articles) were shorter than 90 bytes. This was better than the Arabic Wikipedia where 0.1 percent are shorter than 62 bytes, Latvian with 78 bytes and Estonian with 81 bytes. But it is far worse than Danish 130 bytes, Polish 160, Czech 213, German 280 and Russian 359 bytes. This was in April. By merging and improving the very shortest articles, the Swedish Wikipedia's 0.1 percent shortest articles now reach 145 bytes.
In the next step, I found 1.0 percent of articles (2700 articles) were shorter than 126 bytes, which was far worse than any other language I looked at. But during April and May this has improved to 171 bytes, which is better than Arabic and Estonian. During all of April, the Swedish Wikipedia didn't grow at all, because stubs were being merged and removed faster than new articles were written. This is when the Russian Wikipedia on May 19 got the 10th position at 283K articles.
By addressing the very shortest articles, we remove the easiest excuse for people to create new very short stubs. They can no longer point to other articles that are only 100 bytes long. After the lower limit is pushed from 120 to 140 bytes, it will continue to push upwards to 160 and 180 bytes.
Some might ask why we don't just remove everything shorter than 200 or 300 bytes. You are welcome to try to propose this, but so far every such proposal has met compact resitance. Instead I found 20 articles saying "X is an island in Y archipelago" and merged them into a nice article on "Y archipelago" with a map, so that everybody can agree that the new style is a real improvement. With that sort of aggregation, it becomes more obvious that island Z is missing or that island W doesn't really belong. Fact checking and quality assurance are improved by merging stubs. It's all about setting an example for the future. The average article size will move slowly over the next few years.
The next medium term goal is to look at the 10 percent shortest articles (27K of them), the "lower decile". For the Swedish Wikipedia, this has moved from 312 bytes in April to 349 bytes in June. For the Lithuanian, Estonian, Arabic, Danish Wikipedias it is around 400 bytes. Norwegian and Icelandic are around 500, Polish at 638. The French, Finnish and Hungarian are around 700. Latvian has 808, Czech 917, German 1081 and the Russian Wikipedia has 10 percent of its articles shorter than 1305 bytes.
If you believe the Polish 638 bytes is really bad, it's just because you haven't looked close enough at Danish and Norwegian. In fact, it's not so much worse than the French 682 bytes. You are welcome to make fun of the Swedish stubs, because it is an area where we can show constant improvement.
This hunt for "substubs" is just one of several quality improvement initiatives currently running on the Swedish Wikipedia.