Hello Wikidatians,
I made a few visualizations of the distributions of language links in Wikidata Items. You can also use these stats to see which Items represent wikipedia articles which are unique to a language and compare the uniquenesses of all languages. Also I investigate all the items with just two language links, to look at Wikipedia "pairs"
See the full analysis: http://notconfusing.com/the-most-unique-wikipedias-according-to-wikidata/
Some sample visualisations: [http://notconfusing.com/wp-content/uploads/2013/06/Composition_zoom-effect-1...] [http://notconfusing.com/wp-content/uploads/2013/06/LangLinks_log-1024x577.pn...] [http://notconfusing.com/wp-content/uploads/2013/06/Uniquenesses-1024x577.png]
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
2013/6/12 Klein,Max kleinm@oclc.org
Hello Wikidatians,
I made a few visualizations of the distributions of language links in Wikidata Items. You can also use these stats to see which Items represent wikipedia articles which are unique to a language and compare the uniquenesses of all languages. Also I investigate all the items with just two language links, to look at Wikipedia "pairs"
The Cebuano-Waray-Swedish cluster is due to the fact that Lsjbot ( http://sv.wikipedia.org/wiki/Anv%C3%A4ndare:Lsjbot) has created a fair share of the articles on all three of them (and, yes, mainly about taxons).
//Johan Jönsson --
Le 2013-06-12 22:22, Klein,Max a écrit :
Hello Wikidatians,
I made a few visualizations of the distributions of language links in Wikidata Items. You can also use these stats to see which Items represent wikipedia articles which are unique to a language and compare the uniquenesses of all languages. Also I investigate all the items with just two language links, to look at Wikipedia "pairs"
See the full analysis:
http://notconfusing.com/the-most-unique-wikipedias-according-to-wikidata/ [1]
Interesting! Could you also create that kind of visualisations by topics : how much uniqueness come from biographies of local football people, compared with history events or abstract concepts ?
Also, in a completly unrelated topic, you may explain me in private what you mean with "Create a communal house to live in" which is in your public todo list, it sounds interesting. :P
Max, Nice work! It takes some study to read your graphs, but they are indeed fascinating. I am sure though that given time these will fluctuate a great deal. My gut feeling on the small wiki side of things is that there are lots of broken interwiki links because there are not enough people fixing those for small wikis. Thus, the amount of uniqueness on the small wiki side might be off by quite a bit. Something else I noticed just from working on Wiki Loves Monuments, is that most of the large wikipedia projects have a fairly dense coverage of topics by disambiguation pages, whereas the smaller projects have only got the local version, so for example, the article for "monument" may be split into many articles on the English wikipedia (Monument, National monument, Monument historique, etc), but in Slovenian there is just one, and so forth. It would be nice to see "Article trees" somehow, using disambiguation pages as startoff points. Jane
2013/6/13, Mathieu Stumpf psychoslave@culture-libre.org:
Le 2013-06-12 22:22, Klein,Max a écrit :
Hello Wikidatians,
I made a few visualizations of the distributions of language links in Wikidata Items. You can also use these stats to see which Items represent wikipedia articles which are unique to a language and compare the uniquenesses of all languages. Also I investigate all the items with just two language links, to look at Wikipedia "pairs"
See the full analysis:
http://notconfusing.com/the-most-unique-wikipedias-according-to-wikidata/ [1]
Interesting! Could you also create that kind of visualisations by topics : how much uniqueness come from biographies of local football people, compared with history events or abstract concepts ?
Also, in a completly unrelated topic, you may explain me in private what you mean with "Create a communal house to live in" which is in your public todo list, it sounds interesting. :P
-- Association Culture-Libre http://www.culture-libre.org/
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Cebuano and Waray are definitely outliers because they're bot-Wikipedias, 70 and 95 % articles bot created respectively. http://stats.wikimedia.org/EN/BotActivityMatrixCreates.htm sv should soon reach about 75 % bot creations and nl is rather stable around 50-60 %, so that explains most weird clusters. For your left to right ordering "by size", you should use "Usage" rather than number of articles, because when they differ too much there's something wrong. http://stats.wikimedia.org/EN/Sitemap.htm For instance, of the top 11-20 Wikipedias by number of articles only 2 are in the official www.wikipedia.org top 20 (which is by usage).
Nemo