I have now given some thought on how to get a more substantial comparison between the different version, and would like you input on the feasibility to do something like this
1.get articles to compare- take the X (1000?) last updates of iw links from Wikidata. By doing this we find articles that are "alive". This can also easily be done by a bot
2.for each article find the other language versions it resides in (if none do not use this article). 2.1 And for each artciel on each language version, look into if the lenght of this article is above Y bytes (with/without templates) and have a least one references/source)
3.Add occurrences per version in the categories "very weak articles" (below 500 ch and/or no sources) and "acceptable" ones
4.Calculate per language version 4.1 coverage by number of occurrences divided by total number of articles looked into 4.2 quality (propotion of very weak articles)
Anders
Anders Wennersten, 24/06/2014 08:06:
below 500 ch
I assume you know of https://stats.wikimedia.org/EN/TablesArticlesGt500Bytes.htm
In the meanwhile I'll repeat myself ad nauseam: any approach based on random selection of articles is useless for quality assessment, because it's not representative of what people actually read on the wiki.
Nemo
My suggestion was to start with new iw links on Wikidata, which is Not a random selection, but represent the last created on any versions, which more represent the most asked for
Anders
Federico Leva (Nemo) skrev 2014-06-24 08:34:
Anders Wennersten, 24/06/2014 08:06:
below 500 ch
I assume you know of https://stats.wikimedia.org/EN/TablesArticlesGt500Bytes.htm
In the meanwhile I'll repeat myself ad nauseam: any approach based on random selection of articles is useless for quality assessment, because it's not representative of what people actually read on the wiki.
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org