Forwarding to Analytics in case anyone there is interested. Please discuss on the Research list.

Thanks,

Pine


On Sun, Jul 6, 2014 at 6:21 AM, Anders Wennersten <mail@anderswennersten.se> wrote:
A standard on measurement quality levels on articles would be excellent and enable much better comparisons between language versions.

I give some ideas of quality levels below, but I also want to stress that I believe  q also is related to coverage. En wp has most 100% q articles in many subject areas like films, and albums. But they have low coverage on poets whos work is not available in English, worse the dewp for example - how to evaluate something like that

My intuitive quality levels on articles are
-1 - Non acceptable quality
  Machine translated articles, vandalinfested articles, severe POV content, shorter the 300 characters with no sources etc. No bot should be allowed to generate, such lousy articles. They ought all to be    deleted, and I would expect there to be no articles at all of this inferior quality on the bigger versions.
0 - Missing articles, that ought to exist
1 - Rudimentary articles
   Articles but with proper sources, categories and infoboxes but short in substance.   Articles with proper substance but missing appropriate sources. Most proper botgenerated articles fall in this level
2 - OK articles
   Have both proper substance and sources, but is not complete, do not cover all aspects of subject. Some  few botgenerated articles fall in this level
3 - Good articles
  Cover the subject

For each of these levels it should be possible to develop detailed criteria which would enable us to machineread  articles and classify them on their qlevel as of above

Anders
 
Han-Teng Liao (OII) skrev 2014-07-06 13:29:
We need overview quality-minded metrics on different language versions of Wikipedias. Otherwise, the current "number games" played by bots across certain language versions have distorted the direction and focus of the editorial developments. I thereby propose an altmetric of "do-not-spread-oneself-too-thin" to counterbalance the situation. 

(Sorry I was late in engaging the conversation of "[Wiki-research-l] Quality on different language version". It is a follow-up reply and a suggestion to this discussion thread.)

For example, in the Chinese Wikipedia community, there are current discussions talking about the current ranking of Chinese Wikipedia in terms of number of articles, and how the *neighboring* versions (those who have similar numbers of articles) use bots to generate new articles.

# The stats report generated and used by the Chinese community to compare itself against neighboring language versions: 
#* Link 
# One current discussion: 
#* Link
# One recently archived discussion:
#* Link

To counterbalance the situation of such nonsensical comparison and competition, I personally think it is better to have an altmetric in place of the crude (and often distorting) measure of the number of articles. 

One would expect a better encyclopedia to contain a set of core articles of human knowledge. 

Indeed the meta has a list of 1000 articles that "every Wikipedia should have". http://meta.wikimedia.org/wiki/List_of_articles_every_Wikipedia_should_have

We can use this to generate a quantifiable metric of the development of the core articles in each language version, perhaps using the following numbers:

* number of references (total and per article)
* number of footnotes (total and per article)
* number of citations (total and per article)
* number of distinct wiki internal links to other articles
* number of good and feature articles (judged by each language version community)

Based on the above numbers, it is conceivable to come up with a metric that measure both the depth and breadth of the quality of the core articles. I admit that other measurements can and should be applied, but still the above numbers have the following merits:

* they reflect the nature of Wikipedia as dependent on other reliable secondary and primary information couces.
* they can be applied across languages automatically without the need to analyze texts, which requires more tools and engenders issues of comparability.

For the sake of simplicity, let us say that one language version (possibly English or German) has the highest number of scores, then that language version can then be served as baseline for comparison. Say this benchmark language version has:

# the quality-metric number of QUAL (from the vital 1000)
# the quantity number of total articles QUAN (from the existing metric)

Then the "do-not-spread-oneself-too-thin" quality metric can be calculated as:

QUAL/QUAN

(It can be further discussed whether logarithmic scales should be applied here.)

The gist of this "quality metric" is to reverse the obsession with the number of articles towards the important core articles, hoping to get more references, footnotes, citations, internal links and good/feature articles for the core 1000. It will hopefully indicate which language version is too "watery", or simply spreading oneself too thin with inconsequential short articles.
 
Let us have a discussion here [Wiki-research-l], before we extend the conversation to [Wikimedia-i].

Best,
han-teng liao


--
han-teng liao

"[O]nce the Imperial Institute of France and the Royal Society of London begin to work together on a new encyclopaedia, it will take less than a year to achieve a lasting peace between France and England." - Henri Saint-Simon (1810)

"A common ideology based on this Permanent World Encyclopaedia is a possible means, to some it seems the only means, of dissolving human conflict into unity." - H.G. Wells (1937)


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l