Re: [Analytics] Analytics Digest, Vol 29, Issue 7 - Analytics

8 Jul 2014

Hi Pine,

Many thanks for this. I've had some trouble with wifi while traveling, so
will probably be able to send you the final responses as soon as I'm back
in SF (I'm in Frankfurt en route to SF). Apologies for the delay!

Hope you're well,
Anasuya
On Jul 7, 2014 2:01 PM, &lt;analytics-request(a)lists.wikimedia.org&gt; wrote:

...
  Send Analytics mailing list submissions to
         analytics(a)lists.wikimedia.org

 To subscribe or unsubscribe via the World Wide Web, visit
         https://lists.wikimedia.org/mailman/listinfo/analytics
 or, via email, send a message with subject or body 'help' to
         analytics-request(a)lists.wikimedia.org

 You can reach the person managing the list at
         analytics-owner(a)lists.wikimedia.org

 When replying, please edit your Subject line so it is more specific
 than "Re: Contents of Analytics digest..."

 Today's Topics:

    1. Re: [Wiki-research-l] We need overview quality-minded metrics
       for different language versions of Wikipedia. (Pine W)

 ----------------------------------------------------------------------

 Message: 1
 Date: Sun, 6 Jul 2014 10:57:03 -0700
 From: Pine W &lt;wiki.pine(a)gmail.com&gt;
 To: Research into Wikimedia content and communities
         &lt;wiki-research-l(a)lists.wikimedia.org&gt;rg>,  "A mailing list for the
         Analytics Team at WMF and everybody who has an  interest in
 Wikipedia
         and analytics." &lt;analytics(a)lists.wikimedia.org&gt;
 Subject: Re: [Analytics] [Wiki-research-l] We need overview
         quality-minded metrics for different language versions of
 Wikipedia.
 Message-ID:
         <CAF=
 dyJiZNBO6sw+g-R7n77EkPatyd18oRPUmJDFH8z2dJ3AjhQ(a)mail.gmail.com&gt;
 Content-Type: text/plain; charset="utf-8"

 Forwarding to Analytics in case anyone there is interested. Please discuss
 on the Research list.

 Thanks,

 Pine

 On Sun, Jul 6, 2014 at 6:21 AM, Anders Wennersten <
 mail(a)anderswennersten.se&gt;
 wrote:

   A standard on measurement quality levels on
articles would be excellent
 and enable much better comparisons between language versions.

 I give some ideas of quality levels below, but I also want to stress that
 I believe  q also is related to coverage. En wp has most 100% q articles  in
  many subject areas like films, and albums. But
they have low coverage on
 poets whos work is not available in English, worse the dewp for example -
 how to evaluate something like that

 My intuitive quality levels on articles are
 -1 - Non acceptable quality
   Machine translated articles, vandalinfested articles, severe POV
 content, shorter the 300 characters with no sources etc. No bot should be
 allowed to generate, such lousy articles. They ought all to be   deleted,
  and I would expect there to be no articles at all
of this inferior  quality
  on the bigger versions.
 0 - Missing articles, that ought to exist
 1 - Rudimentary articles
    Articles but with proper sources, categories and infoboxes but short  in
  substance.   Articles with proper substance but
missing appropriate
 sources. Most proper botgenerated articles fall in this level
 2 - OK articles
    Have both proper substance and sources, but is not complete, do not
 cover all aspects of subject. Some  few botgenerated articles fall in  this
  level
 3 - Good articles
   Cover the subject

 For each of these levels it should be possible to develop detailed
 criteria which would enable us to machineread  articles and classify them
 on their qlevel as of above

 Anders

 Han-Teng Liao (OII) skrev 2014-07-06 13:29:

 We need overview quality-minded metrics on different language versions of
 Wikipedias. Otherwise, the current "number games" played by bots across
 certain language versions have distorted the direction and focus of the
 editorial developments. I thereby propose an altmetric of
 "do-not-spread-oneself-too-thin" to counterbalance the situation.

  (Sorry I was late in engaging the conversation of "[Wiki-research-l] 
Quality
  on different language version
 < 
http://www.mail-archive.com/wiki-research-l@lists.wikimedia.org/msg03168.ht…
 ".
 It is a follow-up reply and a suggestion to this discussion thread.)

  For example, in the Chinese Wikipedia community, there are current
 discussions talking about the current ranking of Chinese Wikipedia in  terms
  of number of articles, and how the *neighboring*
versions (those who have
 similar numbers of articles) use bots to generate new articles.

  # The stats report generated and used by the Chinese community to
 compare itself against neighboring language versions:
  #* Link
 < 
http://zh.wikipedia.org/wiki/Wikipedia:%E7%BB%9F%E8%AE%A1/%E4%B8%8E%E9%82%B…

  #* Google translated
 < 
https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=h…

  # One current discussion:
  #* Link
 < 
http://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88…

 #* Google translated
 < 
https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=…

 # One recently archived discussion:
 #* Link
 < 
http://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88…

 #* Google translated
 < 
https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=h…

  To counterbalance the situation of such nonsensical comparison and
 competition, I personally think it is better to have an altmetric in  place
  of the crude (and often distorting) measure of
the number of articles.

  One would expect a better encyclopedia to contain a set of core articles
 of human knowledge.

  Indeed the meta has a list of 1000 articles that "every Wikipedia should
 have".

 http://meta.wikimedia.org/wiki/List_of_articles_every_Wikipedia_should_have

  We can use this to generate a quantifiable metric of the development of
 the core articles in each language version, perhaps using the following
 numbers:

  * number of references (total and per article)
 * number of footnotes (total and per article)
 * number of citations (total and per article)
 * number of distinct wiki internal links to other articles
 * number of good and feature articles (judged by each language version
 community)

  Based on the above numbers, it is conceivable to come up with a metric
 that measure both the depth and breadth of the quality of the core
 articles. I admit that other measurements can and should be applied, but
 still the above numbers have the following merits:

  * they reflect the nature of Wikipedia as dependent on other reliable
 secondary and primary information couces.
 * they can be applied across languages automatically without the need to
 analyze texts, which requires more tools and engenders issues of
 comparability.

  For the sake of simplicity, let us say that one language version
 (possibly English or German) has the highest number of scores, then that
 language version can then be served as baseline for comparison. Say this
 benchmark language version has:

  # the quality-metric number of QUAL (from the vital 1000)
 # the quantity number of total articles QUAN (from the existing metric)

  Then the "do-not-spread-oneself-too-thin" quality metric can be
 calculated as:

  QUAL/QUAN

  (It can be further discussed whether logarithmic scales should be
 applied here.)

  The gist of this "quality metric" is to reverse the obsession with the
 number of articles towards the important core articles, hoping to get  more
  references, footnotes, citations, internal links
and good/feature  articles
  for the core 1000. It will hopefully indicate
which language version is  too
  "watery", or simply spreading oneself
too thin with inconsequential short
 articles.

  Let us have a discussion here [Wiki-research-l], before we extend the
 conversation to [Wikimedia-i].

  Best,
 han-teng liao

  --
 han-teng liao

  "[O]nce the Imperial Institute of France and the Royal Society of London
 begin to work together on a new encyclopaedia, it will take less than a
 year to achieve a lasting peace between France and England." - Henri
 Saint-Simon (1810)

  "A common ideology based on this Permanent World Encyclopaedia is a
 possible means, to some it seems the only means, of dissolving human
 conflict into unity." - H.G. Wells (1937)

 _______________________________________________
 Wiki-research-l mailing listWiki-research-l@lists.wikimedia.orghttps:// 
lists.wikimedia.org/mailman/listinfo/wiki-research-l

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l