Re: [Analytics] Relevant Content Availability

17 Mar 2015

Abdel Samad, Rawia, 21/01/2015 09:47:
...
  I work for a consulting firm called Strategy&. We
have been engaged by
 Facebook on behalf of Internet.org to conduct a study on assessing the
 state of connectivity globally. One key area of focus is the
 availability of relevant online content. We are using a the availability
 of encyclopedic knowledge in one’s primary language as a proxy for
 relevant content. We define this as 100K+ Wikipedia articles in one’s
 primary language. 
Hello Rawia,
is there any update on this project? Have you contacted Google about 
similar "content availability" and "content ingestion" activities they

conducted in the past, also related to machine translation 
(https://meta.wikimedia.org/wiki/Machine_translation )?

We are very interested in this sort of initiatives (see also 
https://lists.wikimedia.org/pipermail/wiki-research-l/2015-March/004297.html 
), but experience taught us that looking at the wrong things can have 
terrible consequences.

Nemo

...
  We have a few questions related to this analysis
prior
 to publishing it:

 ·We are currently using the article count by language based on
 Wikimedia’s foundation public link: Source:
 http://meta.wikimedia.org/wiki/List_of_Wikipedias. Is this a reliable
 source for article count – does it include stubs?

 ·Is it possible to get historic data for article count. It would be
 great to monitor the evolution of the metric we have defined over time?

 ·What are the biggest drivers you’ve seen for step change in the number
 of articles (e.g., number of active admins, machine translation, etc.)

 ·We had to map Wikipedia language codes to ISO 639-3 language codes in
 Ethnologue (source we are using for primary language data). The 2
 language code for a wikipedia language in the “List of Wikipedias”
 sometimes matches but not always the ISO 639-1 code. Is there an easy
 way to do the mapping?

 Many Thanks,

 Rawia 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Relevant Content Availability