Thanks Jane for the comments and suggestions.
Correct me if I misread your comments/suggestions, Jane.
(1) Did you suggest measurements that are observable *inside*
Wikipedia/Wikimedia websites?
(2) If so, does it mean that your suggestion of measuring the current
state of a language version as "a combination of the state of its content
and community" describes only the *internal* state of that version?
(3) When you said "zero-state", did you mean the state where the number
of articles in a given language version is zero?
Your suggestions appear to me deal with a measurement of the current state
of a language version. The use of "zero-state" suggests the equal grounds
for any language version to develop on the Wikipedia platform.
However, my call for help focuses on the current external state out there
external to Wikipedia platform. In this context, the term *baseline*
suggests some languages are already *more equal* than the others because of
the availability of language users and content out there. Since Wikipedia
depends on reliable published secondary sources, some languages are
*expected* to be more developed than the others. What I want to do is to
come up such *expectation values* so that researchers and community members
can see which language versions perform better/worse than expected, in
comparison to other languages.
While I can agree that on the Wikipedia platform, any language may have
equal groundings when they start from zero. It is my contestation that some
languages are already *more equal* than the other.
In other words, I want to construct sensible baselines *against which* the
development of language versions can be better understood. Such baselines
thus should capture external factors that are likely to condition the
development. Normalization of development metrics using such baselines can
then control these external factors to see which language versions
underperform even when the external availability content and users is not
an issue. It can also help to see which language versions outperform even
when the external conditions are not that great.
Hence, I really appreciate your suggestions as potential indicators of the
(internal) development state of a language version of Wikipedia, but they
do not appear to capture factors that are external to Wikipedia.
Best,
2014-07-08 10:09 GMT+01:00 Jane Darnell <jane023(a)gmail.com>om>:
Well as I see it, the state of any language
version is a combination of
the state of its content and community. Going back to the zero-state, in
order to have permission to start a language version, there must be a "list
of 10,000 important topics" that has to be registered somewhere (sorry, no
idea where). This list for the English wikipedia includes an entry for the
singer Michael Jackson, one of the many articles that gets lots and lots of
page hits daily. Perhaps this is the case for all other languages in the
world (I have no idea), but I would assume one measurement going forward
from the zero-state would be the number of changes over time involving this
list in the specific language, such as
1) The list itself (do these topics ever change?)
2) The average number of edits and page views of those pages in the
specific language
3) The average number of blue links per page on those pages in the
specific language
4) The average number of editors *ever* contributing per page on those
pages in the specific language
5) The average number of active editors contributing per page on those
pages in the specific language
...
Other important measurements could be the number of active editors over
all, the number of edits appearing in the recent changes list per
day/month/year, the number of pages created or deleted per day/month/year...
On Tue, Jul 8, 2014 at 9:27 AM, Han-Teng Liao (OII) <
han-teng.liao(a)oii.ox.ac.uk> wrote:
Dear all,
Your suggestions are needed on the ways in which one can construct
some sensible baselines, most likely based on data sets *external* to
Wikipedia projects, of *expected* Wikipedia language versions development.
Such baselines should ideally indicate, given the availability of
language users and content (some numbers based on external data sets), a
certain language version should have expected number of articles/active
users.
As previous research has suggested that Wikipedia activities need
mutually-reinforcing cycles of participation, content, and readership, it
is expected that the development of a Wikipedia language version is
conditioned by the availability of (digitally) literate users and (possibly
digitized) content/sources.
So the assumption is:
Wikipedia Activities = Some function of (available users and content)
For example, the major non-English writing languages in the world
such as Arabic, Chinese, Spanish, etc., may have different numbers of
Internet users and digital content. These numbers indicate the basis on
which a Wikipedia language version can develop.
One practical use of this baseline measurement is to better
categorize/curate activities across Wikipedia language versions. We can
then better come up with expected values of Wikipedia development, and thus
categorize language versions accordingly based on the *external conditions*
of available/potential users and content.
Another use of this baseline measurement is to better compare the
development of different language versions. It should help answer questions
such as (1) whether Korean language version is *underdeveloped* on
Wikipedia platforms when compared with a language version that enjoys
similar number of available/potential users and content.
The current similar external baseline data is probably the number
of language speakers. My hunch is that it is not good enough in taking into
accounts the available/potential users and content, especially the
digitally-ready one.
So I welcome you to add to the following list, any external
indicators (and possibly data sources) that may help to construct such base
line.
==Indicators==
* Internet users for each language (probably approximate measurement
based on CLDR Territory-Language information and ITU internet penetration
rates.
* Number of books published annually in different languages (suggested
data sources? Does ISBN have a database or stat report on published
languages?)
* Number of web pages returned by major search engines on the queries of
"Wikipedia" in different languages, excluding results from Wikimedia
projects.
* Number of scholarly publications across languages (suggested data
sources?)
* Number of major newspaper publications across languages (suggested
data sources?)
Please share your thoughts!
--
han-teng liao
"[O]nce the Imperial Institute of France and the Royal Society of London
begin to work together on a new encyclopaedia, it will take less than a
year to achieve a lasting peace between France and England." - Henri
Saint-Simon (1810)
"A common ideology based on this Permanent World Encyclopaedia is a
possible means, to some it seems the only means, of dissolving human
conflict into unity." - H.G. Wells (1937)
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org