On 18 May 2014 10:06, h <hanteng@gmail.com> wrote:

Dear Nemo,

As I am waiting for a more complete response, I am not sure that I understand your last "No" as in "No, we definitely can't" means. To clarify, take the CLDR supplement Language-Territory information for example

http://www.unicode.org/cldr/charts/latest/supplemental/language_territory_information.html

One can suggest additions of the data point by submitting sourced numbers for a geo-linguistic population like this: http://unicode.org/cldr/trac/newticket?&description=%3Cterritory%2c%20speaker%20population%20in%20territory%2c%20and%20references%3E&summary=Add%20territory%20to%20Traditional%20Chinese%20(zh_Hant)

In Wikipedia articles and Wikidata pages, there are many attempts to provide more updated and better sourced data points. I see the potentials in exchanging such data, curating them better in Wikidata projects as more detailed and dynamic source than the CLDR.

These data points will have extra benefits in curating traffic data. For one, these geo-linguistic population data points would be useful to normalize traffic data for further analysis, such as geographic normalization. For another, they provide important reference data for the development strategies and policies of the Wikipedia projects.

Best,
han-teng liao

2014-05-18 16:23 GMT+08:00 Federico Leva (Nemo) <nemowiki@gmail.com>:

Thanks for your suggestions. Just some quick pointers below.

h, 18/05/2014 08:26:

(I-A). Tabulate the data points in absolute numbers first, not
percentage numbers [...]

(I-B). Include all language versions for the *editing traffic* report as
well. [...]

(I-C). Provide static data objects in more accessible format (i.e. csv
and/or json). [...]

(II-A). Putting viewing traffic and editing traffic report on the same
page. [...]

(II-B). Organizing and archiving the traffic reports for historical
comparison. [...]

(I-C). Provide dynamic data objects in more accessible format (i.e. csv
and/or json).

At least the first four are "just" changes in the WikiStats reports formatting, personally I encourage you to submit patches: <https://git.wikimedia.org/summary/analytics%2Fwikistats.git> (should be the "squids" directory, but there is some ongoing refactoring of the repos).

On archives and "history rewriting"/reports regeneration, see also https://bugzilla.wikimedia.org/show_bug.cgi?id=46198

[...] (III-B). Smaller (i.e more specific) geographic aggregate units.

The country (geographic) information is often based on geo-IP databases,
and sometimes provincial and city-level data would be available.

http://lists.wikimedia.org/pipermail/wikitech-l/2014-April/075964.html

[...]

( I know that the Unicode Common Locale Data Repository (CLDR Version 25
<http://cldr.unicode.org/index/downloads/cldr-25>)
provides“language-territory”
<http://www.unicode.org/cldr/charts/latest/supplemental/language_territory_information.html>or
“territory-language”
<http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_information.html>unit-based

charts, but I believe that the Wikimedia projects can use and build one
better..) [...]

No, we definitely can't, not alone. I've asked for help, please contribute: <https://www.mediawiki.org/wiki/Universal_Language_Selector/FAQ#How_does_Universal_Language_Selector_determine_which_languages_I_may_understand>.

Nemo

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Oliver Keyes
Research Analyst
Wikimedia Foundation