Read the rest :P

On Jun 13, 2015 02:43, "Asaf Bartov" <abartov@wikimedia.org> wrote:
(adding Analytics, as a relevant group for this discussion.)

I think this is next to meaningless, because the differing bot policies and
practices on different wikis skew the data into incoherence.

The (already existing) metric of active-editors-per-million-speakers is, it
seems to me, a far more robust metric.  Erik Z.'s stats.wikimedia.org is
offering that metric.

   A.

On Sun, Jun 7, 2015 at 3:23 PM, Milos Rancic <millosh@gmail.com> wrote:

> When you get data, at some point of time you start thinking about
> quite fringe comparisons. But that could actually give some useful
> conclusions, like this time it did [1].
>
> We did the next:
> * Used the number of primary speakers from Ethnologue. (Erik Zachte is
> using approximate number of primary + secondary speakers; that could
> be good for correction of this data.)
> * Categorized languages according to the logarithmic number of
> speakers: >=10k, >=100k, >=1M, >=10M, >=100M.
> * Took the number of articles of Wikipedia in particular language and
> created ration (number of articles / number of speakers).
> * This list is consisted just of languages with Ethnologue status 1
> (national), 2 (provincial) or 3 (wider communication). In fact, we
> have a lot of projects (more than 100) with worse language status; a
> number of them are actually threatened or even on the edge of
> extinction.
>
> Those are the preliminary results and I will definitely have to pass
> through all the numbers. I fixed manually some serious errors, like
> not having English Wikipedia itself inside of data :D
>
> Putting the languages into the logarithmic categories proved to be
> useful, as we are now able to compare the Wikipedias according to
> their gross capacity (numbers of speakers). I suppose somebody well
> introduced into statistics could even create the function which could
> be used to check how good one project stays, no matter of those strict
> categories.
>
> It's obvious that as more speakers one language has, it's harder to
> the community to follow the ratio.
>
> So, the winners per category are:
> 1) >= 1k: Hawaiian, ratio 0.96900
> 2) >= 10k: Mirandese, ratio 0.18073
> 3) >= 100k: Basque, ratio 0.38061
> 4) >= 1M: Swedish, ratio 0.21381
> 5) >= 10M: Dutch, ratio 0.08305
> 6) >= 100M: English, ratio 0.01447
>
> However, keep in mind that we removed languages not inside categories
> 1, 2 or 3. That affected >=10k languages, as, for example, Upper
> Sorbian stays much better than Mirandese (0.67). (Will fix it while
> creating the full report. Obviously, in this case logarithmic
> categories of numbers of speakers are much more important than what's
> the state of the language.)
>
> It's obvious that we could draw the line between 1:1 for 1-10k
> speakers to 10:1 for >=100M speakers. But, again, I would like to get
> input of somebody more competent.
>
> One very important category is missing here and it's about the level
> of development of the speakers. That could be added: GDP/PPP per
> capita for spoken country or countries would be useful as measurement.
> And I suppose somebody with statistical knowledge would be able to
> give us the number which would have meaning "ability to create
> Wikipedia article".
>
> Completed in such way, we'd be able to measure the success of
> particular Wikimedia groups and organizations. OK. Articles per
> speaker are not the only way to do so, but we could use other
> parameters, as well: number of new/active/very active editors etc. And
> we could put it into time scale.
>
> I'll make some other results. And to remind: I'd like to have the
> formula to count "ability to create Wikipedia article" and then to
> produce "level of particular community success in creating Wikipedia
> articles". And, of course, to implement it for editors.
>
> [1]
> https://docs.google.com/spreadsheets/d/1TYyhETevEJ5MhfRheRn-aGc4cs_6k45Gwk_ic14TXY4/edit?usp=sharing
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>




--
    Asaf Bartov
    Wikimedia Foundation <http://www.wikimediafoundation.org>

Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us make it a reality!
https://donate.wikimedia.org
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org

Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>