Hi Dan,
Thanks for running these!
I'm struck by the figure of 12.8m pages in ns0 - it looks like this
includes redirects (there are ~7.6m ns0 redirects on enwiki, and ~5.2m
articles). This will probably skew things a lot, as the majority of
those will probably be edited once and never touched again, barring
the target page being moved,. Given they're ~60% of the pages, this
will introduce a lot of extra weight for "articles with very few
edits" and "articles that get edited very infrequently".
It might be worth trying to filter out redirects - I suspect this
would have a noticeable effect on both the distribution and the mean
time between edits.
Andrew.
On 14 September 2016 at 22:01, Dan Andreescu <dandreescu@wikimedia.org> wrote:
> Quick follow up 'cause I was curious. I calculated the average and standard
> deviation for edits per namespace 0 article on enwiki. I tried to do it on
> the research db replicas but it took forever so I did it on the hadoop
> cluster. Including archived pages isn't useful, doesn't change the results
> almost at all. Including pages outside namespace 0 increases the standard
> deviation and decreases the average. Here are the results:
>
> 484,170,218 edits on namespace 0
> 12,756,342 pages in namespace 0
>
> standard deviation for edits per page: 213.58
> average edits per page: 38.02
> average days between first and last edit per page: 1215.27
>
> So considering the standard deviation is much larger than the mean, I'm
> pretty confident to answer yes, I think the vast majority of articles in
> namespace 0 on enwiki get very few edits. The dataset we're working on
> releasing as part of wikistats 2.0 will allow these kinds of questions to be
> answered really easily and really quickly. Stay tuned over the next few
> quarters :)
>
> And the queries:
> https://gist.github.com/milimetric/ 8b5f447e3ef09b6fe4384e0f75cc0b 34
>
> If you want to edit those queries to find something else out, I'm happy to
> run them one or two more times, but then I really have to get back to my
> real job :)
>
> On Wed, Sep 7, 2016 at 12:42 PM, Andrew Gray <andrew.gray@dunelm.org.uk>
> wrote:
>>
>> Hi Reem,
>>
>> Here's some rough estimates.
>>
>> English - https://stats.wikimedia.org/EN/TablesWikipediaEN.htm
>>
>> English has ~5.2 million articles, with an average of ~92 edits per
>> article, not counting deleted edits (or deleted articles). Note that 80% of
>> those articles are more than three years old, so they've had plenty of time
>> to build up the 92 edits.
>>
>> [The page does not explicitly say that only article edits are counted in
>> the tables, but this is easy to confirm -
>> https://en.wikipedia.org/wiki/Wikipedia:Statistics has 847m edits]
>>
>> Arabic - https://stats.wikimedia.org/EN/TablesWikipediaAR.htm
>>
>> Arabic has ~437k articles, ~31 edits/article - but only half of these are
>> more than three years old, so they're on average a lot younger than the
>> English ones.
>>
>> As of July there are 3.3m edits/month in English - this is equal to an
>> average of 0.63 edits/article/month - and 226k edits/month in Arabic, equal
>> to 0.52 edits/article/month. July was a slow month for Arabic, and March had
>> more than twice as many edits, 487k, across 415k articles.
>>
>> These are plain averages. The distribution is going to be very skewed, so
>> high-edit articles get most of the attention, and the other articles easily
>> go months without attention. If we assume an 80:20 distribution - which is a
>> wild guess but sounds plausible - then the "long tail" of 80% of articles
>> would get 20% of the edits. In this case, a plausible average would be:
>>
>> * English long tail, 4.16m articles and 660k edits/month = average of six
>> months between each edit
>> * Arabic (July) long tail, 350k articles and 45k edits/month = average of
>> seven or eight months between each edit
>> * Arabic (March) long tail, 332k articles and 97k edits/month = average of
>> three and a half months between each edit
>>
>> This is a broad range, but it feels more or less right for all those
>> unloved pages...
>>
>> Andrew.
>>
>>
>> On 7 September 2016 at 14:52, Reem Al-Kashif <reemalkashif@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I always hear people saying that most of the articles usually receive
>> > little
>> > to no edits (and that is used to encourage participants to make sure
>> > their
>> > articles are good enough). I would like to know if there are statistics
>> > that
>> > support this for the English and Arabic Wikipedia.
>> >
>> > Best,
>> > Reem
>> >
>> > --
>> > Kind regards,
>> > Reem Al-Kashif
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> - Andrew Gray
>> andrew.gray@dunelm.org.uk
>>
>>
>> --
>> - Andrew Gray
>> andrew.gray@dunelm.org.uk
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
--
- Andrew Gray
andrew.gray@dunelm.org.uk
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics