Ah point taken, @Aaron.
Though to my defense this is what the title reads on my screen:
" Top 100 editors in Module namespace in English Wikipedia (in th"
I use MS Unicode Sans as default browser font, which is a bit larger than most, with this unexpected side effect. J
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron Halfaker
Sent: Friday, June 02, 2017 17:47
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.
Subject: Re: [Analytics] Top editors in a certain namespace across sites?
Hi Erik,
My query only looks at the last 30 days. See the "(in the last 30 days)" suffix on the title :) This explains the discrepancy in our counts.
-Aaron
On Thu, Jun 1, 2017 at 3:54 PM, Erik Zachte <ezachte@wikimedia.org> wrote:
Here is an experiment I did about two months ago with First Normal Form (yikes!) and GNU. I just never posted this yet.
I collected alle edits from all wikis using recent full history stub dumps, with a perl script, which took some 30 hours.
The total file for all Wikimedia wikis is 240 GB uncompressed, 2.94 billion lines.
Each edit yields a record with wiki, timestamp, namespace, user name, article title and then some.
Now querying that file with grep,cut,sort,uniq,wc is pretty straightforward and remarkably fast (30-40 min), and rather versatile.
A) Top editors in namespace 'Module' (=828) on English Wikipedia:
grep -P "^enwiki," EditsTimestampsTitlesAll.csv | cut -d ',' -f 6,9 | grep -P "^828," | sort | uniq -c | sort -rn | head -n 500 > top500_edits_enwiki_namespace_828.txt
should yield figures similar to Aaron's quarry query [3], but in fact they are way higher
e.g. top editor (according to quarry) for that namespace 'Module' (=828) user 'Mehmedsons' had a two months ago 1251 instead of quarry's 234 edits, which is confirmed by [3]
[1] https://stats.wikimedia.org/archive/scan_edits/edits_namespace_828_enwiki.txt
[2] https://quarry.wmflabs.org/query/17556
[3] https://en.wikipedia.org/w/index.php?title=Special:Contributions&contribs=user&target=Mehmedsons&namespace=828&tagfilter
B) similarly, but for all wikis and namespaces: total edits per wiki per namespace per user (filter as you like)
cut -d ',' -f 1,6,9 EditsTimestampsTitlesAll.csv | sort -t\, -k 1,1 -k 2,2n -k 3,3 | uniq -c > edits_per_wiki_namespace_user.txt
[4] https://stats.wikimedia.org/archive/scan_edits/edits_per_wiki_namespace_user.zip
C) Most edited articles all over:
here are top 10, for top 10,000 see [5]
1259058 enwiki,4,Wikipedia,Administrator intervention against vandalism
955842 enwiki,4,Wikipedia,Administrators' noticeboard/Incidents
788061 enwiki,2,User,Cyde/List of candidates for speedy deletion/Subpage
654559 enwiki,4,Wikipedia,Sandbox
578992 metawiki,2,User,COIBot/LinkReports
446429 dewiki,4,Wikipedia,Vandalismusmeldung
434556 enwiki,4,Wikipedia,Requests for page protection
433781 enwiki,4,Wikipedia,Reference desk/Science
390821 commonswiki,4,Commons,Quality images candidates/candidate list
369557 enwiki,4,Wikipedia,Help desk
[5] https://stats.wikimedia.org/archive/scan_edits/top_10000_most_edited_articles.txt
-----Original Message-----
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Dan Andreescu
Sent: Friday, March 24, 2017 3:13
To: Andre Klapper; analytics@lists.wikimedia.org
Subject: Re: [Analytics] Top editors in a certain namespace across sites?
We are working real hard to make cross-site querying easy from quarry, by pointing it to the new data we're working on. So we hope to have that out as soon as the new labs db servers have data for all projects. A quick question on this topic: how far back do you all need to go? Whole history for most things or will you get a lot of value out of one or two years, with further back just being nice to have?
Original Message
From: Andre Klapper
Sent: Thursday, March 23, 2017 15:55
To: analytics@lists.wikimedia.org
Reply To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.
Subject: Re: [Analytics] Top editors in a certain namespace across sites?
On Wed, 2017-03-22 at 17:25 -0500, Aaron Halfaker wrote:
> https://quarry.wmflabs.org/query/17556
Thanks a lot everybody for your replies and explanations!
A welcome reminder that I have to learn more about Quarry (plus find a way to also query cross-site and maybe restrict to "recent edits").
andre
--
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics