Hi, I have a batch tool which collects article titles for any category and its subcategories (up to an arbitrary depth), then collects the page views for those articles for any given month and prints a sorted list. For optimal results the parsed category subtree often needs manual pruning (so weird subcategories can be blacklisted) or category depth should be kept modest.

 

Here's an example with top category 'WikiProject_Islands': http://ow.ly/QgahV

Tool is at http://ow.ly/QgbLO

But again it's a batch tool. One would have to download a file with monthly pageview totals from https://dumps.wikimedia.org/other/pagecounts-ez/merged/, or ask me to run a occasional ad hoc query.

 

Erik

 

From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Raymond Leonard
Sent: Wednesday, July 29, 2015 21:00
To: mdg@uw.edu; A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.
Cc: Felipe Hoffa
Subject: Re: [Analytics] New to list; please direct me to the tool(s) that I can use to determine per-page views per category/WikiProject

 

Michael,

Thanks! Your solution solves a different problem than I was looking for (list of page views of each article over time per a WikiProject), but having a comprehensive list of the pages under WikiProject_Seattle will undoubtedly be useful as well. Heck, I am tempted to install ActivePerl on my PC (used to have it on an earlier PC & also had access to a couple of Macs) to write a Perl script to convert the results into a .csv file to load it into Excel or the Libre Office equivalent.

Kudos to you & the UW iSchool.


Yours,
Peaceray
Cascadia Wikimedians User Group
peaceray@cascadia.wiki (redirects to)
raymond.f.leonard.jr@gmail.com

 

On Wed, Jul 29, 2015 at 11:11 AM, Michael Gilbert <mdg@uw.edu> wrote:

Peaceray,

Though it should be considered largely only research ready (i.e., potentially unstable, so not ready for production-level always-on tools), I've created a service that syncs and provides details on current WikiProjects.  For example:

https://alahele.ischool.uw.edu:8997/api/getProjects

Would return all current WikiProjects, defined as all pages in the Wikipedia namespace (4) starting with WikiProject_*, as well as pages in the Active_WikiProjects category on the Wikipedia namespace, allowing for both projects like WikiProject Seattle as well as Department of Fun to be recorded.

To get the pages under the scope of those projects, try:

https://alahele.ischool.uw.edu:8997/api/getProjectPages?project=WikiProject_Seattle

This currently returns 6,961 pages, including all pages under the project category as well as all sub-category pages to a depth of 2. It's possible something like this could be paired with stats.grok.se or the upcoming pageview API to get page view data on each of the Seattle-related articles (i.e., http://stats.grok.se/en/201507/Bitter_Lake,_Seattle), or, if you're looking for activity information you can get it from a separate request, below (which would return edits to the Space Needle page in the Article and Article Talk namespace, grouped by page, editor, and week, after March 5th, 2014):

https://alahele.ischool.uw.edu:8997/api/getEdits?page=Space_Needle&namespace=0|1&group=page|user|date&sd=20140305

(Rough) documentation and all the bits for the above are at: https://github.com/mdgilbert/wiki-tools (for instance, the syncProjects.py script is what collects the project and project-pages data).
(Also rough) documentation and the code for the node.js server which provides the data is at: https://github.com/mdgilbert/node-reflex

Any comments, suggestions, requests, etc always welcome. Cheers,

Michael Gilbert
Human Centered Design & Engineering, University of Washington

 

On 07/29/2015 09:59 AM, Raymond Leonard wrote:

However, for Category:WikiProject Seattle article, it only brings back the articles 10 Things I Hate About You through Ballard Carnegie Library, which is a little over 1100 articles, whereas there are 6,882 in the category alone (as of the time of this email), let alone subcategories. It may be that there is a limit as to the number of articles that the tool can pull monthly page views for, but it does not state that.

Do you know who developed TreeViews & how I can contact her/him/them?

 

On Wed, Jul 29, 2015 at 8:28 AM, Dan Andreescu <dandreescu@wikimedia.org> wrote:

Hi Raymond.  Currently we don't have any WMF-hosted tools that will let you get this information easily.  We have committed to deliver a Pageview API by the end of this quarter [1].  The first version will not have per-category totals, but it will have per-article totals.  Until then, there are community-built tools such as:

 

http://stats.grok.se (not updated for a while)

https://www.vitribyte.com/ (great dashboarding features but the future of the project is not determined yet)

 

Google Big Query has also ingested our hourly pageview dumps, I've cc-ed Felipe Hoffa so he can provide details on that.

 

The main problem with the solutions above is that they're based on an out-dated pageview definition that's been having more and more problems lately.  The Pageview API we are shipping at the end of this quarter will be based on higher quality data that makes an effort to detect spiders and normalize page titles across different access methods (API requests from mobile apps, different accents, etc).  Preliminary tests show that this data does not have the anomalies we've seen in the old data.

 

[1] if you're interested in following along or helping with this project, you can find it by searching for {slug} in our backlog and kanban task boards.

 

On Sun, Jul 26, 2015 at 3:21 PM, Raymond Leonard <raymond.f.leonard.jr@gmail.com> wrote:

Hello,

I am new to this list. I am looking to rejuvenate a semi-active WikiProject & am looking for a tool or tools that will list the frequency of individual per-page views for a given category/WikiProject. The time period could be preset to a period of time or specifiable --- my guess is that this may depend upon the particular tool(s).

We wish to use this as one of the inputs to determining the importance of an article to the WikiProject.

 

Please feel free to email me directly if you wish to avoid adding traffic to the mail list.

Yours,

Peaceray

Cascadia Wikimedians User Group

peaceray@cascadia.wiki (redirects to)

raymond.f.leonard.jr@gmail.com

 

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

 


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

 

 

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

 


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics