Hi, I have a batch tool which collects article titles for any category and its
subcategories (up to an arbitrary depth), then collects the page views for those articles
for any given month and prints a sorted list. For optimal results the parsed category
subtree often needs manual pruning (so weird subcategories can be blacklisted) or category
depth should be kept modest.
Here's an example with top category 'WikiProject_Islands':
http://ow.ly/QgahV
Tool is at
http://ow.ly/QgbLO
But again it's a batch tool. One would have to download a file with monthly pageview
totals from
https://dumps.wikimedia.org/other/pagecounts-ez/merged/, or ask me to run a
occasional ad hoc query.
Erik
From: analytics-bounces(a)lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org]
On Behalf Of Raymond Leonard
Sent: Wednesday, July 29, 2015 21:00
To: mdg(a)uw.edu; A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics.
Cc: Felipe Hoffa
Subject: Re: [Analytics] New to list; please direct me to the tool(s) that I can use to
determine per-page views per category/WikiProject
Michael,
Thanks! Your solution solves a different problem than I was looking for (list of page
views of each article over time per a WikiProject), but having a comprehensive list of the
pages under WikiProject_Seattle will undoubtedly be useful as well. Heck, I am tempted to
install ActivePerl on my PC (used to have it on an earlier PC & also had access to a
couple of Macs) to write a Perl script to convert the results into a .csv file to load it
into Excel or the Libre Office equivalent.
Kudos to you & the UW iSchool.
Yours,
Peaceray <https://en.wikipedia.org/wiki/User:Peaceray>
Cascadia Wikimedians User Group <http://cascadia.wiki>
peaceray(a)cascadia.wiki (redirects to)
raymond.f.leonard.jr(a)gmail.com
On Wed, Jul 29, 2015 at 11:11 AM, Michael Gilbert <mdg(a)uw.edu> wrote:
Peaceray,
Though it should be considered largely only research ready (i.e., potentially unstable, so
not ready for production-level always-on tools), I've created a service that syncs and
provides details on current WikiProjects. For example:
https://alahele.ischool.uw.edu:8997/api/getProjects
Would return all current WikiProjects, defined as all pages in the Wikipedia namespace (4)
starting with WikiProject_*, as well as pages in the Active_WikiProjects category on the
Wikipedia namespace, allowing for both projects like WikiProject Seattle as well as
Department of Fun to be recorded.
To get the pages under the scope of those projects, try:
https://alahele.ischool.uw.edu:8997/api/getProjectPages?project=WikiProject…
This currently returns 6,961 pages, including all pages under the project category as well
as all sub-category pages to a depth of 2. It's possible something like this could be
paired with stats.grok.se or the upcoming pageview API to get page view data on each of
the Seattle-related articles (i.e.,
http://stats.grok.se/en/201507/Bitter_Lake,_Seattle),
or, if you're looking for activity information you can get it from a separate request,
below (which would return edits to the Space Needle page in the Article and Article Talk
namespace, grouped by page, editor, and week, after March 5th, 2014):
https://alahele.ischool.uw.edu:8997/api/getEdits?page=Space_Needle
<https://alahele.ischool.uw.edu:8997/api/getEdits?page=Space_Needle&namespace=0>
&namespace=0|1&group=page|user|date&sd=20140305
(Rough) documentation and all the bits for the above are at:
https://github.com/mdgilbert/wiki-tools (for instance, the syncProjects.py script is what
collects the project and project-pages data).
(Also rough) documentation and the code for the node.js server which provides the data is
at:
https://github.com/mdgilbert/node-reflex
Any comments, suggestions, requests, etc always welcome. Cheers,
Michael Gilbert
Human Centered Design & Engineering, University of Washington
On 07/29/2015 09:59 AM, Raymond Leonard wrote:
Hi Dan,
I did discover the TreeViews tool a couple of days ago on
tools.wmflabs.org:
http://tools.wmflabs.org/glamtools/treeviews/?q={%22rows%22%3A[{%22title%22…
<http://tools.wmflabs.org/glamtools/treeviews/?q=%7B%22rows%22%3A%5b%7B%22title%22%3A%22WikiProject%20Seattle%20articles%22%7D%5d%7D>
However, for Category:WikiProject Seattle article, it only brings back the articles 10
Things I Hate About You <http://en.wikipedia.org/wiki/10_Things_I_Hate_About_You>
through Ballard Carnegie Library
<http://en.wikipedia.org/wiki/Ballard_Carnegie_Library> , which is a little over
1100 articles, whereas there are 6,882 in the category alone (as of the time of this
email), let alone subcategories. It may be that there is a limit as to the number of
articles that the tool can pull monthly page views for, but it does not state that.
Do you know who developed TreeViews & how I can contact her/him/them?
Yours,
Peaceray <https://en.wikipedia.org/wiki/User:Peaceray>
Cascadia Wikimedians User Group <http://cascadia.wiki>
peaceray(a)cascadia.wiki (redirects to)
raymond.f.leonard.jr(a)gmail.com
On Wed, Jul 29, 2015 at 8:28 AM, Dan Andreescu <dandreescu(a)wikimedia.org> wrote:
Hi Raymond. Currently we don't have any WMF-hosted tools that will let you get this
information easily. We have committed to deliver a Pageview API by the end of this
quarter [1]. The first version will not have per-category totals, but it will have
per-article totals. Until then, there are community-built tools such as:
http://stats.grok.se (not updated for a while)
https://www.vitribyte.com/ (great dashboarding features but the future of the project is
not determined yet)
Google Big Query has also ingested our hourly pageview dumps, I've cc-ed Felipe Hoffa
so he can provide details on that.
The main problem with the solutions above is that they're based on an out-dated
pageview definition that's been having more and more problems lately. The Pageview
API we are shipping at the end of this quarter will be based on higher quality data that
makes an effort to detect spiders and normalize page titles across different access
methods (API requests from mobile apps, different accents, etc). Preliminary tests show
that this data does not have the anomalies we've seen in the old data.
[1] if you're interested in following along or helping with this project, you can find
it by searching for {slug} in our backlog
<https://phabricator.wikimedia.org/tag/analytics-backlog/> and kanban
<https://phabricator.wikimedia.org/tag/analytics-kanban/> task boards.
On Sun, Jul 26, 2015 at 3:21 PM, Raymond Leonard <raymond.f.leonard.jr(a)gmail.com>
wrote:
Hello,
I am new to this list. I am looking to rejuvenate a semi-active WikiProject & am
looking for a tool or tools that will list the frequency of individual per-page views for
a given category/WikiProject. The time period could be preset to a period of time or
specifiable --- my guess is that this may depend upon the particular tool(s).
We wish to use this as one of the inputs to determining the importance of an article to
the WikiProject.
Please feel free to email me directly if you wish to avoid adding traffic to the mail
list.
Yours,
Peaceray <https://en.wikipedia.org/wiki/User:Peaceray>
Cascadia Wikimedians User Group <http://cascadia.wiki>
peaceray(a)cascadia.wiki (redirects to)
raymond.f.leonard.jr(a)gmail.com
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics