Hi Erik,
in principle, yes, that would be useful. However: * I would mostly need "last month" on a continued basis, at the moment stretching back to September 2012 I believe * As a flat file it's not seek-able, which means I would have to run through the entire thing for each of my ~50 page sets, or keep all 50 in memory; neither of which is appealing
Maybe I could read such a file into a toolserver database? It would be a duplication of effort, and add load to the toolserver, but hey ;-)
Cheers, Magnus
On Mon, Dec 3, 2012 at 11:59 PM, Erik Zachte ezachte@wikimedia.org wrote:
I have code to aggregate Domas' hourly file into a daily file and later a monthly file and still retain full hourly resolution.****
It has been a Xmas holiday past-time and is still a bit buggy, but I can up the priority to fix this.****
Intro:****
http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054590.html***
Data:****
http://dumps.wikimedia.org/other/pagecounts-ez/monthly/****
(ge5 is subset of only pages with 5+ views per month, which makes big difference in file size)****
Would this be useful, for you Magnus?****
Erik****
*From:* analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] *On Behalf Of *Magnus Manske *Sent:* Monday, December 03, 2012 9:14 PM *To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] Access to view stats****
Hi Diderik,****
in principle, all of the Wikimedia projects; currently, all listed at http://stats.grok.se/ which is the top 100 or so Wikipedias. Plus Commons, if possible.****
As for the number of pages on those, that seems to fluctuate (probably just my scripts breaking on slow/missing data, occasionally); the largest amount I can find is May 2012, with ~350K pages. But, this only needs to run once a month. I could even give you the list if you like, and you can extract the data for me ;-)****
But that would certainly not be a long-term, scalable solution. An SQL interface on the toolserver would be ideal; a speedy http-based API would be good as well (maybe even better, as it would not require the toolserver ;-), especially if it can take chunks of data (e.g. POST requests with a 1K article list), so that I don't have to fire thousands of tiny queries.****
Cheers,****
Magnus****
On Mon, Dec 3, 2012 at 7:38 PM, Diederik van Liere < dvanliere@wikimedia.org> wrote:****
Hi Magnus,****
Can you the list of pages for the Wikimedia projects that you are interested in. Once I know how many pages that are I can come up with a solution.****
D****
On Mon, Dec 3, 2012 at 2:33 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:****
+1 ****
I have received an increasing number of external requests for something more efficient than the stats.grok.se JSON interface and more user-friendly than Domas' hourly raw data. I am also one of the interested consumers of this data. ****
Diederik, any chance we could prioritize this request? I guess per-article and per-project daily / monthly pv would be the most useful aggregation level.****
On Dec 3, 2012, at 11:11 AM, Magnus Manske magnusmanske@googlemail.com wrote:****
Hi all,****
as you might know, I have a few GLAM-related tools on the toolserver. Some are updated once a month, some can be used live, but all are in high demand by GLAM institutions.****
Now, the monthly updated stats have always been slow to run, but did almost grind to a halt recently. The on-demand tools have stalled completely.****
All these tools get their data from stats.grok.se, which works well but not really high-speed; my on-demand tools have apparently been shut out recently because too many people were using them, DDOSing the server :-(** **
I know you are working on page view numbers, and for what I gather it's up-and-running internally already. My requirements are simple: I have a list of pages on many Wikimedia projects; I need view counts for these pages for a specific month, per-page.****
Now, I know that there is no public API yet, but is there any way I can get to the data, at least for the monthly stats?****
Cheers,****
Magnus****
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics****
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics****
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics****
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics