Hi Erik,
in principle, yes, that would be useful. However:
* I would mostly need "last month" on a continued basis, at the moment
stretching back to September 2012 I believe
* As a flat file it's not seek-able, which means I would have to run
through the entire thing for each of my ~50 page sets, or keep all 50 in
memory; neither of which is appealing
Maybe I could read such a file into a toolserver database? It would be a
duplication of effort, and add load to the toolserver, but hey ;-)
Cheers,
Magnus
On Mon, Dec 3, 2012 at 11:59 PM, Erik Zachte <ezachte(a)wikimedia.org> wrote:
I have code to aggregate Domas' hourly file into a
daily file and later a
monthly file and still retain full hourly resolution.****
It has been a Xmas holiday past-time and is still a bit buggy, but I can
up the priority to fix this.****
** **
Intro:****
http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054590.html***
*
** **
Data:****
http://dumps.wikimedia.org/other/pagecounts-ez/monthly/****
(ge5 is subset of only pages with 5+ views per month, which makes big
difference in file size)****
** **
Would this be useful, for you Magnus?****
** **
Erik****
** **
** **
** **
** **
*From:* analytics-bounces(a)lists.wikimedia.org [mailto:
analytics-bounces(a)lists.wikimedia.org] *On Behalf Of *Magnus Manske
*Sent:* Monday, December 03, 2012 9:14 PM
*To:* A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics.
*Subject:* Re: [Analytics] Access to view stats****
** **
Hi Diderik,****
** **
in principle, all of the Wikimedia projects; currently, all listed at
http://stats.grok.se/ which is the top 100 or so Wikipedias. Plus
Commons, if possible.****
** **
As for the number of pages on those, that seems to fluctuate (probably
just my scripts breaking on slow/missing data, occasionally); the largest
amount I can find is May 2012, with ~350K pages. But, this only needs to
run once a month. I could even give you the list if you like, and you can
extract the data for me ;-)****
** **
But that would certainly not be a long-term, scalable solution. An SQL
interface on the toolserver would be ideal; a speedy http-based API would
be good as well (maybe even better, as it would not require the toolserver
;-), especially if it can take chunks of data (e.g. POST requests with a 1K
article list), so that I don't have to fire thousands of tiny queries.****
** **
Cheers,****
Magnus****
** **
** **
On Mon, Dec 3, 2012 at 7:38 PM, Diederik van Liere <
dvanliere(a)wikimedia.org> wrote:****
Hi Magnus,****
** **
Can you the list of pages for the Wikimedia projects that you are
interested in. Once I know how many pages that are I can come up with a
solution.****
D****
** **
On Mon, Dec 3, 2012 at 2:33 PM, Dario Taraborelli <
dtaraborelli(a)wikimedia.org> wrote:****
+1 ****
** **
I have received an increasing number of external requests for something
more efficient than the stats.grok.se JSON interface and more
user-friendly than Domas' hourly raw data. I am also one of the interested
consumers of this data. ****
** **
Diederik, any chance we could prioritize this request? I guess per-article
and per-project daily / monthly pv would be the most useful aggregation
level.****
** **
On Dec 3, 2012, at 11:11 AM, Magnus Manske <magnusmanske(a)googlemail.com>
wrote:****
** **
Hi all,****
** **
as you might know, I have a few GLAM-related tools on the toolserver. Some
are updated once a month, some can be used live, but all are in high demand
by GLAM institutions.****
** **
Now, the monthly updated stats have always been slow to run, but did
almost grind to a halt recently. The on-demand tools have stalled
completely.****
** **
All these tools get their data from stats.grok.se, which works well but
not really high-speed; my on-demand tools have apparently been shut out
recently because too many people were using them, DDOSing the server :-(**
**
** **
I know you are working on page view numbers, and for what I gather it's
up-and-running internally already. My requirements are simple: I have a
list of pages on many Wikimedia projects; I need view counts for these
pages for a specific month, per-page.****
** **
Now, I know that there is no public API yet, but is there any way I can
get to the data, at least for the monthly stats?****
** **
Cheers,****
Magnus****
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics****
** **
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics****
** **
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics****
** **
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics