Not sure I want to throw the API open to the public (the grok.se folks,
and others, have a fine service for casual experimentation).
However, I am willing to share the data with interested researchers who
need to do some serious crunching (I have a Java API and could
distribute database credentials on a per-case basis).
I'll note that I only parse English Wikipedia at this time. I've found
it useful in my anti-vandalism research (i.e., "given that edit survived
between time [w] and [x] on article [y], we estimate it received [z]
views"). Thanks, -AW
On 04/06/2011 08:44 PM, MZMcBride wrote:
Andrew G. West wrote:
I've parsed every one of these files (at hour
granularity; grok.se
aggregates at day-level, I believe) since Jan. 2010 into a DB structure
indexed by page title. It takes up about 400GB of space, at the moment.
Is your database available to the public? The Toolserver folks have been
talking about getting the page view stats into usable form for quite some
time, but nothing's happened yet. If you have an API or something similar,
that would be fantastic. (stats.grok.se has a rudimentary API that I don't
imagine many people are aware of.)
MZMcBride
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Andrew G. West, Doctoral Student
Dept. of Computer and Information Science
University of Pennsylvania, Philadelphia PA
Website:
http://www.cis.upenn.edu/~westand