Dear Toby,
I recently saw your comment on a blog
post<http://magnusmanske.de/wordpress/?p=173>by Magnus Manske
regarding the lack of Wikipedia page view data besides the
oft-overloaded
http://stats.grok.se/. I was wondering if there's been any
progress at WMF on building a more stable, central, and complete source for
this data?
I ask because I'm a data scientist at a small research non-profit
called Harmony
Institute <http://harmony-institute.org/>, where we study the social impact
of media (primarily television and film). I'm currently building an
interactive web app <http://harmony-institute.org/work/impactspace/> that
visualizes social impact on a variety of issues by many documentary films.
One indicator of interest is "information-seeking behavior," i.e. are
audiences seeking out information about a film or issue. Besides Google
search trends, an excellent proxy for this is Wikipedia page views for both
film pages, e.g. Escape
Fire<http://en.wikipedia.org/wiki/Escape_Fire:_The_Fight_to_Rescue_Ameri…re>,
and issue-related pages, e.g. Health care
reform<http://en.wikipedia.org/wiki/Health_care_reform>
.
I'm currently trying to use stats.grok.se to grab raw data in JSON form;
unfortunately, the site almost always responds with "Server overloaded,
please throttle your requests," and no amount of throttling seems to
suffice. I'm aware that there are many TBs of raw data for the downloading,
but I don't have the resources to handle that much data, nor do I need more
than the tiniest fraction of it.
I would *love* to show Wikipedia page view statistics for film pages in our
app. If you have any updates on progress or suggestions on how I might do
this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work -- I'm a proud donor to
the cause. :)
Best,
Burton DeWilde
--
Burton DeWilde
Data Scientist
Harmony Institute
harmony-institute.org
blog <http://harmony-institute.org/therippleeffect/> |
twitter<https://twitter.com/hinstitute>|
facebook <https://www.facebook.com/harmonyinstitute>