Dear Toby,
I recently saw your comment on a blog posthttp://magnusmanske.de/wordpress/?p=173by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute http://harmony-institute.org/, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app http://harmony-institute.org/work/impactspace/ that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Firehttp://en.wikipedia.org/wiki/Escape_Fire:_The_Fight_to_Rescue_American_Healthcare, and issue-related pages, e.g. Health care reformhttp://en.wikipedia.org/wiki/Health_care_reform .
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would *love* to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work -- I'm a proud donor to the cause. :)
Best, Burton DeWilde
Hi Burton,
We just opened a new site www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
I hope it will serve your purpose. I am ready to discuss integration out of the list.
Alex Druk
On Mon, Mar 24, 2014 at 11:40 PM, Burton DeWilde < burton@harmony-institute.org> wrote:
Dear Toby,
I recently saw your comment on a blog posthttp://magnusmanske.de/wordpress/?p=173by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute http://harmony-institute.org/, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app http://harmony-institute.org/work/impactspace/ that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Firehttp://en.wikipedia.org/wiki/Escape_Fire:_The_Fight_to_Rescue_American_Healthcare, and issue-related pages, e.g. Health care reformhttp://en.wikipedia.org/wiki/Health_care_reform .
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would *love* to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work — I'm a proud donor to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog http://harmony-institute.org/therippleeffect/ | twitterhttps://twitter.com/hinstitute| facebook https://www.facebook.com/harmonyinstitute
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Quick question: Does this have en.wp data only, or can I query (as in CSV) other wikipedias/projects? And, can I limit the data range (not really necessary, but less data to transmit)?
On Tue, Mar 25, 2014 at 9:09 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Burton,
We just opened a new site www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
I hope it will serve your purpose. I am ready to discuss integration out of the list.
Alex Druk
On Mon, Mar 24, 2014 at 11:40 PM, Burton DeWilde < burton@harmony-institute.org> wrote:
Dear Toby,
I recently saw your comment on a blog posthttp://magnusmanske.de/wordpress/?p=173by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute http://harmony-institute.org/, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app http://harmony-institute.org/work/impactspace/that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Firehttp://en.wikipedia.org/wiki/Escape_Fire:_The_Fight_to_Rescue_American_Healthcare, and issue-related pages, e.g. Health care reformhttp://en.wikipedia.org/wiki/Health_care_reform .
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would *love* to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work — I'm a proud donor to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog http://harmony-institute.org/therippleeffect/ | twitterhttps://twitter.com/hinstitute| facebook https://www.facebook.com/harmonyinstitute
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Magnus,
Only en.wp for now. We will wait and see how popular it is before adding other projects. You cannot limit data range in csv now, but the size of the response is usually < 10 KB.
By the way, many thanks for your great work!
Regards, Alex
On Tue, Mar 25, 2014 at 10:53 AM, Magnus Manske <magnusmanske@googlemail.com
wrote:
Quick question: Does this have en.wp data only, or can I query (as in CSV) other wikipedias/projects? And, can I limit the data range (not really necessary, but less data to transmit)?
On Tue, Mar 25, 2014 at 9:09 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Burton,
We just opened a new site www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
I hope it will serve your purpose. I am ready to discuss integration out of the list.
Alex Druk
On Mon, Mar 24, 2014 at 11:40 PM, Burton DeWilde < burton@harmony-institute.org> wrote:
Dear Toby,
I recently saw your comment on a blog posthttp://magnusmanske.de/wordpress/?p=173by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute http://harmony-institute.org/, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app http://harmony-institute.org/work/impactspace/that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Firehttp://en.wikipedia.org/wiki/Escape_Fire:_The_Fight_to_Rescue_American_Healthcare, and issue-related pages, e.g. Health care reformhttp://en.wikipedia.org/wiki/Health_care_reform .
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would *love* to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work — I'm a proud donor to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog http://harmony-institute.org/therippleeffect/ | twitterhttps://twitter.com/hinstitute| facebook https://www.facebook.com/harmonyinstitute
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- undefined
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
@Alex: that's really awesome. Thanks for providing a stats.grok.se alternative. Really looking forward to other languages as well, and maybe throw in Commons in the mix as well?
-- Hay
On Tue, Mar 25, 2014 at 11:06 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Magnus,
Only en.wp for now. We will wait and see how popular it is before adding other projects. You cannot limit data range in csv now, but the size of the response is usually < 10 KB.
By the way, many thanks for your great work!
Regards, Alex
On Tue, Mar 25, 2014 at 10:53 AM, Magnus Manske magnusmanske@googlemail.com wrote:
Quick question: Does this have en.wp data only, or can I query (as in CSV) other wikipedias/projects? And, can I limit the data range (not really necessary, but less data to transmit)?
On Tue, Mar 25, 2014 at 9:09 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Burton,
We just opened a new site www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
I hope it will serve your purpose. I am ready to discuss integration out of the list.
Alex Druk
On Mon, Mar 24, 2014 at 11:40 PM, Burton DeWilde burton@harmony-institute.org wrote:
Dear Toby,
I recently saw your comment on a blog post by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Fire, and issue-related pages, e.g. Health care reform.
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would love to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work — I'm a proud donor to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog | twitter | facebook
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- undefined
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
@Hay: Thank you. The site is still in very early stage of development. We would like to get constructive criticism from the wikipedians from this list first. Our resources are very limited and we cannot include all other wiki projects now. What languages would you like to see first? What projects? Commons is a good idea, but I am not sure many people use these stats.
On Tue, Mar 25, 2014 at 12:00 PM, Hay (Husky) huskyr@gmail.com wrote:
@Alex: that's really awesome. Thanks for providing a stats.grok.se alternative. Really looking forward to other languages as well, and maybe throw in Commons in the mix as well?
-- Hay
On Tue, Mar 25, 2014 at 11:06 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Magnus,
Only en.wp for now. We will wait and see how popular it is before adding other projects. You cannot limit data range in csv now, but the size of the response is usually < 10 KB.
By the way, many thanks for your great work!
Regards, Alex
On Tue, Mar 25, 2014 at 10:53 AM, Magnus Manske magnusmanske@googlemail.com wrote:
Quick question: Does this have en.wp data only, or can I query (as in
CSV)
other wikipedias/projects? And, can I limit the data range (not really necessary, but less data to transmit)?
On Tue, Mar 25, 2014 at 9:09 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Burton,
We just opened a new site www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/,
but use
slightly different approach to calculating and presenting data as well
as
allow comparison of different articles.
I hope it will serve your purpose. I am ready to discuss integration
out
of the list.
Alex Druk
On Mon, Mar 24, 2014 at 11:40 PM, Burton DeWilde burton@harmony-institute.org wrote:
Dear Toby,
I recently saw your comment on a blog post by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress
at WMF
on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit
called
Harmony Institute, where we study the social impact of media
(primarily
television and film). I'm currently building an interactive web app
that
visualizes social impact on a variety of issues by many documentary
films.
One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides
search trends, an excellent proxy for this is Wikipedia page views
for both
film pages, e.g. Escape Fire, and issue-related pages, e.g. Health
care
reform.
I'm currently trying to use stats.grok.se to grab raw data in JSON
form;
unfortunately, the site almost always responds with "Server
overloaded,
please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the
downloading,
but I don't have the resources to handle that much data, nor do I
need more
than the tiniest fraction of it.
I would love to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I
might
do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work — I'm a proud
donor
to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog | twitter | facebook
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- undefined
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Just saw an update from Henrik: «hopefully within the next two or three weeks the capacity of stats.grok.se will be quadrupled». https://en.wikipedia.org/w/index.php?title=User_talk:Henrik&diff=6009179...
Alex Druk, 25/03/2014 10:09:
We just opened a new site www.wikipediatrends.com http://www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
Nice, do you have an issue tracker? Apparently all titles with diacritics are broken, e.g. if I enter Ruisreikäleipä it's accepted but nothing happens and if I try to autocomplete to something simpler like "Fabrizio De André" I get two "null" in dropdown.
Nemo
@Nemo: Ooops! Will be fix soon. yes, we have issue tracker at https://github.com/sergeychernyshev/wikitrends/issues?direction=desc&lab... You can also submit any at http://www.wikipediatrends.com/ContactUs.php
On Tue, Mar 25, 2014 at 12:30 PM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Just saw an update from Henrik: «hopefully within the next two or three weeks the capacity of stats.grok.se will be quadrupled». https://en.wikipedia.org/w/index.php?title=User_talk: Henrik&diff=600917917&oldid=600897425
Alex Druk, 25/03/2014 10:09:
We just opened a new site www.wikipediatrends.com http://www.wikipediatrends.com that show Wikipedia page view data. Our
site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
Nice, do you have an issue tracker? Apparently all titles with diacritics are broken, e.g. if I enter Ruisreikäleipä it's accepted but nothing happens and if I try to autocomplete to something simpler like "Fabrizio De André" I get two "null" in dropdown.
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Tue, Mar 25, 2014 at 12:40 PM, Alex Druk alex.druk@gmail.com wrote:
@Nemo: Ooops! Will be fix soon. yes, we have issue tracker at https://github.com/sergeychernyshev/wikitrends/issues?direction=desc&lab... You can also submit any at http://www.wikipediatrends.com/ContactUs.php
The Github page gives a 404... maybe the repo is private?
Our resources are very limited and we cannot include all other wiki projects now. What languages would you like to see first? What projects?
Purely from my own point of view, the Dutch WP and Commons would be most useful for me. I guess you might get some useful information from this list:
https://meta.wikimedia.org/wiki/List_of_wikipedia#All_Wikipedias_ordered_by_...
-- Hay
Fascinating website, and I love the comparison option - I just compared page hits on Haarlem vs Leiden and I guess the spikes due to tourist attractions.
2014-03-25 12:48 GMT+01:00, Hay (Husky) huskyr@gmail.com:
On Tue, Mar 25, 2014 at 12:40 PM, Alex Druk alex.druk@gmail.com wrote:
@Nemo: Ooops! Will be fix soon. yes, we have issue tracker at https://github.com/sergeychernyshev/wikitrends/issues?direction=desc&lab... You can also submit any at http://www.wikipediatrends.com/ContactUs.php
The Github page gives a 404... maybe the repo is private?
Our resources are very limited and we cannot include all other wiki projects now. What languages would you like to see first? What projects?
Purely from my own point of view, the Dutch WP and Commons would be most useful for me. I guess you might get some useful information from this list:
https://meta.wikimedia.org/wiki/List_of_wikipedia#All_Wikipedias_ordered_by_...
-- Hay
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Burton,
nicely done (and yay for using dygraphs) – with what frequenty do you expect wikipediatrends to ingest new data from the raw pageview dumps? I assume it’s once a month?
Dario
On Mar 25, 2014, at 2:09 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Burton,
We just opened a new site www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
I hope it will serve your purpose. I am ready to discuss integration out of the list.
Alex Druk
On Mon, Mar 24, 2014 at 11:40 PM, Burton DeWilde burton@harmony-institute.org wrote: Dear Toby,
I recently saw your comment on a blog post by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Fire, and issue-related pages, e.g. Health care reform.
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would love to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work — I'm a proud donor to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog | twitter | facebook
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
apologies, s/Burton/Alex :)
one more question: is there any plan to add a JSON interface on top of the CSV download? Many people have relied on stats.grok.se JSON output for years and it would be fantastic to have wikipediatrends return data in the same format.
Dario
On Mar 25, 2014, at 6:27 AM, Dario Taraborelli dario@wikimedia.org wrote:
Hi Burton,
nicely done (and yay for using dygraphs) – with what frequenty do you expect wikipediatrends to ingest new data from the raw pageview dumps? I assume it’s once a month?
Dario
On Mar 25, 2014, at 2:09 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Burton,
We just opened a new site www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
I hope it will serve your purpose. I am ready to discuss integration out of the list.
Alex Druk
On Mon, Mar 24, 2014 at 11:40 PM, Burton DeWilde burton@harmony-institute.org wrote: Dear Toby,
I recently saw your comment on a blog post by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Fire, and issue-related pages, e.g. Health care reform.
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would love to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work — I'm a proud donor to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog | twitter | facebook
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
@Dario: thanks. yes, we renew the site once in a month, usually around 10th of each month because dependence on dumps. And yes, we plan to introduce JSON
Alex
On Tue, Mar 25, 2014 at 2:32 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
apologies, s/Burton/Alex :)
one more question: is there any plan to add a JSON interface on top of the CSV download? Many people have relied on stats.grok.se JSON output for years and it would be fantastic to have wikipediatrends return data in the same format.
Dario
On Mar 25, 2014, at 6:27 AM, Dario Taraborelli dario@wikimedia.org wrote:
Hi Burton,
nicely done (and yay for using dygraphs) – with what frequenty do you expect wikipediatrends to ingest new data from the raw pageview dumps? I assume it’s once a month?
Dario
On Mar 25, 2014, at 2:09 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Burton,
We just opened a new site www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
I hope it will serve your purpose. I am ready to discuss integration out of the list.
Alex Druk
On Mon, Mar 24, 2014 at 11:40 PM, Burton DeWilde < burton@harmony-institute.org> wrote:
Dear Toby,
I recently saw your comment on a blog posthttp://magnusmanske.de/wordpress/?p=173by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute http://harmony-institute.org/, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app http://harmony-institute.org/work/impactspace/that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Firehttp://en.wikipedia.org/wiki/Escape_Fire:_The_Fight_to_Rescue_American_Healthcare, and issue-related pages, e.g. Health care reformhttp://en.wikipedia.org/wiki/Health_care_reform .
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would *love* to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work — I'm a proud donor to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog http://harmony-institute.org/therippleeffect/ | twitterhttps://twitter.com/hinstitute| facebook https://www.facebook.com/harmonyinstitute
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Mar 25, 2014, at 7:01 AM, Alex Druk alex.druk@gmail.com wrote:
@Dario: thanks. yes, we renew the site once in a month, usually around 10th of each month because dependence on dumps. And yes, we plan to introduce JSON
awesome
also, I noticed some inconsistency in the heading/titles that you may want to fix: “Wikipedia Articles Trends”, “Wikipedia trends”, “Wikipedia pageview statistics”, “Wiki Trends”.
Dario
On Tue, Mar 25, 2014 at 2:32 PM, Dario Taraborelli dtaraborelli@wikimedia.org wrote: apologies, s/Burton/Alex :)
one more question: is there any plan to add a JSON interface on top of the CSV download? Many people have relied on stats.grok.se JSON output for years and it would be fantastic to have wikipediatrends return data in the same format.
Dario
On Mar 25, 2014, at 6:27 AM, Dario Taraborelli dario@wikimedia.org wrote:
Hi Burton,
nicely done (and yay for using dygraphs) – with what frequenty do you expect wikipediatrends to ingest new data from the raw pageview dumps? I assume it’s once a month?
Dario
On Mar 25, 2014, at 2:09 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Burton,
We just opened a new site www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
I hope it will serve your purpose. I am ready to discuss integration out of the list.
Alex Druk
On Mon, Mar 24, 2014 at 11:40 PM, Burton DeWilde burton@harmony-institute.org wrote: Dear Toby,
I recently saw your comment on a blog post by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Fire, and issue-related pages, e.g. Health care reform.
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would love to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work — I'm a proud donor to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog | twitter | facebook
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I haven’t checked the raw logs to compare them with the visualization but I think we should QA the data: the raw (unsmoothed) series for Eros shows a spike on 2/14 (Valentine’s Day, predictably) with 6,920 pageviews, while stats.grok.se reports for the same date 3,209 page views. I don’t think any interpolation for missing data occurred around that date.
[1] http://www.wikipediatrends.com/?query%5B%5D=Eros [2] http://stats.grok.se/en/latest90/Eros
On Mar 25, 2014, at 7:09 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
On Mar 25, 2014, at 7:01 AM, Alex Druk alex.druk@gmail.com wrote:
@Dario: thanks. yes, we renew the site once in a month, usually around 10th of each month because dependence on dumps. And yes, we plan to introduce JSON
awesome
also, I noticed some inconsistency in the heading/titles that you may want to fix: “Wikipedia Articles Trends”, “Wikipedia trends”, “Wikipedia pageview statistics”, “Wiki Trends”.
Dario
On Tue, Mar 25, 2014 at 2:32 PM, Dario Taraborelli dtaraborelli@wikimedia.org wrote: apologies, s/Burton/Alex :)
one more question: is there any plan to add a JSON interface on top of the CSV download? Many people have relied on stats.grok.se JSON output for years and it would be fantastic to have wikipediatrends return data in the same format.
Dario
On Mar 25, 2014, at 6:27 AM, Dario Taraborelli dario@wikimedia.org wrote:
Hi Burton,
nicely done (and yay for using dygraphs) – with what frequenty do you expect wikipediatrends to ingest new data from the raw pageview dumps? I assume it’s once a month?
Dario
On Mar 25, 2014, at 2:09 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Burton,
We just opened a new site www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
I hope it will serve your purpose. I am ready to discuss integration out of the list.
Alex Druk
On Mon, Mar 24, 2014 at 11:40 PM, Burton DeWilde burton@harmony-institute.org wrote: Dear Toby,
I recently saw your comment on a blog post by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Fire, and issue-related pages, e.g. Health care reform.
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would love to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work — I'm a proud donor to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog | twitter | facebook
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
@Dario: I do not have time to check original files now, but I believe that difference reflects that we show aggregated data (i.e. data for the article PLUS all it's redirects). However, I would check raw files also.
On Tue, Mar 25, 2014 at 3:51 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
I haven’t checked the raw logs to compare them with the visualization but I think we should QA the data: the raw (unsmoothed) series for Eros shows a spike on 2/14 (Valentine’s Day, predictably) with 6,920 pageviews, while stats.grok.se reports for the same date 3,209 page views. I don’t think any interpolation for missing data occurred around that date.
[1] http://www.wikipediatrends.com/?query%5B%5D=Eros [2] http://stats.grok.se/en/latest90/Eros
On Mar 25, 2014, at 7:09 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
On Mar 25, 2014, at 7:01 AM, Alex Druk alex.druk@gmail.com wrote:
@Dario: thanks. yes, we renew the site once in a month, usually around 10th of each month because dependence on dumps. And yes, we plan to introduce JSON
awesome
also, I noticed some inconsistency in the heading/titles that you may want to fix: “Wikipedia Articles Trends”, “Wikipedia trends”, “Wikipedia pageview statistics”, “Wiki Trends”.
Dario
On Tue, Mar 25, 2014 at 2:32 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
apologies, s/Burton/Alex :)
one more question: is there any plan to add a JSON interface on top of the CSV download? Many people have relied on stats.grok.se JSON output for years and it would be fantastic to have wikipediatrends return data in the same format.
Dario
On Mar 25, 2014, at 6:27 AM, Dario Taraborelli dario@wikimedia.org wrote:
Hi Burton,
nicely done (and yay for using dygraphs) – with what frequenty do you expect wikipediatrends to ingest new data from the raw pageview dumps? I assume it’s once a month?
Dario
On Mar 25, 2014, at 2:09 AM, Alex Druk alex.druk@gmail.com wrote:
Hi Burton,
We just opened a new site www.wikipediatrends.com that show Wikipedia page view data. Our site is very similar to existing http://tools.wmflabs.org/wikiviewstats/ and http://stats.grok.se/, but use slightly different approach to calculating and presenting data as well as allow comparison of different articles.
I hope it will serve your purpose. I am ready to discuss integration out of the list.
Alex Druk
On Mon, Mar 24, 2014 at 11:40 PM, Burton DeWilde < burton@harmony-institute.org> wrote:
Dear Toby,
I recently saw your comment on a blog posthttp://magnusmanske.de/wordpress/?p=173by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute http://harmony-institute.org/, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app http://harmony-institute.org/work/impactspace/that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Firehttp://en.wikipedia.org/wiki/Escape_Fire:_The_Fight_to_Rescue_American_Healthcare, and issue-related pages, e.g. Health care reformhttp://en.wikipedia.org/wiki/Health_care_reform .
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would *love* to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work — I'm a proud donor to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog http://harmony-institute.org/therippleeffect/ | twitterhttps://twitter.com/hinstitute| facebook https://www.facebook.com/harmonyinstitute
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Burton --
Thanks for this. I'm glad the Wikipedia data is useful, even if it's difficult to access at this time.
As Nemo reported, we're currently working with Henrik to get him a better server and it should be on it's way to him now. We're hopeful that modern hardware and SSDs will really help scale the service.
We're also planning on working with Henrik to see if there are any optimizations in the app/database that will help. (We have one of our DBAs signed up to help here)
It's also exciting to see other projects come up that address this issue -- we have some major tasks ahead of us in updating the page view definitions and making them available in a scalable way. While we haven't decided what format we want to use, integrating with existing page view APIs is something we want to be able to support.
You can take a look at the projects we are working on herehttps://www.mediawiki.org/wiki/Analytics/Prioritization_Planning; we will be doing some prioritization next week for the new quarter and I'll update this list with the results.
-Toby
On Mon, Mar 24, 2014 at 3:40 PM, Burton DeWilde < burton@harmony-institute.org> wrote:
Dear Toby,
I recently saw your comment on a blog posthttp://magnusmanske.de/wordpress/?p=173by Magnus Manske regarding the lack of Wikipedia page view data besides the oft-overloaded http://stats.grok.se/. I was wondering if there's been any progress at WMF on building a more stable, central, and complete source for this data?
I ask because I'm a data scientist at a small research non-profit called Harmony Institute http://harmony-institute.org/, where we study the social impact of media (primarily television and film). I'm currently building an interactive web app http://harmony-institute.org/work/impactspace/ that visualizes social impact on a variety of issues by many documentary films. One indicator of interest is "information-seeking behavior," i.e. are audiences seeking out information about a film or issue. Besides Google search trends, an excellent proxy for this is Wikipedia page views for both film pages, e.g. Escape Firehttp://en.wikipedia.org/wiki/Escape_Fire:_The_Fight_to_Rescue_American_Healthcare, and issue-related pages, e.g. Health care reformhttp://en.wikipedia.org/wiki/Health_care_reform .
I'm currently trying to use stats.grok.se to grab raw data in JSON form; unfortunately, the site almost always responds with "Server overloaded, please throttle your requests," and no amount of throttling seems to suffice. I'm aware that there are many TBs of raw data for the downloading, but I don't have the resources to handle that much data, nor do I need more than the tiniest fraction of it.
I would *love* to show Wikipedia page view statistics for film pages in our app. If you have any updates on progress or suggestions on how I might do this, I would be very appreciative.
Thanks very much for your and all of WMF's hard work -- I'm a proud donor to the cause. :)
Best, Burton DeWilde
-- Burton DeWilde
Data Scientist Harmony Institute harmony-institute.org blog http://harmony-institute.org/therippleeffect/ | twitterhttps://twitter.com/hinstitute| facebook https://www.facebook.com/harmonyinstitute
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics