Hey guys,
Good afternoon. I am working on a data-mining project on Wikipedia data. Our main question is whether the number of page view is correlated with the number of reverted edits.In order to have a fair comparison between different portal (People, Technology, Math and etc) , I would like to get the 250 most viewed pages for each portal (12 in total).
I notice that on Wikipedia, there is a page where weekly page view data are aggregated and the 5000 most viewed pages of the week are listed.
However, in order to see the behavior between the number of page views and the number of reverted edits, I would need data for a longer duration (Say a month or longer).
I wonder if there is any way (hopefully easy way) that I can query the data I need.
Your help is highly appreciated.
Happy Thanksgiving!
Anson
Heya,
We have a new RESTful API at https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/ that should let you get the data you want at a monthly level. If you're using R, we already have a client library at https://github.com/Ironholds/pageviews
Hope that helps!
On 25 November 2015 at 14:07, Ying Haw Lee yinghawl@gmail.com wrote:
Hey guys,
Good afternoon. I am working on a data-mining project on Wikipedia data. Our main question is whether the number of page view is correlated with the number of reverted edits.In order to have a fair comparison between different portal (People, Technology, Math and etc) , I would like to get the 250 most viewed pages for each portal (12 in total).
I notice that on Wikipedia, there is a page where weekly page view data are aggregated and the 5000 most viewed pages of the week are listed.
However, in order to see the behavior between the number of page views and the number of reverted edits, I would need data for a longer duration (Say a month or longer).
I wonder if there is any way (hopefully easy way) that I can query the data I need.
Your help is highly appreciated.
Happy Thanksgiving!
Anson
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Don't forget to *consider the redirect*[1] when associating pageviews with an actual page.
From [1]:
In a highly-cited article, Priedhorsky et al. combine Wiki-
pedia view data with edit data and do not mention account- ing for redirects[6]. Although not a central finding in their paper, the authors note with surprise that they find a weak correlation between edits and views. Because redirects are edited infrequently but "viewed" as often as millions of times per month each, redirects may be contributing to the sur- prisingly low correlation between edits and views noted by Priedhorsky et al. and others.
1. https://mako.cc/academic/hill_shaw-consider_the_redirect.pdf
On Tue, Dec 1, 2015 at 12:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Heya,
We have a new RESTful API at https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/ that should let you get the data you want at a monthly level. If you're using R, we already have a client library at https://github.com/Ironholds/pageviews
Hope that helps!
On 25 November 2015 at 14:07, Ying Haw Lee yinghawl@gmail.com wrote:
Hey guys,
Good afternoon. I am working on a data-mining project on Wikipedia data.
Our
main question is whether the number of page view is correlated with the number of reverted edits.In order to have a fair comparison between different portal (People, Technology, Math and etc) , I would like to get the 250 most viewed pages for each portal (12 in total).
I notice that on Wikipedia, there is a page where weekly page view data
are
aggregated and the 5000 most viewed pages of the week are listed.
However, in order to see the behavior between the number of page views
and
the number of reverted edits, I would need data for a longer duration
(Say a
month or longer).
I wonder if there is any way (hopefully easy way) that I can query the
data
I need.
Your help is highly appreciated.
Happy Thanksgiving!
Anson
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Anson,
The API should give top views for a month... but the data isn't available yet: https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia/all-acc...
I believe we'll have October and November shortly - I'm going to find out when by tomorrow.
On Wed, Nov 25, 2015 at 11:07 AM, Ying Haw Lee yinghawl@gmail.com wrote:
Hey guys,
Good afternoon. I am working on a data-mining project on Wikipedia data. Our main question is whether the number of page view is correlated with the number of reverted edits.In order to have a fair comparison between different portal (People, Technology, Math and etc) , I would like to get the 250 most viewed pages for each portal (12 in total).
I notice that on Wikipedia, there is a page where weekly page view data are aggregated and the 5000 most viewed pages of the week are listed.
However, in order to see the behavior between the number of page views and the number of reverted edits, I would need data for a longer duration (Say a month or longer).
I wonder if there is any way (hopefully easy way) that I can query the data I need.
Your help is highly appreciated.
Happy Thanksgiving!
Anson
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics