I would like to know the top Wikipedia articles in music category for a country ( lets say India ). Is there a way I can get access to this data. If this data is not readily available can I get the per-article PV data for a given country, so that I can infer my desired outcome.
Thanks, Prasenjit
Have you seen the Page View Statistics data that is available?
http://dumps.wikimedia.org/other/pagecounts-raw/
The page counts are not broken out by category, or country, but they do include the project and language. So in theory you can do something like this:
curl http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-2013... | zcat - | egrep '^hi '
To see the Hindi Wikipedia page views for 2013-01-07 08:00. I say in theory because the download server seems to be somewhat slow at the moment (100K/s) so I didn't see it actually work :-)
//Ed
It would be nice to have the page counts split at least by language. (and it would reduce the load on your machine) please let me know If I could help with the code
cheers, Diego
On Mon, Jan 7, 2013 at 9:55 AM, Ed Summers ehs@pobox.com wrote:
Have you seen the Page View Statistics data that is available?
http://dumps.wikimedia.org/other/pagecounts-raw/
The page counts are not broken out by category, or country, but they do include the project and language. So in theory you can do something like this:
curl http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130107-080000.gz
| zcat - | egrep '^hi '
To see the Hindi Wikipedia page views for 2013-01-07 08:00. I say in theory because the download server seems to be somewhat slow at the moment (100K/s) so I didn't see it actually work :-)
//Ed
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks for the pointer. It really helps. Also is it possible to get the stats by request's country ( by IP geocoding ). Traffic could be coming from a different country to view a different country's Wikipedia page. Thanks, Prasenjit On Mon, Jan 7, 2013 at 2:25 PM, Ed Summers ehs@pobox.com wrote:
Have you seen the Page View Statistics data that is available?
http://dumps.wikimedia.org/other/pagecounts-raw/
The page counts are not broken out by category, or country, but they do include the project and language. So in theory you can do something like this:
curl
http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-2013... | zcat - | egrep '^hi '
To see the Hindi Wikipedia page views for 2013-01-07 08:00. I say in theory because the download server seems to be somewhat slow at the moment (100K/s) so I didn't see it actually work :-)
//Ed
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics