Thanks Dan! We should try to have this kind of information in the actual
documentation updated; I just added your remarks to the page about
pagecounts-raw
<https://wikitech.wikimedia.org/wiki/Analytics/Archive/Data/Pagecounts-raw>,
where the pagecounts-ez alternative had not been mentioned yet.
On Wed, Feb 21, 2018 at 11:26 AM, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
Hi Lars,
You have a couple of options:
1. download the data in lossless compressed form,
https://dumps.wikimedia.
org/other/pagecounts-ez/ The format is clever and doesn't lose
granularity, should be a lot quicker than pagecounts-raw (this is basically
what stats.grok.se did with the data as well, so downloading this way
should be equivalent)
2. work on Toolforge, a virtual cloud that's on the same network as the
data, so getting the data is a lot faster and you can use our compute
resources (free, of course):
https://wikitech.wiki
media.org/wiki/Portal:Toolforge
More specifically there is
https://wikitech.wikimedia.org/wiki/PAWS . (And
I assume "getting the data" meant transferring the files from
dumps.wikimedia.org, correct?)
If you decide to go with the second option, the IRC channel where they
support folks like you is #wikimedia-cloud and you can always find me there
as milimetric.
On Tue, Feb 20, 2018 at 12:51 PM, Lars Hillebrand <
larshillebrand(a)icloud.com> wrote:
Dear Analytics Team,
I am a M.Sc. student at Copenhagen Business School. For my Master Thesis
I would like to use page views data from certain Wikipedia articles. I
found out that in July 2015 a new API was created which delivers this data.
However, for my project I have to use data from before 2015.
In my further search I found out that the old page views data exists (
https://dumps.wikimedia.org/other/pagecounts-raw/) and until March 2017
it could be queried by using stats.grok.se. Unfortunately, this site
does no longer exists, which is why I cannot filter and query the raw data
in .gz format on the webpage.
Are there any possibilities to get the page views data for certain
articles from before July 2017?
Thanks a lot and best regards,
Lars Hillebrand
PS: I am conducting my research in R and for the post 2015 data the
package “pageviews” works great.
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB