To de-mystify the dumps: pagecounts-raw was a very buggy dataset that had data loss, didn't filter out hits from spiders, etc.  The pageviews dataset takes care of those problems and more, and is the same as the data behind the pageview API. 

On Fri, Jul 29, 2016 at 3:17 PM, Bruno Goncalves <bgoncalves@gmail.com> wrote:
Thank you for the heads up Taha and Federico. I'm not entirely sure redirect will be a big factor in what I'm playing with but I'll definitely keep an eye out for it. From what I gather, there is no simple way to check if there is a redirect page pointing to my page of interest?

I have about 400k pages I'm looking at across multiple editions so Pageviews_API might not be enough :) I'll just try my hand at implementing a PAGENAMEE encoding in Python unless there's some library out there that I'm missing.

Best,

B

*******************************************
Bruno Miguel Tavares Gonçalves, PhD
Homepage: www.bgoncalves.com
Email: bgoncalves@gmail.com
*******************************************

On Thu, Jul 28, 2016 at 4:48 PM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
Definitely consider the redirect :) https://mako.cc/copyrighteous/consider-the-redirect

Bruno Goncalves, 28/07/2016 22:00:
I've been trying to match edit activity with pagecounts

The first question is how much data you need. If a few months are enough, https://wikitech.wikimedia.org/wiki/Pageviews_API may be easier.

Otherwise... https://www.mediawiki.org/wiki/Manual:PAGENAMEE_encoding

Nemo


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l