To de-mystify the dumps: pagecounts-raw was a very buggy dataset that had
data loss, didn't filter out hits from spiders, etc. The pageviews dataset
takes care of those problems and more, and is the same as the data behind
the pageview API.
On Fri, Jul 29, 2016 at 3:17 PM, Bruno Goncalves <bgoncalves(a)gmail.com>
wrote:
Thank you for the heads up Taha and Federico. I'm
not entirely sure
redirect will be a big factor in what I'm playing with but I'll definitely
keep an eye out for it. From what I gather, there is no simple way to check
if there is a redirect page pointing to my page of interest?
I have about 400k pages I'm looking at across multiple editions so
Pageviews_API might not be enough :) I'll just try my hand at implementing
a PAGENAMEE encoding in Python unless there's some library out there that
I'm missing.
Best,
B
*******************************************
Bruno Miguel Tavares Gonçalves, PhD
Homepage:
www.bgoncalves.com
Email: bgoncalves(a)gmail.com
*******************************************
On Thu, Jul 28, 2016 at 4:48 PM, Federico Leva (Nemo) <nemowiki(a)gmail.com>
wrote:
Definitely consider the redirect :)
https://mako.cc/copyrighteous/consider-the-redirect
Bruno Goncalves, 28/07/2016 22:00:
I've been trying to match edit activity with
pagecounts
The first question is how much data you need. If a few months are enough,
https://wikitech.wikimedia.org/wiki/Pageviews_API may be easier.
Otherwise...
https://www.mediawiki.org/wiki/Manual:PAGENAMEE_encoding
Nemo
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l