I believe all timestamps in the database dumps are GMT.
Regarding the timestamps of pageview dumps, I have always assumed they would be GMT too, and what I suppose is the source code of the old dumper seems to confirm my intuition:
https://phabricator.wikimedia.org/diffusion/ANWC/browse/master/collector.c;8...
G
Giovanni Luca Ciampaglia http://glciampaglia.com *∙* Assistant Research Scientist, Indiana University
On Sat, Jul 30, 2016 at 1:40 PM, Bruno Goncalves bgoncalves@gmail.com wrote:
Hi Dan and Giovanni,
Thank you for the pointers. I think I figured most of it out. The only remaining question is about the timezone consistency:
Also, are is timezones mentioned in the meta-history files:
<timestamp>2006-02-18T19:29:10Z</timestamp>
the same as the one used in the pagecount filenames:
pagecounts-20140725-070000.gz
Best,
B
Bruno Miguel Tavares Gonçalves, PhD Homepage: www.bgoncalves.com Email: bgoncalves@gmail.com
On Sat, Jul 30, 2016 at 10:25 AM, Giovanni Luca Ciampaglia < glciampagl@gmail.com> wrote:
Hi Bruno, look into the redirects table. It's dumped along stub-meta-history. You'll need to join it with page if you want a title -> title edge list.
I can share code if you need.
Cheers, Giovanni
-- Typed with my thumbs
On Jul 29, 2016 15:18, "Bruno Goncalves" bgoncalves@gmail.com wrote:
Thank you for the heads up Taha and Federico. I'm not entirely sure redirect will be a big factor in what I'm playing with but I'll definitely keep an eye out for it. From what I gather, there is no simple way to check if there is a redirect page pointing to my page of interest?
I have about 400k pages I'm looking at across multiple editions so Pageviews_API might not be enough :) I'll just try my hand at implementing a PAGENAMEE encoding in Python unless there's some library out there that I'm missing.
Best,
B
Bruno Miguel Tavares Gonçalves, PhD Homepage: www.bgoncalves.com Email: bgoncalves@gmail.com
On Thu, Jul 28, 2016 at 4:48 PM, Federico Leva (Nemo) < nemowiki@gmail.com> wrote:
Definitely consider the redirect :) https://mako.cc/copyrighteous/consider-the-redirect
Bruno Goncalves, 28/07/2016 22:00:
I've been trying to match edit activity with pagecounts
The first question is how much data you need. If a few months are enough, https://wikitech.wikimedia.org/wiki/Pageviews_API may be easier.
Otherwise... https://www.mediawiki.org/wiki/Manual:PAGENAMEE_encoding
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l