Hi George,
I don't really know about historical numbers :(
I forward your message to the Analytics mailing list to get some more help
:)
Cheers
Joseph
---------- Forwarded message ----------
From: George Gkotsis <gkotsis(a)gmail.com>
Date: Mon, Sep 14, 2015 at 2:36 PM
Subject: corrupted and missing log files
To: kleduc(a)wikimedia.org, aotto(a)wikimedia.org, mforns(a)wikimedia.org,
joal(a)wikimedia.org
Greetings Wikimedia Analytics team!
First, thanks for your amazing work! Your work has amazing impact to
everyone, including researchers like me.
My name is George Gkotsis and I am a post-doctoral research fellow for
King's College London. I have recently finished downloading the massive
weblog files dataset and I am trying to "tame" the beast. As part of this
process, I am reading all .gz files that concern WIkimedia page visits
(downloaded from http://dumps.wikimedia.org/other/pagecounts-raw/*).
Unless I am mistaken, I have found cases of either missing or corrupt
archives. I paste a few examples I randomly sampled below:
*Missing:*
http://dumps.wikimedia.org/other/pagecounts-raw/2010/2010-07/pagecounts-201…http://dumps.wikimedia.org/other/pagecounts-raw/2008/2008-10/pagecounts-200…http://dumps.wikimedia.org/other/pagecounts-raw/2009/2009-09/pagecounts-200…
*Corrupted:*
pagecounts-20080304-030000.gz
pagecounts-20080304-140000.gz
pagecounts-20080304-150000.gz
pagecounts-20090921-160000.gz
(the list is quite long and I haven't finished processing it, but I can
give you a full log file)
Could you provide some feedback concerning the above cases?
Best regards,
George
--
/g
--
*Joseph Allemandou*
Data Engineer @ Wikimedia Foundation
IRC: joal
Hi,
Would it be possible to track the number of users changing language version
in each article? Like: on date X, Y users visited a.wikipedia.org and Z left
to go to b.wikipedia.org, T left for c.wikipedia.org etc.
If possible, is there interest (aka who do I have to bribe) to implement that
as a publicly-available dump/site?
I think for smaller wikis this would be an interesting way to know which
domains/articles to work on.
Thanks,
Strainu
Dear analytics team,
we (Wikimedia Italia) are starting writing a proposal for a EU project
(in the Horizon 2020 framework) and our partners asked us for "numbers
to quantify the readership of Wikipedia in the various languages
interested by the project".
They asked then a breakdown of unique visitors by country (yes, we
explained that Wikipedia editions are by language not by country). To
my best knowledge, these data are not available.
I provided the number of unique visitors from Europe as per WMF
reportcard[1] and then provided *pageviews* by project (i.e. by
language) as available on stats.wikimedia.org, specifically from the
summaries (for example for en.wiki[2]).
I think that those number are more than enough for now, but I wanted
to know if there are some numbers that I am missing.
Thank you.
Cristian
[1] https://reportcard.wmflabs.org/
[2] https://stats.wikimedia.org/EN/SummaryEN.htm
A number of us are discussing the year to date editor population stats.
When can we anticipate seeing the August stats? It would be helpful to have
them be published at least a week before the publication of the monthly
Recent Research report for September.
Thanks,
Pine
See https://phabricator.wikimedia.org/T85984
The user_daily_contribs table (and associated API) is sometimes used for
* JavaScript (e.g. CentralNotice) targeting users based on activity in a
certain timeframe,
* simplification of SQL queries (e.g. [1]),
* other?
If you use this data/feature or plan to use it, or if you replaced it
with something else, your comment on the task is particularly welcome to
assess whether to keep it.
Nemo
[1]
https://phabricator.wikimedia.org/diffusion/TLST/browse/master/scripts/user…