As you've probably heard, last week we deployed ulsfo in production,
reducing latency for Oceania, East/Southeast Asia & US/Canada
pacific/west coast states. My estimation of the user base affected by
this is 360 million users (as in, Internet users, not Wikipedia users).
I was wondering if you have an easy way to measure and plot the impact
in page load time, perhaps using Navigation Timing data?
The operations team has spent a considerable amount of time and money to
deploy ulsfo and I believe it'd be useful for us and the organization at
large to be able to quantify this effort.
The exact dates of the rollout by country/region codes can be found in
operations/dns' git history:
(the commits should be self-explanatory, but I'd be happy to clarify if
Thanks a lot for the appreciation.
As Sajjad mentioned, we have already obtained a edit-per-location
dataset from Evan (Rosen) that has the following column structure:
*start* and *end* denote the beginning and ending date for counting the
number of edits, and *ts* is time stamp.
The *fraction*, however, gives a national ratio of edit activity, that
is it gives the ratio of 'total edits from that city for that language
Wikipedia project' divided 'total edits from that country for that
language Wikipedia project'. Hence, it cannot be used to understand
global edit contributions to a Wikipedia project (for a time period).
It seems that the original data (from where this dataset is extracted)
should also have the global fractions -- total edit from a city divided
by total edit from the whole world, for a project, for a time period.
Would you know if the global fractions can also be derived from the XML
dumps? Or, even better, is the relevant raw data available in CSV form
On Wednesday 15 May 2013 12:32 AM, analytics-request(a)lists.wikimedia.org
> Send Analytics mailing list submissions to
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
> You can reach the person managing the list at
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
> Date: Tue, 14 May 2013 19:40:00 +0200
> From: "Erik Zachte" <ezachte(a)wikimedia.org>
> To: "'A mailing list for the Analytics Team at WMF and everybody who
> has an interest in Wikipedia and analytics.'"
> Subject: Re: [Analytics] Visualizing Indic Wikipedia projects.
> Message-ID: <016f01ce50ca$0fe736b0$2fb5a410$(a)wikimedia.org>
> Content-Type: text/plain; charset="iso-8859-1"
> Awesome work! I like the flexibility of the charts, easy to switch metrics
> and presentation mode.
> 1. WMF has never captured ip->geo data on city level, but afaik this is
> going to change with Kraken.
> 2. Total edits per article per year can be derived from the xml dumps. I may
> have some csv data that come in handy.
> For edit wars you need track reverts on an per article basis, right? That
> can also be derived from dumps.
> For long history you need full archive dumps and need to calc checksum per
> revision text. (stub dumps have checksum but only for last year or two)
> Erik Zachte
Dear fellow Analytics Developer team members,
over the past few weeks, it seems we at least twice discussed that
maybe we want to adopt “If it didn't happen on the list, it didn't
happen”. It appeared to me that both times, we actually wanted to move
forward with that ... but it never got posted to the mailing list :-D
So I'm being bold:
Either slap me with a large trout by 2014-02-28 or agree with me
that the Analytics development team adopts “If it didn't happen on
the list, it didn't happen.”
(With “list” being analytics(a)lists.wikimedia.org for public things,
and “list” being analytics-internal(a)lists.wikimedia.org for not so
To state the obvious:
meetings in real life,
do not qualify as “list” :-)
/me hands a large trout to all of you to slap me with and ducks.
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
OpenPGP key transition from 0xEF78CCDE to 0x13C1072F:
Starting tomorrow (February 26), we will be broadcasting the monthly showcase of the Wikimedia Research and Data team.
The showcase is an opportunity to present and discuss recent work researchers at the Foundation have been conducting. The showcase will start at 11.30 Pacific Time and we will post a link to the stream a few minutes before it starts. You can also join the conversation on the #wikimedia-office IRC channel on freenode (we’ll be sticking around after the end of the showcase to answer any question).
This month, we’ll be talking about Wikipedia mobile readers and article creation trends:
Mobile session times
A prerequisite to many pieces of interesting reader research is being able to accurately identify the length of users' 'sessions'. I will explain one potential way of doing it, how I’ve applied it to mobile readers, and what research this opens up. (20 mins)
Wikipedia article creation research
I'll present research examining trends in newcomer article creation across 10 languages with a focus on English and German Wikipedias. I'll show that, in wikis where anonymous users can create articles, their articles are less likely to be deleted than articles created by newly registered editors. I’ll also show the results of an in-depth analysis of Articles for Creation (AfC) which suggest that while AfC’s process seems to result in the publication of high quality articles, it also dramatically reduces the rate at which good new articles are published. (30 mins)
Looking forward to seeing you all tomorrow!
@Tim: By "feature" I mean having values for column user.user_registration filled for DB replicas accessible from Tool-Labs, if possible. As Oliver has suggested, I don't see any reason for this info not being available, as it is already public from Special:ListUsers.
@Aaron: Thanks a lot. I belive that is a fairly decent approximation. In fact, I suspect that daily or weekly aggregates would be enough for time-series characterization. My actual goal is comparing trends between different languages, and eventually correlation with other known activity metrics.
El Viernes 14 de febrero de 2014 16:00, Aaron Halfaker <aaron.halfaker(a)gmail.com> escribió:
I have a dataset containing estimated registration dates for editors who registered before Dec. 2005. My method assumes that user_id is monotonically increasing and sets the lowest upper-bound available.
>For example. Let's assume the following rows:
> user_id first_edit
> 12345 20040102030405
> 12344 NULL
> 12343 20040102050102
>Since an editor couldn't have saved a revision before registering their account, we can assume that user 12345 registered there account on or before 20040102030405. If user_id is monotonically increasing, we also know that user 12344 must have registered on or before 20040102030405, which lets us fill in a NULL. Similarly, we have a first_edit timestamp for user 12343, but that edit happened pretty late. We can actually just continue to propagate the 20040102030405timestamp to this user too.
>After performing this approximation, we'd have the following rows:
> user_id first_edit user_registration_approx
> 12345 20040102030405 20040102030405
> 12344 NULL 20040102030405
> 12343 20040102050102 20040102030405
>In effect, this is similar to the approximation discussed in https://bugzilla.wikimedia.org/show_bug.cgi?id=18638, but I'm not trying to interpolate probable registration timings on users. In practice we're talking about a difference of seconds, so I haven't bothered with the extra work.
>I'm generating a datafile for English now that I should be able to share the the end of the day:
> * user_id
> * registration_type (see https://meta.wikimedia.org/wiki/Research:Attached_user and https://meta.wikimedia.org/wiki/Research:Newly_registered_user)
> * user_registration (from user table)
> * first_edit (lowest timestamp from "revision" and "archive" for user_id)
> * registration_approx (my approximation based on the method described above)
>On Fri, Feb 14, 2014 at 6:06 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com> wrote:
>Felipe Ortega, 14/02/2014 12:05:
>>Thanks a lot. Then, I look forward to the confirmation and
>>>implementation of this feature. In case it's better to open a new issue
>>>on bugzilla or any other action on my side (lend a hand with value
>>>reviewing/testing) just let me know.
You could help assess the correctness of and/or code the guesstimate method proposed in https://bugzilla.wikimedia.org/show_bug.cgi?id=18638 , for the script to fill further blanks.
>>Labs-l mailing list
with CSCW just concluded and conferences like CHI and WWW coming up we have a good set of papers to review for the February issue of the Research Newsletter 
Please take a look at: https://etherpad.wikimedia.org/p/WRN201402 and add your name next to any paper you are interested in reviewing. As usual, short notes and one-paragraph reviews are most welcome.
Instead of contacting past contributors only, this month we’re experimenting with a public call for reviews cross-posted to analytics-l and wiki-research-l. if you have any question about the format or process feel free to get in touch off-list.
Dario Taraborelli and Tilman Bayer
---------- Forwarded message ----------
From: Strainu <strainu10(a)gmail.com>
Date: Sun, Feb 23, 2014 at 6:11 PM
Subject: [Wikitech-l] Exit stats?
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Does Wikipedia have any exit or click-through stats, like what links
are the visitors following from an article? If yes, are those public?
In addition to privacy issues, we want to give as little reason as possible for spammers to post links on Wikimedia sites and give them as little information as possible about how to design effective spam links. Providing data about the effectiveness of external links could benefit spammers.
There may be a way to make exceptions for providing aggregated data for a whitelist of GLAM links.
> Date: Sun, 23 Feb 2014 20:57:17 +0000
> From: Magnus Manske <magnusmanske(a)googlemail.com>
> To: "A mailing list for the Analytics Team at WMF and everybody who
> has an interest in Wikipedia and analytics."
> Cc: Strainu <strainu10(a)gmail.com>
> Subject: Re: [Analytics] [Wikitech-l] Exit stats?
> Content-Type: text/plain; charset="utf-8"
> collecting anonymized clicks to external links on image pages, in an
> attempt to give some stats to the GLAM community. That would fire off basic
> data like time, page, and target URL to a toolserver logging script. Volume
> was low enough, but I had to turn it off due to privacy paranoia soon after.