Hi folks,
As you've probably heard, last week we deployed ulsfo in production,
reducing latency for Oceania, East/Southeast Asia & US/Canada
pacific/west coast states. My estimation of the user base affected by
this is 360 million users (as in, Internet users, not Wikipedia users).
I was wondering if you have an easy way to measure and plot the impact
in page load time, perhaps using Navigation Timing data?
The operations team has spent a considerable amount of time and money to
deploy ulsfo and I believe it'd be useful for us and the organization at
large to be able to quantify this effort.
The exact dates of the rollout by country/region codes can be found in
operations/dns' git history:
https://git.wikimedia.org/summary/?r=operations/dns.git
(the commits should be self-explanatory, but I'd be happy to clarify if
needed)
Thanks!
Faidon
Hi All --
This email is a long time coming, but I wanted to confirm that the
Analytics team is going to take over maintenance/development of Event
Logging from Ori.
Nuria will be co-ordinating the details transition (based on the plan we've
all put together [1]) Please look for further emails from her about the
transition.
-Toby
[1] https://www.mediawiki.org/wiki/Analytics/EventLogging
Hi all,
after rolling out MediaViewer to the English and German Wikipedias, we have
gotten quite a few complaints; to understand how representative they are, I
have looked at the number of users who have opted out (there is a user
preference for that; it is linked from the MediaViewer interface, although
one of the recurring complaints is that it is still not trivial to find). I
would appreciate opinions on whether this is a good approach and whether I
did it the right way.
The queries I have run look like this:
select up_value, count(*) from user left join user_properties on user_id =
up_user and up_property = 'multimediaviewer-enable' where user_touched >
'20140604000000' and user_editcount > 10000 group by up_value;
for various edit count limits (the timestamp is the time of deployment on
enwiki plus a few hours).
This is an interesting topic about RecentChanges and its many uses and variants. I'm copying Analytics, EE and Research lists because I hope that some of our colleagues from these lists will hop over to Wikimedia-l to participate in this discussion. [a]
In particular I would call my colleagues' attention to this section of Mingli's email:
"Content is only one aspect to observe, people are another:
* Who are the experts on some topics?
* Who are my buddies on some articles?
* Who did help me to improve an article originally I wrote?
In all, we may reshape our technical infrastructure in this direction for
new spaces of participation.
And finally, one open question for the system
designer:
* Towards better content and community, what is the most important things
we want our user to observe?"
I'll just note here some observability work on user contributions that has been done or is in progress.
1. User Analysis Tool [b], similar to the legacy tool by User:X!. Be sure to look at the "Future plans" tab.
2. Listen to Wikipedia [c] visualization tool of recent changes, mostly for aesthetics but there may be ways to adapt some of the ideas or code used here for other interesting purposes.
2. Snuggle [d] which is a tool that helps to identify good-faith and bad-faith new editors.
4. Finding a Collaborator [e] is a current research project, also see [f] a visualization example. As part of this work the researchers seem to have formulated a way of quantifying an editor's impact, although I haven't seen the formula yet. As you probably know the quality of edits and editors is a topic that gets discussed repeatedly.
5. WikiStats [g] which provides high-level statistics about Wikimedia projects.
6. WikiMetrics [h] cohort analysis, has a lot of potential for expanding its tool set.
7. For code and related technical contributions see [i].
8. There are a variety of tools next to users' requests at English Wikipedia's Requests for Permissions page [j] such as WikiChecker [k] and automated edits logs [l].
This is a good discussion and I would be happy to have an office hour meeting for live chat.
Pine
[a] http://lists.wikimedia.org/pipermail/wikimedia-l/2014-June/072507.html
[b] https://tools.wmflabs.org/supercount/index.php
[c] http://listen.hatnote.com/
[d] https://en.wikipedia.org/wiki/Wikipedia:Snuggle
[e] https://meta.wikimedia.org/wiki/Research:Finding_a_Collaborator
[f] https://depts.washington.edu/reflex/
[g] https://stats.wikimedia.org
[h] https://meta.wikimedia.org/wiki/Wikimetrics
[i] http://korma.wmflabs.org/browser/
[j] https://en.wikipedia.org/wiki/Wikipedia:Requests_for_permissions
[k] http://en.wikichecker.com/user/?t=Jimbo%20Wales
[l] https://tools.wmflabs.org/xtools/autoedits/index.php?user=Jimbo%20Wales&lan…
I polled https://www.mediawiki.org/w/api.php?action=sitematrix&format=jsonfm
to get a list of wikis and some metadata then I pulled it into a table in
the new analytics-store DB.
The data should be complete at the time I pulled it. It will be relatively
cheap to update, so we could set a cron job to check against sitematrix
every night. See details below.
analytics-store.eqiad.wmnet [staging]> explain wiki_info;
+-----------------+----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+----------------+------+-----+---------+-------+
| wiki | varbinary(100) | NO | PRI | | |
| code | varbinary(100) | YES | | NULL | |
| sitename | varbinary(100) | YES | | NULL | |
| url | varbinary(255) | YES | | NULL | |
| lang_id | int(11) | YES | | NULL | |
| lang_code | varbinary(100) | YES | | NULL | |
| lang_name | varbinary(255) | YES | | NULL | |
| lang_local_name | varbinary(255) | YES | | NULL | |
+-----------------+----------------+------+-----+---------+-------+
8 rows in set (0.00 sec)
analytics-store.eqiad.wmnet [staging]> select * from wiki_info limit 3;
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+
| wiki | code | sitename | url |
lang_id | lang_code | lang_name | lang_local_name |
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+
| rnwiki | wiki | Wikipedia | http://rn.wikipedia.org |
216 | rn | Kirundi | Rundi |
| rnwiktionary | wiktionary | Wikipedia | http://rn.wiktionary.org |
216 | rn | Kirundi | Rundi |
| rowiki | wiki | Wikipedia | http://ro.wikipedia.org |
217 | ro | română | Romanian |
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+
3 rows in set (0.02 sec)
-Aaron
Chase,
Ori mentioned that you might have plans to replace txstatsd as our StatsD
collector for graphite/carbon. Do you have such plans and can you elaborate?
The reason I'm asking is because we currently operate txstatsd with a non
statsd complaint message processor which has some unexpected, to me, side
effects with respect to counters. On the other hand, we cannot use the
compliant processor because it uses a whole bunch of unwanted prefixes.
The side effect to this is that I cant just use a vanilla StatsD client in
my code, and having looked at the internals of the non compliant processor
I'm worried we're abusing it and thus potentially misinterpreting its
results in other places (it uses mark() to store statistics which doesn't
do aggregation and keeps it across time points). On the other hand, it's
implementation of timer is very friendly in that it gives us percentile
breakdowns and such...
So, my current thought is either to write another processor with better
counter semantics; or we should replace it entirely with a vanilla statsd
implementation (like etsy's -- which we might have to end up patching to
get the nice timer behavior.) I suspect that writing another processor and
hosting a custom deb locally is probably the "easiest" option.
It's probably worth pointing out that txstatsd is currently unmaintained --
Sidnei left Canonical for Google sometime near the end of last year and
hasn't touched the code since.
~Matt Walker
Wikimedia Foundation
Fundraising Technology Team
Hi all --
We've received a request for a list of red links. I've been told we can get
this list from the link table.
Does anyone have any more information on this?
thanks,
-Toby