Hi all!
We will soon be migrating everything on stat1 over to a new server in eqiad: stat1003. For the most part, data, accounts and cronjobs will be copied over exactly as they are. However, stat1 has been around for a while, and there are quite a few accounts on there, may of which are probably not used. We’re doing a little audit to see which accounts we don’t need to migrate to the new server.
I’ve pasted a list of names below that we are not sure about. None of these users have logged in in the last few weeks at least.
If you see a name there and you know that it SHOULD DEFINITELY have an account on the new stat1003 server, please let me know via a reply by Tuesday April 1.
See also: https://rt.wikimedia.org/Ticket/Display.html?id=6789
Thanks!
-Andrew Otto
abartov - Asaf Bartov
akhanna - Ayush Khanna
awight - Adam Wight
brion - Brion Vibber
bsitu - Benny Situ
catrope - Roan Kattouw
ebernhardson - Erik Bernhardson
fflorin - Fabrice Florin
fschulenburg - Frank Schulenburg
haithams - Haitham Shammaa
handrade - Henrique Andrade
howief - Howie Fung
jeluf - Jens Frank
jforrester - James D. Forrester
kaldari - Ryan Kaldari
kate - River Tarnell
kwang - Kenan Wang
lwelling - Luke Welling
maryana - Maryana Pinchuk
mflaschen - Matthew Flaschen
mgrover - Michelle Grover
midom - Domas Mituzas
mlitn - Matthias Mullie
msyed - Moiz Syed
mwalker - Matt Walker
reedy - Sam Reed
rush - Chase Pettet
siebrand - Siebrand Mazeland
spage - S Page
tfinc - Tomasz Finc
yuvipanda - Yuvi Panda
Hi all,
We finished our sprint on Tuesday and made plans for the next one on
Thursday and I wanted to let you know the updates.
We didn't make any commitments this sprint, because we needed to work on an
time-specific, yet open-ended task: migrating our applications out of the
labs instance in the Tampa datacenter into the new instance in Virginia.
The team worked hard and we were able to migrate to the new labs (yay!)
Services migrated were:
- Limn instances (Report cards, Wikipedia Zero, others)
- Wikimetrics production and staging instances
We also added another dashboard for Wikipedia Zero
We fixed the following defects:
- Wikimetrics: Visiting /cohorts/upload exposes bug in
database.py<https://bugzilla.wikimedia.org/show_bug.cgi?id=62260>
- Wikistats: Traffic reports: fix region
Oceania<https://bugzilla.wikimedia.org/show_bug.cgi?id=46205>
- Wikistats: Chromium OS in Wikimedia traffic
reports<https://bugzilla.wikimedia.org/show_bug.cgi?id=55950>
- Wikistats: Wikistats fails to parse php message file correctly for ID
(Indonesian) <https://bugzilla.wikimedia.org/show_bug.cgi?id=49013>
- Wikistats: Italian Wikivoyage page count in Wikistats seems too
low<https://bugzilla.wikimedia.org/show_bug.cgi?id=55927>
- Wikistats: Missing stats for
zh.wikivoyage<https://bugzilla.wikimedia.org/show_bug.cgi?id=61420>
- W0 Dashboards: Some W0 Dashboards do not show data
We also worked on but did not finish some Wikimetrics features:
- Add scheduled reports with a permanent
address<https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/1378>
- Add publicly shareable reports <http://add publicly shareable report
results>
For the next sprint, we are focusing on Wikimetrics. We have lowered our
velocity since we have a new Product Manager starting and some employees
are traveling to San Francisco and these events typically lower
productivity.
We have committed to finishing the following features:
- Add publicly shareable
reports<http://add%20publicly%20shareable%20report%20results/>
We are also going to work on some Wikistats features (these aren't part of
the sprint, Erik will work on them as time permits) I'll report on them at
the same cadence as the sprint.
- Wikistats Portal: add search
box<https://bugzilla.wikimedia.org/show_bug.cgi?id=46216>
- Total edits on wikidata seems too
low<https://bugzilla.wikimedia.org/show_bug.cgi?id=62230>
- Dump stats: further analyze drop in editor counts on English
Wikipedia<https://bugzilla.wikimedia.org/show_bug.cgi?id=46199>
- Dump stats: switch to persistent stats rather than monthly regenerated
stats <https://bugzilla.wikimedia.org/show_bug.cgi?id=46198>
Please let me know if you have any questions.
-Toby
Hi,
since several people on this list are working with zero logs, please
be aware that it seems the X-Analytics header was changed behind our
back with
https://gerrit.wikimedia.org/r/#/c/119795/
. So while you were loosing hardly anything up to now, if you scripts
didn't handle the list-nature of X-Analytics, this changed
considerably in the last few hours.
Please make sure your zero-handling scripts properly handle the
list-nature of X-Analytics, or you will loose a considerable amount of
zero traffic otherwise.
Newly, fields in X-Analytics column seem to be
* https=1 (if the request came through https)
* proxy=Opera (if the request came through an Opera proxy to us)
I hope the Wikipedia Zero team will soon update
https://wikitech.wikimedia.org/wiki/X-Analytics
with a more authorative description.
Have fun,
Christian
P.S.: Note that (maybe related to this change), some zero tags are now
doubled:
https://bugzilla.wikimedia.org/show_bug.cgi?id=62922
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Beginning in 10 minutes :) public stream link:
https://www.youtube.com/watch?v=bozyc1z25aQ
---------- Forwarded message ----------
From: Dario Taraborelli <dtaraborelli(a)wikimedia.org>
Date: 18 March 2014 20:42
Subject: [Wmfall] Next research & data showcase: tomorrow at 11.30
To: "wmfall(a)lists.wikimedia.org Staff" <wmfall(a)lists.wikimedia.org>
The next Research & Data
showcase<https://www.mediawiki.org/wiki/Analytics/Research_and_Data/Showcase>
will
be live-streamed tomorrow at 11.30 PT (the streaming link will be posted on
the list a few minutes before the showcase starts. Those of you who are in
the SF office can join us in Yongle). This month's program is below, we
look forward to seeing you.
Dario
*Metrics standardization *(Dario)
In this talk I'll present the most recent updates on our work
on participation metrics and discuss the goals of the Editor Engagement
Vital Signs project.
*Wikipedia's rise and decline *(Aaron)
In Halfaker et al. (2013) we present data that show that several changes
the Wikipedia community made to manage quality and consistency in the face
of a massive growth in participation have ironically crippled the very
growth they were designed to manage. Specifically, the restrictiveness of
the encyclopedia's primary quality control mechanism and the algorithmic
tools used to reject contributions are implicated as key causes of
decreased newcomer retention.
_______________________________________________
Wmfall mailing list
Wmfall(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wmfall
--
Oliver Keyes
Product Analyst
Wikimedia Foundation
The next Research & Data showcase will be live-streamed tomorrow Wed 3/19 at 11.30 PT.
The streaming link will be posted on the lists a few minutes before the showcase starts and you can join the conversation on IRC at #wikimedia-research. We look forward to seeing you!
Dario
Metrics standardization (Dario Taraborelli)
In this talk I'll present the most recent updates on our work on participation metrics and discuss the goals of the Editor Engagement Vital Signs project.
Wikipedia's rise and decline (Aaron Halfaker)
In Halfaker et al. (2013) we present data that show that several changes the Wikipedia community made to manage quality and consistency in the face of a massive growth in participation have ironically crippled the very growth they were designed to manage. Specifically, the restrictiveness of the encyclopedia's primary quality control mechanism and the algorithmic tools used to reject contributions are implicated as key causes of decreased newcomer retention.
Does anyone know the answer to this question on the extension talk
page? If so could you reply?
---------- Forwarded message ----------
From: MediaWiki Mail <wiki(a)wikimedia.org>
Date: Thu, Mar 13, 2014 at 9:35 AM
Subject: MediaWiki discussion - New thread: X-Analytics/mobile tracking?
To: Jdlrobson <jdlrobson(a)gmail.com>
Hi Jdlrobson,
this is a notification from MediaWiki that a new thread on Extension
talk:MobileFrontend, 'X-Analytics/mobile tracking?',
was created on 13 March 2014 at 16:35 by Kchurch05
You can see it at
<http://www.mediawiki.org/w/index.php?title=Extension_talk:MobileFrontend&of…>
The text is:
We're using Piwik and Google Analytics, and currently the only way we
can determine whether a visitor used our wiki on a mobile device is
via the browser model, since we don't have a mobile URL -- and even
then it's not 100% accurate, since some people prefer to use the
desktop version of the site instead of the mobile version.
I did a search and saw
[https://wikitech.wikimedia.org/wiki/X-Analytics this article] on
[[Analytics/Kraken/Data Formats|X-Analytics]]. I'm not a MW expert so
I'm not 100% clear on what X-A does, and from the code linked it seems
to only track whether a user is in alpha or beta mode.
We're currently running MW 1.21 so the X-Analytics doesn't seem to be
included in that MobileFrontend version. Hopefully we'll upgrade ASAP,
but I wanted to see if this was a way to get better data on mobile
usage of our wiki.
Thanks all!
--
Jon Robson
* http://jonrobson.me.uk
* https://www.facebook.com/jonrobson
* @rakugojon
> This is not our intention for the long term, we are in the middle
> of putting in place a sanitization strategy to get rid of any PII after
90 days.
> This discussion might make more sense in another thread though,
> kindly please do not hijack Sajjad's thread :)
The results of our internal discussion regarding sanitization go in this
regard so far:
For users that have ethical concerns about their data being gathered via
EventLogging we have thought we could provide an incognito mode. Incognito
mode will be "on" by default if you browse with cookies off. That is, if
your browser is set to not make use of cookies, no data will be sampled.
This is so far just an idea.
Regarding anonymization: after much discussion we believe that to properly
anonymize EventLogging data there is no other solution than aggregation and
for that we need to build infrastructure that will "consume" EventLogging
events. At this time EventLogging just samples discrete events thus data is
stored as "discrete" data points. That being said, IPs are always
anonymized in any EventLogging dataset. Not so User Agents.
We shall be updating this wiki in the near future with more information:
https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
On Thu, Mar 13, 2014 at 12:43 PM, Dan Andreescu <dandreescu(a)wikimedia.org>wrote:
> On Thu, Mar 13, 2014 at 9:32 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com>wrote:
>>
>>> Andrew Gray, 13/03/2014 00:56:
>>>
>>> For that matter, surely this data won't exist anyway before 2013 or so?
>>>> I'm not sure how long we retain IP data for logged-in users, but I'd be
>>>> a bit startled if it was five years.
>>>>
>>>
>>> EventLogging can contain almost anything I think. Is there any purging?
>>> I don't think so. Is it aggregate and anonymised? No longer. <
>>> https://www.mediawiki.org/w/index.php?title=Extension:
>>> EventLogging&diff=prev&oldid=905171>
>>
>>
> On Thu, Mar 13, 2014 at 5:19 AM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
>
> Sorry but this is not correct:
>> IP addresses are anonymized in Event Logging and they always have been
>> so. We calculate a HMAC with a rotating salt that changes either every 90
>> days or with a service restart.
>>
>> Event Logging data has never been aggregated, it is a system to log
>> discrete events. There had not been any changes on this regard as of late.
>>
>>
> What Nuria said is correct, however, we do store some data, such as User
> Agents currently. This is not our intention for the long term, we are in
> the middle of putting in place a sanitization strategy to get rid of any
> PII after 90 days. This discussion might make more sense in another thread
> though, kindly please do not hijack Sajjad's thread :)
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
Erik,
Thanks a lot for the appreciation.
As Sajjad mentioned, we have already obtained a edit-per-location
dataset from Evan (Rosen) that has the following column structure:
*language,country,city,start,end,fraction,ts*
*start* and *end* denote the beginning and ending date for counting the
number of edits, and *ts* is time stamp.
The *fraction*, however, gives a national ratio of edit activity, that
is it gives the ratio of 'total edits from that city for that language
Wikipedia project' divided 'total edits from that country for that
language Wikipedia project'. Hence, it cannot be used to understand
global edit contributions to a Wikipedia project (for a time period).
It seems that the original data (from where this dataset is extracted)
should also have the global fractions -- total edit from a city divided
by total edit from the whole world, for a project, for a time period.
Would you know if the global fractions can also be derived from the XML
dumps? Or, even better, is the relevant raw data available in CSV form
somewhere else?
Bests,
sumandro
-------------
sumandro
ajantriks.net
On Wednesday 15 May 2013 12:32 AM, analytics-request(a)lists.wikimedia.org
wrote:
> Send Analytics mailing list submissions to
> analytics(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/analytics
> or, via email, send a message with subject or body 'help' to
> analytics-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> analytics-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
>
> ----------------------------------------------------------------------
>
>
> Date: Tue, 14 May 2013 19:40:00 +0200
> From: "Erik Zachte" <ezachte(a)wikimedia.org>
> To: "'A mailing list for the Analytics Team at WMF and everybody who
> has an interest in Wikipedia and analytics.'"
> <analytics(a)lists.wikimedia.org>
> Subject: Re: [Analytics] Visualizing Indic Wikipedia projects.
> Message-ID: <016f01ce50ca$0fe736b0$2fb5a410$(a)wikimedia.org>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Awesome work! I like the flexibility of the charts, easy to switch metrics
> and presentation mode.
>
>
>
> 1. WMF has never captured ip->geo data on city level, but afaik this is
> going to change with Kraken.
>
>
>
> 2. Total edits per article per year can be derived from the xml dumps. I may
> have some csv data that come in handy.
>
> For edit wars you need track reverts on an per article basis, right? That
> can also be derived from dumps.
>
> For long history you need full archive dumps and need to calc checksum per
> revision text. (stub dumps have checksum but only for last year or two)
>
>
>
> Erik Zachte
>
>
>
Hi everyone!
Has anyone tried to observer how different wikipedias use the
templates: how often, what's the average depth of template calls, etc?
-----
Yury Katkov, WikiVote