Yet another installment of Aaron makes you a table. This time I present
you with local_user_info, a table that contains a record for every user in
every Wikimedia project wiki along with their centralauth globaluser ID if
applicable. Unlike the centralauth tables, this table contains a user's
local user_id -- a persistent identifier for that user that persists
between renames.
Notably, this table can be used along with editor_month (from my last
have-a-table email) to track user activity more easily cross-wiki.
mysql:research@analytics-store.eqiad.wmnet [staging]> select * from
local_user_info limit 3;
+--------+---------+-------------------+---------------+----------------+-----------------+
| wiki | user_id | user_registration | globaluser_id | user_attached |
attached_method |
+--------+---------+-------------------+---------------+----------------+-----------------+
| bgwiki | 1 | NULL | 488 | 20080325163146 |
password |
| bgwiki | 2 | NULL | 0 | NULL |
|
| bgwiki | 3 | NULL | 30314 | 20080805113516 |
primary |
+--------+---------+-------------------+---------------+----------------+-----------------+
3 rows in set (0.00 sec)
mysql:research@analytics-store.eqiad.wmnet [staging]> explain
local_user_info;
+-------------------+-----------------------------------------------------------------+------+-----+---------+-------+
| Field | Type
| Null | Key | Default | Extra |
+-------------------+-----------------------------------------------------------------+------+-----+---------+-------+
| wiki | varbinary(50)
| YES | MUL | NULL | |
| user_id | int(11)
| YES | | NULL | |
| user_registration | varbinary(14)
| YES | | NULL | |
| globaluser_id | int(11)
| YES | MUL | NULL | |
| user_attached | varbinary(14)
| YES | | NULL | |
| attached_method |
enum('primary','empty','mail','password','admin','new','login') | YES |
| NULL | |
+-------------------+-----------------------------------------------------------------+------+-----+---------+-------+
6 rows in set (0.01 sec)
Have fun!
-Aaron
The next Research & Data showcase will be live-streamed this Wednesday 6/18 at 11.30 PT.
The streaming link will be posted on the lists a few minutes before the showcase starts and as usual you can join the conversation on IRC at #wikimedia-research.
We look forward to seeing you!
Dario
This month:
MoodBar -- lightweight socialization improves long-term editor retention
by Giovanni Luca Ciampaglia -- I will talk about MoodBar, an experimental feature deployed on the English Wikipedia from 2011 to 2013 to streamline the socialization of newcomers. I will present results from a natural experiment that measured the effect of Moodbar on the short-term engagement and long-term retention of newly registered users attempting to edit for the first time Wikipedia. Our results indicate that a mechanism to elicit lightweight feedback and to provide early mentoring to newcomers significantly improves their chances of becoming long-term contributors.
Active Editors' Survival Models
by Leila Zia -- I will talk about first results in building prediction models for active editors' survival. A sample of such prediction models, their performance, and the important variables in predicting survival will be presented.
In case I'm the first to notice it, I'm getting 503s consistently on the
paths limn normally reads the tsvs from for our stats:
http://stat1001.wikimedia.org/public-datasets/all/multimedia/
The server seems to be serving 503s for any requested URL at the moment
anyway.
Yuvi, Aaron and I sat down to review the implementation of revtags for mobile edits and we came up with the following proposal:
Assumptions
• we want to be able to expose whether an edit is from a mobile app or mobile web in MediaWiki as patrollers are filtering edits based on the source
• we want to allow Product to have a handy way to select all mobile edits (regardless of whether they come from apps or mobile web)
• we want a solution that is backward compatible
• we don’t want to use tags to store additional metadata, such as platform or app version (which can be obtained from EventLogging instrumentation)
Proposal
Implement 3 distinct MediaWiki tags [1]
mobile edit
generic tag for all mobile edits
mobile web edit
tag specific to mobile web revisions
mobile app edit
tag specific to mobile app revisions
Each mobile edit will get at least two tags.
• run a maintenance script to remove spurious tags for mobile app account creations from change_tag and tag_summary
• run a maintenance script looking for all instances of mobile edit without mobile web edit or mobile app edit and add tag them as mobile web edit
This should be fully backward-compatible and allow flexible filtering by tag. Let us know if you have any question/concern.
Dario
[1] https://gerrit.wikimedia.org/r/#/c/139195/
Hi Sherry,
On Sun, Jun 15, 2014 at 01:17:55PM -0700, Whatamidoing (WMF)/Sherry Snyder wrote:
> Someone in ops should probably look this over:
>
> https://en.wikisource.org/w/index.php?title=Wikisource:Administrators%27_no…
>
> Some of the linked userpages at Wikisource are receiving more than a
> quarter million hits per month. (That's about two orders of magnitude more
> than Jimmy Wales' page at the English Wikipedia.)
Thanks for the heads-up. We haven't had any indications that a DDoS is
happening so far; if someone's attempting to do that, they're probably
not doing a very good job :)
As for the hits of those pages: a quick grep revealed hits by IPs
belonging to a Polish ISP. Probably some bot crawling our projects? I'm
not sure if we should care :)
(looping in analytics@ as this is probably significant for them)
Regards,
Faidon
Hello,
Just a brief note to let everyone know that the analytics team is hiring,
if you have an an interested in analytics, Wikipedia and its sister
projects we would love to hear from you.
Check our positions and apply:
https://www.mediawiki.org/wiki/Analytics/Research_and_Data#Open_positions
If you ever used the ServerSideAccountCreation log to run queries on cross-wiki account registrations and ever used the event_userName field, please be aware of these two issues we recently discovered.
• Non-ASCII characters in usernames are garbled and replaced with question marks (we have 25K account creation events with username “???” and 21K registrations with username “????” just to mention the most frequent examples). [1] Counting usernames will underreport the actual number of accounts created, specifically for projects with a large proportion of non-ASCII usernames.
• There’s a large number of new users registering with the same username on multiple projects, which seems to violate the principle that all new accounts are unified by default. These users don’t have a record in centralauth.globaluser and as a result they are treated as non-unified accounts. [2]
Because of these reasons, and until these issues are addressed, you should not assume that there’s a unique event per new registered user globally.
How to avoid this problem:
• Use event_userId whenever possible
• When querying across projects, make sure you JOIN globaluser to make sure you don’t count the same user multiple times. The new analytics-store allows you to do that for any MediaWiki DB or EventLogging log, which is pretty awesome.
Dario
[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=66123
[2] https://bugzilla.wikimedia.org/show_bug.cgi?id=66101
Today someone reminded of WorkingWiki, “a free software extension for MediaWiki to facilitate collaborative, reproducible open research. It makes a wiki into a powerful environment for collaborating on software, data processing, and publication-quality manuscripts.”
It’s pretty awesome, check it out:
http://lalashan.mcmaster.ca/theobio/projects/index.php/WorkingWiki
Dario
Hi everyone -
I'm interested in seeing the active editors of Wikisource. It used to be
updated on http://stats.wikimedia.org/wikisource/EN/TablesWikipediaZZ.htm,
but it appears to be missing now.
Any knowledge on where those numbers went or how to get them?
Thanks!
Jessie
--
*Jessie Wild SnellerGrantmaking Learning & Evaluation *
*Wikimedia Foundation*
Imagine a world in which every single human being can freely share in
the sum of all knowledge. Help us make it a reality!
Donate to Wikimedia <https://donate.wikimedia.org/>