Yet another installment of Aaron makes you a table. This time I present you with local_user_info, a table that contains a record for every user in every Wikimedia project wiki along with their centralauth globaluser ID if applicable. Unlike the centralauth tables, this table contains a user's local user_id -- a persistent identifier for that user that persists between renames.
Notably, this table can be used along with editor_month (from my last have-a-table email) to track user activity more easily cross-wiki.
mysql:research@analytics-store.eqiad.wmnet [staging]> select * from local_user_info limit 3; +--------+---------+-------------------+---------------+----------------+-----------------+ | wiki | user_id | user_registration | globaluser_id | user_attached | attached_method | +--------+---------+-------------------+---------------+----------------+-----------------+ | bgwiki | 1 | NULL | 488 | 20080325163146 | password | | bgwiki | 2 | NULL | 0 | NULL | | | bgwiki | 3 | NULL | 30314 | 20080805113516 | primary | +--------+---------+-------------------+---------------+----------------+-----------------+ 3 rows in set (0.00 sec)
mysql:research@analytics-store.eqiad.wmnet [staging]> explain local_user_info; +-------------------+-----------------------------------------------------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+-----------------------------------------------------------------+------+-----+---------+-------+ | wiki | varbinary(50) | YES | MUL | NULL | | | user_id | int(11) | YES | | NULL | | | user_registration | varbinary(14) | YES | | NULL | | | globaluser_id | int(11) | YES | MUL | NULL | | | user_attached | varbinary(14) | YES | | NULL | | | attached_method | enum('primary','empty','mail','password','admin','new','login') | YES | | NULL | | +-------------------+-----------------------------------------------------------------+------+-----+---------+-------+ 6 rows in set (0.01 sec)
Have fun! -Aaron
Aaron Halfaker, 19/06/2014 00:15:
Yet another installment of Aaron makes you a table.
Shouldn't the full volume be copied over https://wikitech.wikimedia.org/wiki/Analytics-store or something? :)
Nemo
On Jun 19, 2014, at 10:24 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Aaron Halfaker, 19/06/2014 00:15:
Yet another installment of Aaron makes you a table.
Shouldn't the full volume be copied over https://wikitech.wikimedia.org/wiki/Analytics-store or something? :)
the right course of action for the new tables is the following:
• move all datasets that we want to promote to “production-level analytics datasets” to a separate database on analytics-store (not “staging”, but “prod”) • audit the data to makes sure there’s no private data that needs to be stripped • puppetize the scripts that generate/refresh the datasets • set up replication of this data to labsdb • document these datasets on Wikitech
Dario
(awesome work Aaron aka the TableMaker)