Hi Maarten
3) read/write
access to a shared staging DB that can be used as scratch space for temporary tables
(similar to the staging DB on s1-analytics). If you create tables on staging, please
prefix them with your shell user id (e.g. dartar_foo).
You might want to start
using the toolserver/toollabs convention that if you add _p database, it can be viewed by
anyone. That way you can mark databases that don't contain private information and
might be opened up to more people in the future.
in fact, on s1-analytics we have two separate databases:
• “staging” is a sandbox for researchers to store all kind of temporary datasets, many of
which are not meant to be permanently retained or documented
• “prod” is meant to host well-documented datasets that do not contain private information
and are kosher for publication
We have several projects in the pipeline to generate datasets of analytics interest and
that we would like to expose to labs, these include:
• a master dataset of total monthly contributions by user by namespace by project
https://trello.com/c/3ecjp9aM/237-master-monthly-editor-activity-data
• a curated dataset of historical user registration times
https://trello.com/c/NB1WO9fM/315-historical-user-registration-data
• a dataset with revert metadata
https://trello.com/c/FZd4UIcR/29-revert-tracking-and-revert-dump-generation
We also have specs for new server-side logs that will track in a clean way page creations,
page moves and page deletions:
https://trello.com/c/aKzWq1e3/259-create-schemas-for-page-creation-moves-an…
Finally, we’re discussing how to expose to labs existing EventLogging schemas that include
public data that should be made publicly available. I don’t have a definite ETA for each
of these projects, but I’ll make sure we post announcements on the lists as soon as new
data becomes available.
Dario