3) read/write access to a shared staging DB that can be used as scratch space for temporary tables (similar to the staging DB on s1-analytics). If you create tables on staging, please prefix them with your shell user id (e.g. dartar_foo).

You might want to start using the toolserver/toollabs convention that if you add _p database, it can be viewed by anyone. That way you can mark databases that don't contain private information and might be opened up to more people in the future.

in fact, on s1-analytics we have two separate databases:

• “staging” is a sandbox for researchers to store all kind of temporary datasets, many of which are not meant to be permanently retained or documented

• “prod” is meant to host well-documented datasets that do not contain private information and are kosher for publication

We have several projects in the pipeline to generate datasets of analytics interest and that we would like to expose to labs, these include:

• a master dataset of total monthly contributions by user by namespace by project https://trello.com/c/3ecjp9aM/237-master-monthly-editor-activity-data

• a curated dataset of historical user registration times https://trello.com/c/NB1WO9fM/315-historical-user-registration-data

• a dataset with revert metadata https://trello.com/c/FZd4UIcR/29-revert-tracking-and-revert-dump-generation

We also have specs for new server-side logs that will track in a clean way page creations, page moves and page deletions:

https://trello.com/c/aKzWq1e3/259-create-schemas-for-page-creation-moves-and-deletions

Finally, we’re discussing how to expose to labs existing EventLogging schemas that include public data that should be made publicly available. I don’t have a definite ETA for each of these projects, but I’ll make sure we post announcements on the lists as soon as new data becomes available.

Dario