3) read/write access to a shared staging DB that can
be used as scratch space for temporary tables (similar to the
staging DB on s1-analytics). If you create tables on staging,
please prefix them with your shell user id (e.g. dartar_foo).
You might want to start using the toolserver/toollabs convention
that if you add _p database, it can be viewed by anyone. That way
you can mark databases that don't contain private information and
might be opened up to more people in the future.
in fact, on s1-analytics we have two separate databases:
• “staging” is a sandbox for researchers to store all kind of temporary datasets, many of which are not meant to be permanently retained or documented
• “prod” is meant to host well-documented datasets that do not contain private information and are kosher for publication
We have several projects in the pipeline to generate datasets of analytics interest and that we would like to expose to labs, these include:
We also have specs for new server-side logs that will track in a clean way page creations, page moves and page deletions:
Finally, we’re discussing how to expose to labs existing EventLogging schemas that include public data that should be made publicly available. I don’t have a definite ETA for each of these projects, but I’ll make sure we post announcements on the lists as soon as new data becomes available.
Dario