>This reminds me: Is there some kind of an open policy document about what is supposed to sanitized? The general idea is "user's private information", but I'd love >details and examples, especially non-trivial ones.

There is quite  a bit of documentation about sanitization and I am including some links below. FYI that we will not be working on this area in the near term as our efforts are concentrated in scaling the pageview API and edit history reconstruction. Please see:

https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly/Identity_reconstruction_analysis
https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly/Sanitization


On Sat, Jul 30, 2016 at 11:00 PM, Amir E. Aharoni <amir.aharoni@mail.huji.ac.il> wrote:


בתאריך 30 ביולי 2016 06:22,‏ "Dan Andreescu" <dandreescu@wikimedia.org> כתב:
Hi,

Welcome to the first of a series of semi-regular updates on our progress towards Wikistats 2.0.

Much appreciated. Updates about this are very interesting.

* Finding data on Wikistats is a bit hard for new users, so we're working on new ways to organize what's available and present it in a comprehensive way along with other data sources like dumps

I should mention that there are quite a lot of things in Wikistats that are NOT hard to find :)

And I hope it will remain that way. A basic metric like active and very active users, and data for a language in relation to the number of its speakers are very straightforward, and should remain that way.
 
3. [        ] Sanitize pageview data with more dimensions for public consumption
6. [        ] Sanitize editing data for public consumption

This reminds me: Is there some kind of an open policy document about what is supposed to sanitized? The general idea is "user's private information", but I'd love details and examples, especially non-trivial ones. For example, I sometimes hear that grand total numbers are usually OK to publish, but some wikis are so small that even the bare numbers may make it possible to guess some private information. It would be lovely to have a written policy about this
 
9. [        ] Officially Replace stats.wikipedia.org with (maybe) analytics.wikipedia.org

But please don't break existing links :)
 
* no easy way to look at data across wikis.  If someone asks you to run a quarry query to look at data from all wikipedias, you have to run hundreds of separate queries, one for each database

 
If you're still reading, congratulations, sorry for the wall of text.

No problem at all, very useful!

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics