[Foundation-l] Wikipedia tracks user behaviour via third party companies #2

Tisza Gergő gtisza at gmail.com
Fri Jun 5 23:12:34 UTC 2009

Aryeh Gregor <Simetrical+wikilist at ...> writes:
> I believe the major problems with the script are
> 1) It sent data to a server not directly controlled by the Wikimedia
> Foundation.  No personally identifiable information should be sent in
> bulk to any non-Wikimedia server.  Operation of any server hosting
> significant amounts of sensitive information must be directly and
> immediately accountable to Wikimedia's normal chain of command.

I don't think thats reasonable. WikiMiniAtlas, for example, is hosted by WM-DE,
thus every time it is used, IP data is sent to a non-WMF server. (Users have to
click to load it, but it is linked from every page that has coordinates, so it
can be considered bulk. And when it gets replaced with OSM, static map snippets
will be loaded by default from a WM-DE-owned cache server, if I understand the
setup correctly.)

Of course, there should be *some* limit on what servers can receive data. As I
said, the obvious choice for me would be to tie it to chapters (maybe it could
even be included in the chapter agreement?). That, and maybe WMF staff should
have root access for emergencies?
> 2) This use of data was not specifically authorized by the Wikimedia
> Foundation, via either the Board or appropriate officers.  Peter may
> be a checkuser, but that gives him authorization only to use checkuser
> functions, not to collect or harvest other types of data.  As has been
> noted, the data collected includes much more than checkusers can
> access in the course of using their checkuser rights.

Agreed. So consider this as a request for authorization :)

> Last I heard, Erik Zachte is working on improved statistics for all
> Wikimedia projects.  These are running on Wikimedia servers and
> specifically approved by Wikimedia.  It seems like the best course of
> action would be for people to point out what they think is lacking in
> his statistics, and perhaps offer to help improve them.

Certainly, but that in itself is no reason not to have another system for the
time being. It is not unheard of that developement of new features get delayed
by a few years :) We have a working system in place; I don't think it should be
removed just becuase there will be a better one at some indefinite point in
time. It can removed at that time just as well.

As for statistics-related feature requests, I would have quite a few :) Unique
visits/visitors, referrer data, country/browser/OS distribution (I seem to
recall seeing something like this in Erik's stats, but I can't find it now),
breakdown by action and by user group, search term statistics (without the
wikistics.falsicon.de JS hack), gadget usage data. An API would also be nice (so
that for example a user script can query the data for all internal links on the
page, and show a colormap - it would be a nice tool for designing the layouts of

(It would be somewhat unfair to say Erik's starts are lacking these, since our
stat can't measure most of them either. What I would miss most would be visitor
counts and browser distribution. Also, I think stats.grok.se and
wikistics.falsicon.de give slightly incorrect page view results because they
don't take redirects and special pages into account.)

