[Foundation-l] Release of squid log data

Gregory Maxwell gmaxwell at gmail.com
Fri Sep 14 19:37:21 UTC 2007


On 9/14/07, Tim Starling <tstarling at wikimedia.org> wrote:
[snip]
>They are asking if they can have the full data stream
>including IP addresses, and they are prepared to sign a confidentiality
>agreement to get it.
[snip]
> Currently we let toolserver users process
> similar data, assisted by Wikipedia administrators who put web bugs on
> the site. They use it to produce the WikiCharts report. Are we to tell
> prospective research groups to use the toolserver, rather than their own
> substantial hardware, for analysis of Wikipedia traffic patterns?
[snip]

This is simply not true.

The web bug used by Wikicharts uses a URL which gets a custom log
format which logs only the most basic data, here is an example entry:

[14/Sep/2007:00:09:36 +0000] "GET
/xyz.png?ns=0&title=Honored%20Matres&factor=6000&wiki=enwiki HTTP/1.1"

That is the entirety of the logged data. With the exception of the
HTTP version nothing is gathered which is not strictly necessary to
produce the top viewed page data, and even that is gathered at a
sampling rate low enough to make the usefulness questionable.

Not that it isn't horribly silly that we're using a JS web-bug and
toolserver for this because we are already recording much better data
while the wikicharts approach is unreliably, low quality, and
trivially subject to manipulation. At the time Wikicharts was
established there was no Wikimedia logging, and because all of the
Wikimedia logging data is kept private even from most of our own
'inside people', Wikicharts continues to use this method for its
reporting.

The data we are providing to outsiders is substantially better than
the data available to people with @wikimedia.org addresses, including
myself.

For the moment I'm going to refrain from making further public comment
on this subject because I've not yet read most of the messages and I
think consideration is deserved before issuing some harsh criticism.
... but the comment about wikicharts logging is a factual matter which
demanded correction.



More information about the foundation-l mailing list