Re: [Foundation-l] UvA what they need in log data from the Wikimedia Foundation

16 Sep 2007

On 9/15/07, GerardM &lt;gerard.meijssen(a)gmail.com&gt; wrote:
...
  Hoi,
 The university of Amsterdam (UvA)  is getting log information that is
 thoroughly anonymised to the point where it becomes not as useful as it
 should be. The UvA is working on what they call a "peer to peer Wikipedia".
 Their interest in the data is not in the specific IP number of a requester
 for information, their interest is in where a request is coming from. The
 point is that is best, fastest and cheapest when information is available
 from a peer that is close by. 
Would a simple break down of bytes and requests per autonomous system
number over a fairly wide time window (say, days), fit their needs?

Example data:

Collection Span               ASN   REQs  Bytes sent  hit-rate
20070801000000-20070801235959 14907 1000  10289000    .99987
20070802000000-20070801235959 14907 2000  20578013    .99916

Or perhaps by hour and AS over some span:

Collection Span               HrGMT ASN   REQs  Bytes sent  hit-rate
20070801000000-20070814235959 00    14907   40  411560      .9688
20070801000000-20070814235959 01    14907   20  205780      .9832

I don't see any reason why we couldn't release aggregates like these.
We should be generating them for our own planning purposes in any
case.

If they wanted details about object locality and things like that, we
could anonymize requests objects by unique IDs but doing that would
require a lot more care.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Foundation-l] UvA what they need in log data from the Wikimedia Foundation