[Foundation-l] UvA what they need in log data from the Wikimedia Foundation
GerardM
gerard.meijssen at gmail.com
Sat Sep 15 21:22:08 UTC 2007
Hoi,
It is the distribution in time of geographical needs that is critical. When
a particular article becomes relevant in a certain area, you want to address
the flood of requests by distributing the article to the peers that are
close to where the demand is. In this peering system, a node may have an
article but it is not necessary to have all articles.
As was mentioned before, work has been done on the data received from the
WMF. Based on the data a paper will be published in the near future. When it
is available, I will post a link.
Thanks,
Gerard
On 9/15/07, Gregory Maxwell <gmaxwell at gmail.com> wrote:
>
> On 9/15/07, GerardM <gerard.meijssen at gmail.com> wrote:
> > Hoi,
> > The university of Amsterdam (UvA) is getting log information that is
> > thoroughly anonymised to the point where it becomes not as useful as it
> > should be. The UvA is working on what they call a "peer to peer
> Wikipedia".
> > Their interest in the data is not in the specific IP number of a
> requester
> > for information, their interest is in where a request is coming from.
> The
> > point is that is best, fastest and cheapest when information is
> available
> > from a peer that is close by.
>
> Would a simple break down of bytes and requests per autonomous system
> number over a fairly wide time window (say, days), fit their needs?
>
> Example data:
>
> Collection Span ASN REQs Bytes sent hit-rate
> 20070801000000-20070801235959 14907 1000 10289000 .99987
> 20070802000000-20070801235959 14907 2000 20578013 .99916
>
> Or perhaps by hour and AS over some span:
>
> Collection Span HrGMT ASN REQs Bytes sent hit-rate
> 20070801000000-20070814235959 00 14907 40 411560 .9688
> 20070801000000-20070814235959 01 14907 20 205780 .9832
>
> I don't see any reason why we couldn't release aggregates like these.
> We should be generating them for our own planning purposes in any
> case.
>
> If they wanted details about object locality and things like that, we
> could anonymize requests objects by unique IDs but doing that would
> require a lot more care.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> http://lists.wikimedia.org/mailman/listinfo/foundation-l
>
More information about the foundation-l
mailing list