Re: [Wiki-research-l] [Wikimedia-l] wikipedia access traces ?

17 Sep 2014

Hello Giovanni,
on second thought, I think the Click dataset won't do either.
I've parsed the smaller sample [1], which is said to be extracted from the
bigger one.

In that dataset there are ~34k entries related to Wikipedia, but they look
like the following:

{"count": 1, "timestamp": 1257181201, "from":
"en.wikipedia.org", "to": "
ko.wikipedia.org"}

That is, the log only  reports the host/domain accessed, but not the
specific URL being requested (to be clear, the one in the HTTP request
issued by the client).

This is what is of main interest to me.

Thanks for your interest anyway!
Valerio

1 - http://carl.cs.indiana.edu/data/#traffic-websci14

On Wed, Sep 17, 2014 at 4:24 PM, Valerio Schiavoni <
valerio.schiavoni(a)gmail.com&gt; wrote:

...
  Hello Giovanni,
 thanks for the pointer to the Click datasets.
 I'd have to take a look at the complete dataset, to see how much of those
 requests are touching wikipedia.

 Then, one of the requirements to access those datas is:
 "The Click Dataset is large (~2.5 TB compressed), which requires that it
 be transferred on a physical hard drive. You will have to provide the drive
 as well as pre-paid return shipment. "

 I have to check if this is possible and how long this might take to ship
 and send back an hard-drive from Switzerland.
 I'll let you know !!

 Best,
 Valerio

 On Wed, Sep 17, 2014 at 4:09 PM, Giovanni Luca Ciampaglia <
 gciampag(a)indiana.edu&gt; wrote:

  Valerio,

 I didn't know such data existed. As an alternative, perhaps you could
 have a look at our click datasets, which contain requests to the Web at
 large (i.e., not just Wikipedia) generated from within the campus of
 Indiana University over a period of several months. HTH

 http://carl.cs.indiana.edu/data/#click

 Cheers

 G

 Giovanni Luca Ciampaglia

 ✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA
 ☞ http://www.glciampaglia.com/
 ✆ +1 812 855-7261
 ✉ gciampag(a)indiana.edu

 2014-09-17 9:53 GMT-04:00 Valerio Schiavoni &lt;valerio.schiavoni(a)gmail.com&gt;
 :

  Hello,
 just bumping my email from last week, since so far I did not get any
 answer.

 Should I consider that dataset to be somehow lost ?

 I've also contacted the researchers who partially released it, but
 making it publicly available is tricky for them, due to its size (12 TB),
 which might instead be somehow in the norms of the operations taken daily
 by Wikipedia servers.

 Thanks again,
 Valerio

 On Wed, Sep 10, 2014 at 4:15 AM, Valerio Schiavoni <
 valerio.schiavoni(a)gmail.com&gt; wrote:

> Dear WikiMedia foundation,
> in the context of a EU research project [1], we are interested in
> accessing
> wikipedia access traces.
> In the past, such traces were given for research purposes to other
> groups
> [2].
> Unfortunately, only a small percentage (10%) of that trace has been
> made
> made available (10%).
> We are interested in accessing the totality of that same trace (or even
> better, a more recent one, but the same one will do).
>
> If this is not the correct ML to use for such requests, could please
> anyone
> redirect me to correct one ?
>
> Thanks again for your attention,
>
> Valerio Schiavoni
> Post-Doc Researcher
> University of Neuchatel, Switzerland
>
> 1 - http://www.leads-project.eu
> 2 - http://www.wikibench.eu/?page_id=60
>

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] [Wikimedia-l] wikipedia access traces ?