Hello everybody,
My name is Tania and I am new to this list. I am interested in using Wikipedia access traces for my research work: I just need the timestamp and the URL accessed per request. I found some access traces that would be suitable, but the links are offline (http://www.wikibench.eu/?page_id=60/). Apart from this, I found some page count statistics in the Wikipedia Archive (http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive), but they do not fit my requirements. I would be grateful if someone could tell me how to obtain some access traces from Wikipedia.
Regards,
Tania L.B.
Hi Tania, This is what we have used in our work: http://dumps.wikimedia.org/other/pagecounts-raw/
Let me know if you need further information. cheers, Taha Yasseri
On Mon, Dec 3, 2012 at 5:23 PM, cmasmas cmasmas cmasmas10@gmail.com wrote:
Hello everybody,
My name is Tania and I am new to this list. I am interested in using Wikipedia access traces for my research work: I just need the timestamp and the URL accessed per request. I found some access traces that would be suitable, but the links are offline (http://www.wikibench.eu/?page_id=60/). Apart from this, I found some page count statistics in the Wikipedia Archive (http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive), but they do not fit my requirements. I would be grateful if someone could tell me how to obtain some access traces from Wikipedia.
Regards,
Tania L.B.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Taha,
Thank you for the suggestion. The point is that the page count files contain the number of requests of a particular URL with an hour granularity. Instead, I need the exact timestamp of each request. It is the format used in http://www.wikibench.eu/?page_id=60
- The timestamp of the request in Unix notation with milli-second precision - The requested URL - A flag to indicate if the request resulted in a database update or not
I could look like:
*20:00:01 wikipedia.org.... 20:00:03 url2 20:00:04 url3* (...)
I hope it is clear now. Thank you very much in advance.
Regards,
Tania L.B.
On 3 December 2012 18:40, Taha Yasseri taha.yaseri@gmail.com wrote:
Hi Tania, This is what we have used in our work: http://dumps.wikimedia.org/other/pagecounts-raw/
Let me know if you need further information. cheers, Taha Yasseri
On Mon, Dec 3, 2012 at 5:23 PM, cmasmas cmasmas cmasmas10@gmail.comwrote:
Hello everybody,
My name is Tania and I am new to this list. I am interested in using Wikipedia access traces for my research work: I just need the timestamp and the URL accessed per request. I found some access traces that would be suitable, but the links are offline (http://www.wikibench.eu/?page_id=60/). Apart from this, I found some page count statistics in the Wikipedia Archive (http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive), but they do not fit my requirements. I would be grateful if someone could tell me how to obtain some access traces from Wikipedia.
Regards,
Tania L.B.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- .t
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I know Tania, but WP has around 12,000 M page views in a month. I suggest, you think twice what you really need, and whether you are able to handle it once you have it!
On Mon, Dec 3, 2012 at 5:49 PM, cmasmas cmasmas cmasmas10@gmail.com wrote:
Hi Taha,
Thank you for the suggestion. The point is that the page count files contain the number of requests of a particular URL with an hour granularity. Instead, I need the exact timestamp of each request. It is the format used in http://www.wikibench.eu/?page_id=60
- The timestamp of the request in Unix notation with milli-second
precision
- The requested URL
- A flag to indicate if the request resulted in a database update or
not
I could look like:
*20:00:01 wikipedia.org.... 20:00:03 url2 20:00:04 url3* (...)
I hope it is clear now. Thank you very much in advance.
Regards,
Tania L.B.
On 3 December 2012 18:40, Taha Yasseri taha.yaseri@gmail.com wrote:
Hi Tania, This is what we have used in our work: http://dumps.wikimedia.org/other/pagecounts-raw/
Let me know if you need further information. cheers, Taha Yasseri
On Mon, Dec 3, 2012 at 5:23 PM, cmasmas cmasmas cmasmas10@gmail.comwrote:
Hello everybody,
My name is Tania and I am new to this list. I am interested in using Wikipedia access traces for my research work: I just need the timestamp and the URL accessed per request. I found some access traces that would be suitable, but the links are offline ( http://www.wikibench.eu/?page_id=60/). Apart from this, I found some page count statistics in the Wikipedia Archive ( http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive), but they do not fit my requirements. I would be grateful if someone could tell me how to obtain some access traces from Wikipedia.
Regards,
Tania L.B.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- .t
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org