Hi, are this information accessible for an academic research paper? Thank you
On 11/24/06, Antonio Gulli gulli@di.unipi.it wrote:
Hi, are this information accessible for an academic research paper? Thank you
Sorry, we do not currently store this data because of the performance impact. We do, however, have plenty of data on editing.
We also have some low quality sampled data on readership for some of our projects at:
Gregory Maxwell ha scritto:
On 11/24/06, Antonio Gulli gulli@di.unipi.it wrote:
Hi, are this information accessible for an academic research paper? Thank you
Is wiki using apache web server or something equivalent server? I was referring to the access.log file
Sorry, we do not currently store this data because of the performance impact. We do, however, have plenty of data on editing.
We also have some low quality sampled data on readership for some of our projects at:
On 11/24/06, Antonio Gulli gulli@di.unipi.it wrote:
Is wiki using apache web server or something equivalent server? I was referring to the access.log file
Although we use Apache, we do not store an access.log. We also use squid, but have disabled logging in that as well.
At peak we are serving over 20,000 requests per second. At this activity level logging would present a non-negligible performance and administrative overhead.
Lets pretend for a moment that all access hit apache:
My local mediawiki installation on apache produces log entries of 232.13 bytes per hit on average. I would expect that my log entries would be shorter than the entries we'd see in production.
Over a day we are receiving about 1,188,345,600 http requests.
This would be 256.9 GiB/day in access logs.
At 7.8 terabytes of log data to simply preserve a month's history, keeping full access logs would be both unreasonable and wasteful.
If you have some especially interesting research ideas, and your research can be done on smaller amounts of data that we might be collecting (such as the wikicharts data) then I would be glad to discuss the possibilities. But it would be best to take that discussion off list...
You could also try this link if you want general statistics on Wikipedia: http://stats.wikimedia.org/EN/Sitemap.htm
-- Hay Kranen / [[User:Husky]]
On 11/24/06, Gregory Maxwell gmaxwell@gmail.com wrote:
On 11/24/06, Antonio Gulli gulli@di.unipi.it wrote:
Is wiki using apache web server or something equivalent server? I was referring to the access.log file
Although we use Apache, we do not store an access.log. We also use squid, but have disabled logging in that as well.
At peak we are serving over 20,000 requests per second. At this activity level logging would present a non-negligible performance and administrative overhead.
Lets pretend for a moment that all access hit apache:
My local mediawiki installation on apache produces log entries of 232.13 bytes per hit on average. I would expect that my log entries would be shorter than the entries we'd see in production.
Over a day we are receiving about 1,188,345,600 http requests.
This would be 256.9 GiB/day in access logs.
At 7.8 terabytes of log data to simply preserve a month's history, keeping full access logs would be both unreasonable and wasteful.
If you have some especially interesting research ideas, and your research can be done on smaller amounts of data that we might be collecting (such as the wikicharts data) then I would be glad to discuss the possibilities. But it would be best to take that discussion off list... _______________________________________________ foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
On Fri, Nov 24, 2006 at 5:36 PM, Gregory Maxwell gmaxwell@gmail.com wrote:
Over a day we are receiving about 1,188,345,600 http requests.
This would be 256.9 GiB/day in access logs.
As I recall logging stopped when the rate of incoming log data outpaced the rate at which the data could be written to disk :)
wikimedia-l@lists.wikimedia.org