You could also try this link if you want general statistics on Wikipedia: http://stats.wikimedia.org/EN/Sitemap.htm
-- Hay Kranen / [[User:Husky]]
On 11/24/06, Gregory Maxwell gmaxwell@gmail.com wrote:
On 11/24/06, Antonio Gulli gulli@di.unipi.it wrote:
Is wiki using apache web server or something equivalent server? I was referring to the access.log file
Although we use Apache, we do not store an access.log. We also use squid, but have disabled logging in that as well.
At peak we are serving over 20,000 requests per second. At this activity level logging would present a non-negligible performance and administrative overhead.
Lets pretend for a moment that all access hit apache:
My local mediawiki installation on apache produces log entries of 232.13 bytes per hit on average. I would expect that my log entries would be shorter than the entries we'd see in production.
Over a day we are receiving about 1,188,345,600 http requests.
This would be 256.9 GiB/day in access logs.
At 7.8 terabytes of log data to simply preserve a month's history, keeping full access logs would be both unreasonable and wasteful.
If you have some especially interesting research ideas, and your research can be done on smaller amounts of data that we might be collecting (such as the wikicharts data) then I would be glad to discuss the possibilities. But it would be best to take that discussion off list... _______________________________________________ foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l