When I asked in what way, if any, the Swedish chapter could help the toolserver project, the short answer I got was that a new server would cost 8000 euro and a year's salary for an admin would cost even more. If we had 8000 euro to spend (we don't) and nobody asked us what we use the money for, maybe we could get away with that. But when we ask people for donations, they like to know what it's good for. So what is the toolserver good for, really? I think I know this, but how can I explain it to others?
The annual report (Tätigkeitsbericht) for 2007 for Wikimedia Deutschland mentions that 60,000 euro was used to double squid capacity and 10,000 euro was used for the toolserver cluster. The report for 2008 should be due soon, I guess.
I think it could be useful if the toolserver project could find a way to explain its current capacity, what kind of services this capacity is used for, and how much this capacity could be increased by a certain donation. We could make a little folder or poster that explains this, and distribute among our fans.
The easiest would perhaps be to count the HTTP requests. How many requests (or page views) does the toolserver receive per year, day, or second? How are they distributed over the various services? Is it possible to count how many unique visitors (IP addresses) each service has? Divided by language or referer language of Wikipedia? Could the toolserver produce something similar to Domas' hourly visitor statistics for Wikipedia?
There's already http://toolserver.org/%7Einteriot/cgi-bin/tstoc where the right-most column indicates the number of individual IP adresses in the last 7 days. Is there a log of these data? It must be wrong (surely?) that only 8 tools had any accesses this week. It could be correct that "21933 unique IP's visited the toolserver this week", but that number sounds rather low.
The toolserver hopefully spends far more processor cycles for each HTTP request than a squid cache. But just how many more? Every server has an economic life of maybe 3 years, while its purchasing price is written off, and it serves a number of HTTP requests in that time. So what is the hardware (and hosting) cost of each request?
Of course, HTTP requests is not everything. Is there a way we could measure other uses of the toolserver?
Getting back to my favorite: The geographic applications. Are the map tiles for the WikiMiniAtlas served from the toolserver? How much power does this currently use for the Swedish Wikipedia? What if the maps were shown inline for every visitor, instead of as a pop-up that very few people click on? How much extra would that cost? Could such a service still run on the toolserver, or would it need to be moved to the servers in Florida?
Lars Aronsson schrieb:
When I asked in what way, if any, the Swedish chapter could help the toolserver project, the short answer I got was that a new server would cost 8000 euro and a year's salary for an admin would cost even more. If we had 8000 euro to spend (we don't) and nobody asked us what we use the money for, maybe we could get away with that. But when we ask people for donations, they like to know what it's good for. So what is the toolserver good for, really? I think I know this, but how can I explain it to others?
Find some tools people are using a lot. Wiki MiniAtlas, CatScan, CheckUsage, SulTool, etc.
The annual report (Tätigkeitsbericht) for 2007 for Wikimedia Deutschland mentions that 60,000 euro was used to double squid capacity and 10,000 euro was used for the toolserver cluster. The report for 2008 should be due soon, I guess.
It's due soon, yes. I wrote it. It doesn't have much technical details - basically, it sais that the investments in squids payed off, and we won't need new squids until the end of the year. About the toolserver it says that we got one DB server per wiki cluster now, extended web server power to match demands, and have moved batch jobs and bots to a separate server (which is nearly overloaded already, btw).
I think it could be useful if the toolserver project could find a way to explain its current capacity, what kind of services this capacity is used for, and how much this capacity could be increased by a certain donation. We could make a little folder or poster that explains this, and distribute among our fans.
Who would? This is a lot of work. It would sure be nice to have... and maybe it would also lead people to donate. But I don't know if dead wood is really so useful. Perhaps let some of our tools let people know more directly that they are using the toolserver.
The easiest would perhaps be to count the HTTP requests. How many requests (or page views) does the toolserver receive per year, day, or second? How are they distributed over the various services? Is it possible to count how many unique visitors (IP addresses) each service has? Divided by language or referer language of Wikipedia? Could the toolserver produce something similar to Domas' hourly visitor statistics for Wikipedia?
Access logs for the last 7 days are in /var/log/http on wolfsbane. Webalizer reports for this should be at http://toolserver.org/~daniel/stats/ but they appear to be broken - webalizer seems to have problems with very large files. Stats for the old apache server are in http://toolserver.org/~daniel/stats-old/. I only now remembered to change the stats to look at the ZWS logs - damn, I was sure I had done that months ago. Sorry.
There's already http://toolserver.org/%7Einteriot/cgi-bin/tstoc where the right-most column indicates the number of individual IP adresses in the last 7 days. Is there a log of these data? It must be wrong (surely?) that only 8 tools had any accesses this week. It could be correct that "21933 unique IP's visited the toolserver this week", but that number sounds rather low.
I suspect it's looking at the Apache logs, instead of ZWS logs. Or it's simply bropken, who knows.
The toolserver hopefully spends far more processor cycles for each HTTP request than a squid cache. But just how many more? Every server has an economic life of maybe 3 years, while its purchasing price is written off, and it serves a number of HTTP requests in that time. So what is the hardware (and hosting) cost of each request?
Hard to tell, depends on what you include. Do you include resources used for bots in the cost of web requests?
Of course, HTTP requests is not everything. Is there a way we could measure other uses of the toolserver?
We could look at the number of bot jobs running on rosemary - http://ganglia.toolserver.org/ is intended to provide this information, but it's broken because of some technical nastynes.
Getting back to my favorite: The geographic applications. Are the map tiles for the WikiMiniAtlas served from the toolserver?
They are served from the stable cache on willow, aka stable.toolserver.org.
How much power does this currently use for the Swedish Wikipedia?
Hard to tell - how would we know if something is loaded "for" the Swedisch Wikipedia? But you can look at the apache logs on stable, i suppose.
What if the maps were shown inline for every visitor, instead of as a pop-up that very few people click on? How much extra would that cost?
We can only guess. But note that just today, a cooperation project with OpenStreetMap was launched to make dynamic, zoomable inline maps in wikipedia happen. They will get extra hardware for that.
Could such a service still run on the toolserver, or would it need to be moved to the servers in Florida?
It doesn't really matter if it's in florida or amsterdam. But it would need dedicated hardware. When this goes full scale, I expect to see the tiles stored in both places, probably on our big file serfvers that also contain all other images.
-- daniel
toolserver-l@lists.wikimedia.org