Hi there,
on the stable toolserver project page, I read:
"There are a number of important, well liked, and widely used tools on the toolserver systems. Currently the toolserver serves well over 1 million http requests per day to over 65,000 distinct IPs per day and there are a number of non-http based tools as well. The 3 to 4 most popular tools account for roughly 3/4 of the http requests." (http://meta.wikimedia.org/wiki/Toolserver/Stable_server)
In short, it shows that the toolserver and the tools hosted by it *are* important to the community (and maybe the readers out there, I can't figure out, really).
I have a few questions about this: 1) do we know what tools are the most used (and hence would actually make most sense to migrate to a stable toolserver) 2) Can we, from the above list, also say which tools are the most useful (ie. without which some projects would just break, I am especially thinking Commons here) 3) Do I undersand this right in saying that a "stable" toolserver would mean a way of actually integrating for real those tools into our daily operational monitoring (ie. it *must* work, just like the websites must be up)? 4) Has anyone actually made any kind of a budget concerning what kind of machine we'd need, what the cost of maintenance would be, if we want to make sure there is some sysadmin time devoted to it (ie. a real full cost things about this).
My underlying idea is the following. There are many organisations out there, Wikimedia Chapters and other friends of Wikimedia, who would definitely finance something like a stable toolserver, all the more if it is proven that it is essential to the Wikimedia projects [*].
So I would really urge those who are developping and maintaining tools that are useful and widely used to express their interest and help build this project so that it is "sellable" to entities ready to help us with making it happen.
Cheers,
Delphine
*for example, in all POV fashion, I am not sure that an edit counter" is *essential* to the Wikimedia projects, although it's good tool and might be an important asset to the development of the community. On the other hand, I am convinced that "check usage" is essential for the Wikimedia projects on a wide scale. ----- Delphine Ménard Chapters coordinator Wikimedia Foundation dmenard[at]wikimedia[punto]org
Delphine Ménard wrote:
In short, it shows that the toolserver and the tools hosted by it *are* important to the community (and maybe the readers out there, I can't figure out, really).
things like Magnus' geohack are used by readers without them really knowing it's from the toolserver.
- do we know what tools are the most used (and hence would actually
make most sense to migrate to a stable toolserver)
http://tools.wikimedia.de/~daniel/stats/usage_200711.html#TOPURLS shows a few of the most used scripts.
- Can we, from the above list, also say which tools are the most
useful (ie. without which some projects would just break, I am especially thinking Commons here)
possibly someone could, but not me ;)
- Do I undersand this right in saying that a "stable" toolserver
would mean a way of actually integrating for real those tools into our daily operational monitoring (ie. it *must* work, just like the websites must be up)?
not really; the stable toolserver is still part of the toolserver cluster. the difference is that it only runs tools which have been shown to be stable, and aren't going to break the whole thing.
i'd also like to keep the stable server load to a reasonable level, but i'm not sure the Verein could support several stable servers if we need more.
- Has anyone actually made any kind of a budget concerning what kind
of machine we'd need, what the cost of maintenance would be, if we want to make sure there is some sysadmin time devoted to it (ie. a real full cost things about this).
not that i know of.
- river.
On Nov 22, 2007 9:23 PM, River Tarnell river@wikimedia.org wrote:
- Can we, from the above list, also say which tools are the most
useful (ie. without which some projects would just break, I am especially thinking Commons here)
possibly someone could, but not me ;)
Do we have a log of referring URLs or which projects are hosted on the servers that requests originate from? (Sorry if I've understood the structure of toolserver/wikimedia server farm incorrectly)
We could probably get 10,000 lines of the access log and do some basic analysis on it to work out what requests are being used for. Alternatively, just informally survey the people on each wiki familiar with the role of toolserver tools in their project.
draicone@gmail.com wrote:
> 2) Can we, from the above list, also say which tools are the most > useful (ie. without which some projects would just break, I am > especially thinking Commons here)
possibly someone could, but not me ;)
We could probably get 10,000 lines of the access log and do some basic analysis on it to work out what requests are being used for.
i don't think this would help with 'most useful' tools - that doesn't necessarily correlate with 'most used'.
Alternatively, just informally survey the people on each wiki familiar with the role of toolserver tools in their project.
that seems to make more sense.
- river.
On 11/22/07, Delphine Ménard notafishz@gmail.com wrote:
On the other hand, I am convinced that "check usage" is essential for the Wikimedia projects on a wide scale.
As long as anything has to share resources with untrusted and unreviewed programs, it will probably not be possible for it to work as well as the trusted and reviewed programs working off the main clusters' resources, because the untrusted and unreviewed programs will tend to be relatively numerous and inefficient. Look at the load on the toolserver, and the replication lag on its database. Of course you could budget more funds for more servers, but my suspicion is that people would start using them more and lagging them more regardless. You're certainly never going to have as much funding as the main servers.
On the flip side of that, you can't just stick something like check usage up on the main servers without review by the people who are collectively available 24/7 to maintain those servers and the software running on them. Toolserver stuff is not centrally reviewed for security or performance issues, and the core devs/sysadmins are probably not familiar with how it works. It also tends to be somewhat tacked-on, and would be better if integrated properly into MediaWiki.
So I think the distinction between "carefully-maintained, high-availability software" and "toolserver stuff" will remain to some extent, and the goal needs to be to move the most valuable stuff to extensions or even core and run it on the main servers. But those are just my thoughts.
Simetrical wrote:
On the flip side of that, you can't just stick something like check usage up on the main servers without review by the people who are collectively available 24/7 to maintain those servers and the software running on them. Toolserver stuff is not centrally reviewed for security or performance issues, and the core devs/sysadmins are probably not familiar with how it works. It also tends to be somewhat tacked-on, and would be better if integrated properly into MediaWiki.
So I think the distinction between "carefully-maintained, high-availability software" and "toolserver stuff" will remain to some extent, and the goal needs to be to move the most valuable stuff to extensions or even core and run it on the main servers. But those are just my thoughts.
Aye, but the "carefully maintaned" stable toolserver tools should be efficient, too (if the task can be efficiently done, of course, the criteria for inclusion would be greater).
In fact, i'm for rewriting some of these tools in the moving. Why? Current tools are one man's work. They work, but can be a bit {{esoteric}}, with hacks added on backend changes and only the author fully knows it. By rewriting them between all the maintaners, they all know the baby since birth. It's easier knowing a check-usage when you have seen the functions grow from one sql query, than starting with a teenager.
Also, the peer reviewing of every check-in should push for more efficient and modular code. Plus, the new tools would have a clear license (toolserver scripts are required to have a free license, but it's not clear for all under which they're, rewriting fixes it). That said, doesn't mean you can't view or even copy code from previous tool. In the end, it's up to the team.
Opinions?
On 11/22/07, Platonides platonides@gmail.com wrote:
Aye, but the "carefully maintaned" stable toolserver tools should be efficient, too (if the task can be efficiently done, of course, the criteria for inclusion would be greater).
In which case they should be moved to the main servers, and in particular have access to the same (guaranteed non-lagged) databases as everything else.
In fact, i'm for rewriting some of these tools in the moving. Why? Current tools are one man's work. They work, but can be a bit {{esoteric}}, with hacks added on backend changes and only the author fully knows it. By rewriting them between all the maintaners, they all know the baby since birth. It's easier knowing a check-usage when you have seen the functions grow from one sql query, than starting with a teenager.
In that case, they should be rewritten as proper MediaWiki extensions (if appropriate), and proposed for enabling as such on the application servers.
Simetrical wrote:
On 11/22/07, Platonides wrote:
Aye, but the "carefully maintaned" stable toolserver tools should be efficient, too (if the task can be efficiently done, of course, the criteria for inclusion would be greater).
In which case they should be moved to the main servers, and in particular have access to the same (guaranteed non-lagged) databases as everything else.
The tools should be high-quality qualifying to be moved to the main server ;) While these tools could get a server on the main cluster, i don't know if that would slow changing them, and the sysadmisn would still want to review it (which is slow).
In fact, i'm for rewriting some of these tools in the moving. Why? Current tools are one man's work. They work, but can be a bit {{esoteric}}, with hacks added on backend changes and only the author fully knows it. By rewriting them between all the maintaners, they all know the baby since birth. It's easier knowing a check-usage when you have seen the functions grow from one sql query, than starting with a teenager.
In that case, they should be rewritten as proper MediaWiki extensions (if appropriate), and proposed for enabling as such on the application servers.
EditCounter could be quite pluggable (in fact, a basic one was added to Special:Preferences) but how would you add big ones like CheckUsage? As a special page with hardcoded db's to check? Not happening soon.
On 11/22/07, Platonides platonides@gmail.com wrote:
EditCounter could be quite pluggable (in fact, a basic one was added to Special:Preferences) but how would you add big ones like CheckUsage? As a special page with hardcoded db's to check? Not happening soon.
There would be a variety of possible ways. One way would be having the imagelinks table centralized along with the image table (except probably not with separate local imagelinks tables too). You could have an extra column for the site, say a smallint for compactness. Needless to say, this would require having a central database that all sites can access and update, and would take a considerable amount of effort to program and set up, but it's the correct way to do it regardless. Or at least one correct way. The current way is possibly slow (hundreds of queries) and unduly difficult for third-parties to take advantage of.
toolserver-l@lists.wikimedia.org