Lately, I've tried to make some large improvements, primarily in the Swedish language Wikipedia. Sometimes this has succeeded, sometimes it has failed or been delayed. Some of the delays have been due to failing server functionality, either at WMF or on the toolserver. We're victims to our own success. Provisional hacks tend to get so successful, that people start to rely on them.
I really want to get better geo tagging going in the Swedish Wikipedia. To this end, WikiMiniAtlas was activated some weeks ago. The activation went just fine. But the underlying database is not up to date. This is because Stefan Kühn is rewriting the script that extracts geo coordinates from the database dumps, and he has many other things to do. I don't blame him. He's doing a very good work, when he finds some time for it. But even if he had a solution ready, current database dumps might not be available.
Suppose database dumps were ready and WikiMiniAtlas was up to date, and I would have some idea for a contest among Swedish wikipedians to add more coordinates to articles with prizes being awarded to the top ten contributors. That's the kind of project for which I could apply for money. That money could cover some costs for the toolserver or for the ticket for somebody to go to the developer meeting. It would have many good side effects.
Of course, the WikiMiniAtlas is just one example of a very successful toolserver project that could generate such benefits.
Another example: Apparently the Dutch Wikipedia community has some tool (?) to facilitate image uploading, that we might want to adapt to Swedish. But that's not running on the toolserver today, so we would also have to find somewhere to host it. This can of course be solved, but what would the best solution be?
My question: when we see such opportunities, how can we make better use of them? The toolserver team has a list of "stable" services which are (at least) more stable than the rest. But how can we invest in making them even better and more stable, so other projects can build upon them?
One idea that I had was to transfer such projects to the central Wikimedia servers, and make them a part of the WMF infrastructure. This would hopefully make the services more stable, and free up resources for more experimentation on the toolserver. Lately, I've become more skeptic about that plan, since they seem to have more work than they can really handle.
Another idea is to set up a separate toolserver for Scandinavia. I heard some rumours that Wikimedia Norge (Norway) has resources to do this. But perhaps it is wiser to invest in the existing one?
I'm very happy that the new hardware has arrived and is being installed. It's unfortunate that it took so long for this to happen. How can we make resource planning work smoother in the future? There are millions of users who expect us to perform better, and I think they will send money if we ask them.
What are we short of? Money? Then say so. People? What kind?
So, who is the idiot that writes this message? I'm user:LA2 and my full name is Lars Aronsson. I'm a C/UNIX programmer from Sweden. Back in 1992 I started to scan old Scandinavian books under the name Project Runeberg, http://runeberg.org/ and after having started to scan encyclopedias (not just poetry and novels), finding Wikipedia was a perfect match. I've scanned two minor encyclopedias for the German and English Wikisource. I've also been an active contributor to OpenStreetMap since 2005. I happened to visit Berlin in 2004 when Wikimedia Deutschland was formed. In October 2007 I helped to create Wikimedia Sverige (Sweden) and has been a board member since. In 2008 we organized Wikipedia Academy in Sweden. That's not the only idea we have copied from the German chapter. We're now in talks with libraries and archives to see if we can have some major content exchanges. I've been to Wikimania in Frankfurt (2005) and Alexandria (2008).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
i've read your mail twice, but i don't really understand what sort of answers you're looking for. you seem to be more interested in the people side of the problem, e.g. how we can get more people to contribute; is that correct?
if you have specific technical questions/problems, i can answer those, or explain why things are as they are.
as far as resources go: yes, we're short of money. we have just about enough to keep the toolserver running most of the time, but with more money it could be a lot more reliable (and faster). we are also very short of admin time, in part because the WMF won't allow us to add any more admins until they've moved their private databases to a new cluster, which is taking a long time. (Werdna recently gained access to those databases by working for the WMF, which is why we were able to give him root access.)
i've started asking other chapters for funding; our ZWS license was purchased by wikimedia.fr, for example. but this is a slow process, and most chapters want to know exact (or at least roughly accurate) figures before they can decide whether they can help. this means we need to know what we want, and that means someone (which will probably end up being me) needs to sit down and work it out.
the main problem with having chapters buy stuff is most countries restrict how a non-profit organisation can transfer funds/assets, in most cases forbidding them to simply buy hardware and give it to the WMF or wikimedia.de. this means we could end up with 20 servers, each owned by a different chapter, which makes support/RMA/etc. a nightmare.
- river.
River Tarnell wrote:
i've read your mail twice, but i don't really understand what sort of answers you're looking for. you seem to be more interested in the people side of the problem, e.g. how we can get more people to contribute; is that correct?
My problem is a lack of speed. When I say "geotagging", I want geotagging to happen, but so far it is delayed by this and that. When I want to add interwiki links (using Multichill's suggester), the failed s3 replication delays a large portion of that work. I'm trying to find out what sides there are to this problem. I suspect there is a people side and a money side, and if we can solve these, the technical side will just be fun to solve. Maybe there is a legal side too.
if you have specific technical questions/problems, i can answer those, or explain why things are as they are.
I have tons of these, but I'll save them for another thread.
as far as resources go: yes, we're short of money. we have just about enough to keep the toolserver running most of the time, but with more money it could be a lot more reliable (and faster).
How does this money flow? Does the toolserver have a budget of its own? A balance sheet that shows what you've used the money for? Is the server wholly owned by Wikimedia Deutschland? They're hardly short of money, are they?
Unfortunately I don't sit on a lot of money, but I have contacts that can be used for finding funding.
we are also very short of admin time, in part because the WMF won't allow us to add any more admins until they've moved their private databases to a new cluster, which is taking a long time. (Werdna recently gained access to those databases by working for the WMF, which is why we were able to give him root access.)
Ah, always this WMF bottleneck. While we're waiting, could we send someone more along Werdna's path? Are the current admins doing some tasks that can be delegated to unprivileged users?
Are there some projects that only use database dumps, and no replicated data, that could run on a separate (low security) server with a more liberal admin policy?
i've started asking other chapters for funding; our ZWS license was purchased by wikimedia.fr, for example. but this is a slow process, and most chapters want to know exact (or at least roughly accurate) figures before they can decide whether they can help. this means we need to know what we want, and that means someone (which will probably end up being me) needs to sit down and work it out.
Yes, probably. Is there a budget, a balance sheet, a bank account? On the Toolserver Wiki, the donation page asks for donations to be sent to Wikimedia Deutschland or WMF. If both organizations have a problem to send money out of their countries, that seems suboptimal.
the main problem with having chapters buy stuff is most countries restrict how a non-profit organisation can transfer funds/assets, in most cases forbidding them to simply buy hardware and give it to the WMF or wikimedia.de. this means we could end up with 20 servers, each owned by a different chapter, which makes support/RMA/etc. a nightmare.
Fortunately, the Swedish chapter doesn't have any such restrictions. (There's no tax exemption in Sweden anyway.) So far, we have been more active in other areas than fundraising, but we still asked if we could send $2K of our surplus to WMF. That request was turned down, so we're now looking for some other goal. We'll probably support travel costs for the chapter meeting.
As a chapter, we have a need to present our achievments at some exhibitions and conferences in September-November. So if we can spend money now that makes a difference during spring/summer, that would be optimal. As a Swedish chapter, we need to present how our achievments have helped free knowledge in Sweden. A better infrastructure for WikiMiniMaps, for example, would do just that. We have to support Sweden, but we're not forbidden to help others.
Our next chapter board meeting is on Monday evening, March 2. Do you have any suggestion I should bring forward?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Lars Aronsson:
My problem is a lack of speed. When I say "geotagging", I want geotagging to happen, but so far it is delayed by this and that.
can't really answer that without knowing what 'this and that' are (i understand most of the problem are caused by people having a lack of time, which isn't really something we (= admins) can fix).
When I want to add interwiki links (using Multichill's suggester), the failed s3 replication delays a large portion of that work.
yes, this was unfortunate, but i've spoken to the WMF admins about it, and i think it shouldn't happen in future. the chances would be further reduced if we could have redundant database replicas, but we don't have the ~EUR24'000 (+ power costs) needed to make that happen.
if you have specific technical questions/problems, i can answer those, or explain why things are as they are.
I have tons of these, but I'll save them for another thread.
i can hardly wait.
as far as resources go: yes, we're short of money. we have just about enough to keep the toolserver running most of the time, but with more money it could be a lot more reliable (and faster).
How does this money flow?
the Verein purchases hardware itself using donations. as far as i'm aware there has been no work on grants or additional funding. usually, we have a general idea of what we need (new database, new web server, ...), then discuss an exact specification, look at prices and decide on a final list of purchases. we send these to the Verein and they incorporate it into the budget.
Does the toolserver have a budget of its own?
yes. but it's decided at the start of the year based on the minimum amount we need to survive for the entire year, plus a small discretionary budget for small hardware costs, software, etc.
A balance sheet that shows what you've used the money for?
if there is one, it's not public. this year, we have spent ~EUR2'000 on a new web server and ~EUR8'000 on a new database for s1 (enwiki), which also helps s3; the Verein separately spent ~20'000 EUR on a server to replicate Wikimedia uploads in Amsterdam. this used up most of the budget; we might be able to afford one or two additional servers, if necessary.
Is the server wholly owned by Wikimedia Deutschland?
yes, except the servers that are owned by Wikimedia (hemlock and amaranth, because they were donated by Sun, and vandale and zedler, because they were donated by Kennisnet).
They're hardly short of money, are they?
i don't know how you define "short of money". they don't have enough money to support everything we want to do.
Ah, always this WMF bottleneck. While we're waiting, could we send someone more along Werdna's path?
i don't think we would have much luck trying to get the WMF to employ someone just so we can make them a TS admin.
Are the current admins doing some tasks that can be delegated to unprivileged users?
nothing that takes a significant amount of time.
Are there some projects that only use database dumps,
it's quite likely, but i don't know of any. perhaps someone could start a list.
and no replicated data, that could run on a separate (low security) server with a more liberal admin policy?
this server would need to be completely separate from the existing Toolserver infrastructure, which would significantly increase admin workload. it would make more sense for it to be run by someone else, unrelated to the Toolserver. (as if it had its own admins, all we would be doing is providing hosting and hardware.)
Is there a budget, a balance sheet, a bank account? On the Toolserver Wiki, the donation page asks for donations to be sent to Wikimedia Deutschland or WMF. If both organizations have a problem to send money out of their countries, that seems suboptimal.
why? the Verein buys the hardware; they don't have to send the money to anyone. having hardware owned by the WMF is also no problem, as we work closely with them anyway (and we already use some of their hardware). what i want to avoid is having 20 chapters owning different bits of the hardware, and having to chase down some new person every time we need a broken disk replaced.
Our next chapter board meeting is on Monday evening, March 2. Do you have any suggestion I should bring forward?
i don't know what projects would help Swedish users specifically. some examples of things we are currently unable to buy would be:
- - 3 new databases to provide a redundant replica of each cluster. ~EUR8'000 each, EUR24'000 total. - - a ZWS license for stable, which would improve web hosting reliability by replacing our current software (switchboard). ~EUR1'300 - - a paid admin. ~EUR15'000/year
- river.
the Verein separately spent ~20'000 EUR on a server to replicate Wikimedia uploads in Amsterdam.
Oh! Is that server accessible from the toolserver cluster? It would make a few applications quite a bit easier/faster (I have a bot that detects animated GIFs, and one that extracts GPS data from the EXIF block in files (no, its not in the DB!))
Dschwen
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Schwen:
Is that server accessible from the toolserver cluster?
it's not set up yet. it might be feasible to make it accessible if there's sufficient demand, and if the WMF agrees.
- river.
Daniel Schwen wrote:
the Verein separately spent ~20'000 EUR on a server to replicate Wikimedia uploads in Amsterdam.
Oh! Is that server accessible from the toolserver cluster? It would make a few applications quite a bit easier/faster (I have a bot that detects animated GIFs, and one that extracts GPS data from the EXIF block in files (no, its not in the DB!))
Dschwen
Gmaxwell has a node with a copy of the images. Ask him for access.
I download there all commons uploads, if you need to be processing new files I could extract the data for you at the same time.
River Tarnell wrote:
- 3 new databases to provide a redundant replica of each cluster. ~EUR8'000 each, EUR24'000 total.
- a ZWS license for stable, which would improve web hosting reliability by replacing our current software (switchboard). ~EUR1'300
- a paid admin. ~EUR15'000/year
All of these are bigger than what Wikimedia Sverige can afford right now. We could have these numbers as goals for our fundraising during the year, and for a grant application in September, but that wouldn't solve the problem for 2009.
(i understand most of the problem are caused by people having a lack of time, which isn't really something we (= admins) can fix).
It's not a technical system administrator's job, but it's a project coordinator's job to recruit more volunteers as they are needed. It seems we might need more developers, not inventors for new projects, but support developers for existing "stable" projects. Even if they are volunteers, that might incur some travel costs. That's the level (tens or hundreds of euro, rather than thousands) where Wikimedia Sverige can help in the near term.
Besides being able to replicate databases from WMF (input), what is the needed server capacity from usage of the tools (output)? Will the toolserver need capacity upgrades soon? Do you know from statistics which tools require more capacity, and how can we economize with that? If we can postpone the next 8K euro investment by 6 months, that is worth some thinking.
Ah, always this WMF bottleneck. While we're waiting, could we send someone more along Werdna's path?
i don't think we would have much luck trying to get the WMF to employ someone just so we can make them a TS admin.
I have the impression that WMF has more money than they know how to spend. It seems they are slow or late to hire, and might need help in finding talented people. I could be wrong, though.
this server would need to be completely separate from the existing Toolserver infrastructure, which would significantly increase admin workload. it would make more sense for it to be run by someone else, unrelated to the Toolserver.
I agree in full. Perhaps the Norwegians could look at this.
On Sat, Feb 28, 2009 at 6:12 PM, River Tarnell < river@loreley.flyingparchment.org.uk> wrote:
and no replicated data, that could run on a separate (low security)
server
with a more liberal admin policy?
this server would need to be completely separate from the existing Toolserver infrastructure, which would significantly increase admin workload. it would make more sense for it to be run by someone else, unrelated to the Toolserver. (as if it had its own admins, all we would be doing is providing hosting and hardware.)
Wikiversity has some resources and projects which have something similar as a focus. It might do well to co-ordinate if this is something that could be considered seriously.
Gerald.
Hello, Am Saturday 28 February 2009 22:58:23 schrieb Lars Aronsson:
River Tarnell wrote:
i've read your mail twice, but i don't really understand what sort of answers you're looking for. you seem to be more interested in the people side of the problem, e.g. how we can get more people to contribute; is that correct?
My problem is a lack of speed. When I say "geotagging", I want geotagging to happen, but so far it is delayed by this and that. When I want to add interwiki links (using Multichill's suggester), the failed s3 replication delays a large portion of that work. I'm trying to find out what sides there are to this problem. I suspect there is a people side and a money side, and if we can solve these, the technical side will just be fun to solve. Maybe there is a legal side too.
there is a money side, a people side, a technical side and a time side, yes.
if you have specific technical questions/problems, i can answer those, or explain why things are as they are.
I have tons of these, but I'll save them for another thread.
as far as resources go: yes, we're short of money. we have just about enough to keep the toolserver running most of the time, but with more money it could be a lot more reliable (and faster).
How does this money flow? Does the toolserver have a budget of its own? A balance sheet that shows what you've used the money for? Is the server wholly owned by Wikimedia Deutschland? They're hardly short of money, are they?
the most parts of the toolserver-cluster is owned by the german verein (that's Wikimedia Deutschland). The verein gave the toolserver-people a buget for several things like new-hardware, hosting, repairs etc.. The buget has 2 problems in my eyes: It is too small to make the toolserver real stable (it's more or less enough to expand the slow ways as before) and the greatest part has to spend in december (that has to do with the date of the foundraising).
Are the current admins doing some tasks that can be delegated to unprivileged users?
non of I knew of.
Are there some projects that only use database dumps, and no replicated data, that could run on a separate (low security) server with a more liberal admin policy?
In theory: yes, but for pratice you would need another server too.
Yes, probably. Is there a budget, a balance sheet, a bank account? On the Toolserver Wiki, the donation page asks for donations to be sent to Wikimedia Deutschland or WMF. If both organizations have a problem to send money out of their countries, that seems suboptimal.
no, it is not, because both organisations own allready parts of the toolserver and we have to keep that allready in mind (legal problems for example (germany has another level of privacy-security then the US for example)).
But it is problematic for us to get hardware by other chapters if the hardware has to be in the ownership of that chaper (because then we would have to respect allready 3 differend law-systems or 4 or 5).
the main problem with having chapters buy stuff is most countries restrict how a non-profit organisation can transfer funds/assets, in most cases forbidding them to simply buy hardware and give it to the WMF or wikimedia.de. this means we could end up with 20 servers, each owned by a different chapter, which makes support/RMA/etc. a nightmare.
Like River says :).
Fortunately, the Swedish chapter doesn't have any such restrictions. (There's no tax exemption in Sweden anyway.) So far, we have been more active in other areas than fundraising, but we still asked if we could send $2K of our surplus to WMF. That request was turned down, so we're now looking for some other goal.
We'll probably support travel costs for the chapter meeting.
that's a nice idea (because you can't buy a server with 2K$ of corse).
Our next chapter board meeting is on Monday evening, March 2. Do you have any suggestion I should bring forward?
If you (your chapter) can collect money to help the toolserver in any level that would be great. If the swedish chapter can realy send money or hardware to the foundation or the german verein without legal problems that would be even greater.
But you have to know that servers are expensive: A single databaseserver is 8.000€ and beyond (and the hosting come on top of that).
Sincerly, DaB.
Lars Aronsson wrote:
I really want to get better geo tagging going in the Swedish Wikipedia. To this end, WikiMiniAtlas was activated some weeks ago. The activation went just fine. But the underlying database is not up to date. This is because Stefan Kühn is rewriting the script that extracts geo coordinates from the database dumps, and he has many other things to do. I don't blame him. He's doing a very good work, when he finds some time for it. But even if he had a solution ready, current database dumps might not be available.
Suppose database dumps were ready and WikiMiniAtlas was up to date, and I would have some idea for a contest among Swedish wikipedians to add more coordinates to articles with prizes being awarded to the top ten contributors. That's the kind of project for which I could apply for money. That money could cover some costs for the toolserver or for the ticket for somebody to go to the developer meeting. It would have many good side effects.
Of course, the WikiMiniAtlas is just one example of a very successful toolserver project that could generate such benefits.
Plan the contest to start by the time next dump is finished. Then run the contest until next dump. You don't even need the old dump to decide which geotags were added. But wikipedians will like a more-or-less up to date list of articles without geotags :)
Platonides wrote:
Plan the contest to start by the time next dump is finished. Then run the contest until next dump. You don't even need the old dump to decide which geotags were added. But wikipedians will like a more-or-less up to date list of articles without geotags :)
OK, let's go into detail on the maps.
The problem with adding geo coordinates now is that you're doing it in the blind. To verify that a coordinate is correct, you need to go to the right article, click on the coordinate, and then pick the map provider and see if the pin lands in the right place.
My very first look at WikiMiniAtlas, starting at the small town Grängesberg in central Sweden, revealed that (a) Grängesberg was not shown on that map, because the WikiMiniAtlas database is not up-to-date, and (b) next to it was shown a Norwegian town, which should be 6 degrees west but just happened to have the wrong coordinate -- in the Dutch Wikipedia. I would never have found and corrected that error without WikiMiniAtlas. And you can immediately see if the next town is missing from the map, and go to that article and add the coordinate. This sort of application is extremely useful. That's why I need it to work.
http://sv.wikipedia.org/wiki/Gr%C3%A4ngesberg
http://nl.wikipedia.org/w/index.php?title=Rollag&diff=15400179
Now, I fixed that coordinate in the Dutch Wikipedia a month ago, but the WikiMiniAtlas still shows Rollag next to where Grängesberg should be. That's not useful.
Stefan Kühn wrote that he digs out coordinates from database dumps. I know a lot about digging things out of database dumps and I doubt this is the best way for WikiMiniAtlas, since there are so many different ways that the coord template can be called. Just consider all the lat/lat_min parameters to infobox templates. Shouldn't you just dig out from the external links table, all the links to the stable.toolserver.org/geohack/ and parse the coordinates from those URLs?
And when WikiMiniAtlas is invoked from an article, such as Grängesberg, the call to the toolserver (that fetches the map tiles) could perhaps be used for updating the coordinate for that article. We can be pretty sure that those who update a coordinate will pop up the WikiMiniAtlas to see they made no mistake.
and I doubt this is the best way for WikiMiniAtlas, since there are so many different ways that the coord template can be called. Just consider all the lat/lat_min parameters to infobox templates. Shouldn't you just dig out from the external links table, all the links to the stable.toolserver.org/geohack/ and parse the coordinates from those URLs?
Yes I should, and I thought I already told you that this is what I'm doing for the english wikipedia and for commons. And I also set up a page explaining why this is quiite a lot of work, and how people can help me with this, if they want faster updates for their language. http://meta.wikimedia.org/wiki/WikiMiniAtlas/CoordinateProcessing I guess I'll have to make this more public....
And when WikiMiniAtlas is invoked from an article, such as Grängesberg, the call to the toolserver (that fetches the map tiles) could perhaps be used for updating the coordinate for that article. We can be pretty sure that those who update a coordinate will pop up the WikiMiniAtlas to see they made no mistake.
Yeah, that's not a bad idea. I even used to have a Javascript gadget that sent the data to the toolserver, whenever a coordinate in an article was updated. In any case, the red dot, is generated on the the client side and will always use the most up to date coordinates.
Dschwen
Daniel Schwen wrote:
Yes I should, and I thought I already told you that this is what I'm doing for the english wikipedia and for commons. And I also set up a page explaining why this is quiite a lot of work, and how people can help me with this, if they want faster updates for their language. http://meta.wikimedia.org/wiki/WikiMiniAtlas/CoordinateProcessing I guess I'll have to make this more public....
After reading it I still don't know what do you need/how to provide it :/
I have been poking WikiMiniAtlas and filed several related bugs, though.
And when WikiMiniAtlas is invoked from an article, such as Grängesberg, the call to the toolserver (that fetches the map tiles) could perhaps be used for updating the coordinate for that article. We can be pretty sure that those who update a coordinate will pop up the WikiMiniAtlas to see they made no mistake.
Yeah, that's not a bad idea. I even used to have a Javascript gadget that sent the data to the toolserver, whenever a coordinate in an article was updated. In any case, the red dot, is generated on the the client side and will always use the most up to date coordinates.
Dschwen
Doing that automatically on requests would allow anyone to subrepticiously change all coordinates. What you could do is make the requests with a different coordinate to trigger an update for that page.
toolserver-l@lists.wikimedia.org