Hello everyone,

Overall running the Wikipedia-OpenStreetMap server on the toolserver cluster (in particular on server Ptolemy) has worked reasonably well and hasn't needed too much maintenance attention recently anymore. It also seems to have handled serving map tiles to the OSM-gadget in various Wikipedia's including the German, Spanish and Russian Wikipedia amongst others pretty well and more recently also WIWOSM.

However, there remain a number of issues that have never been resolved entirely satisfactorily on Ptolemy

1) There is still a memory leak in tirex-master as well as a creep in CPU usage over time. This has to some degree been solved by simply restarting Tirex every 12 hours in a cron job, very much limiting the scope of the memory leak and to a lesser degree also the CPU usage creep. This however means that the request queues are dropped every time.

2) The socket between tirex and mod_tile / render_list  always gets closed before the the successful acknowledgment can be sent from tirex. This means the requester can't tell if the rendering was successful. In mod_tile this results in returning http 404 errors for tiles that need rendering, instead of returning the tiles that were rendered on the fly. In render_list I got around the issue by simply always assuming the render request was successful and reconnecting to the socket for each request.

3) The socket between tirex and mod_tile / render_list refuses connection for a (random) subset of connection requests. This results in quite a number of rendering requests from mod_tile being dropped as it can't connect to the tirex socket. In render_list, I could again work around the problem by sleeping for n seconds and then retrying the connection until it eventually succeeds.

4) The performance of postgresql is still below what can be expected from the server of the specs of Ptolemy. While clustering the rendering tables by the geometry column several months ago brought up the performance to a level that it can now more or less keep up with the limited re-rendering load put on the server from Wikipedia, low-zoom rendering is still exceptionally slow. Also it can barely  keep up with replication from the OSM servers, and not infrequently drops behind during busy times. Other servers with much slower I/O performance on the other hand seem to have no problem keeping up with diff-imports.

Non of these issues are directly critical, but would be nice to solve.



Next to the OpenStreetMap server hosted by the toolserver, the wikimedia foundation is planning on hosting their own OpenStreetMap tile-server, at least for the mobile wikipedia client, but presumably also for inclusion in the "standard" wikipedias that are currently served by the toolserver. If I understand correctly, they have already purchased the hardware and are awaiting provisioning until somewhat can puppetize the OSM tileserver rendering stack.

Secondly, if I understand it correctly, the toolserver cluster is slowly moving back from Solaris to Debian.

Now with the OpenStreetMap switch of license from CC-BY-SA to ODbL for the raw map data soon to be completed. (The database has now mostly been purged of data for which the OSMF did not get permission to relicense), my understanding is that the OSMF will likely recommend everyone to do a fresh import of a new ODbL licensed planet for legal purposes, rather than to apply the soon to be ODbL licensed diffs to a base CC-BY-SA database.

One question is, would this forced full re-import of the OpenStreetMap database, which last time took approximately 4 days, be a good opportunity to change things in the setup? For example could this be used to migrate from Solaris to Debian? Some of the above mentioned issues might be solved by a change of OS. Furthermore, it should be easier and better tested to upgrade the OSM rendering stack. Unfortunately key components in Debian Squeeze appear to be quite old, including postgres with version 8.4, mapnik with version 0.7 and the boost library (which would be necessary to compile mapnik 2.0) (They are sufficiently recent in Debian Sid, or Ubuntu 12.04), so that they would likely need back ports of self compilation.

However, an issue is that ptolemy is in productive use in several key wikipedia's serving the map tiles. So a replacement would be necessary before being able to take it down for upgrades. At least for the tiles (I am not sure what the requirement is for WIWOSM), the main issue is serving tiles. (Re-)rendering would be suspended for several days anyway during an import. If I understand it correctly, the tiles are currently stored on the toolserver SAN. All that would be needed would be a apache webserver with mod_tile installed.

So the main question is, would this be a good time to change things? Are there any other problems / issues that need fixing or improving? Or should we simply re-import a fresh ODbL planet into the existing setup, once the license change has completed?

Kai