Hello everyone,
After just writing last week that everything is running stable on the OpenStreetMap tile rendering server, the next day, the postgresql database seems to have gotten corrupted and is only partially functional anymore.
Due to the database corruption, replication (diff imports) are suspended for the moment and most rendering of new map tiles is disabled as well. This will effect both WIWOSM and the osm tiles showen in the osm_gadget. Any changes that occurred in the last 3 days or so as well future updates to the OSM database won't show up in either until the toolserver-osm database can be fixed again. This is unfortunately likely going to take a few days, if not weeks should a full new import be necessary due to the corruption.
More technically:
Initially queries seemed to fail every couple of hours with the error message
"DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. The connection to the server was lost. Attempting reset: Failed."
But otherwise more or less "work".
Now though diff imports quickly fail, and e.g. the query "SELECT * FROM planet_ways WHERE id = 67780465;" consistently results in the error "unexpected chunk number 0 (expected 1) for toast value 28214399 in pg_toast_3406700" which sounds like definite database corruption.
I have no idea what caused the issues, but they seemed to start in the evening of the 23rd.
It appears that the planet_ways table definitely has problems, but I don't know if the rendering tables are corrupt as well. Potentially this means that processes that work on the other tables still work, although I'd treat them with care, as it may well be that they are corrupt as well.
i will see if reindexing the tables or vacuuming them will help to recover the corruption.
If not, then a full reimport will presumably be necessary.
This is a bit of an awkward time for this, as due to the license change in OSM, there are no up-to-date planet files at the moment (last one was 3 weeks ago) and will likely not resume until the license change is over. When that will be is not yet entirely clear. So that might add another couple of weeks of delay until a new import can take place.
Hopefully this can be resolved as soon as possible, but it will likely be an inconvenience to WIWOSM and tile rendering.
Kai