On 19/11/10 11:58, River Tarnell wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
As I understand it, our database/tile server (ptolemy) is higher spec than the equivalent hardware at OSM.org, yet it performs much worse (e.g. at rendering tiles). Is this correct?
Yes, ptolemy's specs are likely to be higher than yevaud's (the osm.org tile server), especially the disk performance. ( http://wiki.openstreetmap.org/wiki/Servers/yevaud ). It was recently updated to 48Gb ram from 24Gb, which did have a positive effect, as disk performance for both tile serving as well as the database appeared to have become a bottleneck, but even with 24Gb it did fairly well.
In some sense, yes ptolemy is performing much worse than yevaud in that ptolemy manages to render only something between 0 and 50 or so metatiles per minute whereas yeavaud achieves about 3 - 6 metatiles per second. However, the big question is is it doing something comparable? I.e. is it a problem that the OS / DB isn't tuned optimally, or is it simply doing something much harder?
Another comparison might be the OpenCycleMap server ( http://tile.opencyclemap.org/munin/ ), which despite having SSDs for its db, also only achieves about 1 metatile/s and can't really keep up, potentially due to a more complex stylesheet.
If so, has anyone compared the indices on ptolemy's database to OSM's?
I don't know for sure, but I am reasonably certain that yevaud's database has no additional indices beyond what osm2pgsql creates. So I don't think indices are the problem, if the workload is comparable.
One thing I do vaguely remember Jon once mentioning is that I think he once experimented with CLUSTERing on the geometry index, which physically moves data around to be alligned with the index and thus attempts to reduce seeking on range queries like bounding box requests. I also vaguely remember though that he said it didn't help all that much, but I don't know any details or if it is still the case.
The biggest impact though appears to be the distribution of low zoom to high zoom tiles and the style sheet used.
Whereas Z18 tiles are rendered in about a second, Z7 tiles can take more than 10 minutes on ptolemy according to tirex status. I don't have the numbers for yevaud, but this seems about the same too (perhaps 20 - 30% faster at most), at least by judging from /dirty a tile and seeing at what point the /status updates.
Ptolemy is rendering a lot more low zoom tiles than yevaud it seems from looking at tirex status. Osm.org currently basically never renders tiles for zooms lower than Z11, perhaps only once every couple of months on a full new db import, whereas ptolemy is currently occupied with lowzoom tiles a lot of the time. This can either be because the expiry policy of lowzoom tiles is still more aggressive on ptolemy, or simply as there are so many more style sheets, which each need the low zoom tiles rendered.
Equally different styles can make quite a big difference, as a single "carelessly" thrown in feature or layer can slow down the db querries a lot. So perhaps the various other style sheets rendered on ptolemy aren't as optimised as the main osm.org one?
Therefore I don't think it is necessarily obvious that ptolemy is actually performing worse on the db level although it may well be the case. However, I also don't really know how best to determine this. Perhaps suppressing lowzoom rendering altogether for a while to see how much it this helps?
It also might be useful to log all the slow postgresql querries (e.g. that take more than 20 seconds to execute). This might point to a few optimisations in the style-sheets, to get the low zoom tiles out of the way faster.
Another question might be, is it actually necessary to have a minutely uptodate database on ptolemy if it can't really keep up with rendering anyway? Would it perhaps be sufficient to use daily diffs for the purpose of wikipedia? This might help reduce the load of the actual db import process, as well as potentially limit the tile expiry and rerender to once a day.
Kai
If that is not the problem, I would like to test performance without VxVM between the filesystem and the disk. While Vx doesn't hurt performance with MySQL, I noticed during testing that it significantly reduced import performance with Postgres. I believe that was fixed by putting pg_xlog on a separate (non-Vx) disk, but it may still be hurting read performance.
Testing this will require some downtime for conversion; based on the amount of data, I would estimate about 8 hours to copy the data off and back again.
- river.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD)
iEYEARECAAYFAkzmWEkACgkQIXd7fCuc5vLqAQCguzjEGzMXZTcRfQFKKISsw0hI 8ggAoMDmU+HOp4VPZBIp9SBuWrdY/Ua7 =TpV4 -----END PGP SIGNATURE-----
Maps-l mailing list Maps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/maps-l