[Maps-l] ptolemy performance

Fri Nov 19 22:19:16 UTC 2010

On 19/11/10 11:58, River Tarnell wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> As I understand it, our database/tile server (ptolemy) is higher spec than the
> equivalent hardware at OSM.org, yet it performs much worse (e.g. at rendering
> tiles).  Is this correct?

Yes, ptolemy's specs are likely to be higher than yevaud's (the osm.org 
tile server), especially the disk performance. ( 
http://wiki.openstreetmap.org/wiki/Servers/yevaud ). It was recently 
updated to 48Gb ram from 24Gb, which did have a positive effect, as disk 
performance for both tile serving as well as the database appeared to 
have become a bottleneck, but even with 24Gb it did fairly well.

In some sense, yes ptolemy is performing much worse than yevaud in that 
ptolemy manages to render only something between 0 and 50 or so 
metatiles per minute whereas yeavaud achieves about 3 - 6 metatiles per 
second. However, the big question is is it doing something comparable? 
I.e. is it a problem that the OS / DB isn't tuned optimally, or is it 
simply doing something much harder?

Another comparison might be the OpenCycleMap server ( 
http://tile.opencyclemap.org/munin/ ), which despite having SSDs for its 
db, also only achieves about 1 metatile/s and can't really keep up, 
potentially due to a more complex stylesheet.

>
> If so, has anyone compared the indices on ptolemy's database to OSM's?

I don't know for sure, but I am reasonably certain that yevaud's 
database has no additional indices beyond what osm2pgsql creates. So I 
don't think indices are the problem, if the workload is comparable.

One thing I do vaguely remember Jon once mentioning is that I think he 
once experimented with CLUSTERing on the geometry index, which 
physically moves data around to be alligned with the index and thus 
attempts to reduce seeking on range queries like bounding box requests. 
I also vaguely remember though that he said it didn't help all that 
much, but I don't know any details  or if it is still the case.

The biggest impact though appears to be the distribution of low zoom to 
high zoom tiles and the style sheet used.

Whereas Z18 tiles are rendered in about a second, Z7 tiles can take more 
than 10 minutes on ptolemy according to tirex status. I don't have the 
numbers for yevaud, but this seems about the same too (perhaps 20 - 30% 
faster at most), at least by judging from /dirty a tile and seeing at 
what point the /status updates.

Ptolemy is rendering a lot more low zoom tiles than yevaud it seems from 
looking at tirex status. Osm.org currently basically never renders tiles 
for zooms lower than Z11, perhaps only once every couple of months on a 
full new db import, whereas ptolemy is currently occupied with lowzoom 
tiles a lot of the time. This can either be because the expiry policy of 
lowzoom tiles is still more aggressive on ptolemy, or simply as there 
are so many more style sheets, which each need the low zoom tiles rendered.

Equally different styles can make quite a big difference, as a single 
"carelessly" thrown in feature or layer can slow down the db querries a 
lot. So perhaps the various other style sheets rendered on ptolemy 
aren't as optimised as the main osm.org one?

Therefore I don't think it is necessarily obvious that ptolemy is 
actually performing worse on the db level although it may well be the 
case. However, I also don't really know how best to determine this. 
Perhaps suppressing lowzoom rendering altogether for a while to see how 
much it this helps?

It also might be useful to log all the slow postgresql querries (e.g. 
that take more than 20 seconds to execute). This might point to a few 
optimisations in the style-sheets, to get the low zoom tiles out of the 
way faster.

Another question might be, is it actually necessary to have a minutely 
uptodate database on ptolemy if it can't really keep up with rendering 
anyway? Would it perhaps be sufficient to use daily diffs for the 
purpose of wikipedia? This might help reduce the load of the actual db 
import process, as well as potentially limit the tile expiry and 
rerender to once a day.

Kai

>
> If that is not the problem, I would like to test performance without VxVM
> between the filesystem and the disk.  While Vx doesn't hurt performance with
> MySQL, I noticed during testing that it significantly reduced import
> performance with Postgres.  I believe that was fixed by putting pg_xlog on a
> separate (non-Vx) disk, but it may still be hurting read performance.
>
> Testing this will require some downtime for conversion; based on the amount of
> data, I would estimate about 8 hours to copy the data off and back again.
>
> 	- river.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.16 (FreeBSD)
>
> iEYEARECAAYFAkzmWEkACgkQIXd7fCuc5vLqAQCguzjEGzMXZTcRfQFKKISsw0hI
> 8ggAoMDmU+HOp4VPZBIp9SBuWrdY/Ua7
> =TpV4
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Maps-l mailing list
> Maps-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/maps-l