-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
As I understand it, our database/tile server (ptolemy) is higher spec than the equivalent hardware at OSM.org, yet it performs much worse (e.g. at rendering tiles). Is this correct?
If so, has anyone compared the indices on ptolemy's database to OSM's?
If that is not the problem, I would like to test performance without VxVM between the filesystem and the disk. While Vx doesn't hurt performance with MySQL, I noticed during testing that it significantly reduced import performance with Postgres. I believe that was fixed by putting pg_xlog on a separate (non-Vx) disk, but it may still be hurting read performance.
Testing this will require some downtime for conversion; based on the amount of data, I would estimate about 8 hours to copy the data off and back again.
- river.
Hello, if downtime would also effect the delivery of existing tiles, we could close the map in german geohack for this time, because it's seems the smallest lost for users and a lot of requests comes from there.
If downtime would only influence the rendering, a downtime over european night seem no bigger problem for me. Mostly we rendering only dirty files and have also fallback to OSM.org tiles.
I'm not sure what with the comparison with osm.org, because it's difficult to compare the different renderd/tirex statistics.
Greetings Kolossos
Zitat von River Tarnell river.tarnell@wikimedia.de:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
As I understand it, our database/tile server (ptolemy) is higher spec than the equivalent hardware at OSM.org, yet it performs much worse (e.g. at rendering tiles). Is this correct?
If so, has anyone compared the indices on ptolemy's database to OSM's?
If that is not the problem, I would like to test performance without VxVM between the filesystem and the disk. While Vx doesn't hurt performance with MySQL, I noticed during testing that it significantly reduced import performance with Postgres. I believe that was fixed by putting pg_xlog on a separate (non-Vx) disk, but it may still be hurting read performance.
Testing this will require some downtime for conversion; based on the amount of data, I would estimate about 8 hours to copy the data off and back again.
- river.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD)
iEYEARECAAYFAkzmWEkACgkQIXd7fCuc5vLqAQCguzjEGzMXZTcRfQFKKISsw0hI 8ggAoMDmU+HOp4VPZBIp9SBuWrdY/Ua7 =TpV4 -----END PGP SIGNATURE-----
Maps-l mailing list Maps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/maps-l
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tim Alder:
if downtime would also effect the delivery of existing tiles, we could close the map in german geohack for this time, because it's seems the smallest lost for users and a lot of requests comes from there.
Yes, everything would be offline during the maintenance. It would be done during early morning UTC.
- river.
I transfer now the request from german geohack to osm.org. We get now with 10 requests/s only 1/4 than before.
So you can start with maintaining. If you start I would react and shutdown the maps in Wikipedia, because without Wikipedia-Overlay it makes IMO not so much sense.
Greetings Kolossos
River Tarnell schrieb:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tim Alder:
if downtime would also effect the delivery of existing tiles, we could close the map in german geohack for this time, because it's seems the smallest lost for users and a lot of requests comes from there.
Yes, everything would be offline during the maintenance. It would be done during early morning UTC.
- river.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD)
iEYEARECAAYFAkzmahsACgkQIXd7fCuc5vK/xwCfSTggzFzeCu9Y80cpZRUlwIum 1gEAn3syhm8v2G9mZeWjdux2lZWyASAM =rGbn -----END PGP SIGNATURE-----
Maps-l mailing list Maps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/maps-l
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tim Alder:
I transfer now the request from german geohack to osm.org. We get now with 10 requests/s only 1/4 than before.
There's no need to do this right now. I'm only discussing making a change, there's no plan to actually do anything.
- river.
Hello, my main interest was also to see the effect to the requests and the queue. Now we are back with toolserver-tiles on geohack.
----
For analysing I take me: http://toolserver.org/~mazder/tirex-status/?short=0&extended=0&refre... and create some diagrams in OpenOffice Calc: http://toolserver.org/~kolossos/docs/tirex-stat02.xls
There it is possible to see rendertime per layer, render time per zoomlevel and average-render- duration per metatile. So I would say that we have no problem in z0-z10 and z14-z18. In z11-z13 we have a explosion of render time. The half of render time comes from default style, the biggest part of the other half comes from the next 10 popular styles. The other 280 styles have in the moment nearly no incluence, but off-course we want locale-maps in higher zoomlevel in different Wikipedias what seems not possible with technic we have today. It's also detectable that special overlays like "lightning" rendering much faster than full styles (no wonder) and should be to prefer.
Greetings Kolossos
River Tarnell schrieb:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tim Alder:
I transfer now the request from german geohack to osm.org. We get now with 10 requests/s only 1/4 than before.
There's no need to do this right now. I'm only discussing making a change, there's no plan to actually do anything.
- river.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD)
iEYEARECAAYFAkznU0IACgkQIXd7fCuc5vJiTACfYTZ6VajUY2PfPs3r/65HSYN2 rVEAn2K5aLPSXgp+t56Cf4qSY2/JQdBu =CVSH -----END PGP SIGNATURE-----
Maps-l mailing list Maps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/maps-l
Hi Tim,
For analysing I take me: http://toolserver.org/~mazder/tirex-status/?short=0&extended=0&refre... and create some diagrams in OpenOffice Calc: http://toolserver.org/~kolossos/docs/tirex-stat02.xls
Thanks for those graphs!
There it is possible to see rendertime per layer, render time per zoomlevel and average-render- duration per metatile. So I would say that we have no problem in z0-z10 and z14-z18. In z11-z13 we have a explosion of render time. The half of render time comes from default style, the biggest part of the other half comes from the next 10 popular styles. The other 280 styles have in the moment nearly no incluence, but off-course we want locale-maps in higher zoomlevel in different Wikipedias what seems not possible with technic we have today. It's also detectable that special overlays like "lightning" rendering much faster than full styles (no wonder) and should be to prefer.
On a side note: bw-noicons needs double the time for rendering as osm-no-lables - shouldn't they be more-a-less equal in cpu-time?
[The following is only about 'full'-styles]
Looking at Zoom 12: http://www.openstreetmap.org/?lat=49.4592&lon=10.9644&zoom=12&la...
Do we really need a weekly update on that zoom? What you can see is: - main streets (secondary and up) and trains - forests and fields - names of villages and cities
IMHO those things don't change often enough to let them jam the queue. Rendering them once a month would be ok.
Secondly: Is it correct, that tiles never get deleted? So every tile gets rerendered according to the timetable, no matter if it was requested from a user during that period or not? So maybe a deletion strategy would be handy..
Regards, Thomas
Am 26.11.2010 09:32, schrieb Thomas Ineichen:
On a side note: bw-noicons needs double the time for rendering as osm-no-lables - shouldn't they be more-a-less equal in cpu-time?
Yes I also had the feeling that the bw styles need very long to render but I don't have any idea why.
[The following is only about 'full'-styles]
Looking at Zoom 12: http://www.openstreetmap.org/?lat=49.4592&lon=10.9644&zoom=12&la...
Do we really need a weekly update on that zoom? What you can see is:
- main streets (secondary and up) and trains
- forests and fields
- names of villages and cities
IMHO those things don't change often enough to let them jam the queue. Rendering them once a month would be ok.
The dirty plans (which zoom and when) were wrong, I see that now. Currently I'm re-marking every tile in z0-12 as clean. I want the tirex to catch up again. Then we can discuss how to go on for that. Kai Krueger has found some issues with the dirty section in the load-next script as well. I don't have time to work on these things but as osm is a MMP this should not be a problem ;)
Secondly: Is it correct, that tiles never get deleted? So every tile gets rerendered according to the timetable, no matter if it was requested from a user during that period or not? So maybe a deletion strategy would be handy..
Deleting is remove any fallback. Deleting a tile is never a solution. We don't render tiles after the timetable (except z0-6) but mark them as dirty. They'll get rerendered when accessed next time via mod_tile (while mod_tile serves out the old tile from disk).
Peter
There's another thing going on here.
master_rendering_timeout was set to 10 minutes, so rendering jobs that took 11 minutes were killed and restarted so they never finished. I increased this to 15 minutes now, restarted tirex and added the z0-6 tiles for all styles, just to see how long they'll take to render.
Peter
Hi Peter,
master_rendering_timeout was set to 10 minutes, so rendering jobs that took 11 minutes were killed and restarted so they never finished. I increased this to 15 minutes now, restarted tirex and added the z0-6 tiles for all styles, just to see how long they'll take to render.
I see that right now it's rendering my qa- and qai-styles. I changed them some days ago to not render anything from z0-z9 (maybe I'll even put that up to z13 or something) - so if you reload the styles from ~ti/styles, the queue should be a lot shorter/faster.
Regards, Thomas
Am 26.11.2010, 10:14 Uhr, schrieb Peter Körner osm-lists@mazdermind.de:
Yes I also had the feeling that the bw styles need very long to render but I don't have any idea why.
bw-mapnik is (bugfree assumed) 100% identical to the mapnik map except for the colour attributes, so I would expect the identcal time to render.
bw-noicons just omits some symbols, so - for low zoom tiles without symbols - is also identical to mapnik.
Kay
Peter Körner schrieb:
Currently I'm re-marking every tile in z0-12 as clean. I want the tirex to catch up again. Then we can discuss how to go on for that.
Our Queue is now empty. Could we reactivate expire-on-change for z13-18 so that we don't get to much old tiles? I requested a membership to OSM-mmp, so I can perhaps help a little bit later.
Greeting Kolossos
Hi Tim,
i activated expire-on-change for z13-18 now. I also added some improvements made by Kai Krueger. I'll comment on your jira ticket soon.
Peter
Am 27.11.2010 21:18, schrieb Tim Alder:
Peter Körner schrieb:
Currently I'm re-marking every tile in z0-12 as clean. I want the tirex to catch up again. Then we can discuss how to go on for that.
Our Queue is now empty. Could we reactivate expire-on-change for z13-18 so that we don't get to much old tiles? I requested a membership to OSM-mmp, so I can perhaps help a little bit later.
Greeting Kolossos
Maps-l mailing list Maps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/maps-l
Hi all,
Am 26.11.2010, 09:32 Uhr, schrieb Thomas Ineichen toolserver.mailinglist@t-i.ch:
On a side note: bw-noicons needs double the time for rendering as osm-no-lables - shouldn't they be more-a-less equal in cpu-time?
The b/w styles do show lables. Both show at least street names - the osm-no-labels style does not.
However, I noticed a bug that bw-noicons does also render text labels of shops, which should be omitted as icons are.
Regards, Kay
Thomas Ineichen schrieb:
Looking at Zoom 12: http://www.openstreetmap.org/?lat=49.4592&lon=10.9644&zoom=12&la...
Do we really need a weekly update on that zoom? What you can see is:
- main streets (secondary and up) and trains
- forests and fields
- names of villages and cities
IMHO those things don't change often enough to let them jam the queue. Rendering them once a month would be ok.
I think in countries like Germany there will be not often such large changes in this zoom level, but I believe in other areas it's possible to be the first mapper on a motorway. So this differences should we have in mind but I believe a montly update should be enough for this zoom-levels. We can adjust it so that the server has enough to do. For the database I'm glad that we have updates each minute because we use this database also for other tools.
Secondly: Is it correct, that tiles never get deleted? So every tile gets rerendered according to the timetable, no matter if it was requested from a user during that period or not? So maybe a deletion strategy would be handy..
It's true that we not deleting tiles in the moment. This seems only necessary if the harddisks will be nearly full (In the moment we use 68% of the /osm-space[1]). So we have some months to think about a good strategy. If necessary, I would delete rare used z=16-18 tiles (reasons: cheap to render, a lot of tiles, improbable that somebody else need it).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tim Alder:
It's true that we not deleting tiles in the moment. This seems only necessary if the harddisks will be nearly full (In the moment we use 68% of the /osm-space[1]). So we have some months to think about a good strategy.
There is still 600GB allocated to /sql, of which only 214GB is used. So there is ~424GB of free space which can be allocated to tiles or SQL as needed.
- river.
Tim Alder schrieb:
Hello, if downtime would also effect the delivery of existing tiles, we could close the map in german geohack for this time, because it's seems the smallest lost for users and a lot of requests comes from there.
If downtime would only influence the rendering, a downtime over european night seem no bigger problem for me. Mostly we rendering only dirty files and have also fallback to OSM.org tiles.
I'm not sure what with the comparison with osm.org, because it's difficult to compare the different renderd/tirex statistics.
Greetings Kolossos
Are you talking about javascript at the wikipedia? Note that they are cached and changes aren't immediatly noticed by all clients.
Platonides schrieb:
Tim Alder schrieb:
Hello, if downtime would also effect the delivery of existing tiles, we could close the map in german geohack for this time, because it's seems the smallest lost for users and a lot of requests comes from there.
If downtime would only influence the rendering, a downtime over european night seem no bigger problem for me. Mostly we rendering only dirty files and have also fallback to OSM.org tiles.
I'm not sure what with the comparison with osm.org, because it's difficult to compare the different renderd/tirex statistics.
Greetings Kolossos
Are you talking about javascript at the wikipedia? Note that they are cached and changes aren't immediatly noticed by all clients.
I'm talking about: http://de.wikipedia.org/wiki/Hilfe:OpenStreetMap/en It's embedded with JavaScript in MediaWiki but I can react with php-script behind the loaded iframe. So it would works immediately.
Greeting Kolossos
On 19/11/10 11:58, River Tarnell wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
As I understand it, our database/tile server (ptolemy) is higher spec than the equivalent hardware at OSM.org, yet it performs much worse (e.g. at rendering tiles). Is this correct?
Yes, ptolemy's specs are likely to be higher than yevaud's (the osm.org tile server), especially the disk performance. ( http://wiki.openstreetmap.org/wiki/Servers/yevaud ). It was recently updated to 48Gb ram from 24Gb, which did have a positive effect, as disk performance for both tile serving as well as the database appeared to have become a bottleneck, but even with 24Gb it did fairly well.
In some sense, yes ptolemy is performing much worse than yevaud in that ptolemy manages to render only something between 0 and 50 or so metatiles per minute whereas yeavaud achieves about 3 - 6 metatiles per second. However, the big question is is it doing something comparable? I.e. is it a problem that the OS / DB isn't tuned optimally, or is it simply doing something much harder?
Another comparison might be the OpenCycleMap server ( http://tile.opencyclemap.org/munin/ ), which despite having SSDs for its db, also only achieves about 1 metatile/s and can't really keep up, potentially due to a more complex stylesheet.
If so, has anyone compared the indices on ptolemy's database to OSM's?
I don't know for sure, but I am reasonably certain that yevaud's database has no additional indices beyond what osm2pgsql creates. So I don't think indices are the problem, if the workload is comparable.
One thing I do vaguely remember Jon once mentioning is that I think he once experimented with CLUSTERing on the geometry index, which physically moves data around to be alligned with the index and thus attempts to reduce seeking on range queries like bounding box requests. I also vaguely remember though that he said it didn't help all that much, but I don't know any details or if it is still the case.
The biggest impact though appears to be the distribution of low zoom to high zoom tiles and the style sheet used.
Whereas Z18 tiles are rendered in about a second, Z7 tiles can take more than 10 minutes on ptolemy according to tirex status. I don't have the numbers for yevaud, but this seems about the same too (perhaps 20 - 30% faster at most), at least by judging from /dirty a tile and seeing at what point the /status updates.
Ptolemy is rendering a lot more low zoom tiles than yevaud it seems from looking at tirex status. Osm.org currently basically never renders tiles for zooms lower than Z11, perhaps only once every couple of months on a full new db import, whereas ptolemy is currently occupied with lowzoom tiles a lot of the time. This can either be because the expiry policy of lowzoom tiles is still more aggressive on ptolemy, or simply as there are so many more style sheets, which each need the low zoom tiles rendered.
Equally different styles can make quite a big difference, as a single "carelessly" thrown in feature or layer can slow down the db querries a lot. So perhaps the various other style sheets rendered on ptolemy aren't as optimised as the main osm.org one?
Therefore I don't think it is necessarily obvious that ptolemy is actually performing worse on the db level although it may well be the case. However, I also don't really know how best to determine this. Perhaps suppressing lowzoom rendering altogether for a while to see how much it this helps?
It also might be useful to log all the slow postgresql querries (e.g. that take more than 20 seconds to execute). This might point to a few optimisations in the style-sheets, to get the low zoom tiles out of the way faster.
Another question might be, is it actually necessary to have a minutely uptodate database on ptolemy if it can't really keep up with rendering anyway? Would it perhaps be sufficient to use daily diffs for the purpose of wikipedia? This might help reduce the load of the actual db import process, as well as potentially limit the tile expiry and rerender to once a day.
Kai
If that is not the problem, I would like to test performance without VxVM between the filesystem and the disk. While Vx doesn't hurt performance with MySQL, I noticed during testing that it significantly reduced import performance with Postgres. I believe that was fixed by putting pg_xlog on a separate (non-Vx) disk, but it may still be hurting read performance.
Testing this will require some downtime for conversion; based on the amount of data, I would estimate about 8 hours to copy the data off and back again.
- river.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD)
iEYEARECAAYFAkzmWEkACgkQIXd7fCuc5vLqAQCguzjEGzMXZTcRfQFKKISsw0hI 8ggAoMDmU+HOp4VPZBIp9SBuWrdY/Ua7 =TpV4 -----END PGP SIGNATURE-----
Maps-l mailing list Maps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/maps-l
Hi,
In some sense, yes ptolemy is performing much worse than yevaud in that ptolemy manages to render only something between 0 and 50 or so metatiles per minute whereas yeavaud achieves about 3 - 6 metatiles per second. However, the big question is is it doing something comparable? I.e. is it a problem that the OS / DB isn't tuned optimally, or is it simply doing something much harder?
Looking at the Tirex-statistics for ptolemy the following question comes to my mind:
Does Tirex really render meta-tiles?
It rather looks like every tile (256x256) is rendered on it's own. According to Frederik, Yevaud renders a meta-tile of 8x8 'normal' tiles per request. If so, that could explain a lot of the different rendering output.
Regards, Thomas
I'm sure that it is not so. If you look to the x and y values in the tirex-status you can see that they are divisible by 8, so that are metatiles. Our problem is more that we too often needs 5-10 minutes per metatiles.
Greetings Kolossos
Thomas Ineichen schrieb:
Hi,
In some sense, yes ptolemy is performing much worse than yevaud in that ptolemy manages to render only something between 0 and 50 or so metatiles per minute whereas yeavaud achieves about 3 - 6 metatiles per second. However, the big question is is it doing something comparable? I.e. is it a problem that the OS / DB isn't tuned optimally, or is it simply doing something much harder?
Looking at the Tirex-statistics for ptolemy the following question comes to my mind:
Does Tirex really render meta-tiles?
It rather looks like every tile (256x256) is rendered on it's own. According to Frederik, Yevaud renders a meta-tile of 8x8 'normal' tiles per request. If so, that could explain a lot of the different rendering output.
Regards, Thomas
Maps-l mailing list Maps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/maps-l
I'm sorry I hadn't had any time for these and many other things, but I got a lot to do at and it will stay that way some weeks like that.
I disabled the tile expiry so that the server can catch up with the tile queues. Our expiry plan was like that:
re-render z0-6 on the 1st of each month expire z7-9 on the 1st of each month expire z10 on the 8th and 22th of each month expire z11 each sunday at 20 o' clock expire z12 each sunday at 20 o' clock expire z13-18 during the database import as the data changes
I disabled everything now. All of this can be managed in the crontab of the osm project on ptolemy except the expire-on-change which is controlled by a section in /home/project/o/s/m/osm/tools/diff-import/load-next (look for "expiring tiles").
All these rules are just guesses and it seems they were not so good, so if anyone has a better plan, please post it.
Peter
Am 19.11.2010 11:58, schrieb River Tarnell:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
As I understand it, our database/tile server (ptolemy) is higher spec than the equivalent hardware at OSM.org, yet it performs much worse (e.g. at rendering tiles). Is this correct?
If so, has anyone compared the indices on ptolemy's database to OSM's?
If that is not the problem, I would like to test performance without VxVM between the filesystem and the disk. While Vx doesn't hurt performance with MySQL, I noticed during testing that it significantly reduced import performance with Postgres. I believe that was fixed by putting pg_xlog on a separate (non-Vx) disk, but it may still be hurting read performance.
Testing this will require some downtime for conversion; based on the amount of data, I would estimate about 8 hours to copy the data off and back again.
- river.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD)
iEYEARECAAYFAkzmWEkACgkQIXd7fCuc5vLqAQCguzjEGzMXZTcRfQFKKISsw0hI 8ggAoMDmU+HOp4VPZBIp9SBuWrdY/Ua7 =TpV4 -----END PGP SIGNATURE-----
Maps-l mailing list Maps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/maps-l