Hello, we haved now reached with the re-rendering the point z-curve=0.25. So we are now rendering the emptiness of north norway. But this has no visible effect on the rendering speed. The rendering performance correlate very good with the queue length. I found now the parameter -num in tirex-batch to limit the queue length. I will restart the process for tiles with z>0.25 and limit the queue to a value of around 5.000.
"vacuum full" is now running more than 1 day. If it's not done tomorrow, I believe I will stop it. Ok?
Greetings Kolossos
Hi all,
Am 08.09.2011, 17:18 Uhr, schrieb Tim Alder tim.alder@s2002.tu-chemnitz.de:
The rendering performance correlate very good with the queue length.
Do you mean this graph: http://munin.toolserver.org/OSM/ptolemy/tirex_status_queued_requests.html
I found now the parameter -num in tirex-batch to limit the queue length. I will restart the process for tiles with z>0.25 and limit the queue to a value of around 5.000.
I fail to see the point (sorry for my tone being provocative, I really don't intend to be rude). Currently the only effect I have is: my changes are not being rendered. This is probably because the queue is not respecting a first-come-first-served order, otherwise the max-tile age would not be so high. And because of this I can see no difference if my changes are not rendered because there are constantly 5000 or growing 558879 (as of now) in the queue.
How about limiting it to *1* (or maybe 2) in order to give other tiles a chance, too?
Why do you fill the queue quicker than it can be rendered? Just adding a delay to slow it down would not harm, would it?
Sorry for whining, I just want *some* tiles rendered! :-))) (And no, it has nothing to do with the replag being horrible, too http://munin.toolserver.org/OSM/ptolemy/replication_delay2.html)
The real solution would of course to fill a separate queue (back- ground) with the automatic rerendering and give missing and dirty tiles a higher priority. And manually dirty marked tiles should be the same priority as missing, btw...
Greetings, Kay
Hello Kay, take an additionally look to the monthly render curve: http://munin.toolserver.org/OSM/ptolemy/tirex_status_requests_rendered.html
It's continuously decreasing curve like a perfect e-function. I have also no idea why the rendering-performance should depend on queue-length but that the only correlation I see. Or why is the rendering of tiles in Alaska much faster than in Norway?
If you compare with the performance of last year with 15 tiles/min so is the actual performance with 30 not bad, but 120 tiles/min like on the start of the process would be much better. (Also for your tiles.)
120 tiles/min would meet the performance we should expect from a hi-performance server like ptolemy.
As explained before, we make now a full re-rendering, after this we can rerender only expired tiles which should be much faster (<1 day).
Greetings Kolossos
Am 08.09.2011 18:51, schrieb Kay Drangmeister:
Hi all,
Am 08.09.2011, 17:18 Uhr, schrieb Tim Aldertim.alder@s2002.tu-chemnitz.de:
The rendering performance correlate very good with the queue length.
Do you mean this graph: http://munin.toolserver.org/OSM/ptolemy/tirex_status_queued_requests.html
I found now the parameter -num in tirex-batch to limit the queue length. I will restart the process for tiles with z>0.25 and limit the queue to a value of around 5.000.
I fail to see the point (sorry for my tone being provocative, I really don't intend to be rude). Currently the only effect I have is: my changes are not being rendered. This is probably because the queue is not respecting a first-come-first-served order, otherwise the max-tile age would not be so high. And because of this I can see no difference if my changes are not rendered because there are constantly 5000 or growing 558879 (as of now) in the queue.
How about limiting it to *1* (or maybe 2) in order to give other tiles a chance, too?
Why do you fill the queue quicker than it can be rendered? Just adding a delay to slow it down would not harm, would it?
Sorry for whining, I just want *some* tiles rendered! :-))) (And no, it has nothing to do with the replag being horrible, too http://munin.toolserver.org/OSM/ptolemy/replication_delay2.html)
The real solution would of course to fill a separate queue (back- ground) with the automatic rerendering and give missing and dirty tiles a higher priority. And manually dirty marked tiles should be the same priority as missing, btw...
Greetings, Kay
Maps-l mailing list Maps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/maps-l
Hi Kolossos,
http://munin.toolserver.org/OSM/ptolemy/tirex_status_queued_requests.html
What is filling the "dirty" queue? The re-render should only fill the "systema" queue, shouldn't it? Currently the "dirty" queue is 11k entries.
Next question, what is filling the bulk queue (4k entries)? :-)
http://toolserver.org/~mazder/tirex-status/?short=1&extended=0&refre...
Kind regards, Kay
Hi,
http://toolserver.org/~mazder/tirex-status/?short=1&extended=0&refre...
how is the age being calculated in the queues?
Queue: Prio Size Maxsize Age 1 0 146 2 497 5138 0:00-3:21 4 11173 11173 1064:46-950:48 10 4108 4135 38:50-1126:05
all 15778 15836 0:00-1126:05
(i.e. how can it be that (prio 4) the newest tile is older than the oldest?) :)
Cheers, Kay
Am 10.09.2011 01:44, schrieb Kay Drangmeister:
Hi Kolossos,
http://munin.toolserver.org/OSM/ptolemy/tirex_status_queued_requests.html
What is filling the "dirty" queue? The re-render should only fill the "systema" queue, shouldn't it? Currently the "dirty" queue is 11k entries.
mod_tile is still running and filling the queue. I restarted tirex an hour ago and set prio of this dirty files to 99. I had the hope that it so ignored by tirex, but doesn't work.
We are rendering now high density areas around amsterdam and running with many styles in timeouts at zoom-level 9 (x=264 y=160). I will think about to remove zoom-level 9 from my list. Other ideas?
What's not good is that we have in the moment 70 seq. scans and 12 index scans.
"vacuum full" was broken after 2.5 day. I hope somebody else could install the postgres_upgrade_fix.
Next question, what is filling the bulk queue (4k entries)? :-)
I don't know.
how is the age being calculated in the queues?
I don't know. Is this important?
Greetings Kolossos
Kind regards, Kay
On 9/11/11 3:49 AM, Tim Alder wrote:
What's not good is that we have in the moment 70 seq. scans and 12 index scans.
"vacuum full" was broken after 2.5 day. I hope somebody else could install the postgres_upgrade_fix.
That's not so good either.
If I understood it correctly the postgres_upgrade_fix is only relevant if one updates postgresql. However, as far as I know, postgresql wasn't upgraded and it is still running 8.3. So I doubt this will fix anything.
So the question is how do we fix the database corruption? Is it necessary to do a full reimport?
In the mean time, until the corruption is fixed, can we turn off diff-imports. They are always erroring anyway and are adding load to the db, that could probably better be spent on rendering.
Kai
I stopped load-import.
Tim
Am 12.09.2011 18:29, schrieb Kai Krueger:
In the mean time, until the corruption is fixed, can we turn off diff-imports. They are always erroring anyway and are adding load to the db, that could probably better be spent on rendering.
On 12.09.2011 18:29, Kai Krueger wrote:
I hope somebody else could install the postgres_upgrade_fix.
That's not so good either. If I understood it correctly the postgres_upgrade_fix is only relevant if one updates postgresql. However, as far as I know, postgresql wasn't upgraded and it is still running 8.3. So I doubt this will fix anything.
I had asked before. So if no upgrade happened this can't be the cause of the corruption. The script would also only prevent it from getting corrupted. If we don't have a backup then I sound like the db needs to be rebuild.
BTW: PostgreSQL 9.1 was just released... and osm2pgsql main development branch no longer supports intarray. This migration also needed a re-import.
Is this a hint that we should do a big upgrade? Could wait for the rendering to complete so we have some tiles still to deliver while the db rebuilds for a day or two.
Stephan
Couldn't we install the 9.1 and the new osm2pgsql in parallel, do a fresh import there and than switch over? Or isn't there enough disk space?
Julian
2011/9/12 Stephan Knauss toolserver@stephans-server.de
On 12.09.2011 18:29, Kai Krueger wrote:
I hope somebody else could install the postgres_upgrade_fix.
That's not so good either. If I understood it correctly the postgres_upgrade_fix is only relevant if one updates postgresql. However, as far as I know, postgresql wasn't upgraded and it is still running 8.3. So I doubt this will fix anything.
I had asked before. So if no upgrade happened this can't be the cause of the corruption. The script would also only prevent it from getting corrupted. If we don't have a backup then I sound like the db needs to be rebuild.
BTW: PostgreSQL 9.1 was just released... and osm2pgsql main development branch no longer supports intarray. This migration also needed a re-import.
Is this a hint that we should do a big upgrade? Could wait for the rendering to complete so we have some tiles still to deliver while the db rebuilds for a day or two.
Stephan
Maps-l mailing list Maps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/maps-l
If we keep only the four old tables we need for rendering we should have enough space for upgrade, but it would slow down the process. Would be nice if my db for wikipedia-POIs could run during the process.
I big update would have also the benefit that we would have a fresh DB. I hear for different sides that the updating process over long time brings undefined rubbish in DB and increase the size.
So I could live with an update.
Greetings Kolossos
Am 12.09.2011 23:51, schrieb Julian Picht:
Couldn't we install the 9.1 and the new osm2pgsql in parallel, do a fresh import there and than switch over? Or isn't there enough disk space?
Julian
On 09/08/2011 09:18 AM, Tim Alder wrote:
Hello, we haved now reached with the re-rendering the point z-curve=0.25. So we are now rendering the emptiness of north norway. But this has no visible effect on the rendering speed. The rendering performance correlate very good with the queue length. I found now the parameter -num in tirex-batch to limit the queue length. I will restart the process for tiles with z>0.25 and limit the queue to a value of around 5.000.
I am not sure that the length of the queue is really to blame for the slowdown. The good correlation is probably more simple because both the increase of the queue length and the slow down of rendering are exponential functions with time.
Renderd (rather than tirex), also limits the queue length (in that case to 1000) metatiles. It's original reason to do this was as far as I know, due to the fact that queueing and dequeueing were O(n) algorithms (Simple linear list). So with ever growing lists, this time could have become significant. By now it is using a better algorithm and this limitation should no longer be a problem. Although I don't know what algorithm tirex actually uses for its queues, I suspect it isn't using a simple linear list. So this shouldn't be a problem.
The other reason why renderd limits its queue is because it uses a simple fifo (first in first out) rendering strategy. A lot of tiles are likely to get requested for rerender and then not viewed in a very long time or ever again. Other tiles however get viewed frequently. With a fifo strategy without limits, the frequently viewed tile render requests get stuck behind all those renderings that will not be viewed again. With a limit, entering the queue becomes probabilistic. If a tile gets viewed multiple times its chance of entering the queue will increase, giving somewhat of an effect of a priority queue for frequently viewed tiles. As tirex has a proper priority queue, the length of the queue doesn't matter as much.
In neither of the cases should this really affect rendering speed.
I still suspect it has more to do with that the priority queue of tirex starts loosing the systematicity over time and thus slows it down. In this case limiting the queue would probably help, because you don't you constantly inject the tiles fresh, with less opportunity to loose the systematic ordering.
"vacuum full" is now running more than 1 day. If it's not done tomorrow, I believe I will stop it. Ok?
If those error messages are indeed coming from database corruption, then we need to fix the database. Given that the alternatives to a "vacuum full" (assuming it works, which it may not given the potential corruption of the database), basically are a full reimport of the database, which last time took longer than a few days, I think it would be good to give vacuum full a chance for a little longer.
Greetings Kolossos
Maps-l mailing list Maps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/maps-l