Hi
Am 27.07.2011 08:18, schrieb Kay Drangmeister:
there have been quite some performance tuning measures on ptolemy:
(1) number of render processes has been reduced from 8/6 to 4 (2) Kolossos modified expire.rb to render low zoom tiles with low probability (3) indexes have been added to the DB for geometry,hstore and osm-id (4) clustering
Is there a good way that we can monitor the results? Especially (1) should be carefully tracked. I can see no significant changes in IO throughput
This decision has been made to try if offloading the database would result in less render timeouts.
http://munin.toolserver.org/OSM/ptolemy/iostat.html or IO http://munin.toolserver.org/OSM/ptolemy/io_bytes_sd.html and not even in postgres connections http://munin.toolserver.org/OSM/ptolemy/postgres_connections_osm_mapnik.html The load and CPU usage has been decreased a bit. My guess would be that more processes would result in a better CPU utilization (and thus faster overall rendering).
To monitor this we need two figures: (a) average tile rendering time (per process) and (b) tiles rendered per second (by all processes). Can we set up munin to track it?
I don't think tirex allows capturing the tile throughput on a per-process base, I guess it would need to be modified to allow that.
BTW: http://munin.toolserver.org/OSM/ptolemy/tirex_status_queued_requests.html has not been updated for 13 h now, how can that happen?
The whole tirex block has disappeared from the statistics. Munin is not listing the plugins anymore:
osm@ptolemy:~$ telnet localhost 4949 Trying ::1... telnet: connect to address ::1: Connection refused Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. # munin node at ptolemy.esi.toolserver.org list apache_accesses apache_processes apache_volume cpu df if_e1000g0 io_busy_sd io_bytes_sd io_ops_sd iostat load mod_tile_fresh mod_tile_response mod_tile_zoom netstat ntp_kernel_err ntp_kernel_pll_freq ntp_kernel_pll_off ntp_offset ntp_states postfix_mailqueue postfix_mailstats postfix_mailvolume postgres_bgwriter postgres_cache_osm_mapnik postgres_checkpoints postgres_connections_db postgres_connections_osm_mapnik postgres_locks_osm_mapnik postgres_querylength_osm_mapnik postgres_scans_osm_mapnik postgres_size_osm_mapnik postgres_transactions_osm_mapnik postgres_tuples_osm_mapnik postgres_users postgres_xlog processes replication_delay2 uptime users
This seems like a munin misconfiguration. Sometimes only munin-node needs to be restarted.
And another question: earlier, two slots have been reserved for prio 1 queue requests (i.e. missing tiles). Is there a reserve available currently? Otherwise one would have to wait in that case.
I just reduced the max. number of render processed by two. The configuration now looks like this:
osm@ptolemy:~$ less tirex/etc/tirex/tirex.conf # Buckets for different priorities. bucket name=missing minprio=1 maxproc=6 maxload=20 bucket name=dirty minprio=2 maxproc=4 maxload=8 bucket name=bulk minprio=10 maxproc=3 maxload=6 bucket name=background minprio=20 maxproc=3 maxload=4
Peter