On 02/11/2010 09:07 AM, Peter Körner wrote:
Kai Krueger schrieb:
Hi,
On 02/08/2010 11:34 AM, Peter Körner wrote:
River Tarnell schrieb:
Peter Körner:
> I used 2GB of RAM as cache (we used 10GB on cassini) -- i hope this
> is ok
> on these public machines.
If you use more than 1GB, you will probably find slayerd will kill your
process.
I re-started the process with 1GB cache.
It looks like this is not entirely ideal.
With 1GB the process was killed, so I
had to run it with 512 MB.. ^^
Although I don't know what the
optimal cache size is, 1GB seems far too little. My guess would be
more like 10 - 15 GB for the full planet. On the OSM tile server an
import takes about 8 - 15 hours if I am not mistaken, whereas the
current import on ptolemy has been running for nearly 3 days, despite
the fact that ptolemy is probably a faster db server than the osm tile
server which only has four disks.
But we got over 556 additional tags (name:xx and
wikipedia:xx for all
the 278 wikipedia languages). That's the main reason why the import
takes this much longer. On cassini, where we only imported the name:xx
tags and used to have 10GB cache, it took around 2 days.
Although I don't have much fact to base this on, my guess would be that
the extra fields don't cause too much extra time, as simply writing out
database fields to disk wouldn't seem like the bottle neck. The problem
I think is building the geometry of ways and relations. As the OSM data
only has node references in the ways section, for each node in a way it
will need to query the database to retrieve the lat/lon pair for the
node to build linestrings. This is where the osm2pgsql cache comes in
which stores this information in a very efficient way saving on a lot of
db access, if the hit ratio is reasonable. With the db being on a
different machine, the extra latency won't help either. With 500Mb, the
hit ratio is probably fairly low, as even with much smaller extracts
such as the UK, you need 1 - 2GB of cache to achieve reasonable performance.
How far along has it come? Osm2pgsql should give the number of nodes,
ways and relations it has processed so far. If it is nearly done, then
we don't have to worry too much about this.
Kai
Peter