Hi Folks,
i started a planet import with the additional hstore column. It allows us to retrive and search in all tags used in osm. I used a combined hstore/column scheme, as our tests on the german devserver showed that hstore-value-conditions are much slower than the column-conditions of the classic database layout.
If the import is over, i'll come back with some sample queries.
The import is running on ptolemy using the following command:
$HOME/src/osm2pgsql/osm2pgsql \ --slim \ --cache 3000 \ --merc \ --prefix planet \ --style $HOME/src/osm2pgsql/default.style \ --host sql-mapnik \ --database osm_mapnik \ --hstore \ --extra-attributes \ $HOME/data/planet-100505.osm.bz2 2>&1 | \ tee $HOME/planet_osm_hstore_import.log
I'm planning to switch the ddiff update process to this database after the import and to drop the old, classic database.
Peter
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Peter Körner:
i started a planet import with the additional hstore column.
Do you expect to need more disk space for this?
ptolemy% df -h /sql Filesystem size used avail capacity Mounted on /dev/vx/dsk/disks/sql 300G 151G 140G 52% /sql
- river.
Am 09.05.2010 18:30, schrieb River Tarnell:
Do you expect to need more disk space for this?
It seems, the import takes more memory than expected: /dev/vx/dsk/disks/sql 300G 251G 47G 85% /sql
I stopped the import as I don't want to fill up the HD. I hope, that postgres will free up some space soon.
I think the hstore import is more useful then the current database scheme and it is also compatible with the default osm styles. Under this circumstaces I'd suggest dropping the current tables and do a plain-vanilla import with the hstore column. I'd like to collect some opinions before doing this.
Peter
On 9-5-2010 21:53, Peter Körner wrote:
I stopped the import as I don't want to fill up the HD. I hope, that postgres will free up some space soon.
I noticed you used the --extra-attributes argument. This causes osm2pgsql to inject osm_user,osm_version,etc tags to each object it sees in the input stream. This subsequently causes each and every object in the input stream to be added to the db, even ones that previously weren't added because they didn't match any of the rules in default.style.
Coupled with hstore support, it will consume disk space soon and fast.
Am 09.05.2010 22:02, schrieb Lennard:
I noticed you used the --extra-attributes argument. This causes osm2pgsql to inject osm_user,osm_version,etc tags to each object it sees in the input stream. This subsequently causes each and every object in the input stream to be added to the db, even ones that previously weren't added because they didn't match any of the rules in default.style.
This is true for the hstore import anyway. Or does --extra-attributes also add nodes without any tags?
Peter
On 9-5-2010 22:54, Peter Körner wrote:
This is true for the hstore import anyway. Or does --extra-attributes also add nodes without any tags?
Yes, it does.
As a test, I just imported a 6 node building, and wound up with 1 entry in the polygon table (good), and those 6 nodes in the point table (bad). Those building nodes are untagged in OSM.
The reason this happens is because the xml reader is injecting osm_uid,osm_user,osm_version,osm_timestamp for every element, and because those columns also appear in default.style (if you enable them, of course), these nodes are now 'interesting', and will be imported.
I think --extra-attributes needs to be modified so these osm_* tags are considered 'uninteresting' and are only actually added to objects which would otherwise already be imported.
On 9-5-2010 23:15, Lennard wrote:
This is true for the hstore import anyway. Or does --extra-attributes also add nodes without any tags?
Yes, it does.
And to add insult to injury, the osm_* tags are also added to the hstore column, leading to further duplication of information.
Also, was it ever considered to drop tags added as a real column from the hstore column? Now they end up in both, if the tag is present in default.style.
Lennard schrieb:
On 9-5-2010 23:15, Lennard wrote:
This is true for the hstore import anyway. Or does --extra-attributes also add nodes without any tags?
Yes, it does.
Uhh, so I'll skip --extra-attributes -- i didn't think about this before.
Also, was it ever considered to drop tags added as a real column from the hstore column? Now they end up in both, if the tag is present in default.style.
Yes we talked about thatand it's just a matter of coding it into osm2pgsql. The benefit from having the tags as columns is that value-lookups are much faster then they would be in the hstore, even with a gin-index on it. The benefit from having all tags in the hstore is easier statistical analysis, eg.
SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP BY key ORDER BY count DESC, key LIMIT 20;
gives you the top-20 used tags in osm. If some tags are only available as columns, it's not that easy to count them.
Peter
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Peter Körner:
I stopped the import as I don't want to fill up the HD. I hope, that postgres will free up some space soon.
I added 300GB to /sql. Presumably that should be enough for the import.
- river.
River Tarnell schrieb:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Peter Körner:
I stopped the import as I don't want to fill up the HD. I hope, that postgres will free up some space soon.
I added 300GB to /sql. Presumably that should be enough for the import.
Cool, I didn't know that's possible so easily. I re-started the import with out the --extra-attributes argument. The progress can be monitored from any login server using
tail -fc100 /home/project/o/s/m/osm/planet_osm_hstore_import.log
Peter
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Peter Körner:
I added 300GB to /sql. Presumably that should be enough for the import.
Cool, I didn't know that's possible so easily.
It is as long as there is space on the disk. Currently there is 819GB in total, of which 200GB is allocated to /osm and 600GB to /sql. I will shrink /sql again once the import is done and there is only one copy of the database.
- river.
River Tarnell schrieb:
I will shrink /sql again once the import is done and there is only one copy of the database.
That sounds workable. The import runs very fast, it's already writing the ways. We'll see how long it will take to complete it. Have you any idea of how many traffic there is on the current database and how many tools will break if we drop it?
Peter
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Peter Körner:
Have you any idea of how many traffic there is on the current database and how many tools will break if we drop it?
Is it not possible to just rename it over the old one?
- river.
River Tarnell schrieb:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Peter Körner:
Have you any idea of how many traffic there is on the current database and how many tools will break if we drop it?
Is it not possible to just rename it over the old one?
Yes, this would be possible, but the way of accessing none-standard tags is different. atm it is just
{{{ select "name:de" from planet_osm_point where "name:de" is not null; }}}
with the hstore it would be
{{{ select (tags->"name:de") from planet_osm_point where (tags ? "name:de"); }}}
so the tools will have to be adopted.
Pezte
River Tarnell schrieb:
It is as long as there is space on the disk. Currently there is 819GB in total, of which 200GB is allocated to /osm and 600GB to /sql. I will shrink /sql again once the import is done and there is only one copy of the database.
I think you can do this right now
osm@ptolemy:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/vx/dsk/disks/sql 600G 159G 414G 28% /sql
Peter