Planet-Import with hstore

List overview All Threads
Download

newer

older

Re: [Maps-l] [Toolserver-l] OSM on...

OSM replication broken? Why is...

Peter Körner

9 May 2010 9 May '10

9:03 a.m.

Hi Folks,

i started a planet import with the additional hstore column. It allows us to retrive and search in all tags used in osm. I used a combined hstore/column scheme, as our tests on the german devserver showed that hstore-value-conditions are much slower than the column-conditions of the classic database layout.

If the import is over, i'll come back with some sample queries.

The import is running on ptolemy using the following command:

$HOME/src/osm2pgsql/osm2pgsql \ --slim \ --cache 3000 \ --merc \ --prefix planet \ --style $HOME/src/osm2pgsql/default.style \ --host sql-mapnik \ --database osm_mapnik \ --hstore \ --extra-attributes \ $HOME/data/planet-100505.osm.bz2 2>&1 | \ tee $HOME/planet_osm_hstore_import.log

I'm planning to switch the ddiff update process to this database after the import and to drop the old, classic database.

Peter

Show replies by date

River Tarnell

9 May 9 May

9:30 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Peter Körner:

...

i started a planet import with the additional hstore column.

Do you expect to need more disk space for this?

ptolemy% df -h /sql Filesystem size used avail capacity Mounted on /dev/vx/dsk/disks/sql 300G 151G 140G 52% /sql

- river.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (HP-UX) iEYEARECAAYFAkvm4zEACgkQIXd7fCuc5vLAwwCffno0CNuiiKvySmNgY4Uoe3bR qDwAoIRZ5ab4eoyP0DzlirChTna4J5Fp =eZcD -----END PGP SIGNATURE-----

Peter Körner

12:53 p.m.

Am 09.05.2010 18:30, schrieb River Tarnell:

...

Do you expect to need more disk space for this?

It seems, the import takes more memory than expected: /dev/vx/dsk/disks/sql 300G 251G 47G 85% /sql

I stopped the import as I don't want to fill up the HD. I hope, that postgres will free up some space soon.

I think the hstore import is more useful then the current database scheme and it is also compatible with the default osm styles. Under this circumstaces I'd suggest dropping the current tables and do a plain-vanilla import with the hstore column. I'd like to collect some opinions before doing this.

Peter

Lennard

1:02 p.m.

On 9-5-2010 21:53, Peter Körner wrote:

...

I stopped the import as I don't want to fill up the HD. I hope, that postgres will free up some space soon.

I noticed you used the --extra-attributes argument. This causes osm2pgsql to inject osm_user,osm_version,etc tags to each object it sees in the input stream. This subsequently causes each and every object in the input stream to be added to the db, even ones that previously weren't added because they didn't match any of the rules in default.style.

Coupled with hstore support, it will consume disk space soon and fast.

-- Lennard

Peter Körner

1:54 p.m.

Am 09.05.2010 22:02, schrieb Lennard:

...

I noticed you used the --extra-attributes argument. This causes osm2pgsql to inject osm_user,osm_version,etc tags to each object it sees in the input stream. This subsequently causes each and every object in the input stream to be added to the db, even ones that previously weren't added because they didn't match any of the rules in default.style.

This is true for the hstore import anyway. Or does --extra-attributes also add nodes without any tags?

Peter

Lennard

2:15 p.m.

On 9-5-2010 22:54, Peter Körner wrote:

...

This is true for the hstore import anyway. Or does --extra-attributes also add nodes without any tags?

Yes, it does.

As a test, I just imported a 6 node building, and wound up with 1 entry in the polygon table (good), and those 6 nodes in the point table (bad). Those building nodes are untagged in OSM.

The reason this happens is because the xml reader is injecting osm_uid,osm_user,osm_version,osm_timestamp for every element, and because those columns also appear in default.style (if you enable them, of course), these nodes are now 'interesting', and will be imported.

I think --extra-attributes needs to be modified so these osm_* tags are considered 'uninteresting' and are only actually added to objects which would otherwise already be imported.

-- Lennard

Lennard

2:22 p.m.

On 9-5-2010 23:15, Lennard wrote:

...

...
This is true for the hstore import anyway. Or does --extra-attributes also add nodes without any tags?

Yes, it does.

And to add insult to injury, the osm_* tags are also added to the hstore column, leading to further duplication of information.

Also, was it ever considered to drop tags added as a real column from the hstore column? Now they end up in both, if the tag is present in default.style.

-- Lennard

Peter Körner

9:54 p.m.

Lennard schrieb:

...

On 9-5-2010 23:15, Lennard wrote:

...
...
This is true for the hstore import anyway. Or does --extra-attributes also add nodes without any tags?

Yes, it does.

Uhh, so I'll skip --extra-attributes -- i didn't think about this before.

...

Also, was it ever considered to drop tags added as a real column from the hstore column? Now they end up in both, if the tag is present in default.style.

Yes we talked about thatand it's just a matter of coding it into osm2pgsql. The benefit from having the tags as columns is that value-lookups are much faster then they would be in the hstore, even with a gin-index on it. The benefit from having all tags in the hstore is easier statistical analysis, eg.

SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP BY key ORDER BY count DESC, key LIMIT 20;

gives you the top-20 used tags in osm. If some tags are only available as columns, it's not that easy to count them.

Peter

River Tarnell

10:08 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Peter Körner:

...

I stopped the import as I don't want to fill up the HD. I hope, that postgres will free up some space soon.

I added 300GB to /sql. Presumably that should be enough for the import.

- river.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (HP-UX) iEYEARECAAYFAkvnlNgACgkQIXd7fCuc5vIuPgCePxso3YjUISVcq7coekLhS2gf togAn1VxOrpr/P3GwA0jt5zF5AZURF+H =XPSK -----END PGP SIGNATURE-----

Peter Körner

10:18 p.m.

River Tarnell schrieb:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Peter Körner:

...
I stopped the import as I don't want to fill up the HD. I hope, that postgres will free up some space soon.

I added 300GB to /sql. Presumably that should be enough for the import.

Cool, I didn't know that's possible so easily. I re-started the import with out the --extra-attributes argument. The progress can be monitored from any login server using

tail -fc100 /home/project/o/s/m/osm/planet_osm_hstore_import.log

Peter

River Tarnell

10 May 10 May

12:56 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Peter Körner:

...

...
I added 300GB to /sql. Presumably that should be enough for the import.

...

Cool, I didn't know that's possible so easily.

It is as long as there is space on the disk. Currently there is 819GB in total, of which 200GB is allocated to /osm and 600GB to /sql. I will shrink /sql again once the import is done and there is only one copy of the database.

- river.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (HP-UX) iEYEARECAAYFAkvnvBoACgkQIXd7fCuc5vKBeQCfXbwKOt73LhM0u3PYMrhEI4VI 32MAoMEng2f1YbmICaGtxfS3xL8rbq8a =tnyl -----END PGP SIGNATURE-----

Peter Körner

2:42 a.m.

River Tarnell schrieb:

...

I will shrink /sql again once the import is done and there is only one copy of the database.

That sounds workable. The import runs very fast, it's already writing the ways. We'll see how long it will take to complete it. Have you any idea of how many traffic there is on the current database and how many tools will break if we drop it?

Peter

River Tarnell

3:51 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Peter Körner:

...

Have you any idea of how many traffic there is on the current database and how many tools will break if we drop it?

Is it not possible to just rename it over the old one?

- river.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (HP-UX) iEYEARECAAYFAkvn5SkACgkQIXd7fCuc5vL4WwCfeNyARfbwoxHog5Sh21wJ/f33 d6sAoIoVJuq4b6jjbSlMk1E18Fl+/KyZ =9Rxm -----END PGP SIGNATURE-----

Peter Körner

4:01 a.m.

River Tarnell schrieb:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Peter Körner:

...
Have you any idea of how many traffic there is on the current database and how many tools will break if we drop it?

Is it not possible to just rename it over the old one?

Yes, this would be possible, but the way of accessing none-standard tags is different. atm it is just

{{{ select "name:de" from planet_osm_point where "name:de" is not null; }}}

with the hstore it would be

{{{ select (tags->"name:de") from planet_osm_point where (tags ? "name:de"); }}}

so the tools will have to be adopted.

Pezte

Peter Körner

15 Jun 15 Jun

6:48 a.m.

River Tarnell schrieb:

...

It is as long as there is space on the disk. Currently there is 819GB in total, of which 200GB is allocated to /osm and 600GB to /sql. I will shrink /sql again once the import is done and there is only one copy of the database.

I think you can do this right now

osm@ptolemy:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/vx/dsk/disks/sql 600G 159G 414G 28% /sql

Peter

5305

Age (days ago)

5342

Last active (days ago)

maps-l@lists.wikimedia.org

14 comments

3 participants

tags (0)

participants (3)

Lennard
Peter Körner
River Tarnell