Re: [discovery] [Ops] Index management for Maps - Discovery

22 Jun 2016

On Wed, Jun 22, 2016 at 10:15 AM, Alexandros Kosiaris
&lt;akosiaris(a)wikimedia.org&gt; wrote:
...
  From my understanding back then, those indices are
 kartotherian/tilerator specific. So bundling any schema changes that
 suite of software requires with it makes sense. Anything non
 kartotherian/tilerator specific we could/should try to upstream it of
 course. 
I have the same understanding.

...
  On Wed, Jun 22, 2016 at 9:40 AM, Giuseppe Lavagetto
 &lt;glavagetto(a)wikimedia.org&gt; wrote:
> On Tue, Jun 21, 2016 at 9:02 PM, Guillaume Lederrey
> &lt;glederrey(a)wikimedia.org&gt; wrote:
>> Hello!
>>
>> I need some feedback from my fellow Ops on how to manage indexes on Maps.
>>
>> Context:
>>
>> Maps imports OpenStreetMap data in a Postgresql database. This import
>> is done with osm2pgsql, which takes care of creating the schema,
>> populating tables and creating a few indexes.
>>
>> Some additional indexes are required to support the specific
>> functionalities of Tilerator. So far, those indexes have been created
>> manually and have not been tracked.
>>
>> The enhancements to the schema created by osm2pgsql are minor (a few
>> index and functions), so we probably need a lightweight solution.
>>
>> Proposal:
>>
>> A few idempotent scripts are versionned in osm-bright.tm2source [1].
>> Those scripts are executed after review by Ops, at the request of the
>> project. We don't use a full schema migration process, because at this
>> point there isn't really a need for it on this project.
>>
>> Does this look reasonable to you? Feedback welcomed, shoot me down if
>> you have to...
>
> My very naive first level suggestion would be to run a wrapper around
> osm2pgsql and (re-)add the indexes via this wrapper once osm2pgsql has
> completed successfully. 
osm2pgsql is creating the schema on initial data load. Creating
additional index / functions should be done at that time. We have a
script that takes care of this initial data load and some operations
that needs to happen at the same time (work in progress [1]). But
after initial data load, we might (and will) still need to evolve the
index and need to track those evolutions.

On the media wiki side, we have a more elaborate solution, with a real
schema migration process. I think that in the case of Maps this is not
really needed as the changes to the initial schema are fairly trivial.
Idempotent scripts are sufficient IMHO. I'd like validation on this
specific point, but at least I did not see anyone jump at the mention
of idempotent scripts ...

[1] https://gerrit.wikimedia.org/r/#/c/293105/

...
   Also, if such
schema optimizations have general interest, it would be
 nice to try to get them integrated into osm2pgsql.

 But I can be missing some context here!

 Cheers
 Giuseppe
 --
 Giuseppe Lavagetto, Ph.d.
 Senior Technical Operations Engineer, Wikimedia Foundation

 _______________________________________________
 Ops mailing list
 Ops(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/ops 

 --
 Alexandros Kosiaris &lt;akosiaris(a)wikimedia.org&gt; 

-- 
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation