For some time I’ve been working as a contractor developing a new
style and vector tile schema for the Wikimedia Foundation. It’s
been completed but not deployed for several months. As my contract
finishes this month and Wikimedia Foundation leadership has
decided to not deploy the new map styles, I’m writing up a
technical lessons learned from my experiences on the style. I’m
not going to be discussing the organizational factors that lead to
the decision, but looking at how I’d code things differently if
starting over.
Overview
A complete map style consists of three parts: The database loading
rules, the feature selection rules, and the styling rules. For a
style written in the languages used by the WMF stack, these are
expressed in osm2pgsql instructions, a tm2source project with SQL,
and CartoCSS. The first tells you how to get the data into the
database, the second defines what data goes into the vector tiles,
and the third is how to draw features in the vector tiles. What
goes in the vector tiles is also known as the “schema” and can be
expressed in terms of what features appear and when, e.g.
secondary roads first appear on zoom 12. For increased confusion,
the database also has a “schema”, both of which are distinct from
a PostgreSQL “SCHEMA.”
In the current style, the parts are the osm2pgsql C transforms,
osm-bright.tm2source, and WMF’s fork of osm-bright.tm2. In the new
style, the parts are ClearTables + osm2pgsql, meddo, and brighmed.
The goal with the style changes was to improve the representation
of disputed borders, switch to a vector tile schema without a
legal cloud over it, and make some styling improvements. In this
it succeeded.
Database schema
The decision was made early on to go with ClearTables. This is an
alternative set of rules for osm2pgsql which loads the data into
many more tables for greater performance, easier style rules, and
a bigger layer of abstraction between raw OSM tags and the SQL you
need to write. It was started by me before my work at WMF and only
a few features were added.
ClearTables does what it is designed to do, yet it was a mistake
for this project. I still believe it is technically a better
solution, yet the advantages are not worth the costs of doing
something different.
The two most common database schemas are the built-in osm2pgsql “C
transforms” and the OpenStreetMap Carto. They aren’t any better
code - with ClearTable’s test suite, it’s probably got fewer bugs,
but there are many guides on how to set them up, and it requires
fewer components.
Setting up the database isn’t an issue for WMF production servers,
but is considered one of the more difficult steps for potential
contributors to any style. Minimizing differences from other
setups here helps greatly. A second issue is that many potential
users of the style already have a database. I have heard from
multiple people who would like to run the style if it could be
used with their existing databases.
Static data
Map styles need some forms of “static” data loaded such as oceans,
low-zoom data, and borders. Normally this is done on an ad-hoc
basis with a long complicated shp2pgsql or ogr2ogr command, but I
wrote a python script that downloads the data and loads it with
ogr2ogr, as well as handling all the SQL needed to update the data
without a service interruption.
This script is useful enough that I have reused it for other
projects, which was made easy because I didn’t hard-code the files
used into the script, but used another file to define them.
Borders
One of the drivers of the work was to better display disputed
borders. To do this a pre-processing step was considered
necessary, and I wrote a necessary program in C++ with libosmium.
This worked, but I should have made more of an effort to get it
packed by Debian GIS and run on Jochen Toph’s
OpenStreetMapData.com servers so others could use the work to
encourage more developers to participate in maintenance. I should
also have given pyosmium a more detailed look.
Vector tile schema
One of the reasons for switching to a new schema was legal threats
against people using the Mapbox Streets schema. This meant
osm2vectortiles also had to switch schemas at the same time. There
was an effort to work with them to use a common schema, but it
never happened because we had different needs. In retrospect, we
should have either gone with a common schema and tm2source
project, or done nothing in common. Either choice is valid, and
it’s a balance of coordination work against a common development
direction.
It was useful to have someone external to discuss ideas with, but
this wouldn’t have been required with other people on the team.
Style
The original plan was to largely stick with the cartography of
osm-bright. This changed once we got into implementation and we
realized how insane some parts of the osm-bright cartography were,
and efforts were made towards redoing the style.
The road colours selected were from ColorBrewer2 OrRd6, with
casing colours done by adjusting the Lch lightness and chroma. It
would have been better to pick endpoints and generate colours
using a script, similar to osm-carto. This would have allowed
easier changes and sped up development by reducing the number of
variables that need to be manually set.
Overall
The style was completed successfully in time, and none of the
changes would have significantly changed that. They would have
mainly made it easier to attract external contributors if an
effort were put into that. As attracting external contributors
wasn’t a priority, they didn’t matter.