Hi Paul,
Thanks so much for all your hard work, dedication to the maps project, and
your devotion to documenting as your time with the Foundation is ending. It
was great to work with you to create and refine the new map style; I
sincerely hope we can get it deployed someday.
Cheers and see ya around the map,
Deb
On Tue, Jul 24, 2018 at 2:03 PM Paul Norman <paul(a)paulnorman.ca> wrote:
> The current WMF map rendering software is already vector tile based. The
> work was about disputed borders, changing the vector tile schema, and
> cartography improvements.
>
> MapTiler and openmaptiles aren't really that related. I went over
> openmaptiles in the section on vector schema lessons learned.
>
>
> On Jul 24, 2018 2:58 AM, Naveen Francis <naveenpf(a)wikimedia.in> wrote:
>
> Sad to hear that WMF is not deploying vector map now.
> For WMF, 'internal politics' is the key for any projects they take up.
>
> Is this an implementation of vector sytle ?
> https://www.maptiler.com/cloud/#streets/kn/vector/7.18/77.534/19.724
>
> https://openmaptiles.org/schema/
> The vector tile schema has been developed by Klokan Technologies GmbH
> <https://www.klokantech.com/> and was initially modelled after the
> cartography of the Carto Basemap Positron
> <https://carto.com/location-data-services/basemaps/>. The vector tile
> schema has been refined and improved in cooperation with the Wikimedia
> Foundation <https://www.mediawiki.org/wiki/Maps> and is heavily
> influenced by the many years Paul Norman's experience of creating maps from
> OpenStreetMap.
>
>
>
> Thanks,
> Naveen Francis
> <http://wikibooks.in>
>
> On Mon, Jul 23, 2018 at 4:56 AM, Paul Norman <paul(a)paulnorman.ca> wrote:
>
> For some time I’ve been working as a contractor developing a new style and
> vector tile schema for the Wikimedia Foundation. It’s been completed but
> not deployed for several months. As my contract finishes this month and
> Wikimedia Foundation leadership has decided to not deploy the new map
> styles, I’m writing up a technical lessons learned from my experiences on
> the style. I’m not going to be discussing the organizational factors that
> lead to the decision, but looking at how I’d code things differently if
> starting over.
>
> *Overview*
>
> A complete map style consists of three parts: The database loading rules,
> the feature selection rules, and the styling rules. For a style written in
> the languages used by the WMF stack, these are expressed in osm2pgsql
> instructions, a tm2source project with SQL, and CartoCSS. The first tells
> you how to get the data into the database, the second defines what data
> goes into the vector tiles, and the third is how to draw features in the
> vector tiles. What goes in the vector tiles is also known as the “schema”
> and can be expressed in terms of what features appear and when, e.g.
> secondary roads first appear on zoom 12. For increased confusion, the
> database also has a “schema”, both of which are distinct from a PostgreSQL
> “SCHEMA.”
>
> In the current style, the parts are the osm2pgsql C transforms,
> osm-bright.tm2source, and WMF’s fork of osm-bright.tm2. In the new style,
> the parts are ClearTables + osm2pgsql, meddo, and brighmed.
>
> The goal with the style changes was to improve the representation of
> disputed borders, switch to a vector tile schema without a legal cloud over
> it, and make some styling improvements. In this it succeeded.
>
> *Database schema*
>
> The decision was made early on to go with ClearTables. This is an
> alternative set of rules for osm2pgsql which loads the data into many more
> tables for greater performance, easier style rules, and a bigger layer of
> abstraction between raw OSM tags and the SQL you need to write. It was
> started by me before my work at WMF and only a few features were added.
>
> ClearTables does what it is designed to do, yet it was a mistake for this
> project. I still believe it is technically a better solution, yet the
> advantages are not worth the costs of doing something different.
>
> The two most common database schemas are the built-in osm2pgsql “C
> transforms” and the OpenStreetMap Carto. They aren’t any better code - with
> ClearTable’s test suite, it’s probably got fewer bugs, but there are many
> guides on how to set them up, and it requires fewer components.
>
> Setting up the database isn’t an issue for WMF production servers, but is
> considered one of the more difficult steps for potential contributors to
> any style. Minimizing differences from other setups here helps greatly. A
> second issue is that many potential users of the style already have a
> database. I have heard from multiple people who would like to run the style
> if it could be used with their existing databases.
>
> *Static data*
>
> Map styles need some forms of “static” data loaded such as oceans,
> low-zoom data, and borders. Normally this is done on an ad-hoc basis with a
> long complicated shp2pgsql or ogr2ogr command, but I wrote a python script
> that downloads the data and loads it with ogr2ogr, as well as handling all
> the SQL needed to update the data without a service interruption.
>
> This script is useful enough that I have reused it for other projects,
> which was made easy because I didn’t hard-code the files used into the
> script, but used another file to define them.
>
> *Borders*
>
> One of the drivers of the work was to better display disputed borders. To
> do this a pre-processing step was considered necessary, and I wrote a
> necessary program in C++ with libosmium. This worked, but I should have
> made more of an effort to get it packed by Debian GIS and run on Jochen
> Toph’s OpenStreetMapData.com servers so others could use the work to
> encourage more developers to participate in maintenance. I should also have
> given pyosmium a more detailed look.
> Vector tile schema
> One of the reasons for switching to a new schema was legal threats against
> people using the Mapbox Streets schema. This meant osm2vectortiles also had
> to switch schemas at the same time. There was an effort to work with them
> to use a common schema, but it never happened because we had different
> needs. In retrospect, we should have either gone with a common schema and
> tm2source project, or done nothing in common. Either choice is valid, and
> it’s a balance of coordination work against a common development direction.
>
> It was useful to have someone external to discuss ideas with, but this
> wouldn’t have been required with other people on the team.
>
> *Style*
>
> The original plan was to largely stick with the cartography of osm-bright.
> This changed once we got into implementation and we realized how insane
> some parts of the osm-bright cartography were, and efforts were made
> towards redoing the style.
>
> The road colours selected were from ColorBrewer2 OrRd6, with casing
> colours done by adjusting the Lch lightness and chroma. It would have been
> better to pick endpoints and generate colours using a script, similar to
> osm-carto. This would have allowed easier changes and sped up development
> by reducing the number of variables that need to be manually set.
>
> *Overall*
>
> The style was completed successfully in time, and none of the changes
> would have significantly changed that. They would have mainly made it
> easier to attract external contributors if an effort were put into that. As
> attracting external contributors wasn’t a priority, they didn’t matter.
>
>
>
> _______________________________________________
> Maps-l mailing list
> Maps-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/maps-l
>
>
>
> _______________________________________________
> Maps-l mailing list
> Maps-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/maps-l
>
--
--
deb tankersley
Program Manager, Engineering
Wikimedia Foundation
Happy Friday the 13th,* everyone!
I realized that I've gotten more lax about defining terms in my write ups
over time, because I am always talking about types (pre- and post-analysis)
and tokens and monolithic and unpacked analyzers, etc, etc. So, I
reorganized the Search Glossary a bit into topical sections, and added a
big section on Language Analysis, which I will point to in my write ups.
Please review the new section it if you have time and interest. If you know
this stuff, please correct any errors. If you don't know this stuff, please
ask questions about anything that's unclear so I can improve it. Thanks!
The new section of the glossary is under "Language Analysis
<https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Glossary#Language…>
".
Cheers,
—Trey
*Word of the day: friggatriskaidekaphobia
<https://en.wiktionary.org/wiki/friggatriskaidekaphobia>.
Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation
Hi everyone,
I've got an update on the NLP project selection. We've narrowed things down
to a handful of projects we could work on with a consultant, and a handful
we could work on internally.
David, Erik, and I reviewed a selection of the most promising-seeming
and/or most interesting projects and gave them a very rough cost estimate
based on how big of a relative impact they would have, technologically how
hard they would be, and how difficult the UI aspect would be. The scores
are not definitive, but helped guide the discussion. You can see the list
of projects we looked at and more details of the scoring on MediaWiki
<https://www.mediawiki.org/w/index.php?title=User:TJones_(WMF)/Notes/Potenti…>
.
For the possibility of working with an outside consultant, we also
considered how easily separated each project would be from our overall
system (making it easier for someone new to get up to speed), how projects
feed into each other, how easily we could work on projects ourselves (like,
we know pretty much what to do, we just have to do it), etc.
Our current *recommendation for an outside consultant* would be to start
with (1) *spelling correction/did you mean improvements,* with an option to
extend the project to include either (2) *"more like" suggestion
improvements,* or (3) *query reformulation mining,* specifically for typo
corrections.
For spelling correction (#1), we are envisioning an approach that
integrates generic intra-word and inter-word statistical models, optional
language-specific features, and explicit weighted corrections. We believe
we could mine redirects flagged as typo correction for explicit
corrections, and the query reformulation mining (#3) would also provide
frequency-weighted explicit corrections. Our hope is that a system built
initially for English would be readily applicable to other alphabetic
languages, most probably other Indo-European languages, based on statistics
available from Elastic; and that some elements of the system could be
applied to other non-alphabetic languages and languages that are
typologically <https://en.wikipedia.org/wiki/Morphological_typology> dissimilar
to Indo-European languages.
Looking at the rest of the list, (a) *wrong keyboard detection* seems like
something we should work on internally, since we already have a few good
ideas on how to approach it. (b) *Acronym support* is a pet peeve for
several members of the team, and seems to be straightforward to improve. (c)
*Automatic stemmer building* and (d) *automatic stop word* generation
aren't so much projects we should work on as things we should research to
see if there are already tools or lists out there we could use to make the
projects much easier.
Comments and questions here or on the talk page are welcome.
Cheers,
—Trey
Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation
On Tue, May 15, 2018 at 11:30 AM, Trey Jones <tjones(a)wikimedia.org> wrote:
> Hi everyone,
>
> I just finished putting together an annotated list of potential
> applications of natural language processing to on-wiki search
> <https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Potential_Applicatio…>.
> There are dozens and dozens of ideas there—including many that are
> interesting but probably not practical. If you have any additional ideas,
> questions, suggestions, recommendations, or preferences, please
> share!—either on the mailing list or on the talk page.
>
> The goal is to narrow it down to one or two things to pursue over the next
> two to four quarters, along with other projects we are working on.
>
> Thanks!
> —Trey
>
> Trey Jones
> Sr. Software Engineer, Search Platform
> Wikimedia Foundation
>
>
Hello again,
This is the weekly update from the Search Platform team for the weeks
starting 2018-07-02 and 2018-07-09.
Programming Note: With the Wikimania Hackathon, Wikimania proper, and
resulting travel for folks in coming weeks, the next update will be
for the week starting 2018-07-30.
As always, feedback and questions welcome.
== Discussions ==
=== Search ===
* David and Stas worked on fine tuning of search configs to
mediawiki-config for Wikidata [0]
* Stas and Addshore helped to catch and clean up some bad lookups and
report them properly [1]
* A "Wrong document type" error was corrected by Erik by fixing
Sanitizer MetaStore integration [2]
* Erik worked on tracking queries that run on the Elastic Search
clusters longer than both server side and client side timeouts by
fixing some slow logging functionality [3]
* There was a Meta-wiki error where search suggests non-existent title
due to namespace/redirect mixup. Erik's note: "it's a bit awkward, but
typing Help:Glo into autocomplete on metawiki suggests 'Global
Account' from the main namespace, and selecting it takes you to
Help:Unified login" [4]
* In order to dispatch queries to a particular search setup (cirrus
defaults vs wikibase custom query builder), David created a flexible
way to classify queries, meant to replace the 'getSyntaxUsed' approach
currently in SearchContext. [5]
* Trey and a community volunteer, Athena, created a basic Mirandese
analysis chain. It was tested on RelForge and pushed into production
this week [6]. Trey kicked off, completed and tested the re-indexing
of the Mirandese Wikis [7].
* The Re-Re-Index of the Serbian Wikis after refactored plugins were
deployed has been completed [8] and the re-index of the Croatian,
Serbo-Croatian, and Bosnian Wikis was also done [9]
* We currently mix a tiny number of namespace documents into the
regular indices, which seems inefficient; so Erik built a unified
namespace index [10]
* Erik updated the 'OtherIndex' to operate on a cluster other than the
one holding the wiki [11]
* Trey updated a variety of things on the Analysis Tools with lots
little fixes and improvements as well as a few small errors in the
analysis code that conflated post-analysis types and pre-analysis
types [12]
== Did you know? ==
* The period of this status update includes Friday, July 13, 2018. The
fear of the number thirteen is called "triskaidekaphobia" [13]. There
are two words for fear of Friday the 13th: "paraskavedekatriaphobia"
[14] and "friggatriskaidekaphobia" [15]—the first maintains a
consistent etymology with the Greek word for Friday, "Paraskeví",
while the second invokes "Frigg", the Norse Goddess after whom Friday
is named in English.
[0] https://phabricator.wikimedia.org/T182717
[1] https://phabricator.wikimedia.org/T198091
[2] https://phabricator.wikimedia.org/T197446
[3] https://phabricator.wikimedia.org/T196180
[4] https://phabricator.wikimedia.org/T115756
[5] https://phabricator.wikimedia.org/T197774
[6] https://phabricator.wikimedia.org/T194941
[7] https://phabricator.wikimedia.org/T197890
[8] https://phabricator.wikimedia.org/T196404
[9] https://phabricator.wikimedia.org/T196658
[10] https://phabricator.wikimedia.org/T192699
[11] https://phabricator.wikimedia.org/T194678
[12] https://phabricator.wikimedia.org/T199273
[13] https://en.wiktionary.org/wiki/triskaidekaphobia
[14] https://en.wiktionary.org/wiki/paraskavedekatriaphobia
[15] https://en.wiktionary.org/wiki/friggatriskaidekaphobia
---
Subscribe to receive on-wiki (or opt-in email) notifications of the
Discovery weekly update.
https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or
"Volunteer needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours,
Chris Koerner
Community Relations Specialist
Wikimedia Foundation
Cross-posting...!
--
deb tankersley
Program Manager, Engineering
Wikimedia Foundation
---------- Forwarded message ---------
From: Yaron Koren <yaron(a)wikiworks.com>
Date: Tue, Jul 10, 2018 at 11:01 AM
Subject: [MediaWiki-l] New episode of "Between the Brackets": Stas Malyshev
To: MediaWiki announcements and site admin list <
mediawiki-l(a)lists.wikimedia.org>
Hi,
A new episode of the MediaWiki podcast "Between the Brackets" has been
released, featuring an interview with Wikimedia Foundation developer Stas
Malyshev, who works on search in both MediaWiki and Wikidata. You can
listen to the interview here:
http://betweenthebrackets.libsyn.com/episode-12-stas-malyshev
-Yaron
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l