For Wikimedia folks who are interested in possible collaborations with OSM,
now seems like a good time to start thinking about possible presentations.
Staff from the Wikimedia Foundation, and/or Wikimedia volunteers from
around the US outside of the Seattle area, may want to start thinking about
For Wikimedia volunteers outside of Cascadia Wikimedians territory, you
might consider applying for WMF Travel and Participation Support grants .
If you're inside of Cascadia Wikimedians territory and would like to attend
the conference, we may have funds in our budget that can support your
attendance. Contact me off-list for details.
---------- Forwarded message ----------
From: Clifford Snow <clifford(a)snowandsnow.us>
Date: Tue, Mar 1, 2016 at 5:18 PM
Subject: [opensource-107] Seattle to host the 2016 OpenStreetMap State of
the Map US Conference
I am excited to announce that Seattle was chosen to host the OpenStreetMap
2016 State of the Map US Conference. The conference will take place July
23-25 on SeattleU's campus. We chose SeattleU for their low cost, proximity
to Seattle and access to public transit. The food trucks near by didn't
We are looking for help! Let us know if you want to help. Request for
presentation proposals should be coming fairly soon. Start thinking about
what you want to present or teach.
The formal announcement can be found at:
Here's an update of what the team has been up to:
*Stats are updated!*
- updated www.wikipedia.org stats,
- updated the sister portals with the latest version from Meta.
In order to improve the process Deborah created a recurring task for it
It's pretty cool to see what's changed :)
- sister portals
*Deploying the enhanced search box to production*
We had a list of improvements to make before pushing to production.
It took us longer than expected, specially because the language picker
implementation was not 100% production-ready:
- IE8(and lower) users were ignored for the A/B test.
- We initially took the decision to not support IE8 for the A/B test
in order to save development time (in case the test results show no
- Mobile user experience was not optimal because of the custom dropdown.
- We took the decision to figure this out only once we got the test
results, to save some development time (in case the test results show no
for non-JS users.
We will learn from our first A/B test. The test showed a significant
improvement though, and we are now getting ready for a deployment to
production. Here's the latest update:
it with a native <select> element.
- Styling this native <select> element as well as we can (to match what
we had in the A/B test),
- but it may look a little odd in old browsers (there isn't so
much we can do with <select> elements).
- Solves mobile user experience issues because the device's native
selector will be triggered and will be a lot easier than a
- Solves non-JS traffic (~ 7%
because it's a native <select> element (no JS required)
- minor detail: only the custom arrow is not clickable because we
need a little JS hack.
*Status: *The patch made it to code review today. We will release when it's
As of today, here is what you can expect:
We have an idea on how to make it even better for old and weird browsers,
but we want to move forward with this now :)
And the new typeahead is fantastic!!!
*Next A/B test: Use language detection to re-arrange the primary links to
suit the user better*
(Primary links = the 10 wiki links in the screenshot above).
For the test we read the user's preferred languages (from the browser) and
show the corresponding wikis in the most top positions.
Let's see if people click on these links more than usual !
Please come to Deborah and us if you have any question :)
*Status:* The patch made it to code review today.
We will merge it into its feature branch as soon as it's approved.
Then we will review the A/B test setup one more time and schedule a launch
We hope to get the new search box in production before we launch the A/B
Performance matters... it's a huge part of the user experience.
We only talk about it when it's bad, we often forget it when it's good.
Performance improvements can definitely increase user engagement.
Take a look at what we've done since November:
For the Wikipedia.org portal team,
Greetings language nerds,
I've completed the creation of a 21-language balanced (i.e., 200 each)
corpus of relatively clean queries for use in evaluating language
identification model testing. The 21 languages were chosen based on query
volume across wikis in those languages. I've also evaluated our current
version of TextCat against this corpus, using the known 21 languages, and
all 59 languages I have models for.
The 21 languages have pretty good models, because they had lots of query
volume to be built on. The full set of 59 is a bit more dodgy, esp. Igbo,
which is known to have a lot of English in the training data.
Indonesian is the most unexpectedly poor performing of the bunch (most
other poor performance is across language or script families and so is
The best model size among those test (500 to 10K), was the full 10,000!
However performance at the 3,000 ngram model size (what we've been using
for A/B tests) was only a few percentage points worse.
Full write up with lots more details here:
I'll commit models for the rest of these 21 languages after verifying that
they won't mess up our A/B tests.
Software Engineer, Discovery
I have updated the Discovery page
mediawiki.org to convey who is working on what. If you're curious, take a
look. Please note the disclaimer: this is only intended to roughly convey
who is working on what, and there are no guarantees that this is accurate
to any particular level of detail.
Lead Product Manager, Discovery
I seem to have forgotten when last discussing this, but the week of March
22 when we plan to roll the feature to prod is the same week tech ops is
planning to test shifting all traffic to our failover data center in
dallas. They have requested bi deployments that week to limit the variance
for bugs that will invariably crop up. I think discovery is mostly ready,
but we wll need to move comms a bit faster?
We could also push back to the following week. I will be out on mid week to
Israel but I trust dcausse and gehel can handle the stayed rollout
regardless of my availability.
I am finishing the upgrade of elasticsearch to 1.7.5 for codfw (eqiad
still to do). For this, I used a small script , heavily inspired
(copied / stolen / ...) from bd808. The script is ugly, but it does
the job. It runs the deployment over a list of hosts, pausing for
manual steps / validations along the way. The script runs locally on
my workstation, so it is subject to loss of connectivity, local
crashes, hard to handover, ...
Do we have a central place for those kind of scripts? I'd like to
version it in a more obvious place than my personal Github repo. Do we
have examples of similar scripts? A specific tool for this? Rundeck
 comes to mind. Note that I'm not a huge fan of Rundeck as it
brings far too much complexity for simple tasks, but the concept of
having a central place of re usable operational components is
We took some time with Brandon last Friday to have a presentation on
the Varnish / caching infrastructure. Brandon did not have a lot of
time to prepare this, so it ended up being more of a conversation,
mainly driven by our questions. Brandon did have some slides, but they
were just a very light support to our conversation.
Honestly, that's my preferred format. (Thanks you Brandon for being
busy on more important stuff and still take time to answer our
If keeping this fairly unstructured format helps to have more regular
Ops Sessions, I'm all for it!
I also had to friends / former coworkers who were with me for this
session. They both work on the caching infrastructure of Nespresso,
and they found it really interesting. If we could open some of our Ops
Session I think there could be quite a few people interested in
watching them. And in my understanding, this would be quite aligned
with our mission of disseminating the world's knowledge (and we do
have a sizeable body of technical knowledge to disseminate).
I can see a few constraints in open in those Ops Sessions to a wider
audience. We still need some private time to discuss some of the
things that are sensitive. And we need to find a way for this to not
had a significant overhead to our busy schedules. Still I think it
would be great if we could do that...
What do you think?
And again, big thanks to Brandon for his time!