Hello subscribers of wikimedia-search,
this list has been renamed to "discovery" as requested on
https://phabricator.wikimedia.org/T110256
This is to let you know and test if everything worked at the same time.
I am mailing the _old_ list address on purpose to test that mail to that is
also forwarded as intended. Please start using discovery@lists though.
You will see that the listinfo page
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search is also
forwarded to the new name.
All config options and subscribers with their passwords have been imported
from old list to new list. Archives have been regenerated from the .mbox
file.
As said above, the old email address of the list also still works. It has
been added as an "acceptable alias" to list config.
Best regards,
Daniel
--
Daniel Zahn <dzahn(a)wikimedia.org>
Operations Engineer
Hello subscribers of wikimedia-search,
this list has been renamed to "discovery" as requested on
https://phabricator.wikimedia.org/T110256
This is to let you know and test if everything worked at the same time.
I am mailing the _old_ list address on purpose to test that mail to that is
also forwarded as intended. Please start using discovery@lists though.
You will see that the listinfo page
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search is also
forwarded to the new name.
All config options and subscribers with their passwords have been imported
from old list to new list. Archives have been regenerated from the .mbox
file.
As said above, the old email address of the list also still works. It has
been added as an "acceptable alias" to list config.
Best regards,
Daniel
--
Daniel Zahn <dzahn(a)wikimedia.org>
Operations Engineer
Hi All,
Why do people use Google instead of Wikipedia search? Two obvious answers
come to mind: Google gives better results, and users are just used to using
Google 'cause it's useful.
So I set out to see how search on Wikipedia compares to Google for queries
we can recover from referrals from Google.
Disclaimers: we don't know what personalized results people got, whether
they liked the result, or what they intended to search for; all we have is
the wiki page they landed on. Also, results vary depending on which Google
you start from—which I didn't consider until after the experiments and
analysis were underway.
Summary: for about 60% of queries, Wikipedia search does fine. (And about a
quarter of all searches are exact matches for Wikipedia article titles.)
Trouble areas identified include: typos in the first two characters,
question marks, abbreviations and other ambiguous terms, quotes, questions,
formulaic queries, and non-Latin diacritics.
I have a list of about 20 suggestions for projects from small to enormous
that we could tackle to improve results (plus another plug for a Relevance
Lab!).
Best factoid: someone searched for *what is hummus* and ended up on the
wiki page for Hillary Clinton.
Full details here:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Why_People_Use_Searc…
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
Hi Discovery team,
the Gerrit Cleanup Day on Wed 23rd is approaching fast - only one week
left. More info: https://phabricator.wikimedia.org/T88531
Do you feel prepared for the day and all team members know what to do?
If not, what are you missing and how can we help?
Some Gerrit queries for each team are listed under "Gerrit queries per
team/area" in https://phabricator.wikimedia.org/T88531
Are they helpful and a good start? Or do they miss some areas (or do
you have existing Gerrit team queries to use instead or to "integrate",e.g. for parts of MediaWiki core you might work on)?
Also, which person will be the main team contact for the day (and
available in #wikimedia-dev on IRC) and help organize review work in
your areas, so other teams could easily reach out?
Some team plates are emptier than others so they're wondering where and
how to lend a helping hand (to find out in advance, due to timezones).
Thanks for your help to make the Gerrit Cleanup day a success!
andre
--
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/
The php engine used in prod by the wmf, hhvm, has built in support for
shared (non-preemptive) concurrency via async/await keywords[1][2]. Over
the weekend i spent some time converting the Elastica client library we use
to work asynchronously, which would essentially let us continue on
performing other calculations in the web request while network requests are
processing. I've only ported over the client library[3], not the
CirrusSearch code. Also this is not a complete port, there are a couple
code paths that work but most of the test suite still fails.
The most obvious place we could see a benefit from this is when multiple
queries are issued to elasticsearch from a single web request. If the
second query doesn't depend on the results of the first it can be issued in
parallel. This is actually somewhat common use case, for example doing a
full text and a title search in the same request. I'm wary of making much
of a guess in terms of actual latency reduction we could expect, but maybe
on the order of 50 to 100 ms in cases which we currently perform requests
serially and we have enough work to process. Really its hard to say at this
point.
In addition to making some existing code faster, having the ability to do
multiple network operations in an async manner opens up other possibilities
when we are implementing things in the future. In closing, this currently
isn't going anywhere it was just something interesting to toy with. I
think it could be quite interesting to investigate further.
[1] http://docs.hhvm.com/manual/en/hack.async.php
[2] https://phabricator.wikimedia.org/T99755
[2] https://github.com/ebernhardson/Elastica/tree/async
Cross posting to discovery
---------- Forwarded message ----------
From: Tomasz Finc <tfinc(a)wikimedia.org>
Date: Thu, Sep 17, 2015 at 12:26 PM
Subject: Announcing the launch of Maps
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Cc: Yuri Astrakhan <yastrakhan(a)wikimedia.org>, Max Semenik <
msemenik(a)wikimedia.org>
The Discovery Department has launched an experimental tile and static maps
service available at https://maps.wikimedia.org.
Using this service you can browse and embed map tiles into your own tools
using OpenStreetMap data. Currently, we handle traffic from *.wmflabs .org
and *.wikivoyage .org (referrer header must be either missing or set to
these values) but we would like to open it up to Wikipedia traffic if we
see enough use. Our hope is that this service fits the needs of the
numerous maps developers and tool authors who have asked for a WMF hosted
tile service with an initial focus on WikiVoyage.
We'd love for you to try our new service, experiment writing tools using
our tiles, and giving us feedback <https://www.mediawiki.org/wiki/Talk:Maps> .
If you've built a tool using OpenStreetMap-based imagery then using our
service is a simple drop-in replacement.
Getting started is as easy as
https://www.mediawiki.org/wiki/Maps#Getting_Started
How can you help?
* Adapt your labs tool to use this service - for example, use Leaflet js
library and point it to https://maps.wikimedia.org
* File bugs in Phabricator
<https://phabricator.wikimedia.org/tag/discovery-maps-sprint/>
* Provide us feedback to help guide future features
<https://www.mediawiki.org/wiki/Talk:Maps>
* Improve our map style <https://github.com/kartotherian/osm-bright.tm2>
* Improve our data extraction
<https://github.com/kartotherian/osm-bright.tm2source>
Based on usage and your feedback, the Discovery team
<https://www.mediawiki.org/wiki/Discovery> will decide how to proceed.
We could add more data sources (both vector and raster), work on additional
services such as static maps or geosearch, work on supporting all
languages, switch to client-side WebGL rendering, etc. Please help us
decide what is most important.
https://www.mediawiki.org/wiki/Maps has more about the project and related
Maps work.
== In Depth ==
Tiles are served from https://maps.wikimedia.org, but can only be accessed
from any subdomains of *.wmflabs .org and *.wikivoyage.org. Kartotherian
can produce tiles as images (png), and as raw vector data (PBF Mapbox
format or json):
.../{source}/{zoom}/{x}/{y}[(a){scale}x].{format}
Additionally, Kartotherian can produce snapshot (static) images of any
location, scaling, and zoom level with
.../{source},{zoom},{lat},{lon},{width}x{height}[(a){scale}x].{format}.
For example, to get an image centered at 42,-3.14, at zoom level 4, size
800x600, use https://maps.wikimedia.org/img/osm-intl,4,42,-3.14,800x600.png
(copy/paste the link, or else it might not work due to referrer
restriction).
Do note that the static feature is highly experimental right now.
We would like to thank WMF Ops (especially Alex Kosiaris, Brandon Black,
and Jaime Crespo), services team, OSM community and engineers, and the
Mapnik and Mapbox teams. The project would not have completed so fast
without you.
Thank You
--tomasz
Recently, the Team Practices Group agreed to a set of norms around how that
team will use IRC[1].
Would it be helpful for Discovery to agree on its own IRC norms? They could
end up being quite different from what TPG decided on. But whatever we
decided on, it seems like it would be helpful to know that we're all on the
same page. Especially as we bring on new team members.
Thoughts?
[1] https://www.mediawiki.org/wiki/Team_Practices_Group/Team_Norms/IRC_Norms
Kevin Smith
Agile Coach, Wikimedia Foundation
We've had a lot of ideas floating around over the past week or two about
what to do in the final weeks of the quarter towards tackling the zero
results rate problem. This morning the engineering team had a 25 minute
meeting to coalesce these ideas into a plan and sync up. We took notes in
this etherpad: https://etherpad.wikimedia.org/p/nextupforsearch
The short summary of the meeting was a test which tries relaxing the AND
operator for common terms in queries would be tried. This should improve
natural language queries by reducing how important words like "the", "a",
etc. are to the query, thus focussing in on the essence of the query. This
also means that pages that don't contain these common terms, but only
contain the core terms, could now be returned in results.
This work is tracked in the following series of tasks, the structure of
which should now be very familiar to you all:
- T112178 <https://phabricator.wikimedia.org/T112178>: Relax 'AND'
operator with the common term query
- T112581 <https://phabricator.wikimedia.org/T112581>: Run A/B test on
relaxing AND operator for search (test starting on 2015-09-22)
- T112582 <https://phabricator.wikimedia.org/T112582>: Validate data for
AND operator A/B test (on or after 2015-09-23)
- T112583 <https://phabricator.wikimedia.org/T112583>: Analyse results
of AND operator A/B test (on or after 2015-09-29)
What this does mean is that we've probably got a bunch of tests lined up to
start at the same time. In principle this isn't a problem, but if the tests
overlap it can cause difficulties. This will be discussed in tomorrow's
analysis meeting.
As always, if there are any questions, let me know!
Thanks,
Dan
--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation
Hi Everyone,
I've done further analysis on the ~1400 zero-results non-DOI query corpus,
looking at the effects of perfect (or at least human-level) language
detection, and the effects of running all queries against many wikis.
In summary:
> More that 85% of failed queries to enwiki are in English, or are not in a
> particular language. Only about 35% of non-English queries in some language
> (<4.5% of zero-results queries), if funneled to the right language wiki,
> get any results.
>
The types of queries most likely to get results from the non-enwikis are
> names and queries in English. There are lots of English words in
> non-English wikis (enough that they can do decent spelling correction!),
> and the idiosyncrasies of language processing on other wikis allow certain
> classes of typos in names and English words to match, or the typos happen
> to exist uncorrected in the non-enwiki.
>
Perhaps a better approach to handling non-English queries is user-specified
> alternate languages.
More details:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Cross_Language_Wiki_…
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation