Discovery

discovery@lists.wikimedia.org

1 participants
755 discussions

[Wikimedia-search] renaming of this list
by Daniel Zahn 23 Sep '15

23 Sep '15

Hello subscribers of wikimedia-search, this list has been renamed to "discovery" as requested on https://phabricator.wikimedia.org/T110256 This is to let you know and test if everything worked at the same time. I am mailing the _old_ list address on purpose to test that mail to that is also forwarded as intended. Please start using discovery@lists though. You will see that the listinfo page https://lists.wikimedia.org/mailman/listinfo/wikimedia-search is also forwarded to the new name. All config options and subscribers with their passwords have been imported from old list to new list. Archives have been regenerated from the .mbox file. As said above, the old email address of the list also still works. It has been added as an "acceptable alias" to list config. Best regards, Daniel -- Daniel Zahn <dzahn(a)wikimedia.org> Operations Engineer

1 0

[Wikimedia-search] renaming of this list
by Daniel Zahn 23 Sep '15

23 Sep '15

1 0

[Wikimedia-search] Why People Use Search Engines Instead of Wikimedia Search
by Trey Jones 22 Sep '15

22 Sep '15

Hi All, Why do people use Google instead of Wikipedia search? Two obvious answers come to mind: Google gives better results, and users are just used to using Google 'cause it's useful. So I set out to see how search on Wikipedia compares to Google for queries we can recover from referrals from Google. Disclaimers: we don't know what personalized results people got, whether they liked the result, or what they intended to search for; all we have is the wiki page they landed on. Also, results vary depending on which Google you start from—which I didn't consider until after the experiments and analysis were underway. Summary: for about 60% of queries, Wikipedia search does fine. (And about a quarter of all searches are exact matches for Wikipedia article titles.) Trouble areas identified include: typos in the first two characters, question marks, abbreviations and other ambiguous terms, quotes, questions, formulaic queries, and non-Latin diacritics. I have a list of about 20 suggestions for projects from small to enormous that we could tackle to improve results (plus another plug for a Relevance Lab!). Best factoid: someone searched for *what is hummus* and ended up on the wiki page for Hillary Clinton. Full details here: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Why_People_Use_Searc… —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation

1 0

[Wikimedia-search] Gerrit Cleanup Day on Wed 23rd: Are you ready?
by Andre Klapper 22 Sep '15

22 Sep '15

Hi Discovery team, the Gerrit Cleanup Day on Wed 23rd is approaching fast - only one week left. More info: https://phabricator.wikimedia.org/T88531 Do you feel prepared for the day and all team members know what to do? If not, what are you missing and how can we help? Some Gerrit queries for each team are listed under "Gerrit queries per team/area" in https://phabricator.wikimedia.org/T88531 Are they helpful and a good start? Or do they miss some areas (or do you have existing Gerrit team queries to use instead or to "integrate",e.g. for parts of MediaWiki core you might work on)? Also, which person will be the main team contact for the day (and available in #wikimedia-dev on IRC) and help organize review work in your areas, so other teams could easily reach out? Some team plates are emptier than others so they're wondering where and how to lend a helping hand (to find out in advance, due to timezones). Thanks for your help to make the Gerrit Cleanup day a success! andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/

2 2

[Wikimedia-search] Asynchronously calling elasticsearch
by Erik Bernhardson 21 Sep '15

21 Sep '15

The php engine used in prod by the wmf, hhvm, has built in support for shared (non-preemptive) concurrency via async/await keywords[1][2]. Over the weekend i spent some time converting the Elastica client library we use to work asynchronously, which would essentially let us continue on performing other calculations in the web request while network requests are processing. I've only ported over the client library[3], not the CirrusSearch code. Also this is not a complete port, there are a couple code paths that work but most of the test suite still fails. The most obvious place we could see a benefit from this is when multiple queries are issued to elasticsearch from a single web request. If the second query doesn't depend on the results of the first it can be issued in parallel. This is actually somewhat common use case, for example doing a full text and a title search in the same request. I'm wary of making much of a guess in terms of actual latency reduction we could expect, but maybe on the order of 50 to 100 ms in cases which we currently perform requests serially and we have enough work to process. Really its hard to say at this point. In addition to making some existing code faster, having the ability to do multiple network operations in an async manner opens up other possibilities when we are implementing things in the future. In closing, this currently isn't going anywhere it was just something interesting to toy with. I think it could be quite interesting to investigate further. [1] http://docs.hhvm.com/manual/en/hack.async.php [2] https://phabricator.wikimedia.org/T99755 [2] https://github.com/ebernhardson/Elastica/tree/async

7 9

[Wikimedia-search] Fwd: Announcing the launch of Maps
by Tomasz Finc 17 Sep '15

17 Sep '15

Cross posting to discovery ---------- Forwarded message ---------- From: Tomasz Finc <tfinc(a)wikimedia.org> Date: Thu, Sep 17, 2015 at 12:26 PM Subject: Announcing the launch of Maps To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Cc: Yuri Astrakhan <yastrakhan(a)wikimedia.org>, Max Semenik < msemenik(a)wikimedia.org> The Discovery Department has launched an experimental tile and static maps service available at https://maps.wikimedia.org. Using this service you can browse and embed map tiles into your own tools using OpenStreetMap data. Currently, we handle traffic from *.wmflabs .org and *.wikivoyage .org (referrer header must be either missing or set to these values) but we would like to open it up to Wikipedia traffic if we see enough use. Our hope is that this service fits the needs of the numerous maps developers and tool authors who have asked for a WMF hosted tile service with an initial focus on WikiVoyage. We'd love for you to try our new service, experiment writing tools using our tiles, and giving us feedback <https://www.mediawiki.org/wiki/Talk:Maps> . If you've built a tool using OpenStreetMap-based imagery then using our service is a simple drop-in replacement. Getting started is as easy as https://www.mediawiki.org/wiki/Maps#Getting_Started How can you help? * Adapt your labs tool to use this service - for example, use Leaflet js library and point it to https://maps.wikimedia.org * File bugs in Phabricator <https://phabricator.wikimedia.org/tag/discovery-maps-sprint/> * Provide us feedback to help guide future features <https://www.mediawiki.org/wiki/Talk:Maps> * Improve our map style <https://github.com/kartotherian/osm-bright.tm2> * Improve our data extraction <https://github.com/kartotherian/osm-bright.tm2source> Based on usage and your feedback, the Discovery team <https://www.mediawiki.org/wiki/Discovery> will decide how to proceed. We could add more data sources (both vector and raster), work on additional services such as static maps or geosearch, work on supporting all languages, switch to client-side WebGL rendering, etc. Please help us decide what is most important. https://www.mediawiki.org/wiki/Maps has more about the project and related Maps work. == In Depth == Tiles are served from https://maps.wikimedia.org, but can only be accessed from any subdomains of *.wmflabs .org and *.wikivoyage.org. Kartotherian can produce tiles as images (png), and as raw vector data (PBF Mapbox format or json): .../{source}/{zoom}/{x}/{y}[(a){scale}x].{format} Additionally, Kartotherian can produce snapshot (static) images of any location, scaling, and zoom level with .../{source},{zoom},{lat},{lon},{width}x{height}[(a){scale}x].{format}. For example, to get an image centered at 42,-3.14, at zoom level 4, size 800x600, use https://maps.wikimedia.org/img/osm-intl,4,42,-3.14,800x600.png (copy/paste the link, or else it might not work due to referrer restriction). Do note that the static feature is highly experimental right now. We would like to thank WMF Ops (especially Alex Kosiaris, Brandon Black, and Jaime Crespo), services team, OSM community and engineers, and the Mapnik and Mapbox teams. The project would not have completed so fast without you. Thank You --tomasz

1 0

[Wikimedia-search] IRC norms
by Kevin Smith 15 Sep '15

15 Sep '15

Recently, the Team Practices Group agreed to a set of norms around how that team will use IRC[1]. Would it be helpful for Discovery to agree on its own IRC norms? They could end up being quite different from what TPG decided on. But whatever we decided on, it seems like it would be helpful to know that we're all on the same page. Especially as we bring on new team members. Thoughts? [1] https://www.mediawiki.org/wiki/Team_Practices_Group/Team_Norms/IRC_Norms Kevin Smith Agile Coach, Wikimedia Foundation

5 4

[Wikimedia-search] Update on what's next for tackling the zero results rate goal
by Dan Garry 15 Sep '15

15 Sep '15

We've had a lot of ideas floating around over the past week or two about what to do in the final weeks of the quarter towards tackling the zero results rate problem. This morning the engineering team had a 25 minute meeting to coalesce these ideas into a plan and sync up. We took notes in this etherpad: https://etherpad.wikimedia.org/p/nextupforsearch The short summary of the meeting was a test which tries relaxing the AND operator for common terms in queries would be tried. This should improve natural language queries by reducing how important words like "the", "a", etc. are to the query, thus focussing in on the essence of the query. This also means that pages that don't contain these common terms, but only contain the core terms, could now be returned in results. This work is tracked in the following series of tasks, the structure of which should now be very familiar to you all: - T112178 <https://phabricator.wikimedia.org/T112178>: Relax 'AND' operator with the common term query - T112581 <https://phabricator.wikimedia.org/T112581>: Run A/B test on relaxing AND operator for search (test starting on 2015-09-22) - T112582 <https://phabricator.wikimedia.org/T112582>: Validate data for AND operator A/B test (on or after 2015-09-23) - T112583 <https://phabricator.wikimedia.org/T112583>: Analyse results of AND operator A/B test (on or after 2015-09-29) What this does mean is that we've probably got a bunch of tests lined up to start at the same time. In principle this isn't a problem, but if the tests overlap it can cause difficulties. This will be discussed in tomorrow's analysis meeting. As always, if there are any questions, let me know! Thanks, Dan -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

1 0

[Wikimedia-search] Maps and KPI's
by Kevin Smith 15 Sep '15

15 Sep '15

Notes from this afternoon's Maps and KPI's meeting have been posted: https://www.mediawiki.org/wiki/Discovery/Maps_and_KPIs_2015-09-14 Those who attended can feel free to correct anything I got wrong. Kevin Smith Agile Coach, Wikimedia Foundation

1 0

[Wikimedia-search] Some Results of Cross-Languae Wiki Searching
by Trey Jones 11 Sep '15

11 Sep '15

Hi Everyone, I've done further analysis on the ~1400 zero-results non-DOI query corpus, looking at the effects of perfect (or at least human-level) language detection, and the effects of running all queries against many wikis. In summary: > More that 85% of failed queries to enwiki are in English, or are not in a > particular language. Only about 35% of non-English queries in some language > (<4.5% of zero-results queries), if funneled to the right language wiki, > get any results. > The types of queries most likely to get results from the non-enwikis are > names and queries in English. There are lots of English words in > non-English wikis (enough that they can do decent spelling correction!), > and the idiosyncrasies of language processing on other wikis allow certain > classes of typos in names and English words to match, or the typos happen > to exist uncorrected in the non-enwiki. > Perhaps a better approach to handling non-English queries is user-specified > alternate languages. More details: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Cross_Language_Wiki_… —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation

1 0

← Newer
1
...
64
65
66
67
68
69
70
...
76
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Discovery