Discovery November 2015

discovery@lists.wikimedia.org

26 participants
26 discussions

Cross-project visibility of content
by Pine W 04 Nov '15

04 Nov '15

Hi Discovery folks, I'd love to make it easier for readers to discover related materials across projects and formats (Wikipedia, Wiktionary, Wikivoyage, Commons, Wikisource, maps, weather, etc). Any ideas about how to make this happen? Thanks, Pine

3 2

Discovery Deployment freeze in December
by Tomasz Finc 03 Nov '15

03 Nov '15

Team and those that follow Discovery, We're exploring a code freeze for December in order to not conflict with any fundraising changes, respect everybody's holiday time, and provide some significant heads down time. The reading and editing departments are code freezing for the whole month and we could do the same. We could also free for just the last two weeks. Emergency changes could go through as long as Katie and Greg are ok with them. Eager to see what both the the team and public who work with us think --tomasz

5 4

Re: [discovery] [Wikitech-l] Deepcat Gadget: intersection and subcategory search on Wikipedia and Commons
by Pine W 03 Nov '15

03 Nov '15

Kasis, this sounds very interesting! Cross-posting to the Discovery mailing list. Pine On Nov 3, 2015 8:39 AM, "Kasia Odrozek" <kasia.odrozek(a)wikimedia.de> wrote: > Hello all, > > I am happy to announce that the Software-Development Team at WMDE (TCB) > developed the so called “Deepcat” gadget [1] that is now ready to use not > only in German wikis but also internationally. > > The possibility for intersection and subcategory search was one of the > top-wishes from the TOP20 of the Germany Community Technical Wishlist [2]. > DeepCat acts as an interface between a graph database and MediaWiki's > search engine. The Wiki's category structure is stored via their page-ids > in the graph database while the Gadget does the translation of the search > string, retrieves the information from the database and sends it to the > search engine. Tool developers interested in exploring possibilities for > using the CatGraph database can find more information on the respective > infopage [3] or can approach us directly. > > The Deepcat gadget allows to go deeper in the category search and generates > results not only for a certain category but also for its subcategories. > Furthermore, it supports intersection search (among others: searching for > articles or pictures that are in two different categories e.g. “Art” and > “Technology”). The gadget works on Wikipedia, on Wikimedia Commons, as well > as in many other wikis [4]. For performance and technical reasons there is > a search limitation of 15 categories in depth and 70 categories in total > that the gadget can search through (you will see a hint about that while > using the gadget). The gadget doesn’t load on mobile devices however once > you switch to desktop-view, it should work as usual. > > The gadget can be used via typing the keyword “deepcat:” into the regular > search field. An instruction how to install Deepcat and a detailed > description of its functionality can be found on its infopage [5]. Bugs can > be reported on Phabricator [6]. > > We hope it will serve you well! > > Cheers, > Kasia > > [1] https://github.com/wmde/DeepCat-Gadget > [2] > > https://de.wikipedia.org/wiki/Wikipedia:Umfragen/Technische_W%C3%BCnsche/To… > [3] https://wikitech.wikimedia.org/wiki/Nova_Resource:Catgraph > [4] https://tools.wmflabs.org/cgstat > [5] https://wikitech.wikimedia.org/wiki/Nova_Resource:Catgraph/Deepcat > [6] > > https://phabricator.wikimedia.org/maniphest/task/create/?projects=DeepCat-G… > > > -- > Kasia Odrozek > Product Manager > Software Development and Engineering > > Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin > Tel. +49 (030) 219 158 260 > Mobil: +49 151 46752534 > > http://wikimedia.de <http://www.wikimedia.de/> > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für > Körperschaften I Berlin, Steuernummer 27/681/51985. > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l

1 0

Using the MVP Relevance Lab
by Kevin Smith 03 Nov '15

03 Nov '15

In today's Language Search sprint planning meeting, we came up with the following short-term plan (spanning roughly the next few days to a week): The developers are making good progress implementing Treys "Minimum Viable Product" Relevance Lab[1], and if things go well, it should be functional soon. It will allow us to feed sets of queries in, have them run through two different search rules, and compare the results. For now, those result comparisons will be able to objectively note Zero Results Rates, and the rest of the comparison will be subjective human "Were results A better than results B?" Such are the limitations of an MVP. We have brainstormed a list of over a dozen possible changes that are candidates for our next A/B tests to try to improve language searching[2]. Erik is going to coordinate rating each of them for suitability for validation via the MVP relevance lab. Basically, for each one, whether it could be tested in the relevance lab, and if so, how much or little effort would be required to get to that point. Once we know which ideas can be easily tested by the relevance lab, we should be able to run the best candidates through the lab, and use those results to decide which one(s) to take forward and implement as live A/B tests in production. That's the strawdog plan, anyway. [1] https://phabricator.wikimedia.org/T116872 [2] https://etherpad.wikimedia.org/p/LanguageSupportBrainstorming Kevin Smith Agile Coach, Wikimedia Foundation

1 0

mediawiki -> kafka -> hadoop -> hive data pipeline up and running
by Erik Bernhardson 03 Nov '15

03 Nov '15

I've been too busy to move this forward in the last few weeks but finally found some time to deploy what we had been working on. This pipeline is now up, running and queryable from hive. Its sampling 1:1000 right now as i didn't want a flood of errors if it went wrong, but based on the success so far will be dropping the sampling so it captures everything our old logs did. For the time being we will continue logging CirrusSearchRequests and CirrusSearchUserTesting to fluorine (and rsync to stat1002 for processing) but that can be turned off once we move any existing data processing over to hive. Very exciting! There are still a few minor things to figure out, my first edition of the table in hive doesn't handle the external partitioning right but will fix that soon enough.

1 0

BART jam, gonna miss the meeting EOM
by Max Semenik 02 Nov '15

02 Nov '15

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Discovery November 2015