Hi Discovery folks,
I'd love to make it easier for readers to discover related materials across
projects and formats (Wikipedia, Wiktionary, Wikivoyage, Commons,
Wikisource, maps, weather, etc). Any ideas about how to make this happen?
Team and those that follow Discovery,
We're exploring a code freeze for December in order to not conflict
with any fundraising changes, respect everybody's holiday time, and
provide some significant heads down time.
The reading and editing departments are code freezing for the whole
month and we could do the same. We could also free for just the last
Emergency changes could go through as long as Katie and Greg are ok with them.
Eager to see what both the the team and public who work with us think
Kasis, this sounds very interesting! Cross-posting to the Discovery mailing
On Nov 3, 2015 8:39 AM, "Kasia Odrozek" <kasia.odrozek(a)wikimedia.de> wrote:
> Hello all,
> I am happy to announce that the Software-Development Team at WMDE (TCB)
> developed the so called “Deepcat” gadget  that is now ready to use not
> only in German wikis but also internationally.
> The possibility for intersection and subcategory search was one of the
> top-wishes from the TOP20 of the Germany Community Technical Wishlist .
> DeepCat acts as an interface between a graph database and MediaWiki's
> search engine. The Wiki's category structure is stored via their page-ids
> in the graph database while the Gadget does the translation of the search
> string, retrieves the information from the database and sends it to the
> search engine. Tool developers interested in exploring possibilities for
> using the CatGraph database can find more information on the respective
> infopage  or can approach us directly.
> The Deepcat gadget allows to go deeper in the category search and generates
> results not only for a certain category but also for its subcategories.
> Furthermore, it supports intersection search (among others: searching for
> articles or pictures that are in two different categories e.g. “Art” and
> “Technology”). The gadget works on Wikipedia, on Wikimedia Commons, as well
> as in many other wikis . For performance and technical reasons there is
> a search limitation of 15 categories in depth and 70 categories in total
> that the gadget can search through (you will see a hint about that while
> using the gadget). The gadget doesn’t load on mobile devices however once
> you switch to desktop-view, it should work as usual.
> The gadget can be used via typing the keyword “deepcat:” into the regular
> search field. An instruction how to install Deepcat and a detailed
> description of its functionality can be found on its infopage . Bugs can
> be reported on Phabricator .
> We hope it will serve you well!
>  https://github.com/wmde/DeepCat-Gadget
>  https://wikitech.wikimedia.org/wiki/Nova_Resource:Catgraph
>  https://tools.wmflabs.org/cgstat
>  https://wikitech.wikimedia.org/wiki/Nova_Resource:Catgraph/Deepcat
> Kasia Odrozek
> Product Manager
> Software Development and Engineering
> Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> Tel. +49 (030) 219 158 260
> Mobil: +49 151 46752534
> http://wikimedia.de <http://www.wikimedia.de/>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
> Wikitech-l mailing list
In today's Language Search sprint planning meeting, we came up with the
following short-term plan (spanning roughly the next few days to a week):
The developers are making good progress implementing Treys "Minimum Viable
Product" Relevance Lab, and if things go well, it should be functional
soon. It will allow us to feed sets of queries in, have them run through
two different search rules, and compare the results.
For now, those result comparisons will be able to objectively note Zero
Results Rates, and the rest of the comparison will be subjective human
"Were results A better than results B?" Such are the limitations of an MVP.
We have brainstormed a list of over a dozen possible changes that are
candidates for our next A/B tests to try to improve language searching.
Erik is going to coordinate rating each of them for suitability for
validation via the MVP relevance lab. Basically, for each one, whether it
could be tested in the relevance lab, and if so, how much or little effort
would be required to get to that point.
Once we know which ideas can be easily tested by the relevance lab, we
should be able to run the best candidates through the lab, and use those
results to decide which one(s) to take forward and implement as live A/B
tests in production.
That's the strawdog plan, anyway.
Agile Coach, Wikimedia Foundation
I've been too busy to move this forward in the last few weeks but finally
found some time to deploy what we had been working on. This pipeline is
now up, running and queryable from hive. Its sampling 1:1000 right now as i
didn't want a flood of errors if it went wrong, but based on the success so
far will be dropping the sampling so it captures everything our old logs
did. For the time being we will continue logging CirrusSearchRequests and
CirrusSearchUserTesting to fluorine (and rsync to stat1002 for processing)
but that can be turned off once we move any existing data processing over
There are still a few minor things to figure out, my first edition of the
table in hive doesn't handle the external partitioning right but will fix
that soon enough.