Re: [discovery] Hackathon ideas

2 May 2018

Yes! The "Tell me why your search sucks" sign was a success last year, and
I'm looking forward to seeing/hearing all the cool questions folks will ask
this time! :)

I also just got a chance to look at Erik's slides (final version
<https://upload.wikimedia.org/wikipedia/commons/4/4c/From_Clicks_to_Models_The_Wikimedia_LTR_Pipeline.pdf>)
that he presented at Haystack and I think it might be cool to reprise that
presentation in a breakout session...if Erik is up for it. :)

Cheers,

Deb

--

deb tankersley

Program Manager, Engineering

Wikimedia Foundation

On Wed, May 2, 2018 at 1:06 PM, Trey Jones &lt;tjones(a)wikimedia.org&gt; wrote:

...
  Deb—We talked about some of these in our Wednesday
meeting, but didn't do
 much deciding or prioritizing. After that, at the hackathon travel meeting,
 Rachel reminded us that the hackathon is "a community-focused event" and
 that we as WMF staff should be "supporting, connecting, and helping
 volunteer and affiliate developers." So, I think I'm going to update my
 hackathon participation info to include a link to the list of projects I
 want to work on, and hope that someone from outside the WMF contacts me
 about something. On the learning side, I've already gotten David to agree
 to help me with some of the technical bits I need for my some of my
 proposed projects, either before or at the hackathon (yay!). I also hope
 that the "Tell me why your search sucks" sign will encourage people to stop
 and chat with us. I figure random people chatting with us about search and
 anyone who wants to work with us would take precedence over any other
 projects we might prefer to work on at the hackathon, though I plan to fall
 back to my list if I run out of other things to do or people to talk to.

 Justin—We can definitely talk about ways to keep improving the ML ranking
 (or other ML approaches for search). I don't know if there's time during
 the hackathon to pull something together—I guess it depends on how complex
 it is. More broadly—and Erik can speak more definitively about this—I'd say
 while there's always some ML-related stuff going on in the background, our Q4
 goals

<https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2017-18_Q4#Program_1:_Make_knowledge_more_easily_discoverable>
are
 less about Learn-to-Rank/ML, so there may not be much bandwidth for any
 complex projects in the short term. That said, I'm gathering ideas for NLP
 applications for search—which often overlaps with ML applications—so if you
 have any ideas (or if anyone else does!), please share them, whether here
 or off-list.

 —Trey

 Trey Jones
 Sr. Software Engineer, Search Platform
 Wikimedia Foundation

 On Wed, May 2, 2018 at 1:09 PM, Justin Ormont &lt;justin.ormont(a)gmail.com&gt;
 wrote:

  Greetings Deb/Trey/Erik,

 I'd enjoy joining the discussions on these hackathon topics also.

 Specifically, I'd like to see I can help improve MWF's search relevance
 using additional machine learning techniques/ML-packages.

 Thanks,
 --justin

 On Wed, May 2, 2018 at 8:53 AM, Deborah Tankersley <
 dtankersley(a)wikimedia.org&gt; wrote:

  Nice stuff!

 Should we set up a meeting to talk more in depth about this, as we're
 about 2 weeks out from the Hackathon right now?

 Cheers,

 Deb

 --

 deb tankersley

 Program Manager, Engineering

 Wikimedia Foundation

 On Wed, May 2, 2018 at 8:39 AM, Trey Jones &lt;tjones(a)wikimedia.org&gt; wrote:

  I've got my own list of more language-focused
not-necessarily-great
 ideas, in order of my current desire to work on them:

    - Mirandese (mwl) analysis plugin built from Portuguese and French
    parts, plus a stop list provided by an mwl editor
    - plugin to merge high surrogates and low surrogates that get split
    up by the Chinese analyzer
    - plugin to do automatic homoglyph corrections
    - plugin to do transliteration for languages where it is relatively
    easy (Serbian was on the list, but it’s already done!—and for very simple
    mappings this is just a char map)
    - look into ways of automatically generating a stemmer from
    Wiktionary conjugation/declension data (maybe start with Estonian?)
    - compare the analyzers for the top 5-10 wiki languages by volume,
    and look for ways to increase consistency among them
    - develop a different statistical approach to detect wrong keyboard
    typing and build a search-only filter to generate alternative tokens—for
    Russian/English, Hebrew/English, OR one hand on wrong home row
    - update RelForge with some additional metrics I’ve been collecting
    - project Wordnet or other thesaurus/ontology onto short strings
    (e.g., Commons descriptions, Wikipedia titles, etc.) to determine useful
    thesaurus terms and prune the rest
    - recheck differences in unpacked vs monolithic analyzers
    (eliminating our automatic upgrades, which 98% likely to have caused the
    diffs)
    - “Bollywood detector”—identify and map Bollywood movie names into
    multiple scripts

 I was planning to work on the Mirandese analysis plugin and maybe one
 of the next three on the list. But if anyone wants to collaborate on any of
 the others, I'm happy to do so.

 Trey Jones
 Sr. Software Engineer, Search Platform
 Wikimedia Foundation

 On Tue, May 1, 2018 at 6:14 PM, Erik Bernhardson <
 ebernhardson(a)wikimedia.org&gt; wrote:

> With the hackathon coming up I thought we could ponder what could be
> done while there. I've been constructing a list of horrible ideas over the
> last couple weeks:
>
>
 _______________________________________________
 Discovery mailing list
 Discovery(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/discovery

 _______________________________________________
 Discovery mailing list
 Discovery(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/discovery

 _______________________________________________
 Discovery mailing list
 Discovery(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/discovery

 _______________________________________________
 Discovery mailing list
 Discovery(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/discovery

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [discovery] Hackathon ideas