Discovery November 2015

discovery@lists.wikimedia.org

26 participants
26 discussions

Re: [discovery] Example study that uses wdqs
by Shubham Singh Tomar 03 Dec '15

03 Dec '15

Hi David, The study sounds interesting. I would like to contribute. Let me know if I could be of any help. PFA my résumé. LinkedIn: linkedin.com/in/shubhamtomar Blog: http://autodidact24.github.io Quora: http://www.quora.com/Shubham-Singh-Tomar GitHub: https://github.com/Autodidact24/ Twitter: https://twitter.com/shubhamtomar24 Resume: https://goo.gl/mfffdt On Mon, Nov 23, 2015 at 9:36 PM, Shubham Singh Tomar < tomarshubham24(a)gmail.com> wrote: > Hi David, > > The study sounds interesting. I would like to contribute. Let me know if I > could be of any help. > PFA my résumé. > > On Mon, Nov 23, 2015 at 9:06 PM, David Causse <dcausse(a)wikimedia.org> > wrote: > >> Hi, >> >> this is a study (in french) I found in the list of papers that should be >> reviewed for the next research newsletter: >> http://scoms.hypotheses.org/498 >> >> The purpose of the study is to model the social network of movie actors >> of the 1920s and 1930s with Wikidata. >> >> In few words it uses wdqs to export the dataset, applies some conversion >> with R and imports the graph into Gephi. >> >> _______________________________________________ >> discovery mailing list >> discovery(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/discovery >> > > > > -- > *Thanks,* > *Shubham Singh Tomar* > *Autodidact24.github.io <http://Autodidact24.github.io>* > -- *Thanks,* *Shubham Singh Tomar* *Autodidact24.github.io <http://Autodidact24.github.io>*

2 1

Upcoming work for Discovery for the rest of Q2: completion suggester beta features, page views influencing result ranking
by Dan Garry 02 Dec '15

02 Dec '15

As we reach the last month of the quarter, it's a good opportunity for us to reflect on where we want to go for the last part of our remaining time. On the one hand, we're in quite a good place. We're just wrapping up our work on our Q2 goal for search <https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q2_Goals#Search>, which is excellent! On the other hand, the test showed minimal impact, so our users still aren't seeing the impact of our work. Since we can continue running A/B tests for improving language support relatively cheaply in terms of required engineering time, let's take a look back at what we've done previously and see if we can choose something high impact to work on! The completion suggester is a very promising avenue for us to invest in. As noted in our analysis of the initial test <https://phabricator.wikimedia.org/T111858>, using the completion suggester instead of prefixsearch significantly reduced the zero results rate. We've not had an impact on this through other efforts, so this is interesting! In order to more thoroughly test the suggester, we can make it a Beta Feature <https://phabricator.wikimedia.org/T119535>. This will allow editors to opt-in to testing it, and will gather us valuable qualitative feedback about what use cases the completion suggester could support better. The caveat, of course, is that the feedback will be from a specific segment of our user base (users who test beta features) which is more specialised than the intended audience (everyone). That said, the feedback will still be very helpful. There's quite a bit of work to do here; our initial test of the suggester was very hacky, but now that it's proven itself, we can be more rigorous. The other avenue is using page views to influence result ranking. This is in an earlier stage thant he completion suggester, in that it's a relatively unproven approach for us, but it's something that's logical and that we've been interested in for a while. But, we've repeatedly had to deprioritise it for other work. If something is popular, it makes sense to rank it up in search results. Obviously, we do not want to be *too* aggressive with this in case we create feedback loops, but I think the potential benefits are quite clear if done correctly. I explained a lot of this more briefly in our last standup, but hopefully this should give you all some guidance on where we're going. Thanks, and as always, if there are any questions then please let me know. Dan -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

2 2

An idea to improve the clickthrough rate
by Oliver Keyes 30 Nov '15

30 Nov '15

So I was looking up information on peripheral neuritis[0] and I accidentally mistyped it as "peripheral neuriti". The good news: the autocorrector worked out I'd done it wrong, corrected it, and sent me automatically to the right results. Yay![1] But looking at the results I see a really obvious improvement we could make that would definitely improve the user experience in this scenario. See, if you look at the first article on the list you'll see it's "Peripheral neuropathy". Why? Because peripheral neuritis redirects to that. But the article header appears in the search results as "Peripheral neuropathy", since that's the real title. But it's not what I searched for. What I searched for was neuritis. Is neuritis the same as neuropathy? I dunno, I'm a random reader. Is this a good search result to click on? No idea. What I'd love for us to do is run an A/B test with two conditions: 1. Users who search for a term which redirects to an article get the current experience (control) 2. Users who search for a term which redirects to an article get the article title in the search results claiming to be the redirect title (test) I bet this would really improve the clickthrough rate for this class of searches. It would definitely improve the UX. [0] I'm researching thalidomide. Long story. [1] https://en.wikipedia.org/w/index.php?search=peripheral+neuriti&title=Specia… -- Oliver Keyes Count Logula Wikimedia Foundation

4 6

Announcement: New dashboard covering the Wikipedia portal
by Oliver Keyes 30 Nov '15

30 Nov '15

Hey all, The Discovery Analysis team is pleased to report we have released a new dashboard, providing basic data about usage of the Wikipedia portal (https://www.wikipedia.org). It can be found at http://discovery-beta.wmflabs.org/portal/ Thanks, Oliver, on behalf of the Discovery Analysis team -- Oliver Keyes Count Logula Wikimedia Foundation

1 1

Maps conference in Moscow
by Yuri Astrakhan 25 Nov '15

25 Nov '15

Last weekend I attended an amazing Open GIS <http://gisconf.ru/> conference in Moscow. Many good topics, great energy, lots of people wanting to help us build the best maps on the planet. I gave two presentations, one about the overall state of our maps initiative, and one on the tech we have built. As part of the discussion, the GeoHack for the Russian Wikipedia was updated to use our maps, so we had a six fold increase <http://searchdata.wmflabs.org/maps/> in the number of maps users! Will see how it may change during the week. As part of the KPIs, we should add per-country graphs <https://phabricator.wikimedia.org/T119448> (top 10 only). Other results might take time - people learnt of our technology, and I learnt some of the projects we may benefit from, for example I learnt of the simplestyle <https://github.com/mapbox/simplestyle-spec/tree/master/1.1.0> and GeoJSON+CSS <http://wiki.openstreetmap.org/wiki/Geojson_CSS> (will allow our editors to style custom objects they overlay on top of the map). In short, it was a fun weekend )

3 3

Search/Cirrus code freeze week starting 30th December
by Dan Garry 24 Nov '15

24 Nov '15

After talking with Fundraising, we have agreed a code freeze for the week commencing Monday 30th December to minimise disruption of the fundraiser. As a reminder, there will be no train deployment that week, so basically what this code freeze amounts to is "do not manually deploy things that week". The train deployment will resume the following week as normal. This code freeze replaces our previously documented two day code freeze on Tuesday 1st and Wednesday 2nd December. This should not affect our previous agreement with Fundraising/RelEng that it's okay to continue doing deploys on the portal in that time. Thanks, Dan -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

1 1

Example study that uses wdqs
by David Causse 23 Nov '15

23 Nov '15

Hi, this is a study (in french) I found in the list of papers that should be reviewed for the next research newsletter: http://scoms.hypotheses.org/498 The purpose of the study is to model the social network of movie actors of the 1920s and 1930s with Wikidata. In few words it uses wdqs to export the dataset, applies some conversion with R and imports the graph into Gephi.

1 0

Continued dip in user satisfaction metric
by Tomasz Finc 23 Nov '15

23 Nov '15

Oliver & Mikhail, Could you guys review why user satisfaction KPI continues to be affected even after recent changes in https://phabricator.wikimedia.org/T117617 CC'ing discovery@ so that others are aware of the issue thanks --tomasz

3 3

Inlogg till livespårning med GPS fleet management
by Mattias Karlsson 18 Nov '15

18 Nov '15

Nyhetsbrev med senaste nytt. Problem att visa det? Se det i webbläsaren. http://mx.nordtrack.de/mailwizz/index.php/campaigns/bs398aeoz5b7d/track-url… ÖVERSIKT ÖVER ALLA ERA FORDON & MASKINER ------------------------- SÅ HÄR FUNGERAR DET NORDTRACK.SE ÄR DEN ENDA AKTÖREN PÅ MARKNADEN SOM INTE TAR UT MÅNADS ELLER ÅRSAVGIFTER FÖR LIVESPÅRNING http://mx.nordtrack.de/mailwizz/index.php/campaigns/bs398aeoz5b7d/track-url… ------------------------- NORDTRACK® LARM- OCH SPÅRNINGSSYSTEM NORDTRACK® ÄR KRAFTFULLT WEBBASERAT LARM- OCH SPÅRNINGSSYSTEM SOM ERBJUDER BL.A FÖLJANDE FUNKTIONER: * Realtidsspårning (tidsintrevallspårning om minst 9sek) * Utskrift av elektronisk körjournal * Övervakning i realtid av din gps tracker * Fleet management * Gratis app för nedladdning till Android och Apple iOS * Ta fram larmrapporter * Spela upp resväg * Historik, sökning och analyser * Flermappsstöd (GoogleMaps, mm.) Demoinlogg på www.nordtrack.se http://mx.nordtrack.de/mailwizz/index.php/campaigns/bs398aeoz5b7d/track-url… Välj server 2 . User: Nordtrack Password: 123456 Mobilapp till Andriod och iPhone Playstore http://mx.nordtrack.de/mailwizz/index.php/campaigns/bs398aeoz5b7d/track-url… Appstore http://mx.nordtrack.de/mailwizz/index.php/campaigns/bs398aeoz5b7d/track-url… server www.lkgps.net Kontakta oss redan idag http://mx.nordtrack.de/mailwizz/index.php/campaigns/bs398aeoz5b7d/track-url… NORDTRACK SVERIGE FILIAL Address: Kungsbergsgatan 2 A 583 22 Linköping Telefon: 013 - 9913935 E-POST: GPS(a)NORDTRACK.SE www.nordtrack.se http://mx.nordtrack.de/mailwizz/index.php/campaigns/bs398aeoz5b7d/track-url… ------------------------- http://mx.nordtrack.de/mailwizz/index.php/campaigns/bs398aeoz5b7d/track-url… http://mx.nordtrack.de/mailwizz/index.php/lists/vc1127mn1g8f8/unsubscribe/b…, NordTrack Kungsbergsgatan 2 Linköping 58224 Sweden . http://mx.nordtrack.de/mailwizz/index.php/campaigns/bs398aeoz5b7d/track-url…

1 0

Re: [discovery] [Analytics] An example of pageviews usage
by David Causse 17 Nov '15

17 Nov '15

This is fascinating, we started to experiment with the completion suggester few months ago. The first goal was to increase recall and use the ability to activate fuzzy lookups to handle small typos. It became clear that scoring was a critical part of this feature. Some prefixes are already very ambiguous (mar, cha, list ...) and enabling fuzziness does not help here. We tried to implement a score based on the data currently available (size, incoming_links, templates...) but this score is kind of "bigger is better". This is why we were interested in pageviews to add "popularity" in the score. Thanks for sharing this tool it is very helpful to have a quick look at how it would look like. I still don't know if pageviews can be the only score component or if we should compose with other factors like "quality", "authority". My concerns with pageviews are : - we certainly have outliers (caused by 3rd party tools/bots ...) - what's the coverage of pageviews: i.e. in one month how many pages get 0 pageviews? Quality: we have a set of templates that are already used to flag good/featured articles. Cirrus uses it on enwiki only, I'd really like to extend this to other wikis. I'm also very interested in the tool behind http://ores.wmflabs.org/scores/enwiki/wp10/?revids=686575075 . Authority/Centrality: Erik ran an experiment with a pagerank like algorithm and it shows very interesting results. I'm wondering if this approach can work, I tend to think that by using only one factor (pageviews) we can have both very long tails with 1 or 0 pageview and big outliers caused by new bots/tools we failed to detect. Using other factors not related to pageviews might help to mitigate these problems. So the question about normalization is also interesting to compute a composite score between 3 different components. For the question about weighting over time, I think you detailed the problem very well. It really depends on what we want to do here, near-real-time (12h or 24h) can lead to weird behaviors and will only work for very popular wikis. Concerning your experiment, do you plan to activate fuzzy search? On our side it was a bit difficult, completion suggester is still incomplete. Fuzzy results are not discounted so we had to workaround this problem with client-side rescoring. Thank you! Le 14/11/2015 00:10, Greg Lindahl a écrit : > On Fri, Nov 13, 2015 at 01:45:57PM -0800, Erik Bernhardson wrote: > >> Have you put any thought into normalizing page view data? > I haven't studied it, but I think you've got a good start: normalizing > them by the # of pageviews of the community. So if someone types an > entire French phrase into the English wikipedia, and you wanted to > show both En and Fr options in the autocomplete, a simple > normalization would be a good start for having something to sort > by. Ditto for search. > > Your next question, about weighting over time, is really a question > about how much data you have. It's nice to be able to push up current > events, so that someone searching for Paris today could see (alas) the > brand new article about today's attacks. But it's the amount of > pageview data that really dictates how well you can do that. For the > English wikipedia, there are so many pageviews that you probably have > enough data over 24 hours to produce good, not-noisy counts. And for > less than 24 hours, you'll probably end up magnifying Europe's > favorites as America wakes up, and America's favorites as Asia wakes > up. Probably not a good thing! > > For a less-used wiki, only 24 hours might produce pretty sparse and > noisy counts. So you will need to look back farther, which reduces > your ability to react to current events. > > You'd like to experiment with exponential decay, you can look at the > statistics to try to figure out if you're just magnifying noise. Or > Europe's preferences become popular when Americans wake up. > > (And if you're really interested in geography, you could divide the > data up so that Europe, America, ANZ, Asia, etc have separate > autocompletes... if you have enough pageview data.) > > -- greg > > > _______________________________________________ > Analytics mailing list > Analytics(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics

2 1

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Discovery November 2015