Discovery

discovery@lists.wikimedia.org

756 discussions

WDQS instabilities - Incident report
by Guillaume Lederrey 10 May '16

10 May '16

Hello! I took some notes of the recent Wikidata Query Service instabilities and created an incident report [1]. Some people might have additional insight that I don't have. If that's the case, let me know and I'll update that incident report. Thanks all for your help and your understanding! Guillaume [1] https://wikitech.wikimedia.org/wiki/Incident_documentation/20160503-Wikidat… -- Guillaume Lederrey Operations Engineer, Discovery Wikimedia Foundation

1 0

Help improve search relevance with Discernatron 🤖
by Chris Koerner 09 May '16

09 May '16

The Discovery department's [1] work to improve search continues with a new tool! We are asking volunteers to help choose, or discern, which search results are the most relevant. One way of improving search results relevance is to provide search results from multiple search engines side-by-side for comparison. Participants pick the best, most relevant results, which are then used to tune our own search results. It's one way to help improve search with human assistance, by showing articles that are most relevant to search queries. This system is used by many R&D departments and gives great results. Discernatron is a tool developed by the Discovery department for just this sort of work. Visitors are asked to pick the most relevant results across four different search result sets. The data is then used to help improve our relevancy model for search. Screenshot at [2] If you are interested in helping, you can access Discernatron at https://discernatron.wmflabs.org/ and authenticate with your unified user account. To learn more about the tool visit Mediawiki.org. [3] [1] https://www.mediawiki.org/wiki/Discovery [2] https://meta.wikimedia.org/wiki/File:Discernatron_screenshot.png [3] https://www.mediawiki.org/wiki/Discernatron -- Yours, Chris Koerner Community Liaison - Discovery Wikimedia Foundation

1 0

Geospatial search for Wikidata Query Service is up
by Stas Malyshev 09 May '16

09 May '16

Hi! After a number of difficulties and unexpected setbacks[1] I am happy to announce that geospatial search for Wikidata Query Service is now deployed and functional. You can now search for items within certain radius of a point and within a box defined by two points - more detailed instructions are in the User Manual[2]. See also query examples[3] such as "airports within 100km of Berlin": http://tinyurl.com/zxy8o64 There are still a couple of things to complete, namely sorting by distance (coming soon) and units support (maybe). Overall progress of the task is tracked by T133566[4]. [1] https://lists.wikimedia.org/pipermail/wikidata/2016-May/008674.html [2] https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Geospatia… [3] https://www.mediawiki.org/wiki/Wikibase/Indexing/SPARQL_Query_Examples#Airp… [4] https://phabricator.wikimedia.org/T123565 -- Stas Malyshev smalyshev(a)wikimedia.org

1 0

Weekly WDQS deployment window
by Guillaume Lederrey 09 May '16

09 May '16

Hello! We now have a weekly deployment window for WDQS each week on Monday [1]. This is great. I'd like to see how we can improve this a bit more. I have been struggling to know if there is something planned for those deployment windows, and what that content is. I can obviously blindly deploy the latest commit each week and trust that everything is just fine. That might actually be a good idea, but if that's the case, I should work on completely automating it and remove myself from the picture. Other deployment window require the change initiator (the developer in most cases) to add a description of the changes to be deployed to the deployment page [1]. I was actually expecting that and did not clarify this point with Stas (my bad, I should not assume things). @Stas (or others): would you be OK following this same process? Or should we push automation a bit more and take myself out of the picture? Thanks for your idea! MrG [1] https://wikitech.wikimedia.org/wiki/Deployments -- Guillaume Lederrey Operations Engineer, Discovery Wikimedia Foundation

1 0

Re: [discovery] [Analytics] Testing wikipedia
by Tilman Bayer 09 May '16

09 May '16

CCing the Search and Discovery list. On Sun, May 8, 2016 at 12:24 PM, Stan Zonov <stanzon(a)gmail.com> wrote: > Hi! > > I have been trying to gage the speed/efficiency of a database I have setup. > In order to test it, I have filled it with a lot of wikipedia articles from > a specific category (for example history). The database does multi-word > queries and returns the articles that best match the multiword query. For > example if I search up "history in Italy in the past 100 years" then the > best matching articles should pop up. > > I was wondering if anyone has any advice how to form sample test queries to > model realistic situations/queries. I don't think it would be fair to do > random phrases (such as "banana the string") and wanted to model queries > based on my data to test performance and correctness of output. Does anyone > have any advice? How or Is this done at wikipedia? > > > I have looked here > (http://blog.wikimedia.org/2012/09/19/what-are-readers-looking-for-wikipedia…) > but the data has been down for a while. > > Cheers, > > _______________________________________________ > Analytics mailing list > Analytics(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB

2 1

Discovery Weekly Update for the week starting 2016-05-02
by Chris Koerner 06 May '16

06 May '16

Greetings, I hope you had a good week. Here are the updates from the Discovery department for the week. * Wikipedia.org portal A/B test for adding in descriptive text to the sister wiki links <https://phabricator.wikimedia.org/T131238> went live on May 3, 2016. * Updated statistics for wiktionary.org <https://phabricator.wikimedia.org/T128546> on May 3, 2016 * Released a few minor fixes to the Portal on May 3, 2016 * New version of Blazegraph and WDQS deployed, including geospatial search <https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Geospatia…> . * New completion suggester code was reverted from planned ElasticSearch 2.x release <https://lists.wikimedia.org/pipermail/discovery/2016-May/001085.html>, affecting our plans for runtime updates to suggester. * It is now possible to use SPARQL queries <https://phabricator.wikimedia.org/T126741> from interactive graphs <https://www.mediawiki.org/wiki/Extension:Graph/Demo>. * Deb attended the Reading department offsite this week. Initial response was that it was a great opportunity to learn and discovery new ways we can collaborate in the future. ---- Feedback and suggestions on this weekly update are welcome. The full update, and archive of past updates, can be found on Mediawiki.org: https://www.mediawiki.org/wiki/Discovery/Status_updates -- Yours, Chris Koerner Community Liaison - Discovery Wikimedia Foundation

1 0

Completion Suggester v2 reverted in elasticsearch 2.x
by David Causse 06 May '16

06 May '16

Hi, sad news, the completion suggester v2 was reverted from elastic 2.x branch[1] and is now delayed to elasticsearch v5. Unfortunately I was planning to use important features in this version to add realtime support. I'll start to think about it and see if we can workaround and still improve the current implementation. I'm really sorry about that, I should have monitored the elasticsearch repo more closely... [1] https://github.com/elastic/elasticsearch/pull/17120

3 5

Stas's support for search - structured data
by Dan Garry 06 May '16

06 May '16

Hey all, Stas and I just met to discuss how he can assist with search moving forwards. Stas works on a lot of projects, so we wanted to guide and plan his efforts together. Given Stas's role as a technical liaison between the Wikidata and Search teams, we started there. I mentioned how structured file metadata for Commons is important for content discovery, given that it would potentially enable us to do more with search on Commons by having access to structured data. This was quite a popular item <https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Archive#Stru…> on the community wishlist, where it was noted that the Wikidata Team would be working on it. Discovery has some limited capacity to help with architectural discussions, and Lydia and I have discussed helping in this area before. Stas is going to proceed with investigating T89733 <https://phabricator.wikimedia.org/T89733> "Allow ContentHandler to expose structured data to the search engine", which is a huuuuge task which needs a bit more definition to be actionable. Stas is continuing to work on the Wikidata Query Service and also (for this quarter) supporting the Performance Team in some of their efforts. If you have any questions, let me know! Thanks, Dan -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

2 1

Fwd: Search schema validation errors
by Dan Garry 03 May '16

03 May '16

There seems be a data validation error in the Search schema. It's not super pressing, but it should be fixed at some point. I created T134282 <https://phabricator.wikimedia.org/T134282> to track this. Thanks, Dan ---------- Forwarded message ---------- From: Marcel Ruiz Forns <mforns(a)wikimedia.org> Date: 2 May 2016 at 04:48 Subject: Search schema validation errors To: Dan Garry <dgarry(a)wikimedia.org> Hi Dan, We've observed that EventLogging's Search schema is receiving around 1400 events per hour that fail validation with the following error: u'comp_suggest' is not one of ['prefix', 'fulltext', 'prefixmergedwithfulltext'] It seems the client is sending a value that is not registered in the schema for the comp_suggest field. 1400 ev/h is probably not a big share of all Search events, but just a heads up in case you want to look into this. You can easily analyze the error logs here: https://logstash.wikimedia.org/#/dashboard/elasticsearch/eventlogging-errors Cheers! -- *Marcel Ruiz Forns* Analytics Developer Wikimedia Foundation -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

2 1

TextCat optimization for frwiki
by Trey Jones 03 May '16

03 May '16

Hi Everyone, I've just finished my write-up for optimizing the languages that could eventually be used for language detection on French Wikipedia. (Spanish, Italian, and German are still to come.) The full write-up <https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_Optimization…> gives details on corpus creation and clean up, performance stats, and more. Briefly, about 15% of "low performing" queries (those with < 3 results) are easily filtered junk, and 65% of the remainder are not in an identifiable language (e.g., names, acronyms, more junk, etc.). Based on a sample of 682 poor-performing queries on frwiki that are in some language, about 70% are in French, 10-15% are in English, about 7-12% are in Arabic, fewer than 3% are in Portuguese, German, and Spanish, and there are a handful of other languages present. Because of the relatively low percentage of low-performing queries that are relevant, we will still need to do an A/B test before discussing deploying this to frwiki. An A/B test on enwiki <https://phabricator.wikimedia.org/T121542> in in the works at the moment. The optimal settings for frwiki, based on these experiments, would be to use the TextCat query-based models for French, English, Arabic, Russian, Chinese, Armenian, Thai, Greek, Hebrew, Korean (fr, en, ar, ru, zh, th, el, hy, he, ko), using the default 3000-ngram models. —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation

1 1

← Newer
1
...
42
43
44
45
46
47
48
...
76
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Discovery