Hello,
Here is the week's update from the Discovery department - enjoy the read and your weekend!
== Discussions == * Trey completed the analysis for optimizing language identification for the Dutch Wikipedia (nlwiki). The results were good (F0.5 = 82.3%) but not great. The small proportions of queries in the Romance languages and in German led to many more false positives than true positives and so they had to be excluded. Future work on improving confidence may help. [1] * We could use help translating (via translatewiki) the relevant "showing results from" messages into Dutch. We'll need English, Chinese, Arabic, Korean, Greek, Hebrew, Japanese, and Russian translations. [2] * The Analysis team had a discussion on how to use better wording for phrases like "users were 1.07 times more likely to do X" and decided on using phrases similar to "we can expect 2-9 more sessions to click on a search result when they have the new feature" [3] * The Search team wrapped up research into the ElasticSearch instabilities on the eqiad search cluster that occurred on Aug 6, 2016; nothing conclusive was found. [4]
== Events and News ==
=== Interactive === * <maplink> has been enabled on all wikis (announced via email to wikitech-l) [5] * Geoshapes data service is now integrated into all maps [6]
=== Search === * Turned off BM25 A/B test, awaiting analysis [7] * Pushed into production a change that implemented ascii-folding for French [8] * Improved balance of nodes across rows for ElasticSearch eqiad cluster [9]
=== Portal === * Currently blocked on this check-in to gerrit [10]
== Other Noteworthy Stuff' == * Our elasticsearch clusters now have "row aware shard allocation". This means that we can theoretically lose one row of servers in our datacenter and still serve search traffic. [11] * The Search team sent out a request for comment article that was posted to various Village Pumps asking for it to be translated. [12] ** This was in reference to the cross-wiki search results new functionality and design articles on MediaWiki. [13], [14]
== Did you know? == * A study came out yesterday showing that giraffes are actually four distinct species, rather than one (article and BBC report). [15], [16] ** Of course, the English and German Wikipedia pages on giraffes have already been updated! [17], [18]
[1] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_Optimization_... [2] https://phabricator.wikimedia.org/T143354 [3] https://phabricator.wikimedia.org/T140187 [4] https://phabricator.wikimedia.org/T142506 [5] https://lists.wikimedia.org/pipermail/wikitech-l/2016-September/086490.html [6] https://www.mediawiki.org/wiki/Help:Extension:Kartographer#GeoShapes_externa... [7] https://phabricator.wikimedia.org/T143588 [8] https://phabricator.wikimedia.org/T144429 [9] https://phabricator.wikimedia.org/T143685 [10] https://gerrit.wikimedia.org/r/#/c/306241/ [11] https://phabricator.wikimedia.org/T143571 [12] https://meta.wikimedia.org/wiki/User:DTankersley_(WMF)/translation_request_f... [13] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements [14] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements/Design [15] http://www.cell.com/current-biology/fulltext/S0960-9822(16)30787-4 [16] http://www.bbc.com/news/science-environment-37311716 [17] https://en.wikipedia.org/wiki/Giraffe [18] https://de.wikipedia.org/wiki/Giraffe
----
The full update, and archive of past updates, can be found on Mediawiki.org: https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as Easy or volunteer needed in Phabricator: [1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R [2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Cheers!
-- Deb Tankersley Product Manager, Discovery IRC: debt Wikimedia Foundation
Thanks Deborah for the update.
Just to mention another interesting feature (implemented but yet evaluated/activated) - dcausse have implemented ability to show search results also based on the DEFAULTSORT. (T134978 https://phabricator.wikimedia.org/T134978) E.g when you search for Putin (and not Vladimir Putin) you will get suggestion for Vladimir Putin (even if there is no redirect from Putin), as its defaultsort is Putin, Vladimir. This feature may have high impact once (and if) it is activated.
On Sat, Sep 10, 2016 at 1:44 AM, Deborah Tankersley < dtankersley@wikimedia.org> wrote:
Hello,
Here is the week's update from the Discovery department - enjoy the read and your weekend!
== Discussions ==
- Trey completed the analysis for optimizing language identification for
the Dutch Wikipedia (nlwiki). The results were good (F0.5 = 82.3%) but not great. The small proportions of queries in the Romance languages and in German led to many more false positives than true positives and so they had to be excluded. Future work on improving confidence may help. [1]
- We could use help translating (via translatewiki) the relevant "showing
results from" messages into Dutch. We'll need English, Chinese, Arabic, Korean, Greek, Hebrew, Japanese, and Russian translations. [2]
- The Analysis team had a discussion on how to use better wording for
phrases like "users were 1.07 times more likely to do X" and decided on using phrases similar to "we can expect 2-9 more sessions to click on a search result when they have the new feature" [3]
- The Search team wrapped up research into the ElasticSearch instabilities
on the eqiad search cluster that occurred on Aug 6, 2016; nothing conclusive was found. [4]
== Events and News ==
=== Interactive ===
- <maplink> has been enabled on all wikis (announced via email to
wikitech-l) [5]
- Geoshapes data service is now integrated into all maps [6]
=== Search ===
- Turned off BM25 A/B test, awaiting analysis [7]
- Pushed into production a change that implemented ascii-folding for
French [8]
- Improved balance of nodes across rows for ElasticSearch eqiad cluster [9]
=== Portal ===
- Currently blocked on this check-in to gerrit [10]
== Other Noteworthy Stuff' ==
- Our elasticsearch clusters now have "row aware shard allocation". This
means that we can theoretically lose one row of servers in our datacenter and still serve search traffic. [11]
- The Search team sent out a request for comment article that was posted
to various Village Pumps asking for it to be translated. [12] ** This was in reference to the cross-wiki search results new functionality and design articles on MediaWiki. [13], [14]
== Did you know? ==
- A study came out yesterday showing that giraffes are actually four
distinct species, rather than one (article and BBC report). [15], [16] ** Of course, the English and German Wikipedia pages on giraffes have already been updated! [17], [18]
[1] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/ TextCat_Optimization_for_plwiki_arwiki_zhwiki_and_nlwiki [2] https://phabricator.wikimedia.org/T143354 [3] https://phabricator.wikimedia.org/T140187 [4] https://phabricator.wikimedia.org/T142506 [5] https://lists.wikimedia.org/pipermail/wikitech-l/2016- September/086490.html [6] https://www.mediawiki.org/wiki/Help:Extension:Kartographer#GeoShapes_ external_data [7] https://phabricator.wikimedia.org/T143588 [8] https://phabricator.wikimedia.org/T144429 [9] https://phabricator.wikimedia.org/T143685 [10] https://gerrit.wikimedia.org/r/#/c/306241/ [11] https://phabricator.wikimedia.org/T143571 [12] https://meta.wikimedia.org/wiki/User:DTankersley_( WMF)/translation_request_for_cross-wiki_search_results [13] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements [14] https://www.mediawiki.org/wiki/Cross-wiki_Search_ Result_Improvements/Design [15] http://www.cell.com/current-biology/fulltext/S0960-9822(16)30787-4 [16] http://www.bbc.com/news/science-environment-37311716 [17] https://en.wikipedia.org/wiki/Giraffe [18] https://de.wikipedia.org/wiki/Giraffe
The full update, and archive of past updates, can be found on Mediawiki.org: https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as Easy or volunteer needed in Phabricator: [1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R [2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Cheers!
-- Deb Tankersley Product Manager, Discovery IRC: debt Wikimedia Foundation
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
Thanks Eran,
I completely forgot to mention this feature in the status update.
I hope to be able to build a test index very soon so we can start evaluating this new feature.
David.
Le 10/09/2016 à 11:23, Eran Rosenthal a écrit :
Thanks Deborah for the update.
Just to mention another interesting feature (implemented but yet evaluated/activated) - dcausse have implemented ability to show search results also based on the DEFAULTSORT. (T134978 https://phabricator.wikimedia.org/T134978) E.g when you search for Putin (and not Vladimir Putin) you will get suggestion for Vladimir Putin (even if there is no redirect from Putin), as its defaultsort is Putin, Vladimir. This feature may have high impact once (and if) it is activated.
On Sat, Sep 10, 2016 at 1:44 AM, Deborah Tankersley <dtankersley@wikimedia.org mailto:dtankersley@wikimedia.org> wrote:
Hello, Here is the week's update from the Discovery department - enjoy the read and your weekend! == Discussions == * Trey completed the analysis for optimizing language identification for the Dutch Wikipedia (nlwiki). The results were good (F0.5 = 82.3%) but not great. The small proportions of queries in the Romance languages and in German led to many more false positives than true positives and so they had to be excluded. Future work on improving confidence may help. [1] * We could use help translating (via translatewiki) the relevant "showing results from" messages into Dutch. We'll need English, Chinese, Arabic, Korean, Greek, Hebrew, Japanese, and Russian translations. [2] * The Analysis team had a discussion on how to use better wording for phrases like "users were 1.07 times more likely to do X" and decided on using phrases similar to "we can expect 2-9 more sessions to click on a search result when they have the new feature" [3] * The Search team wrapped up research into the ElasticSearch instabilities on the eqiad search cluster that occurred on Aug 6, 2016; nothing conclusive was found. [4] == Events and News == === Interactive === * <maplink> has been enabled on all wikis (announced via email to wikitech-l) [5] * Geoshapes data service is now integrated into all maps [6] === Search === * Turned off BM25 A/B test, awaiting analysis [7] * Pushed into production a change that implemented ascii-folding for French [8] * Improved balance of nodes across rows for ElasticSearch eqiad cluster [9] === Portal === * Currently blocked on this check-in to gerrit [10] == Other Noteworthy Stuff' == * Our elasticsearch clusters now have "row aware shard allocation". This means that we can theoretically lose one row of servers in our datacenter and still serve search traffic. [11] * The Search team sent out a request for comment article that was posted to various Village Pumps asking for it to be translated. [12] ** This was in reference to the cross-wiki search results new functionality and design articles on MediaWiki. [13], [14] == Did you know? == * A study came out yesterday showing that giraffes are actually four distinct species, rather than one (article and BBC report). [15], [16] ** Of course, the English and German Wikipedia pages on giraffes have already been updated! [17], [18] [1] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_Optimization_for_plwiki_arwiki_zhwiki_and_nlwiki <https://www.mediawiki.org/wiki/User:TJones_%28WMF%29/Notes/TextCat_Optimization_for_plwiki_arwiki_zhwiki_and_nlwiki> [2] https://phabricator.wikimedia.org/T143354 <https://phabricator.wikimedia.org/T143354> [3] https://phabricator.wikimedia.org/T140187 <https://phabricator.wikimedia.org/T140187> [4] https://phabricator.wikimedia.org/T142506 <https://phabricator.wikimedia.org/T142506> [5] https://lists.wikimedia.org/pipermail/wikitech-l/2016-September/086490.html <https://lists.wikimedia.org/pipermail/wikitech-l/2016-September/086490.html> [6] https://www.mediawiki.org/wiki/Help:Extension:Kartographer#GeoShapes_external_data <https://www.mediawiki.org/wiki/Help:Extension:Kartographer#GeoShapes_external_data> [7] https://phabricator.wikimedia.org/T143588 <https://phabricator.wikimedia.org/T143588> [8] https://phabricator.wikimedia.org/T144429 <https://phabricator.wikimedia.org/T144429> [9] https://phabricator.wikimedia.org/T143685 <https://phabricator.wikimedia.org/T143685> [10] https://gerrit.wikimedia.org/r/#/c/306241/ <https://gerrit.wikimedia.org/r/#/c/306241/> [11] https://phabricator.wikimedia.org/T143571 <https://phabricator.wikimedia.org/T143571> [12] https://meta.wikimedia.org/wiki/User:DTankersley_(WMF)/translation_request_for_cross-wiki_search_results <https://meta.wikimedia.org/wiki/User:DTankersley_%28WMF%29/translation_request_for_cross-wiki_search_results> [13] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements <https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements> [14] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements/Design <https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements/Design> [15] http://www.cell.com/current-biology/fulltext/S0960-9822(16)30787-4 <http://www.cell.com/current-biology/fulltext/S0960-9822%2816%2930787-4> [16] http://www.bbc.com/news/science-environment-37311716 <http://www.bbc.com/news/science-environment-37311716> [17] https://en.wikipedia.org/wiki/Giraffe <https://en.wikipedia.org/wiki/Giraffe> [18] https://de.wikipedia.org/wiki/Giraffe <https://de.wikipedia.org/wiki/Giraffe> ---- The full update, and archive of past updates, can be found on Mediawiki.org: https://www.mediawiki.org/wiki/Discovery/Status_updates <https://www.mediawiki.org/wiki/Discovery/Status_updates> Interested in getting involved? See tasks marked as Easy or volunteer needed in Phabricator: [1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R <https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R> [2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R <https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R> Cheers! -- Deb Tankersley Product Manager, Discovery IRC: debt Wikimedia Foundation _______________________________________________ discovery mailing list discovery@lists.wikimedia.org <mailto:discovery@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/discovery <https://lists.wikimedia.org/mailman/listinfo/discovery>
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
Thanks to Eran for pointing this out and thanks to David for doing it! :)
I've moved the ticket https://phabricator.wikimedia.org/T134978 to our current work board and into the 'needs review' column.
Cheers,
Deb
-- Deb Tankersley Product Manager, Discovery IRC: debt Wikimedia Foundation
On Mon, Sep 12, 2016 at 2:18 AM, David Causse dcausse@wikimedia.org wrote:
Thanks Eran,
I completely forgot to mention this feature in the status update.
I hope to be able to build a test index very soon so we can start evaluating this new feature. David.
Le 10/09/2016 à 11:23, Eran Rosenthal a écrit :
Thanks Deborah for the update.
Just to mention another interesting feature (implemented but yet evaluated/activated) - dcausse have implemented ability to show search results also based on the DEFAULTSORT. (T134978 https://phabricator.wikimedia.org/T134978) E.g when you search for Putin (and not Vladimir Putin) you will get suggestion for Vladimir Putin (even if there is no redirect from Putin), as its defaultsort is Putin, Vladimir. This feature may have high impact once (and if) it is activated.
On Sat, Sep 10, 2016 at 1:44 AM, Deborah Tankersley < dtankersley@wikimedia.org> wrote:
Hello,
Here is the week's update from the Discovery department - enjoy the read and your weekend!
== Discussions ==
- Trey completed the analysis for optimizing language identification for
the Dutch Wikipedia (nlwiki). The results were good (F0.5 = 82.3%) but not great. The small proportions of queries in the Romance languages and in German led to many more false positives than true positives and so they had to be excluded. Future work on improving confidence may help. [1]
- We could use help translating (via translatewiki) the relevant "showing
results from" messages into Dutch. We'll need English, Chinese, Arabic, Korean, Greek, Hebrew, Japanese, and Russian translations. [2]
- The Analysis team had a discussion on how to use better wording for
phrases like "users were 1.07 times more likely to do X" and decided on using phrases similar to "we can expect 2-9 more sessions to click on a search result when they have the new feature" [3]
- The Search team wrapped up research into the ElasticSearch
instabilities on the eqiad search cluster that occurred on Aug 6, 2016; nothing conclusive was found. [4]
== Events and News ==
=== Interactive ===
- <maplink> has been enabled on all wikis (announced via email to
wikitech-l) [5]
- Geoshapes data service is now integrated into all maps [6]
=== Search ===
- Turned off BM25 A/B test, awaiting analysis [7]
- Pushed into production a change that implemented ascii-folding for
French [8]
- Improved balance of nodes across rows for ElasticSearch eqiad cluster
[9]
=== Portal ===
- Currently blocked on this check-in to gerrit [10]
== Other Noteworthy Stuff' ==
- Our elasticsearch clusters now have "row aware shard allocation". This
means that we can theoretically lose one row of servers in our datacenter and still serve search traffic. [11]
- The Search team sent out a request for comment article that was posted
to various Village Pumps asking for it to be translated. [12] ** This was in reference to the cross-wiki search results new functionality and design articles on MediaWiki. [13], [14]
== Did you know? ==
- A study came out yesterday showing that giraffes are actually four
distinct species, rather than one (article and BBC report). [15], [16] ** Of course, the English and German Wikipedia pages on giraffes have already been updated! [17], [18]
[1] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/T extCat_Optimization_for_plwiki_arwiki_zhwiki_and_nlwiki [2] https://phabricator.wikimedia.org/T143354 [3] https://phabricator.wikimedia.org/T140187 [4] https://phabricator.wikimedia.org/T142506 [5] https://lists.wikimedia.org/pipermail/wikitech-l/2016-Se ptember/086490.html [6] https://www.mediawiki.org/wiki/Help:Extension:Kartograph er#GeoShapes_external_data [7] https://phabricator.wikimedia.org/T143588 [8] https://phabricator.wikimedia.org/T144429 [9] https://phabricator.wikimedia.org/T143685 [10] https://gerrit.wikimedia.org/r/#/c/306241/ [11] https://phabricator.wikimedia.org/T143571 [12] https://meta.wikimedia.org/wiki/User:DTankersley_(WMF)/ translation_request_for_cross-wiki_search_results [13] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements [14] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result _Improvements/Design [15] http://www.cell.com/current-biology/fulltext/S0960-9822(16)30787-4 [16] http://www.bbc.com/news/science-environment-37311716 [17] https://en.wikipedia.org/wiki/Giraffe [18] https://de.wikipedia.org/wiki/Giraffe
The full update, and archive of past updates, can be found on Mediawiki.org: https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as Easy or volunteer needed in Phabricator: [1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R [2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Cheers!
-- Deb Tankersley Product Manager, Discovery IRC: debt Wikimedia Foundation
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
discovery mailing listdiscovery@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/discovery
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
wikitech-l@lists.wikimedia.org