I thought that "," comma was being added to the Elasticsearch token filter as a stopword and excluded from simple search now? Or did I miss something?
[image: image.png]
Or NO and U+002C comma was decided against being added, and we must use the Advanced Search on Wikidata or the API ?
I noticed that the string "foot locker inc" will not show the entity in the dropdown, but only "foot locker, inc." ? (I've since added the full legal name into the alias to improve searchability, but still would like to know the stopword decision)
Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
Hi Thad,
This suggestion box does not use ElasticSearch, it uses a simple prefix search on labels and aliases, run directly against the SQL database, I think. ElasticSearch is only used when you go to Special:Search.
Best,
Antonin
On 15/08/2021 04:25, Thad Guidry wrote:
I thought that "," comma was being added to the Elasticsearch token filter as a stopword and excluded from simple search now? Or did I miss something?
image.png
Or NO and U+002C comma was decided against being added, and we must use the Advanced Search on Wikidata or the API ?
I noticed that the string "foot locker inc" will not show the entity in the dropdown, but only "foot locker, inc." ? (I've since added the full legal name into the alias to improve searchability, but still would like to know the stopword decision)
Thad https://www.linkedin.com/in/thadguidry/ https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/ https://calendly.com/thadguidry/
Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org To unsubscribe send an email to wikidata-tech-leave@lists.wikimedia.org
Hi Antonin,
No, that's not correct, that was changed a few years ago. I've been keeping up (well, trying to!). Here's the post from Stas from 4 years ago. https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/thr...
The reason I asked about the comma token is that I seemed to recall, perhaps in last 3 years? that it would not affect the simple search (aka wbsearchentities API, aka suggestion dropdown box)
Also, I tried to find something about that on Phabricator, but quickly got into the weeds since I don't know the labels that the team uses. The closest ones I found were these: https://phabricator.wikimedia.org/search/query/64FG3L6BYauX/#R but I have no way in the Phabricator Advanced Search to say, only those after the date of Stas' post.
Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
On Sun, Aug 15, 2021 at 1:47 AM Antonin Delpeuch (lists) < lists@antonin.delpeuch.eu> wrote:
Hi Thad,
This suggestion box does not use ElasticSearch, it uses a simple prefix search on labels and aliases, run directly against the SQL database, I think. ElasticSearch is only used when you go to Special:Search.
Best,
Antonin On 15/08/2021 04:25, Thad Guidry wrote:
I thought that "," comma was being added to the Elasticsearch token filter as a stopword and excluded from simple search now? Or did I miss something?
[image: image.png]
Or NO and U+002C comma was decided against being added, and we must use the Advanced Search on Wikidata or the API ?
I noticed that the string "foot locker inc" will not show the entity in the dropdown, but only "foot locker, inc." ? (I've since added the full legal name into the alias to improve searchability, but still would like to know the stopword decision)
Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org To unsubscribe send an email to wikidata-tech-leave@lists.wikimedia.org
Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org To unsubscribe send an email to wikidata-tech-leave@lists.wikimedia.org
Ah ok, that's good to know, thanks!
Antonin
On 15/08/2021 13:59, Thad Guidry wrote:
Hi Antonin,
No, that's not correct, that was changed a few years ago. I've been keeping up (well, trying to!). Here's the post from Stas from 4 years ago. https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/thr... https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/thread/CPGZSC55XFU5HPSRJNZWDQOKLZDHKJSA/#S7J5EZG5TIBNFXRRWVHGCXC4X6RSZ65Q
The reason I asked about the comma token is that I seemed to recall, perhaps in last 3 years? that it would not affect the simple search (aka wbsearchentities API, aka suggestion dropdown box)
Also, I tried to find something about that on Phabricator, but quickly got into the weeds since I don't know the labels that the team uses. The closest ones I found were these: https://phabricator.wikimedia.org/search/query/64FG3L6BYauX/#R https://phabricator.wikimedia.org/search/query/64FG3L6BYauX/#R but I have no way in the Phabricator Advanced Search to say, only those after the date of Stas' post.
Thad https://www.linkedin.com/in/thadguidry/ https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/ https://calendly.com/thadguidry/
On Sun, Aug 15, 2021 at 1:47 AM Antonin Delpeuch (lists) <lists@antonin.delpeuch.eu mailto:lists@antonin.delpeuch.eu> wrote:
Hi Thad, This suggestion box does not use ElasticSearch, it uses a simple prefix search on labels and aliases, run directly against the SQL database, I think. ElasticSearch is only used when you go to Special:Search. Best, Antonin On 15/08/2021 04:25, Thad Guidry wrote:
I thought that "," comma was being added to the Elasticsearch token filter as a stopword and excluded from simple search now? Or did I miss something? image.png Or NO and U+002C comma was decided against being added, and we must use the Advanced Search on Wikidata or the API ? I noticed that the string "foot locker inc" will not show the entity in the dropdown, but only "foot locker, inc." ? (I've since added the full legal name into the alias to improve searchability, but still would like to know the stopword decision) Thad https://www.linkedin.com/in/thadguidry/ <https://www.linkedin.com/in/thadguidry/> https://calendly.com/thadguidry/ <https://calendly.com/thadguidry/> _______________________________________________ Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org <mailto:wikidata-tech@lists.wikimedia.org> To unsubscribe send an email to wikidata-tech-leave@lists.wikimedia.org <mailto:wikidata-tech-leave@lists.wikimedia.org>
_______________________________________________ Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org <mailto:wikidata-tech@lists.wikimedia.org> To unsubscribe send an email to wikidata-tech-leave@lists.wikimedia.org <mailto:wikidata-tech-leave@lists.wikimedia.org>
Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org To unsubscribe send an email to wikidata-tech-leave@lists.wikimedia.org
On Sun, Aug 15, 2021 at 4:26 AM Thad Guidry thadguidry@gmail.com wrote:
I thought that "," comma was being added to the Elasticsearch token filter as a stopword and excluded from simple search now? Or did I miss something?
Or NO and U+002C comma was decided against being added, and we must use the Advanced Search on Wikidata or the API ?
I noticed that the string "foot locker inc" will not show the entity in the dropdown, but only "foot locker, inc." ? (I've since added the full legal name into the alias to improve searchability, but still would like to know the stopword decision)
Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/ _______________________________________________ Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org To unsubscribe send an email to wikidata-tech-leave@lists.wikimedia.org
Hi Thad :)
I don't think we've made any changes to this in quite a while. And I don't think we ever removed commas from the entity search input. Could you file a ticket please if you'd like to see this changed? Thanks!
Cheers Lydia
DONE! Here's the new ticket I created: https://phabricator.wikimedia.org/T289428
Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
On Fri, Aug 20, 2021 at 11:28 AM Lydia Pintscher < Lydia.Pintscher@wikimedia.de> wrote:
On Sun, Aug 15, 2021 at 4:26 AM Thad Guidry thadguidry@gmail.com wrote:
I thought that "," comma was being added to the Elasticsearch token
filter as a stopword and excluded from simple search now?
Or did I miss something?
Or NO and U+002C comma was decided against being added, and we must use
the Advanced Search on Wikidata or the API ?
I noticed that the string "foot locker inc" will not show the entity in
the dropdown, but only "foot locker, inc." ?
(I've since added the full legal name into the alias to improve
searchability, but still would like to know the stopword decision)
Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/ _______________________________________________ Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org To unsubscribe send an email to wikidata-tech-leave@lists.wikimedia.org
Hi Thad :)
I don't think we've made any changes to this in quite a while. And I don't think we ever removed commas from the entity search input. Could you file a ticket please if you'd like to see this changed? Thanks!
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207. _______________________________________________ Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org To unsubscribe send an email to wikidata-tech-leave@lists.wikimedia.org
wikidata-tech@lists.wikimedia.org