Hi Team,
As usual for me, I love digging and pulling out weeds from the garden beds on the weekend. :-)
I searched the repo and did not see "apostrophe" or "contraction" mentioned at all. https://github.com/google/abstracttext/search?q=apostrophe&unscoped_q=ap... https://github.com/google/abstracttext/search?q=contraction&unscoped_q=c...
I was hoping to see an example conversion function to help with contractions (shortened forms of words where letters have been omitted and replaced by apostrophes and sometimes other characters) ?
My use case (in the future) is to help Abstract Wikipedia to more easily handle search & matching for English idioms and deal with alternative variants that sometimes have contracted forms of words within them. For example: https://www.wikidata.org/wiki/Lexeme:L311061 "We will cross that bridge when we come to it" "We'll cross that bridge when we come to it"
Idioms are so complex in English and many alternative variants with optional hyphens, apostrophes, etc. So I'm (<-- a contraction!) trying to understand some of the future ideas on how searchability might be improved by allowing hints somehow in Wikidata Lexemes and what a first practice (maybe not best practice yet!) would begin to look like.
We might have parsing functions that already know that "we will" = "we'll". The lexeme for "we'll" is in fact already there https://www.wikidata.org/wiki/Lexeme:L269709 GREAT! But I think that lexeme and others are missing additional information to make them really useful with our later conversion functions or renderers?
So... Some of these questions are deep, forward thinking, and probably will not have the best answers right now, but it's (<--another contraction!!) useful to ask them now I think: 1. Perhaps somehow mentioning that https://www.wikidata.org/wiki/Lexeme:L269709 is a contraction and not only a phrase? But I don't see how to do that currently. Lexical category allows only 1. If you know, let me know. 2. How would a function determine equivalency handled with Z-objects in the case of contractions? For example, would the mere fact that on L269709 that there are 2 forms -F1 and -F2, automatically return a boolean True on some function? Is that best? 3. Seeing as how ubiquitous contractions are... does that make them a good candidate in the future for separate indexing? a. Is L269709 and its -F1 and -F2 forms good enough for further building very fast lookup or conversion functions for contractions that would use ElasticSearch indexes? This could be performant enough and completely stored in memory for English language, I guess?
I'm all ears (<-- an idiom!!)(means I am listening) Thad https://www.linkedin.com/in/thadguidry/