Hi Thad! "Il faut cultiver notre jardin." [
https://againstprofphil.org/2
017/02/20/il-faut-cultiver-notre-jardin/]
I did add a contracted form to a lexeme few weeks ago, just to test the
water. (
https://www.wikidata.org/w/index.php?title=Lexeme:L1883&dif
f=1248168703.) Specifically, this was "'m" as a contraction of
"am" and,
therefore, a form of *be*. But, as is common with contractions, the form is
permitted only in specific contexts (some languages, like French,
*require* contractions
in particular contexts, so we certainly can't ignore them).
1. I'm not convinced that "we'll" is a phrase. It is certainly a
contraction (of either "we shall" or "we will"). My understanding is
that a *lexical
phrase* would only be admitted as a lexeme if its definition (including its
inflection) is not fully implied by its constituents. And I suggest that is
not the case here. Rather, it is justified as a lexeme by the fact that it
is a contraction (personally, I don't entirely agree with this
justification in this particular case, because "'ll" is practically a
separate word in contemporary English, but perhaps that 'll be a topic for
another day).
2. From 1, "we'll" has three forms: "we'll", "we
shall" and "we will". But
these are not inflections, so perhaps they aren't strictly *forms* at all.
Semantics aside, "we'll" is neither more nor less than "we"
followed by
"'ll", with the orthographic convention of no intervening space where a
word begins with an apostrophe (unless 'tis an exception). Anyway...
What sort of function did you have in mind? My theory would be that only a
parsing function would start with "we'll" and its result is ambiguous
because "'ll" is ambiguous (being a form in two separate lexemes, *shall*
and *will)*.
Since you mention searching, I'd guess that a search would want to find
results containing any one of the three forms, given "we'll", but only the
given form and "we'll", given either of the other forms.
Since you also mention "equivalency", I would say that there is equivalence
between "we'll" and "we shall", and there is equivalence between
"we'll"
and "we will", but there is no (implied) equivalence between "we
shall" and
"we will". (For a clearer case, consider "we'd", which might mean
"we had"
or "we should" or "we would". You might suspect equivalence between
"we
should" and "we would" but not between either and "we had". So it
is not
the common form of the contraction that implies the suspected equivalence,
it is the fading/faded distinction between *shall* and *will*, facilitated
by common contractions.)
Since you mention renderers, I would say not. But that's not a definitive
no. It's more a case of we'll build that bridge when we get to where it
isn't ;) A rendering function should be expected to leave the matter of
contraction 'til later (which, like tomorrow, never comes). According to
this theory, there might come a time when a renderer function has (in
effect) "we shall" and returns "we'll", but not the other way
around. (And
according to my principle of losslessness, the result is actually both:
"we'll<!--optional contraction of: we shall-->", in essence.)
3. No idea! (Just saying...)
Just to backtrack to your use case, I'd be inclined to lemmatize the entire
phrase. My millennial Collins English Dictionary has sense 32 for *cross*:
"cross a bridge when one comes to it". My Oxford Dictionary of English has
the phrase "cross that bridge when one comes to it" under *bridge*. One
never says "one", of course! There seem to be five or six occurrences in
the British National Corpus. Five say "cross that bridge when"; three are
for "we" (one attributive modifying "attitude", one imperative:
"let's
cross that bridge when we *get* to it, one: "we can cross..."), one each
for "he" and "she" "would" (in the same text but not close
together), and
one, well... "we cross our bridges when we come to them and burn them
behind us..." (Why, thank you, Sir Tom!
https://en.wikipedia.org/
wiki/Tom_Stoppard)
Thank you for "listening",
Al.
On Saturday, 29 August 2020, Thad Guidry <thadguidry(a)gmail.com> wrote:
Hi Team,
As usual for me, I love digging and pulling out weeds from the garden beds
on the weekend. :-)
I searched the repo and did not see "apostrophe" or "contraction"
mentioned at all.
https://github.com/google/abstracttext/search?q=apostrophe&u
nscoped_q=apostrophe
https://github.com/google/abstracttext/search?q=contraction&
unscoped_q=contraction
I was hoping to see an example conversion function to help with
contractions (shortened forms of words where letters have been omitted and
replaced by apostrophes and sometimes other characters) ?
My use case (in the future) is to help Abstract Wikipedia to more easily
handle search & matching for English idioms and deal with alternative
variants that sometimes have contracted forms of words within them. For
example:
https://www.wikidata.org/wiki/Lexeme:L311061
"We will cross that bridge when we come to it"
"We'll cross that bridge when we come to it"
Idioms are so complex in English and many alternative variants with
optional hyphens, apostrophes, etc.
So I'm (<-- a contraction!) trying to understand some of the future ideas
on how searchability might be improved by allowing hints somehow in
Wikidata Lexemes and what a first practice (maybe not best practice yet!)
would begin to look like.
We might have parsing functions that already know that "we will" =
"we'll". The lexeme for "we'll" is in fact already there
https://www.wikidata.org/wiki/Lexeme:L269709 GREAT!
But I think that lexeme and others are missing additional information to
make them really useful with our later conversion functions or renderers?
So... Some of these questions are deep, forward thinking, and probably
will not have the best answers right now, but it's (<--another
contraction!!) useful to ask them now I think:
1. Perhaps somehow mentioning that
https://www.wikidata.org/wiki/
Lexeme:L269709 is a contraction and not only a phrase? But I don't see
how to do that currently. Lexical category allows only 1. If you know,
let me know.
2. How would a function determine equivalency handled with Z-objects in
the case of contractions? For example, would the mere fact that on L269709
that there are 2 forms -F1 and -F2, automatically return a boolean True on
some function? Is that best?
3. Seeing as how ubiquitous contractions are... does that make them a good
candidate in the future for separate indexing?
a. Is L269709 and its -F1 and -F2 forms good enough for further building
very fast lookup or conversion functions for contractions that would use
ElasticSearch indexes? This could be performant enough and completely
stored in memory for English language, I guess?
I'm all ears (<-- an idiom!!)(means I am listening)
Thad
https://www.linkedin.com/in/thadguidry/