First of all, welcome to Swork! I don't think you misunderstood anything,
but we're fine with that if you did. "Bobby hit the ball over the stadium
using the bat" sounds okay too.
I'll try to focus on your topic, Adam. It does seem as though we're moving
into a broader problem area, but we'll always get back to inflections
eventually.
So, yes, context is everything. Almost. So let's remember we're talking
primarily about Wikipedia. If we were talking about conversations, that
would introduce all sorts of additional and fascinating complications, like
how you add the thing you just remembered into the words you are in the
middle of speaking. But let's try to relate that sort of problem back to
our Wikipedic context.
We assume that we have a notable person aka "Bobby" and some record of an
event in this person's life (the infamous ball-hitting incident). This, I
think, corresponds to Adam's original "set-up" (to use your term, Swork).
Let's further assume that someone once wrote something in Wikipedia like
"Roberta, known to her friends as "Bobby", is the youngest female ever to
hit a ball over the <name> stadium. She used the bat her father had used in
1983, when..." And let's further assume that this event is represented in
some vaguely familiar way in Wikidata... (details avoided!) And now,
finally, we discover from Roberta's autobiography ("Call me Bobby") that
she was humming the very tune that her father used to hum... so we add that
to Wikidata (during tune Q864573216897 performed by Q<Roberta> as a hum,
reference Q<Call me Bobby>, p.132).
So, before the change, our process found "Bobby", "hit a ball" and "used
the bat". I've left the determiners (articles "a" and "the") in place
because they suggest it's any old ball but a particular bat. And Adam's
function didn't output "Bobby hit a ball used the bat" but (given the
set-up, which refers to the bat in an article about Bobby's father etc "his
daughter, ...") "...Bobby, using the bat, hit a ball..." ("over the
stadium", as it happens).
After the change in Wikidata (given the relative significance of these
well-sourced details), our selection process has transformed the recorded
fact about the tune to the action "hum" and ignored the memorable tune. It
has also, quite consistently, determined that the humming is incidental, so
"during tune" has become something equivalent to <while>. So we have
"Bobby", "hit a ball", "used the bat" plus <while>"Bobby", "hum". Given our
previous context, Adam's function will now produce "Bobby, humming, used
the bat to hit a ball..." (or something equivalent). The point is that it
is this particular context that pretty much requires "Bobby" to come first
and the "ball" to come last. In a different context, we might require
"Bobby" to come later: "Humming, while using the bat to hit a ball,
Bobby..." And notice this (it wasn't even deliberate), we ever so casually
shifted Bobby from being the subject of the verb "hit" to being the subject
of the verb "used" to being the subject of some other verb in the wider
context. Had the bat been more prominent, we might have ended up with that
as the subject: "[later] the bat was used by [his daughter] Bobby to hit a
ball..., while humming..."
Backtracking a little, if we leave open the question of which verb is the
main verb, we could just as easily come up with "Bobby hummed, while using
the bat to hit a ball. And going back to Adam's original example, we might
have got "Bobby used the bat to hit a ball" or even "Bobby batted a ball".
Only with my invented Wikipedic context can I suggest that "Bobby used the
bat to hit a ball" sounds best to my English ears. (I know that wasn't the
question but it's good to share.)
Best regards,
Al.
On Saturday, 25 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Conjugation and Declension Functions (Exocamp Exocamp)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 24 Jul 2020 23:06:45 -0500
> From: Exocamp Exocamp <scamgiver(a)gmail.com>
> To: abstract-wikipedia(a)lists.wikimedia.org
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
> Message-ID:
> <CAPBFV5evw=80nP7rhnXqXOJgOYS3Jgrf2X_h79n-7eE+RhajJA@mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I would like to say first that I am new to this idea and project and not as
> familiar with coding or linguistics as some of you, so I apologize in
> advance if I am severely misunderstanding something in this thread.
>
> While any of these phrases would sound decent in a sentence, I believe that
> the context of the rest of the sentence would be needed to determine the
> correct thematic relation. Even if the output phrases all contain the same
> words, natural speech and language would not pick these phrases necessarily
> at random, because the three phrases have different "set-ups" as I would
> call them. I'll give examples so I become much more clear.
>
> Let us assume we have the agent-patient-instrument pair of Bobby, the ball,
> and the bat, and finish this clause with the directional relation "over the
> stadium". Thus our sentences would most likely be as follows:
>
> [1] Bobby hit the ball using the bat over the stadium.
> [2] Bobby, using the bat, hit the ball over the stadium.
> [3] Using the bat, Bobby hit the ball over the stadium.
>
> Despite all of these being technically correct sentences in English,
> sentence [1] in my opinion would definitely not look as natural to a native
> English speaker in text as [2] or [3] (though in my opinion [3] is best)
> Let's do another example with the phrase "while he hummed a song."
>
> [1] Bobby hit the ball using the bat while he hummed a song.
> [2] Bobby, using the bat, hit the ball while he hummed a song.
> [3] Using the bat, Bobby hit the ball while he hummed a song.
>
> Now with this new phrase, sentences [1] and [2] have become the most
> natural sentences (again, in my opinion), when in the previous sentence
> where the phrase "over the stadium" was used, these two were the most
> *un*natural. As such, I believe that the most accurate method to find the
> most natural phrasing would need to find a pattern in the context in order
> to apply the correct formation to.
>
> One last thought of mine has to do with spoken English and not written, but
> still may assist with our problem. From my thinking, the stressed word on a
> previous sentence may also affect the construction of the next. Let's treat
> our agent-patient-instrument as a complete sentence, and the sentence
> before this in spoken English as "Bobby had a ball and a bat" - introducing
> Bobby, ball, and bat beforehand. The stress could affect the construction
> like so, where * * indicates the stressed word in the sentence.
>
> [1] *Bobby* had a ball and a bat. Bobby hit the ball using the bat.
> [2] Bobby had a *ball* and a bat. Bobby, using the bat, hit the ball.
> [3] Bobby had a ball and a *bat.* Using the bat, Bobby hit the ball.
>
> Again, I am not a very experienced person in either linguistics or
> programming, so I may have missed the point of something, but I hope my
> thoughts help with this.
>
> Sincerely,
> SWork
>
>
> On Fri, Jul 24, 2020, 6:07 PM <
> abstract-wikipedia-request(a)lists.wikimedia.org> wrote:
>
> > Send Abstract-Wikipedia mailing list submissions to
> > abstract-wikipedia(a)lists.wikimedia.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> > or, via email, send a message with subject or body 'help' to
> > abstract-wikipedia-request(a)lists.wikimedia.org
> >
> > You can reach the person managing the list at
> > abstract-wikipedia-owner(a)lists.wikimedia.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Abstract-Wikipedia digest..."
> >
> >
> > Today's Topics:
> >
> > 1. Re: Conjugation and Declension Functions (Adam Sobieski)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Fri, 24 Jul 2020 23:07:03 +0000
> > From: Adam Sobieski <adamsobieski(a)hotmail.com>
> > To: "General public mailing list for the discussion of Abstract
> > Wikipedia (aka Wikilambda)" <
> > abstract-wikipedia(a)lists.wikimedia.org>
> > Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
> > Message-ID:
> > <
> > CH2PR12MB41841C5E1C4E820B9766AB1BC5770@CH2PR12MB4184.
> namprd12.prod.outlook.com
> > >
> >
> > Content-Type: text/plain; charset="windows-1252"
> >
> > In addition to broaching the discovery, encoding and reuse of
> > language-specific patterns, which I view as a discussion topic when
> > comparing and contrasting approaches (see: the DRY principle [1] and the
> > abstraction principle [2]), and indicating how conjugation and declension
> > functions would facilitate the discovery, encoding and reuse of these
> > patterns, I would like to share some topics which arose as I explored
> > adding the thematic relation [3] of instrument to the agent-patient pair.
> >
> > When exploring how best to add the thematic relation of instrument to the
> > agent-patient pair (e.g. adding “using the bat” to “Bobby hit the
> ball”), I
> > observed that, for the same input grammatical arguments, there was a set
> of
> > possible output paraphrases:
> >
> >
> > 1. “Bobby hit the ball using the bat”
> > 2. “Bobby, using the bat, hit the ball”
> > 3. “Using the bat, Bobby hit the ball”
> >
> > I wondered: how might we be able to generate each?
> >
> > One possibility is indicated. There could be a type for each thematic
> > relation [3]. Then, using explicit type conversions to these types, we
> > could have different functions for different sequences of input
> arguments.
> >
> > That is,
> >
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, (Agent)Bobby, (Patient)ball, (Instrument)bat)
> > --> “Bobby hit the ball using the bat”
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, (Agent)Bobby, (Instrument)bat, (Patient)ball)
> > --> “Bobby, using the bat, hit the ball”
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, (Instrument)bat, (Agent)Bobby, (Patient)ball)
> > --> “Using the bat, Bobby hit the ball”
> >
> > or
> >
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, Bobby as Agent, ball as Patient, bat as
> > Instrument) --> “Bobby hit the ball using the bat”
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, Bobby as Agent, bat as Instrument, ball as
> > Patient) --> “Bobby, using the bat, hit the ball”
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, bat as Instrument, Bobby as Agent, ball as
> > Patient) --> “Using the bat, Bobby hit the ball”
> >
> > These functions could each wrap the use of patterns (as indicated in
> > previous email) and attempt to realize output sentences utilizing the
> > arguments in the same sequence in which they were provided. This would,
> > however, mean that the callers of the functions would be responsible for
> > “shuffling” the input arguments to express the desired paraphrase ((1),
> > (2), (3)).
> >
> > There are, of course, other approaches to consider and other
> possibilities
> > to consider with respect to addressing the matter of outputting the
> example
> > paraphrases.
> >
> > Any thoughts on these topics?
> >
> >
> > Best regards,
> > Adam
> >
> > [1] https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
> > [2]
> > https://en.wikipedia.org/wiki/Abstraction_principle_(
> computer_programming)
> > [3] https://en.wikipedia.org/wiki/Thematic_relation
> >
> > P.S.: We could also put the grammatical arguments on the context object
> > (context.voice, context.tense, context.aspect, context.mood, et cetera)
> and
> > then make use of generic parameters for other uses.
> >
> > From: Adam Sobieski<mailto:adamsobieski@hotmail.com>
> > Sent: Thursday, July 23, 2020 7:38 PM
> > To: General public mailing list for the discussion of Abstract Wikipedia
> > (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org>
> > Subject: RE: Conjugation and Declension Functions
> >
> >
> > I had an idea about discovering, encoding and reusing patterns in
> > languages that I would like to share with the group.
> >
> >
> >
> > The following is rough-draft pseudocode for a function with which to
> > generate sentences for the verb “to kick”. The function has generic
> > parameters for voice and tense and parameters for context, agent and
> > patient.
> >
> >
> >
> > Clause kick<V, T>(Context context, Noun agent, Noun patient)
> >
> > {
> >
> > switch(V)
> >
> > {
> >
> > case voice::active:
> >
> > switch(T)
> >
> > {
> >
> > case tense::past:
> >
> > case tense::present:
> >
> > return join(" ", [
> >
> > agent,
> >
> > conjugate_en(“kick”, V, T, …),
> >
> > “the”,
> >
> > patient
> >
> > ]);
> >
> > case tense::future:
> >
> > return join(" ", [
> >
> > agent,
> >
> > “will”,
> >
> > conjugate_en(“kick”, V, T, …),
> >
> > “the”,
> >
> > patient
> >
> > ]);
> >
> > …
> >
> > }
> >
> > case voice::passive:
> >
> > switch(T)
> >
> > {
> >
> > case tense::past:
> >
> > return join(" ", [
> >
> > “the”,
> >
> > patient,
> >
> > “was”,
> >
> > conjugate_en(“kick”, V, T, …),
> >
> > “by”
> >
> > agent
> >
> > ]);
> >
> > case tense::present:
> >
> > return join(" ", [
> >
> > “the”,
> >
> > patient,
> >
> > “is”,
> >
> > “being”,
> >
> > conjugate_en(“kick”, V, T, …),
> >
> > “by”
> >
> > agent
> >
> > ]);
> >
> > …
> >
> > }
> >
> > }
> >
> > }
> >
> >
> >
> > and
> >
> >
> >
> > kick<voice::active, tense::past>(context, Bobby, ball) returns “Bobby
> > kicked the ball”.
> >
> >
> >
> > In English, most combinations of tense, aspect, mood and voice are
> > expressed periphrastically, using constructions with auxiliary verbs.
> After
> > implementing a number of these for similar verbs, per the above example,
> we
> > would notice a pattern. That pattern could be expressed as something
> > resembling:
> >
> >
> >
> > Clause pattern123<V, T>(Context context, Verb verb, Noun agent, Noun
> > patient)
> >
> > {
> >
> > switch(V)
> >
> > {
> >
> > case voice::active:
> >
> > switch(T)
> >
> > {
> >
> > case tense::past:
> >
> > case tense::present:
> >
> > return join(" ", [
> >
> > agent,
> >
> > conjugate_en(verb, V, T, …),
> >
> > “the”,
> >
> > patient
> >
> > ]);
> >
> > case tense::future:
> >
> > return join(" ", [
> >
> > agent,
> >
> > “will”,
> >
> > conjugate_en(verb, V, T, …),
> >
> > “the”,
> >
> > patient
> >
> > ]);
> >
> > …
> >
> > }
> >
> > case voice::passive:
> >
> > switch(T)
> >
> > {
> >
> > case tense::past:
> >
> > return join(" ", [
> >
> > “the”,
> >
> > patient,
> >
> > “was”,
> >
> > conjugate_en(verb, V, T, …),
> >
> > “by”
> >
> > agent
> >
> > ]);
> >
> > case tense::present:
> >
> > return join(" ", [
> >
> > “the”,
> >
> > patient,
> >
> > “is”,
> >
> > “being”,
> >
> > conjugate_en(verb, V, T, …),
> >
> > “by”
> >
> > agent
> >
> > ]);
> >
> > …
> >
> > }
> >
> > }
> >
> > }
> >
> >
> > We could then simply express that the verb “to kick” is an instance of
> > pattern pattern123 with something resembling:
> >
> >
> >
> > Clause kick<V, T>(Context context, Noun agent, Noun patient)
> >
> > {
> >
> > pattern123<V, T>(context, new Verb(“kick”, …), agent, patient);
> >
> > }
> >
> > or
> >
> >
> > Clause kick<V, T>(Context context, Noun agent, Noun patient)
> >
> > {
> >
> > pattern123<V, T>(context, getVerb(“https://…#kick”), agent, patient);
> >
> > }
> >
> > and this pattern could be reused for a large number of verbs.
> >
> > This use of patterns should extend to scenarios where there are more
> > parameters for other thematically-related nouns:
> >
> >
> >
> > Clause kick<V, T>(Context context, Noun agent, Noun patient, Noun
> > instrument)
> >
> > {
> >
> > pattern1234<V, T>(context, new Verb(“kick”, …), agent, patient,
> > instrument);
> >
> > }
> >
> > For English, there might be more generic parameters than those for voice
> > and tense; we could add those for aspect and mood (e.g. pattern1234<V, T,
> > A, M>(…)).
> >
> > In conclusion, with generic programming and functions for conjugation and
> > declension, we have expressiveness with which to discover, encode and
> reuse
> > language-specific patterns.
> >
> >
> > Best regards,
> > Adam Sobieski
> >
> >
Hi all,
I am not aware of details on the funding of Abstract Wikipedia, neither the
funding needs nor currently available funds or any function of budget
(estimated, approved, requested, targeted etc.) over time.
Thanks for any pointers,
Daniel
Hello all,
Based on a suggestion of Chris Cooley on the wiki, let's pull the topic of
"Theoretical and practical underpinnings of the natural language
generation" earlier. I started a page (basically empty), and hope we'll
over time work together on the structure of the page.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Related_and_previous_wor…
Abstract Wikipedia will generate natural language text from an abstract
representation. This is not a novel idea, and it has been tried a number of
times before.
On this new page we aim to collect different existing approaches. It tries
to summarize the core ideas of the different approaches, their advantages
and disadvantages, and point to existing implementations. This page (by and
for the community) will help to choose which approach to focus on first
when we get to it.
We have some time - I hope that part of the project will kick off in 12-18
months, it all goes well. It would be great if we would have a good and
understandable overview of the state of the art by then.
Thanks everyone,
Denny
Hello all,
I made a draft of possible function examples. I am not saying all of these
are useful, or we will have these, or that these are the right signatures
for the suggested functions, but this is more to illustrate the possible
scope of the project we are aiming for.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Early_function_examples
Feel free to add, discuss, improve.
Stay safe,
Denny
I like your "phrasebook" analogy, Charles. Working from the Wikidata
Statement, "Donald Trump has been President of the United States since
<start date>" inflects naturally (in Wikipedia, if the Wikdata Statement
changes) to "Donald Trump was President of the United States from <start
date> to <end date>, when he was succeeded by <successor>." Clearly it's
not just the verb that is inflected. In context, <successor> would be a
link if not previously linked, so we need "link inflection" too. I would be
inclined to return it as a link and let the receiver apply their particular
policy for duplicate links, but I've never seen them as a particular
problem anyway.
There is also a contingent form for death in office: "Donald Trump was
President of the United States from <start date> until his death on <date
of death>, when he was succeeded by <Vice President>." And maybe for the
exception to the rule: "Donald Trump was President of the United States
from <start date> until his death on <date of death> when, instead of being
succeeded by <Vice President>, who was Vice President at the time, he was
succeeded by <successor>, who was <role of successor> when the President
died. This is because <reason>."
Al.
On Thursday, 23 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: A few examples of functions (Denny Vrandečić)
> 2. Budget of Abstract Wikipedia? (Daniel Mietchen)
> 3. Re: Conjugation and Declension Functions (Adam Sobieski)
> (Charles Matthews)
> 4. Re: Conjugation and Declension Functions (Adam Sobieski)
> (Adam Sobieski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 22 Jul 2020 12:10:40 -0700
> From: Denny Vrandečić <dvrandecic(a)wikimedia.org>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists.
> wikimedia.org>
> Subject: Re: [Abstract-wikipedia] A few examples of functions
> Message-ID:
> <CA+bik1dPE60Vc-MWXso3AFTA5OpXSCdm-3W=FQT9CBY-
> G9Fjtg(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Louis,
>
> yes, you are right, but I want to point out that plural was not defined
> as plural: string ➝ string but as plural: English noun ➝ string. So the
> question is, what is English noun - and if that in turn was a wrapper
> object around Wikidata Lexemes, say L3337, it would indeed be rather easy
> to ask for the plural and get back "men" as a string.
>
> Yes, one question is what does it mean to be side-effect free. If for
> example we define that calls to Wikidata can be considered side-effect
> free, as they are basically calls to a static(-ish) knowledge base, we got
> that fixed. That is also needed for many of the other example functions.
> The problem in that case is how much caching can we get away with.
>
> So if we assume that Wikidata is available as a knowledge base for the
> functions, then I think that all the functions offered in the examples
> should be fine.
>
> What do you think?
>
> Thanks,
> Denny
>
>
>
>
>
> On Wed, Jul 22, 2020 at 8:28 AM Louis Lecailliez <
> louis.lecailliez(a)outlook.fr> wrote:
>
> > Hi Denny,
> >
> > while a lot of functions make sense while we stay in the software realm,
> > I'll note again that the ones concerning language have way too simplified
> > signature to fullfil their role.
> >
> > For example
> > * plural: English noun ➝ string
> > is gonna work as a pure function if and only if every irregular plural
> > forms of English (stuff like man/men) are hardcoded into the function
> > itself; and I think we all agree that's not a good engineering practice.
> > * plural: English noun, Dictionary<string, string> ➝ string
> > would be a more appropriate signature here, conceptually a least. In
> > reality, data will be pulled from Wikidata (if I'm not mistaken), which
> > mean the function has side effect and is not pure. It seems the
> problematic
> > of access to the data layer (whatever it is) is underspecified.
> >
> > I think it would be better to remove language generation signatures for
> > the time being to not give the impression the problem space is trivially
> > solvable by a function or two.
> >
> > Best regards,
> > Louis Lecailliez
> >
> > ------------------------------
> > *De :* Abstract-Wikipedia <abstract-wikipedia-bounces@
> lists.wikimedia.org>
> > de la part de Denny Vrandečić <dvrandecic(a)wikimedia.org>
> > *Envoyé :* mercredi 22 juillet 2020 14:30
> > *À :* Abstract Wikipedia list <abstract-wikipedia(a)lists.wikimedia.org>
> > *Objet :* [Abstract-wikipedia] A few examples of functions
> >
> > Hello all,
> >
> > I made a draft of possible function examples. I am not saying all of
> these
> > are useful, or we will have these, or that these are the right signatures
> > for the suggested functions, but this is more to illustrate the possible
> > scope of the project we are aiming for.
> >
> > https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Early_
> function_examples
> >
> > Feel free to add, discuss, improve.
> >
> > Stay safe,
> > Denny
> >
> > _______________________________________________
> > Abstract-Wikipedia mailing list
> > Abstract-Wikipedia(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> >
>
I'm loving all these emails but I find it hard to keep track of the
different topics. Someone find me a Wiki!
So, your question, Adam: "What do you think about conjugation and
declension functions?"
In general, I'm against them. (Well, I'm English. What would you expect?)
That said, I would expect correct inflections of natural language output
(and valid inferences from natural language input). I just wonder whether
we need specific functions for this and, if we do, whether they need to be
generic across languages. My view at the moment is that we need neither.
Instead, I see inflections as special cases of vocabulary and structure
selection. To refer to your example, Bobby and the ball can be related in
many ways, depending on the context. If we assume "Association Football"
(soccer), we might default to kicking. But if it's a foul, we should assume
he handled the ball. Otherwise, we can only assume (lawful) contact.
Staying on topic: "assume" is uninflected, so we might infer the present
tense; "kicking" might be a present participle but I think it's a gerund;
"it's" implies the present tense but "handled" implies the past (if it's
not passive). So, if I need to inflect the concept Bobby-kicking-the-ball,
I'm expecting to end up with "Bobby kicked the ball"... But maybe he
"didn't kick the ball". Maybe he "used to kick the ball" but now he's a
goalkeeper he "can handle the ball" without its being a foul.
So, if you like, we have some extra dimensions of inflection, to do with
negatives and interrogatives and modes and word function (should it be
"Bobby's handling the ball is lawful" or "Bobby handling the ball is
lawful" or "Bobby's handling OF the ball WAS lawful" or "handling the ball
was lawful for Bobby"). Considering only results, maybe these are just
minor tweaks of the near-final result. That's my "lazy evaluation"
perspective.
Perhaps it's worth re-visiting forms of a word. Let's say English has a
handful of forms ("kick", "kicks", "kick's", "kicks'", "kicking",
"kickings", "kicking's", "kickings'", "kicked"... and you might just treat
"kicked" as a noun) but empirically there are just three (what?!): "kick",
"kicking" and "kicked" (each of which has more than one use). Then there
are rules about adding an s (with or without an apostrophe) that are more
or less universal (and, of course, ambiguous). So my lazy English rules
would be: decide which of the three is best; add an s (or not); (if
written) use an apostrophe (or not). Okay, so that's not quite right, but I
think it's nearer to the truth than is generally supposed. The point is, if
we start with forms that actually occur (and here I would extend forms to
include auxiliary words) and then consider what rules different forms in
and out (when we need to), we won't necessarily end up with traditional
patterns. We might end up with something more interesting and more useful
(or of greater interest and more use, to "inflect" differently).
Best regards,
Alan [has general form "Al."]
On Wednesday, 22 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Conjugation and Declension Functions (Adam Sobieski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 22 Jul 2020 07:01:08 +0000
> From: Adam Sobieski <adamsobieski(a)hotmail.com>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wiki
> media.org>
> Subject: [Abstract-wikipedia] Conjugation and Declension Functions
> Message-ID:
> <CH2PR12MB4184E6AA57F4490441D7B7F6C5790(a)CH2PR12MB4184.namprd
> 12.prod.outlook.com>
>
> Content-Type: text/plain; charset="windows-1252"
>
> Louis indicated that we should consider modeling and approaches
> encompassing the features of all languages [1].
>
> Factors that affect verb conjugation are numerous and we can envision a
> verb conjugation function which gathers these factors together from across
> languages, a function resembling:
>
> conjugate(language, stem, person, number, gender, tense, aspect, mood,
> voice, formality, clusivity, transitivity, valency, evidentiality)
>
> The following example utilizes a Context parameter as we haven’t yet
> discussed global variables. The following example uses a “::” syntax for
> enums. The following example shows that, with multilingual conjugation and
> declension functions, we could encapsulate multilingual conjugation and
> declension in language-specific convenience functions, perhaps resembling:
>
> kick<T>(Context context, Noun agent, Noun patient)
> {
> join(" ", [
> agent,
> context.conjugate("en-US", "kick", person::second, agent.number,
> agent.gender, T, aspect::perfective, mood::indicative, voice::active,
> context.formality, clusivity::no, transitivity::transitive,
> valency::transitive, context.evidentiality),
> "the",
> patient
> ])
> }
>
> and, as expected:
>
> kick<tense::past>(context, Bobby, ball) would produce: “Bobby kicked the
> ball”.
>
> It appears that there are a multitude of ways to approach implementing
> NLG-related Wikilambda functions. Conjugation and declension functions
> could be components of some approaches.
>
> What do you think about conjugation and declension functions?
>
>
> Best regards,
> Adam
>
> [1] https://wals.info/
>
>
Hi! As suggested by Louis and seconded by Chris and Tiago on this list, and
since we had the Wikidata workshop already upcoming at ISWC, we decided to
extend the topics of the workshop to also cover Abstract Wikipedia. This
would be a peer-reviewed venue, with publications of the papers, etc. It is
virtually "co-located" with the ISWC2019.
Here is the call for papers.
*The First Wikidata Workshop*
Co-located with the 19th International Conference on Semantic Web (ISWC
2020).
Date: October 29, 2020
The workshop will be held online, afternoon European time.
Website: https://wikidataworkshop.github.io/
== Important dates ==
Papers due: August 10, 2020
Notification of accepted papers: September 11, 2020
Camera-ready papers due: September 21, 2020
Workshop date: October 29, 2020
== Overview ==
Wikidata is an openly available knowledge base, hosted by the Wikimedia
Foundation. It can be accessed and edited by both humans and machines and
acts as a common structured-data repository for several Wikimedia projects,
including Wikipedia, Wiktionary, and Wikisource. It is used in a variety of
applications by researchers and practitioners alike.
In recent years, we have seen an increase in the number of publications
around Wikidata. While there are several dedicated venues for the broader
Wikidata community to meet, none of them focuses on publishing original,
peer-reviewed research. This workshop fills this gap - we hope to provide a
forum to build this fledgling scientific community and promote novel work
and resources that support it.
The workshop seeks original contributions that address the opportunities
and challenges of creating, contributing to, and using a global,
collaborative, open-domain, multilingual knowledge graph such as Wikidata.
We encourage a range of submissions, including novel research, opinion
pieces, and descriptions of systems and resources, which are naturally
linked to Wikidata and its ecosystem, or enabled by it. What we’re less
interested in are works which use Wikidata alongside or in lieu of other
resources to carry out some computational task - unless the work feeds back
into the Wikidata ecosystem, for instance by improving or commenting on
some Wikidata aspect, or suggesting new design features, tools and
practices.
We also encourage submissions on the topic of Abstract Wikipedia,
particularly around collaborative code management, natural language
generation by a community, the abstract representation of knowledge, and
the interaction between Abstract Wikipedia and Wikidata on the one, and
Abstract Wikipedia and the language Wikipedias on the other side.
We welcome interdisciplinary work, as well as interesting applications
which shed light on the benefits of Wikidata and discuss areas of
improvement.
The workshop is planned as an interactive half-day event, in which most of
the time will be dedicated to discussions and exchange rather than frontal
presentations. For this reason, all accepted papers will be presented in
short talks and accompanied by a poster. We are considering online options
in response to ongoing challenges such as travel restrictions and the
recent Covid-19 pandemic.
== Topics ==
Topics of submissions include, but are not limited to:
- Data quality and vandalism detection in Wikidata
- Referencing in Wikidata
- Anomaly, bias, or novelty detection in Wikidata
- Algorithms for aligning Wikidata with other knowledge graphs
- The Semantic Web and Wikidata
- Community interaction in Wikidata
- Multilingual aspects in Wikidata
- Machine learning approaches to improve data quality in Wikidata
- Tools, bots and datasets for improving or evaluating Wikidata
- Participation, diversity and inclusivity aspects in the Wikidata ecosystem
- Human-bot interaction
- Managing knowledge evolution in Wikidata
- Abstract Wikipedia
== Submission guidelines ==
We welcome the following types of contributions:
- Full research paper: Novel research contributions (7-12 pages)
- Short research paper: Novel research contributions of smaller scope than
full papers (3-6 pages)
- Position paper: Well-argued ideas and opinion pieces, not yet in the
scope of a research contribution (6-8 pages)
- Resource paper: New dataset or other resource directly relevant to
Wikidata, including the publication of that resource (8-12 pages)
- Demo paper: New system critically enabled by Wikidata (6-8 pages)
Submissions must be as PDF or HTML, formatted in the style of the Springer
Publications format for Lecture Notes in Computer Science (LNCS). For
details on the LNCS style, see Springer’s Author Instructions.
The papers will be peer-reviewed by at least two researchers. Accepted
papers will be published as open access papers on CEUR (we only publish to
CEUR if the authors agree to have their papers published).
Papers have to be submitted through
easychair:https://easychair.org/conferences/?conf=wikidataworkshop2020
== Proceedings ==
The complete set of papers will be published with the CEUR Workshop
Proceedings (CEUR-WS.org).
== Organizing committee ==
- Lucie-Aimée Kaffee, University of Southampton
- Oana Tifrea-Marciuska, Bloomberg
- Elena Simperl, King’s College London
- Denny Vrandečić, Wikimedia Foundation
== Programme committee ==
- Dan Brickley, Google
- Andrew D. Gordon, Microsoft Research & University of Edinburgh
- Dennis Diefenbach, University Jean Monet
- Aidan Hogan, Universidad de Chile
- Markus Krötzsch, Technische Universität Dresden
- Edgar Meij, Bloomberg
- Claudia Müller-Birn, FU Berlin
- Finn Årup Nielsen, Technical University of Denmark
- Thomas Pellissier Tanon, Télécom ParisTech
- Lydia Pintscher, Wikidata, Wikimedia Deutschland
- Alessandro Piscopo, BBC
- Marco Ponza, University of Pisa
- Simon Razniewski, Max Planck Institute for Informatics
- Miriam Redi, Wikimedia Foundation
- Cristina Sarasua, University of Zurich
- Maria-Esther Vidal, TIB Hannover
- Pavlos Vougiouklis, Huawei Technologies, Edinburgh
- Zainan Victor Zhou, Google
Hello all,
I put a few early mock-ups on the wiki. It might be helpful to understand
where we are planning to go. Also, I am not a designer, and we will have
real designers go through them and there will be millions of improvements
on the way, and probably also one or two complete rethinks.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Early_mockups
So this is all very preliminary. Nothing of this is set in stone yet, it is
just slideware. Comments, inspirations, and discussion is very welcome.
I plan to add more, incorporating feedback, and also to show a few
workflows.
Stay safe,
Denny
Hi folks!
Since people have started talking about runtimes for the functions
defined in Wikilambda (see: V8 Engine andor WebAssembly thread), I
figured I should mention the project I’ve been working on for a few
weeks: GraalEneyj.[1] It’s supposed to be(come) a GraalVM-based
(re)implementation of the eneyj language/runtime[2] that Denny wrote and
which is currently part of the AbstractText extension.
GraalVM,[3] for those who haven’t heard of it yet, is a Java-based
project to build high-performance language implementations. There are
already GraalVM-based implementations of JavaScript,[4] R,[5] Ruby,[6]
LLVM,[7] WebAssembly,[8] and more,[9] and code snippets in any of these
languages can run in the same (J)VM ((Java) Virtual Machine) and call
each other with no performance penalty. This is of course interesting if
we’re thinking about Wikilambda holding function implementations in
different languages – with GraalEneyj, it should be possible to
seamlessly compose functions implemented in any language for which a
GraalVM implementation is available. GraalVM is also supposed to deliver
highly optimized language implementations with relatively little
development effort, but so far GraalEneyj is not yet at a point where I
could benchmark it.
If that sounds interesting to you, please get in touch! You can also
explore the code that I have so far (there’s not a ton of documentation
yet, but hopefully the README[10] is at least enough to get a basic idea
and run the tests), or watch the GraalEneyj walkthrough[11] that I
recorded last month. (At some point I should record another one, since
the code keeps changing, but for now that’s all I have to offer.)
Cheers,
Lucas Werkmeister
PS: In my day job I’m a software developer at Wikimedia Deutschland
e. V., but all of this is private-time activity so far, not work related.
[1]: https://github.com/lucaswerkmeister/graaleneyj/
[2]: https://github.com/google/abstracttext/tree/master/eneyj
[3]: https://www.graalvm.org/
[4]: https://github.com/graalvm/graaljs
[5]: https://github.com/oracle/fastr/
[6]: https://github.com/oracle/truffleruby
[7]: https://github.com/oracle/graal/tree/master/sulong
[8]: https://github.com/oracle/graal/tree/master/wasm
[9]: https://github.com/oracle/graal/blob/master/truffle/docs/Languages.md
[10]: https://github.com/lucaswerkmeister/graaleneyj/#readme
[11]: https://www.youtube.com/watch?v=mQVpcBMjSaw
Introduction
A document abstraction layer (DAL) could provide an API for natural language generation output while encapsulating a configurable and mechanical transformation to specific wikitext implementations. Natural language generation software could, then, output through a DAL to wikitext and this would facilitate certain conveniences and features.
Feature 1: Provenance, Sources and Derivations
A DAL could include a means for annotating portions of natural language with provenance data, sources or derivations. A DAL could provide an API for easily attaching provenance data to document elements (e.g. words, phrases, sentences, paragraphs, sections) and it would encapsulate handing the correct outputting of wikitext (e.g. <ref> and <references> elements) for output articles [1][2][3][4].
That is, one could attach provenance data, sources or derivations, to a particular sentence in a DAL object model and the DAL’s configurable, mechanical transformation to wikitext would encapsulate a specific implementation of outputting a <ref> element at the end of the sentence and ensuring that the automatically-generated wikitext article contained a <references> element in a “References” section.
Furthermore, a DAL could encapsulate the naming of citation instances (<ref name="multiple">) [2] to ensure proper formatting of the references section when materials are cited multiple times.
Feature 2: Computational Traces and Diagnostics
A DAL could include a means of attaching computational traces and diagnostic messages or information to specific portions of generated natural language output (e.g. words, phrases, sentences, paragraphs, sections).
While viewing articles which were automatically-generated in a “debug mode” or “developer mode”, developers could view and interact with content-specific computational traces or diagnostic messages or information in a number of ways (hoverboxes, margin notes, et cetera).
That is, a DAL could encapsulate outputting specific wikitext implementations for providing developers with a means of interacting with automatically-generated articles to view and to understand the computational processes involved in generating specific portions of natural language.
As citations and references are implemented by means of an extension [1], this set of features might require a new MediaWiki extension.
Feature 3: Versioning a Document Abstraction Layer
One could version a DAL, adding features to it, without impacting any existing natural language generation software code. One could also adjust specific output wikitext implementations for particular scenarios without impacting any natural language generation software code.
Conclusion
Should natural language generation output directly to wikitext or instead make use of an abstraction layer? What do you think of these ideas?
Best regards,
Adam Sobieski
[1] https://www.mediawiki.org/wiki/Extension:Cite
[2] https://www.mediawiki.org/wiki/Help:Cite
[3] https://en.wikipedia.org/wiki/Help:Cite_link_labels
[4] https://github.com/SemanticMediaWiki/SemanticCite/blob/master/README.md