Al,
Thank you for sharing the publication. Tools resembling Inference Inspector could be useful when editing rules on crowdsourced knowledgebases such as Wikidata. With such tools, users could better explore and understand the logical consequences of a hypothetical edit or of a number of simultaneous hypothetical edits in a batch.
Feedback for changes which impact a collection of automatically-generated articles is an interesting topic. In these regards, I think about the automatic evaluation of natural language generation with tools like Coh-Metrix [1] which measures natural language on 108 indices. I think about varieties of dashboards which could show Abstract Wikipedia developers measurements from across the entire collection of articles.
Best regards, Adam
From: Grounder UKmailto:grounderuk@gmail.com Sent: Sunday, July 26, 2020 1:36 PM To: abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org Subject: [Abstract-wikipedia] Reasoning over ontologies
Thanks, Adam.
I'm changing the subject just to get this "out there" because what you said about "intricate parse trees" reminded me of this paper, which I came across last night.
https://www.sciencedirect.com/science/article/pii/S1570826817300367
It occurs to me that our early stage of Content Selection might initially be realised or instantiated as a focused ontology (focused on some Wikidata Item of interest). The authors of this paper say:
"We developed the Inference Inspector, a novel Protégé plugin that alerts the author to the changes to key entailment sets that have occurred as a consequence of a modelling action."
If we replace "modelling action" with "NLG/human interaction", maybe we have ourselves an interesting feedback loop? Wikipedia contributors could interact with these extracted ontologies to see, in generated natural language, the consequences of changes. Here, the changes I have in mind are to the linguistic knowledge rather than the encyclopedic content, but they are not different in principle. The goal would be to evolve the Renderers (considered more as data than function, whilst acknowledging that the functions are also data). Continuously evolving Renderers are a deliverable from this project, aren't they?
I know Protégé has a pretty wide user base, but it doesn't include me. Does anyone have any practical experience they'd like to share? Best regards, Al. On Sunday, 26 July 2020, <abstract-wikipedia-request@lists.wikimedia.orgmailto:abstract-wikipedia-request@lists.wikimedia.org> wrote: Send Abstract-Wikipedia mailing list submissions to abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia or, via email, send a message with subject or body 'help' to abstract-wikipedia-request@lists.wikimedia.orgmailto:abstract-wikipedia-request@lists.wikimedia.org
You can reach the person managing the list at abstract-wikipedia-owner@lists.wikimedia.orgmailto:abstract-wikipedia-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Abstract-Wikipedia digest..."
Today's Topics:
1. Re: Conjugation and Declension Functions (Adam Sobieski) (Adam Sobieski)
----------------------------------------------------------------------
Message: 1 Date: Sun, 26 Jul 2020 15:22:07 +0000 From: Adam Sobieski <adamsobieski@hotmail.commailto:adamsobieski@hotmail.com> To: "General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions (Adam Sobieski) Message-ID: <CH2PR12MB4184F449E9BE14DF0C61D8F5C5750@CH2PR12MB4184.namprd12.prod.outlook.commailto:CH2PR12MB4184F449E9BE14DF0C61D8F5C5750@CH2PR12MB4184.namprd12.prod.outlook.com>
Content-Type: text/plain; charset="windows-1252"
Al,
Inverse parsing makes sense. The parse tree in your example is expressive. Your example also shows a means of describing nouns or noun phrases as being Wikidata entities which would facilitate producing hyperlinks in resultant articles.
It could be that some of the natural language generation functions on Wikilambda will, instead of concatenating strings, work with trees or tree-based data. The classes for things like phrases, clauses, sentences, and so forth, could be tree-based or contain tree-based data. The matter could be one of producing intricate parse trees (or using an equivalent object model) from Wikidata data to then inverse parse into natural language sentences.
Best regards, Adam
From: Grounder UK<mailto:grounderuk@gmail.commailto:grounderuk@gmail.com> Sent: Saturday, July 25, 2020 8:12 PM To: abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org<mailto:abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org> Subject: [Abstract-wikipedia] Conjugation and Declension Functions (Adam Sobieski)
You're welcome, Adam.
Far be it from me to disagree with Reiter and Dale (and I don't). I find it odd, however, that you put "hit<…>(Context, Array)". I wonder how, in our Wikipedic context, we get the idea of a verb? And why do we put it outside our array?
The reality is that Wikipedias are about things, in the main: people, places, objects, types and historical events. These things have properties and different types of involvement in events, but I guess more than half the time we have to invent (or select or find...) a suitable verb for the set of objects we find to be related (from the claims in Wikidata). If English didn't demand a finite verb, our Wikipedia would manage perfectly well without them, just as Wikidata does (although they "lurk" within Properties).
To my way of thinking, if there is any substance of note in the sense of the verb, it must be in the array of Senses (never mind their lexical form). Of course, you might reason that we start with an array of "involved senses" and infer the verb, which we can then inflect. It's part of Lexicalization rather than Realization, in other words. Maybe that is generally so, but I am not convinced that we should immediately assume that it must be so in our Wikipedic context.
For me, in any event, Realization in our context is just the final transformation of a near-final representation of our content, so I wouldn't expect it to change the sequence of ideas or change around syntactic roles. If you're deriving roles and sequences, then you're doing Lexicalization (or you're at a higher level) and you won't generally know what the verb is going to be at that stage (how could you?). It could be "strike", "hit" or "bat", for example. And the involvement of a bat in our array might be a good reason to choose the verb "bat" or a good reason to avoid it, depending on the particular context. If we choose "bat" as the verb, we will probably also drop it from the instrument role because that role is implied by the verb ("hit with a bat"). That "fact" will also keep us from choosing the verb "bat" if we need to refer to the bat as an object (to specify that it was her father's bat, for example). If the verb "bat" is going to subsume the instrument role, the instrument must be wholly indefinite ("hit with 'any old' bat" rather than "hit with one of the bats").
Returning to Reiter and Dale, I think it makes sense to view this problem in reverse, as inverse parsing. As they say, you can parse the three forms into a single "logical form" (which some might call "abstract content") and there must be some loss of information in this process, since we cannot guarantee that we will get back to the form we started with if we invert the parsing. It is this lost information that we need to retain when we have parsed (for future reference) and to have available before we proceed to Realize (or "render"). Our logical "manual of style" is what we might use to derive the missing information that we must have available, if we cannot infer it from the selected Wikidata content.
To begin again at the end, I might say we can parse "Bobby batted the ball" as something like: Sentence< {comment: here, Sentence = Subject + Verb Phrase} Subject< Noun Phrase< Noun <Alias<"Bobby">, entity<Q???>> >> + Verb Phrase< {comment: here, Verb Phrase = Verb + Instrument + Direct Object} Verb< Sense<"hit">, Tense<simple past>, Voice<active>> + Instrument< Noun Phrase< Noun< Sense<"bat">> + Determiner<indefinite>> + Direct Object< Noun Phrase< Noun< Sense<"ball">> + Determiner<definite>> >>.
Assuming I've parsed that correctly (and it's been a few years since I had to), we can assert that this "logical form" shall be valid input into our Realization function. What do we get back? "Bobby hit the ball with a bat". I had to cheat, of course. The reason, I suppose, is that the subsumed noun phrase was prepositional and non-prepositional noun phrases, in English, tend to attach themselves to their verbs ("I gave him it" versus "I gave it to him", for example). So when I surfaced the subsumed instrumental, "with a bat" (which is more interpretation than parsing), the result was naturally in the wrong order, coming between the verb and the existing non-prepositional noun phrase, "the ball". The order shouldn't matter to the renderer, unless it's explicit. But I did say that I wouldn't expect Realization to change the sequence of ideas, and then it had to because "Bobby hit with a bat the ball" is un-English in any unfanciful context (although perfectly understandable). Full disclosure, as they say!
So, if interpretive parsing surfaces subsumed elements, some Lexicalization may be required to establish a "logical form" that is valid input into Realization. Alternatively, we might consider that to occur within the inverse of Referring Expression Generation, since it is loosely analogous to surfacing nouns from pronouns and might be considered to be a kind of anaphora (locally, at least, it is likely to be exophoric: "batted" implies an unspecified bat, as discussed above, but surfacing it renders it endophoric, if you'll forgive the pun, and thus anaphoric at that level).
By the way, Reiter claims to have written or admits to writing the main articles on natural language generation in the English Wikipedia. You can find the links in: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Related_and_previous_work....
Best regards, Al.
On Saturday, 25 July 2020, <abstract-wikipedia-request@lists.wikimedia.orgmailto:abstract-wikipedia-request@lists.wikimedia.org<mailto:abstract-wikipedia-request@lists.wikimedia.orgmailto:abstract-wikipedia-request@lists.wikimedia.org>> wrote: Send Abstract-Wikipedia mailing list submissions to abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org<mailto:abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org>
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia or, via email, send a message with subject or body 'help' to abstract-wikipedia-request@lists.wikimedia.orgmailto:abstract-wikipedia-request@lists.wikimedia.org<mailto:abstract-wikipedia-request@lists.wikimedia.orgmailto:abstract-wikipedia-request@lists.wikimedia.org>
You can reach the person managing the list at abstract-wikipedia-owner@lists.wikimedia.orgmailto:abstract-wikipedia-owner@lists.wikimedia.org<mailto:abstract-wikipedia-owner@lists.wikimedia.orgmailto:abstract-wikipedia-owner@lists.wikimedia.org>
When replying, please edit your Subject line so it is more specific than "Re: Contents of Abstract-Wikipedia digest..."
Today's Topics:
1. Re: Conjugation and Declension Functions (Adam Sobieski)
----------------------------------------------------------------------
Message: 1 Date: Sat, 25 Jul 2020 17:43:55 +0000 From: Adam Sobieski <adamsobieski@hotmail.commailto:adamsobieski@hotmail.com<mailto:adamsobieski@hotmail.commailto:adamsobieski@hotmail.com>> To: "General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org<mailto:abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org>> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions Message-ID: <CH2PR12MB41848A048AE721573F59A218C5740@CH2PR12MB4184.namprd12.prod.outlook.commailto:CH2PR12MB41848A048AE721573F59A218C5740@CH2PR12MB4184.namprd12.prod.outlook.com<mailto:CH2PR12MB41848A048AE721573F59A218C5740@CH2PR12MB4184.namprd12.prod.outlook.commailto:CH2PR12MB41848A048AE721573F59A218C5740@CH2PR12MB4184.namprd12.prod.outlook.com>>
Content-Type: text/plain; charset="windows-1252"
Thank you all for the comments and feedback thus far.
I would like to indicate that we could also utilize the expressiveness of JavaScript/TypeScript for array literals with object literal elements. In this alternative approach, arguments could be placed into an array of objects which each attach a thematic role to an argument. “Bobby”, “ball” and “bat” could be objects of type Noun, and we could pass to a function an array literal of object literals to get at the desired concepts of: (1) sequence, (2) roles, (3) values. That is rather the matter: attaching thematic roles to elements of a sequence of arguments.
To the example pseudocode:
Noun Bobby;
Noun ball;
Noun bat;
hit<voice::active, tense::past, aspect::perfective, mood::indicative>(context, [ {role: ‘agent’, value: Bobby}, {role: ‘patient’, value: ball}, {role: ‘instrument’, value: bat} ]) --> “Bobby hit the ball with the bat”
hit<voice::active, tense::past, aspect::perfective, mood::indicative>(context, [ {role: ‘agent’, value: Bobby}, {role: ‘instrument’, value: bat}, {role: ‘patient’, value: ball} ]) --> “Bobby, with the bat, hit the ball”
hit<voice::active, tense::past, aspect::perfective, mood::indicative>(context, [ {role: ‘instrument’, value: bat}, {role: ‘agent’, value: Bobby}, {role: ‘patient’, value: ball} ]) --> “With the bat, Bobby hit the ball”
or, if the grammatical arguments (voice, tense, aspect, mood) are properties of the context object:
hit(context, [ {role: ‘agent’, value: Bobby}, {role: ‘patient’, value: ball}, {role: ‘instrument’, value: bat} ]) --> “Bobby hit the ball with the bat”
hit(context, [ {role: ‘agent’, value: Bobby}, {role: ‘instrument’, value: bat}, {role: ‘patient’, value: ball} ]) --> “Bobby, with the bat, hit the ball”
hit(context, [ {role: ‘instrument’, value: bat}, {role: ‘agent’, value: Bobby}, {role: ‘patient’, value: ball} ]) --> “With the bat, Bobby hit the ball”
A downside would be that the function signatures would resemble:
hit<…>(Context, Array)
or
hit<…>(Context, Array<RoleNounPair>)
An upside would be expressiveness with which to indicate desired output paraphrases.
These topics are discussed in Building Natural Language Generation Systems by Reiter and Dale [1], in chapter 6 on Surface Realization, in section 6.8 on Bidirectional Grammars, pages 194 – 195, where the authors indicate that while “a parser might produce the same logical form for the following three sentences:
1. Mary gave John a ball. 2. Mary gave a ball to John. 3. John was given a ball by Mary.
“The input to a realizer, however, should explicitly provide a means of choosing among these forms; the alternative is to choose at random.” (as noted, another alternative is to select using a manual of style).
The authors also indicate that Winograd [2] provided a list of paraphrases for consideration:
1. Jon bought a painting for Vina. 2. Jon bought Vina a painting. 3. Vina was bought a painting by Jon. 4. What Jon bought Vina was a painting. 5. What Vina was bought by Jon was a painting. 6. It was a painting that Jon bought for Vina. 7. It was Jon that bought Vina a painting. 8. It was Vina that Jon bought a painting for.
On the topic of “a” or “the”, definite or indefinite noun phrases, we can note a discussion in Reiter and Dale [1], section 5.4.2, on page 145. That is another important discussion topic.
In the set of approaches under discussion, by making use of grammatical arguments (e.g. voice, tense, aspect, mood) and by placing thematic roles on elements of a sequence of arguments, we have an expressiveness beyond that of predicate calculus with which to distinguish paraphrases.
Best regards,
Adam
[1] Reiter, Ehud, and Robert Dale. Building natural language generation systems. Cambridge university press, 2000.
[2] Winograd, Terry. "Language as a cognitive process." (1983).
From: Adam Sobieski<mailto:adamsobieski@hotmail.commailto:adamsobieski@hotmail.com<mailto:adamsobieski@hotmail.commailto:adamsobieski@hotmail.com>> Sent: Friday, July 24, 2020 7:07 PM To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org<mailto:abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org>> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
In addition to broaching the discovery, encoding and reuse of language-specific patterns, which I view as a discussion topic when comparing and contrasting approaches (see: the DRY principle [1] and the abstraction principle [2]), and indicating how conjugation and declension functions would facilitate the discovery, encoding and reuse of these patterns, I would like to share some topics which arose as I explored adding the thematic relation [3] of instrument to the agent-patient pair.
When exploring how best to add the thematic relation of instrument to the agent-patient pair (e.g. adding “using the bat” to “Bobby hit the ball”), I observed that, for the same input grammatical arguments, there was a set of possible output paraphrases:
1. “Bobby hit the ball using the bat” 2. “Bobby, using the bat, hit the ball” 3. “Using the bat, Bobby hit the ball”
I wondered: how might we be able to generate each?
One possibility is indicated. There could be a type for each thematic relation [3]. Then, using explicit type conversions to these types, we could have different functions for different sequences of input arguments.
That is,
hit<voice::active, tense::past, aspect::perfective, mood::indicative>(context, (Agent)Bobby, (Patient)ball, (Instrument)bat) --> “Bobby hit the ball using the bat” hit<voice::active, tense::past, aspect::perfective, mood::indicative>(context, (Agent)Bobby, (Instrument)bat, (Patient)ball) --> “Bobby, using the bat, hit the ball” hit<voice::active, tense::past, aspect::perfective, mood::indicative>(context, (Instrument)bat, (Agent)Bobby, (Patient)ball) --> “Using the bat, Bobby hit the ball”
or
hit<voice::active, tense::past, aspect::perfective, mood::indicative>(context, Bobby as Agent, ball as Patient, bat as Instrument) --> “Bobby hit the ball using the bat” hit<voice::active, tense::past, aspect::perfective, mood::indicative>(context, Bobby as Agent, bat as Instrument, ball as Patient) --> “Bobby, using the bat, hit the ball” hit<voice::active, tense::past, aspect::perfective, mood::indicative>(context, bat as Instrument, Bobby as Agent, ball as Patient) --> “Using the bat, Bobby hit the ball”
These functions could each wrap the use of patterns (as indicated in previous email) and attempt to realize output sentences utilizing the arguments in the same sequence in which they were provided. This would, however, mean that the callers of the functions would be responsible for “shuffling” the input arguments to express the desired paraphrase ((1), (2), (3)).
There are, of course, other approaches to consider and other possibilities to consider with respect to addressing the matter of outputting the example paraphrases.
Any thoughts on these topics?
Best regards, Adam
[1] https://en.wikipedia.org/wiki/Don%27t_repeat_yourself [2] https://en.wikipedia.org/wiki/Abstraction_principle_(computer_programming) [3] https://en.wikipedia.org/wiki/Thematic_relation
P.S.: We could also put the grammatical arguments on the context object (context.voice, context.tense, context.aspect, context.mood, et cetera) and then make use of generic parameters for other uses.
From: Adam Sobieski<mailto:adamsobieski@hotmail.commailto:adamsobieski@hotmail.com<mailto:adamsobieski@hotmail.commailto:adamsobieski@hotmail.com>> Sent: Thursday, July 23, 2020 7:38 PM To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org<mailto:abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org>> Subject: RE: Conjugation and Declension Functions
I had an idea about discovering, encoding and reusing patterns in languages that I would like to share with the group.
The following is rough-draft pseudocode for a function with which to generate sentences for the verb “to kick”. The function has generic parameters for voice and tense and parameters for context, agent and patient.
Clause kick<V, T>(Context context, Noun agent, Noun patient)
{
switch(V)
{
case voice::active:
switch(T)
{
case tense::past:
case tense::present:
return join(" ", [
agent,
conjugate_en(“kick”, V, T, …),
“the”,
patient
]);
case tense::future:
return join(" ", [
agent,
“will”,
conjugate_en(“kick”, V, T, …),
“the”,
patient
]);
…
}
case voice::passive:
switch(T)
{
case tense::past:
return join(" ", [
“the”,
patient,
“was”,
conjugate_en(“kick”, V, T, …),
“by”
agent
]);
case tense::present:
return join(" ", [
“the”,
patient,
“is”,
“being”,
conjugate_en(“kick”, V, T, …),
“by”
agent
]);
…
}
}
}
and
kick<voice::active, tense::past>(context, Bobby, ball) returns “Bobby kicked the ball”.
In English, most combinations of tense, aspect, mood and voice are expressed periphrastically, using constructions with auxiliary verbs. After implementing a number of these for similar verbs, per the above example, we would notice a pattern. That pattern could be expressed as something resembling:
Clause pattern123<V, T>(Context context, Verb verb, Noun agent, Noun patient)
{
switch(V)
{
case voice::active:
switch(T)
{
case tense::past:
case tense::present:
return join(" ", [
agent,
conjugate_en(verb, V, T, …),
“the”,
patient
]);
case tense::future:
return join(" ", [
agent,
“will”,
conjugate_en(verb, V, T, …),
“the”,
patient
]);
…
}
case voice::passive:
switch(T)
{
case tense::past:
return join(" ", [
“the”,
patient,
“was”,
conjugate_en(verb, V, T, …),
“by”
agent
]);
case tense::present:
return join(" ", [
“the”,
patient,
“is”,
“being”,
conjugate_en(verb, V, T, …),
“by”
agent
]);
…
}
}
}
We could then simply express that the verb “to kick” is an instance of pattern pattern123 with something resembling:
Clause kick<V, T>(Context context, Noun agent, Noun patient)
{
pattern123<V, T>(context, new Verb(“kick”, …), agent, patient);
}
or
Clause kick<V, T>(Context context, Noun agent, Noun patient)
{
pattern123<V, T>(context, getVerb(“https://%E2%80%A6#kick%E2%80%9D), agent, patient);
}
and this pattern could be reused for a large number of verbs.
This use of patterns should extend to scenarios where there are more parameters for other thematically-related nouns:
Clause kick<V, T>(Context context, Noun agent, Noun patient, Noun instrument)
{
pattern1234<V, T>(context, new Verb(“kick”, …), agent, patient, instrument);
}
For English, there might be more generic parameters than those for voice and tense; we could add those for aspect and mood (e.g. pattern1234<V, T, A, M>(…)).
In conclusion, with generic programming and functions for conjugation and declension, we have expressiveness with which to discover, encode and reuse language-specific patterns.
Best regards, Adam Sobieski