Hi, I am new in this abstract-wikipedia mail list.
If should be two layers of information?
First: net of facts, like
France (Q142) is a country (Q6256).
<Q142> <rel_is> <Q6256> .
and all detailed facts which should be understanded by computer.
Second. usual human-readable sentences like Wikipedia, except they will
language-independent like UNL?
Or combine this two approaches?
Hi, Andy! Welcome!
I do like your idea of being clear about basic "facts" and details. I think
it will be key in the selection of "statements" that go into an "article",
in whatever language is required. I don't think we can say how many levels
of information there might be, but we can already see something from how
Wikipedia pages are put into categories.
"France is a country in Europe" and "in western Europe" and "in the
European Union", just to mention three categories. The first is an
important fact of geography, but is the second more helpful? All countries
in western Europe are (1) a country and (2) in Europe and (3) to the west.
(3) feels more like a detail, but if we tell you France is in Europe, what
is the first question you might ask? It might be, "Is it in the European
Union?" or "How big is it?" or "Do many people live there?" So I would
expect us to give you those facts or details (FAQs) as well.
Facts about facts and statements about claims are a whole other topic, but
if a "fact" is disputed, we do need to know how to show this. If you look
at Wikidata, you will see that the United Kingdom has been a sovereign
state since 1927. This is untrue. But if 1927 is not the answer to the
question "How long has the UK been a country (or sovereign state)?", what
is? "Since 1707, 1801 or 1922", depending on the details. Luckily for you,
France has "always" been a country, despite now being the fifth republic
(since 1958).
So, sometimes the Property of an entity is not a simple value or
relationship. It might be better to think about it as a relationship to a
"disagreement" or debate. Then, a "fact" is an entity's relationship to an
absence of "disagreement", a "consensus", as Wikipedia would call it. Part
of this consensus is the meaning of an entity's label. For example, English
Wikipedia thinks "oxygen" is the chemical element ("O") and "its most
stable form" ("O<sub>2</sub>", "dioxygen"). French Wikipedia thinks
"oxygène" is just the element. Wikidata has statements (mostly) about the
element but the "Identifiers" (external authorities) are for the English
Wikipedia concept, not the French one. The point is, it is clear that there
might be some confusion! We have a separate item for dioxygen and for ozone
and (in theory) for atomic oxygen (and there are others) so we can give you
all of the oxygen facts, mostly grouped by form (allotrope and/or state).
Think of that as a disambiguation page enriched with detail... It's an
interesting use case (or test case), I think.
Best regards,
Al.
On Tuesday, 28 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. All work is preliminary (Denny Vrandečić)
> 2. Two different kinds of information? (Andy)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 27 Jul 2020 12:43:05 -0700
> From: Denny Vrandečić <dvrandecic(a)wikimedia.org>
> To: Abstract Wikipedia list <abstract-wikipedia(a)lists.wikimedia.org>
> Subject: [Abstract-wikipedia] All work is preliminary
> Message-ID:
> <CA+bik1dNtpbA3H2_O=8H8iyNrBPMbpQeAaOb04EpEaoLxCWSZQ(a)mail.gm
> ail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello all,
>
> one of the things we have been discussing in the team is that we want to do
> as much of our work in the open. At the same time, we're a distributed team
> and starting to form a shared understanding of the task at hand. Due to the
> COVID situation, we didn't have the opportunity to have a project kick off,
> where we meet for a few days and make sure that we are fully aligned and
> use the same words and have the same thinking.
>
> That's both an opportunity, but also a risk, as it might lead to divergence
> in what we are saying and writing.
>
> We have two possible ways forward - either we vet documents and discussions
> internally every time, in order to present a more unified view on the
> project, or we just drop that and we publish our documents and plans in the
> open immediately, with the understanding that this is merely preliminary,
> that there might be inconsistencies. We might discuss and disagree with
> each other publicly in Phabricator tasks and on this mailing list and on
> the wiki pages - but in the end, this is also an opportunity to together
> with you build a common understanding and share the process of developing
> the project vision and implementation.
>
> So, in that light, we still have a small backlog of internal documents that
> we want to get out, and by the end of this week, most of the state of the
> work should be in the open, and we will move more and more of our
> discussions to the public, to eventually have them all in the open.
>
> Here is a document I have been working on for a while, it is the core model
> of how the evaluation and representation of data, functions, and function
> calls in Wikilambda may work. Again, there is no agreement on this yet. It
> differs from the AbstractText prototype implementation, and there is a list
> of main differences at the end, and it also has not all the answers yet.
>
> Thanks to, particularly Arthur P. Smith for many comments and rewriting of
> some of the sections, thanks to Lucas Werkmeister for his valuable input
> (and, even more important, for his work on GraalEneyj), thanks to Cyrus
> Omar for his advice and pointers, and thanks to Adam Baso, James Forrester,
> and Nick Wilson for their internal comments.
>
> https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Function_model
>
> Feedback on this would be extremely valuable, and you can see there are
> many open questions left.
>
> Stay safe,
> Denny
>
Thanks, Adam.
I'm changing the subject just to get this "out there" because what you said
about "intricate parse trees" reminded me of this paper, which I came
across last night.
https://www.sciencedirect.com/science/article/pii/S1570826817300367
It occurs to me that our early stage of Content Selection might initially
be realised or instantiated as a focused ontology (focused on some Wikidata
Item of interest). The authors of this paper say:
"We developed the Inference Inspector, a novel Protégé plugin that alerts
the author to the changes to key entailment sets that have occurred as a
consequence of a modelling action."
If we replace "modelling action" with "NLG/human interaction", maybe we
have ourselves an interesting feedback loop? Wikipedia contributors could
interact with these extracted ontologies to see, in generated natural
language, the consequences of changes. Here, the changes I have in mind are
to the linguistic knowledge rather than the encyclopedic content, but they
are not different in principle. The goal would be to evolve the Renderers
(considered more as data than function, whilst acknowledging that the
functions are also data). Continuously evolving Renderers are a deliverable
from this project, aren't they?
I know Protégé has a pretty wide user base, but it doesn't include me. Does
anyone have any practical experience they'd like to share?
Best regards,
Al.
On Sunday, 26 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Conjugation and Declension Functions (Adam Sobieski)
> (Adam Sobieski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 26 Jul 2020 15:22:07 +0000
> From: Adam Sobieski <adamsobieski(a)hotmail.com>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wiki
> media.org>
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
> (Adam Sobieski)
> Message-ID:
> <CH2PR12MB4184F449E9BE14DF0C61D8F5C5750(a)CH2PR12MB4184.namprd
> 12.prod.outlook.com>
>
> Content-Type: text/plain; charset="windows-1252"
>
> Al,
>
> Inverse parsing makes sense. The parse tree in your example is expressive.
> Your example also shows a means of describing nouns or noun phrases as
> being Wikidata entities which would facilitate producing hyperlinks in
> resultant articles.
>
> It could be that some of the natural language generation functions on
> Wikilambda will, instead of concatenating strings, work with trees or
> tree-based data. The classes for things like phrases, clauses, sentences,
> and so forth, could be tree-based or contain tree-based data. The matter
> could be one of producing intricate parse trees (or using an equivalent
> object model) from Wikidata data to then inverse parse into natural
> language sentences.
>
>
> Best regards,
> Adam
>
> From: Grounder UK<mailto:grounderuk@gmail.com>
> Sent: Saturday, July 25, 2020 8:12 PM
> To: abstract-wikipedia(a)lists.wikimedia.org<mailto:abstract-wikip
> edia(a)lists.wikimedia.org>
> Subject: [Abstract-wikipedia] Conjugation and Declension Functions (Adam
> Sobieski)
>
> You're welcome, Adam.
>
> Far be it from me to disagree with Reiter and Dale (and I don't). I find
> it odd, however, that you put "hit<…>(Context, Array)". I wonder how, in
> our Wikipedic context, we get the idea of a verb? And why do we put it
> outside our array?
>
> The reality is that Wikipedias are about things, in the main: people,
> places, objects, types and historical events. These things have properties
> and different types of involvement in events, but I guess more than half
> the time we have to invent (or select or find...) a suitable verb for the
> set of objects we find to be related (from the claims in Wikidata). If
> English didn't demand a finite verb, our Wikipedia would manage perfectly
> well without them, just as Wikidata does (although they "lurk" within
> Properties).
>
> To my way of thinking, if there is any substance of note in the sense of
> the verb, it must be in the array of Senses (never mind their lexical
> form). Of course, you might reason that we start with an array of "involved
> senses" and infer the verb, which we can then inflect. It's part of
> Lexicalization rather than Realization, in other words. Maybe that is
> generally so, but I am not convinced that we should immediately assume that
> it must be so in our Wikipedic context.
>
> For me, in any event, Realization in our context is just the final
> transformation of a near-final representation of our content, so I wouldn't
> expect it to change the sequence of ideas or change around syntactic roles.
> If you're deriving roles and sequences, then you're doing Lexicalization
> (or you're at a higher level) and you won't generally know what the verb is
> going to be at that stage (how could you?). It could be "strike", "hit" or
> "bat", for example. And the involvement of a bat in our array might be a
> good reason to choose the verb "bat" or a good reason to avoid it,
> depending on the particular context. If we choose "bat" as the verb, we
> will probably also drop it from the instrument role because that role is
> implied by the verb ("hit with a bat"). That "fact" will also keep us from
> choosing the verb "bat" if we need to refer to the bat as an object (to
> specify that it was her father's bat, for example). If the verb "bat" is
> going to subsume the instrument role, the instrument must be wholly
> indefinite ("hit with 'any old' bat" rather than "hit with one of the
> bats").
>
> Returning to Reiter and Dale, I think it makes sense to view this problem
> in reverse, as inverse parsing. As they say, you can parse the three forms
> into a single "logical form" (which some might call "abstract content") and
> there must be some loss of information in this process, since we cannot
> guarantee that we will get back to the form we started with if we invert
> the parsing. It is this lost information that we need to retain when we
> have parsed (for future reference) and to have available before we proceed
> to Realize (or "render"). Our logical "manual of style" is what we might
> use to derive the missing information that we must have available, if we
> cannot infer it from the selected Wikidata content.
>
> To begin again at the end, I might say we can parse "Bobby batted the
> ball" as something like: Sentence< {comment: here, Sentence = Subject +
> Verb Phrase}
> Subject< Noun Phrase< Noun <Alias<"Bobby">, entity<Q???>> >>
> + Verb Phrase< {comment: here, Verb Phrase = Verb + Instrument + Direct
> Object}
> Verb< Sense<"hit">, Tense<simple past>, Voice<active>>
> + Instrument< Noun Phrase< Noun< Sense<"bat">> + Determiner<indefinite>>
> + Direct Object< Noun Phrase< Noun< Sense<"ball">> + Determiner<definite>>
> >>.
>
> Assuming I've parsed that correctly (and it's been a few years since I had
> to), we can assert that this "logical form" shall be valid input into our
> Realization function. What do we get back? "Bobby hit the ball with a bat".
> I had to cheat, of course. The reason, I suppose, is that the subsumed noun
> phrase was prepositional and non-prepositional noun phrases, in English,
> tend to attach themselves to their verbs ("I gave him it" versus "I gave it
> to him", for example). So when I surfaced the subsumed instrumental, "with
> a bat" (which is more interpretation than parsing), the result was
> naturally in the wrong order, coming between the verb and the existing
> non-prepositional noun phrase, "the ball". The order shouldn't matter to
> the renderer, unless it's explicit. But I did say that I wouldn't expect
> Realization to change the sequence of ideas, and then it had to because
> "Bobby hit with a bat the ball" is un-English in any unfanciful context
> (although perfectly understandable). Full disclosure, as they say!
>
> So, if interpretive parsing surfaces subsumed elements, some
> Lexicalization may be required to establish a "logical form" that is valid
> input into Realization. Alternatively, we might consider that to occur
> within the inverse of Referring Expression Generation, since it is loosely
> analogous to surfacing nouns from pronouns and might be considered to be a
> kind of anaphora (locally, at least, it is likely to be exophoric: "batted"
> implies an unspecified bat, as discussed above, but surfacing it renders it
> endophoric, if you'll forgive the pun, and thus anaphoric at that level).
>
> By the way, Reiter claims to have written or admits to writing the main
> articles on natural language generation in the English Wikipedia. You can
> find the links in: https://meta.wikimedia.org/wik
> i/Abstract_Wikipedia/Related_and_previous_work/Natural_language_generation
> .
>
> Best regards,
> Al.
>
> On Saturday, 25 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org
> <mailto:abstract-wikipedia-request@lists.wikimedia.org>> wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org<mailto:abstract-wikip
> edia(a)lists.wikimedia.org>
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org<mailto:abstra
> ct-wikipedia-request(a)lists.wikimedia.org>
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org<mailto:abstract
> -wikipedia-owner(a)lists.wikimedia.org>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Conjugation and Declension Functions (Adam Sobieski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 25 Jul 2020 17:43:55 +0000
> From: Adam Sobieski <adamsobieski(a)hotmail.com<mailto:
> adamsobieski(a)hotmail.com>>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wiki
> media.org<mailto:abstract-wikipedia@lists.wikimedia.org>>
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
> Message-ID:
> <CH2PR12MB41848A048AE721573F59A218C5740(a)CH2PR12MB4184.namprd
> 12.prod.outlook.com<mailto:CH2PR12MB41848A048AE721573F59A218
> C5740(a)CH2PR12MB4184.namprd12.prod.outlook.com>>
>
> Content-Type: text/plain; charset="windows-1252"
>
> Thank you all for the comments and feedback thus far.
>
>
>
> I would like to indicate that we could also utilize the expressiveness of
> JavaScript/TypeScript for array literals with object literal elements. In
> this alternative approach, arguments could be placed into an array of
> objects which each attach a thematic role to an argument. “Bobby”, “ball”
> and “bat” could be objects of type Noun, and we could pass to a function an
> array literal of object literals to get at the desired concepts of: (1)
> sequence, (2) roles, (3) values. That is rather the matter: attaching
> thematic roles to elements of a sequence of arguments.
>
>
>
> To the example pseudocode:
>
>
>
> Noun Bobby;
>
> Noun ball;
>
> Noun bat;
>
>
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, [ {role: ‘agent’, value: Bobby}, {role:
> ‘patient’, value: ball}, {role: ‘instrument’, value: bat} ]) --> “Bobby hit
> the ball with the bat”
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, [ {role: ‘agent’, value: Bobby}, {role:
> ‘instrument’, value: bat}, {role: ‘patient’, value: ball} ]) --> “Bobby,
> with the bat, hit the ball”
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, [ {role: ‘instrument’, value: bat}, {role:
> ‘agent’, value: Bobby}, {role: ‘patient’, value: ball} ]) --> “With the
> bat, Bobby hit the ball”
>
>
>
> or, if the grammatical arguments (voice, tense, aspect, mood) are
> properties of the context object:
>
>
>
> hit(context, [ {role: ‘agent’, value: Bobby}, {role: ‘patient’, value:
> ball}, {role: ‘instrument’, value: bat} ]) --> “Bobby hit the ball with the
> bat”
>
> hit(context, [ {role: ‘agent’, value: Bobby}, {role: ‘instrument’, value:
> bat}, {role: ‘patient’, value: ball} ]) --> “Bobby, with the bat, hit the
> ball”
>
> hit(context, [ {role: ‘instrument’, value: bat}, {role: ‘agent’, value:
> Bobby}, {role: ‘patient’, value: ball} ]) --> “With the bat, Bobby hit the
> ball”
>
>
>
> A downside would be that the function signatures would resemble:
>
>
>
> hit<…>(Context, Array)
>
>
>
> or
>
>
>
> hit<…>(Context, Array<RoleNounPair>)
>
>
>
> An upside would be expressiveness with which to indicate desired output
> paraphrases.
>
>
>
> These topics are discussed in Building Natural Language Generation Systems
> by Reiter and Dale [1], in chapter 6 on Surface Realization, in section 6.8
> on Bidirectional Grammars, pages 194 – 195, where the authors indicate that
> while “a parser might produce the same logical form for the following three
> sentences:
>
>
>
> 1. Mary gave John a ball.
> 2. Mary gave a ball to John.
> 3. John was given a ball by Mary.
>
>
>
> “The input to a realizer, however, should explicitly provide a means of
> choosing among these forms; the alternative is to choose at random.” (as
> noted, another alternative is to select using a manual of style).
>
>
>
> The authors also indicate that Winograd [2] provided a list of paraphrases
> for consideration:
>
>
>
> 1. Jon bought a painting for Vina.
> 2. Jon bought Vina a painting.
> 3. Vina was bought a painting by Jon.
> 4. What Jon bought Vina was a painting.
> 5. What Vina was bought by Jon was a painting.
> 6. It was a painting that Jon bought for Vina.
> 7. It was Jon that bought Vina a painting.
> 8. It was Vina that Jon bought a painting for.
>
>
>
> On the topic of “a” or “the”, definite or indefinite noun phrases, we can
> note a discussion in Reiter and Dale [1], section 5.4.2, on page 145. That
> is another important discussion topic.
>
>
>
> In the set of approaches under discussion, by making use of grammatical
> arguments (e.g. voice, tense, aspect, mood) and by placing thematic roles
> on elements of a sequence of arguments, we have an expressiveness beyond
> that of predicate calculus with which to distinguish paraphrases.
>
>
>
>
>
> Best regards,
>
> Adam
>
>
>
> [1] Reiter, Ehud, and Robert Dale. Building natural language generation
> systems. Cambridge university press, 2000.
>
> [2] Winograd, Terry. "Language as a cognitive process." (1983).
>
> From: Adam Sobieski<mailto:adamsobieski@hotmail.com<mailto:adamsobieski
> @hotmail.com>>
> Sent: Friday, July 24, 2020 7:07 PM
> To: General public mailing list for the discussion of Abstract Wikipedia
> (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org<mailto:
> abstract-wikipedia(a)lists.wikimedia.org>>
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
>
> In addition to broaching the discovery, encoding and reuse of
> language-specific patterns, which I view as a discussion topic when
> comparing and contrasting approaches (see: the DRY principle [1] and the
> abstraction principle [2]), and indicating how conjugation and declension
> functions would facilitate the discovery, encoding and reuse of these
> patterns, I would like to share some topics which arose as I explored
> adding the thematic relation [3] of instrument to the agent-patient pair.
>
> When exploring how best to add the thematic relation of instrument to the
> agent-patient pair (e.g. adding “using the bat” to “Bobby hit the ball”), I
> observed that, for the same input grammatical arguments, there was a set of
> possible output paraphrases:
>
>
> 1. “Bobby hit the ball using the bat”
> 2. “Bobby, using the bat, hit the ball”
> 3. “Using the bat, Bobby hit the ball”
>
> I wondered: how might we be able to generate each?
>
> One possibility is indicated. There could be a type for each thematic
> relation [3]. Then, using explicit type conversions to these types, we
> could have different functions for different sequences of input arguments.
>
> That is,
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Agent)Bobby, (Patient)ball, (Instrument)bat)
> --> “Bobby hit the ball using the bat”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Agent)Bobby, (Instrument)bat, (Patient)ball)
> --> “Bobby, using the bat, hit the ball”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Instrument)bat, (Agent)Bobby, (Patient)ball)
> --> “Using the bat, Bobby hit the ball”
>
> or
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, Bobby as Agent, ball as Patient, bat as
> Instrument) --> “Bobby hit the ball using the bat”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, Bobby as Agent, bat as Instrument, ball as
> Patient) --> “Bobby, using the bat, hit the ball”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, bat as Instrument, Bobby as Agent, ball as
> Patient) --> “Using the bat, Bobby hit the ball”
>
> These functions could each wrap the use of patterns (as indicated in
> previous email) and attempt to realize output sentences utilizing the
> arguments in the same sequence in which they were provided. This would,
> however, mean that the callers of the functions would be responsible for
> “shuffling” the input arguments to express the desired paraphrase ((1),
> (2), (3)).
>
> There are, of course, other approaches to consider and other possibilities
> to consider with respect to addressing the matter of outputting the example
> paraphrases.
>
> Any thoughts on these topics?
>
>
> Best regards,
> Adam
>
> [1] https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
> [2] https://en.wikipedia.org/wiki/Abstraction_principle_(compute
> r_programming)
> [3] https://en.wikipedia.org/wiki/Thematic_relation
>
> P.S.: We could also put the grammatical arguments on the context object
> (context.voice, context.tense, context.aspect, context.mood, et cetera) and
> then make use of generic parameters for other uses.
>
> From: Adam Sobieski<mailto:adamsobieski@hotmail.com<mailto:adamsobieski
> @hotmail.com>>
> Sent: Thursday, July 23, 2020 7:38 PM
> To: General public mailing list for the discussion of Abstract Wikipedia
> (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org<mailto:
> abstract-wikipedia(a)lists.wikimedia.org>>
> Subject: RE: Conjugation and Declension Functions
>
>
> I had an idea about discovering, encoding and reusing patterns in
> languages that I would like to share with the group.
>
>
>
> The following is rough-draft pseudocode for a function with which to
> generate sentences for the verb “to kick”. The function has generic
> parameters for voice and tense and parameters for context, agent and
> patient.
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> switch(V)
>
> {
>
> case voice::active:
>
> switch(T)
>
> {
>
> case tense::past:
>
> case tense::present:
>
> return join(" ", [
>
> agent,
>
> conjugate_en(“kick”, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> case tense::future:
>
> return join(" ", [
>
> agent,
>
> “will”,
>
> conjugate_en(“kick”, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> …
>
> }
>
> case voice::passive:
>
> switch(T)
>
> {
>
> case tense::past:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “was”,
>
> conjugate_en(“kick”, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> case tense::present:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “is”,
>
> “being”,
>
> conjugate_en(“kick”, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> …
>
> }
>
> }
>
> }
>
>
>
> and
>
>
>
> kick<voice::active, tense::past>(context, Bobby, ball) returns “Bobby
> kicked the ball”.
>
>
>
> In English, most combinations of tense, aspect, mood and voice are
> expressed periphrastically, using constructions with auxiliary verbs. After
> implementing a number of these for similar verbs, per the above example, we
> would notice a pattern. That pattern could be expressed as something
> resembling:
>
>
>
> Clause pattern123<V, T>(Context context, Verb verb, Noun agent, Noun
> patient)
>
> {
>
> switch(V)
>
> {
>
> case voice::active:
>
> switch(T)
>
> {
>
> case tense::past:
>
> case tense::present:
>
> return join(" ", [
>
> agent,
>
> conjugate_en(verb, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> case tense::future:
>
> return join(" ", [
>
> agent,
>
> “will”,
>
> conjugate_en(verb, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> …
>
> }
>
> case voice::passive:
>
> switch(T)
>
> {
>
> case tense::past:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “was”,
>
> conjugate_en(verb, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> case tense::present:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “is”,
>
> “being”,
>
> conjugate_en(verb, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> …
>
> }
>
> }
>
> }
>
>
> We could then simply express that the verb “to kick” is an instance of
> pattern pattern123 with something resembling:
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> pattern123<V, T>(context, new Verb(“kick”, …), agent, patient);
>
> }
>
> or
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> pattern123<V, T>(context, getVerb(“https://…#kick”), agent, patient);
>
> }
>
> and this pattern could be reused for a large number of verbs.
>
> This use of patterns should extend to scenarios where there are more
> parameters for other thematically-related nouns:
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient, Noun
> instrument)
>
> {
>
> pattern1234<V, T>(context, new Verb(“kick”, …), agent, patient,
> instrument);
>
> }
>
> For English, there might be more generic parameters than those for voice
> and tense; we could add those for aspect and mood (e.g. pattern1234<V, T,
> A, M>(…)).
>
> In conclusion, with generic programming and functions for conjugation and
> declension, we have expressiveness with which to discover, encode and reuse
> language-specific patterns.
>
>
> Best regards,
> Adam Sobieski
>
>
>
Nice one, Adam!
I'm liking the constraints paradigm because it follows the real-world
domain, conceived of as an inverse pyramid:
Not all claims are expressed in a Wikipedia, and of those
Not all claims are codified in Wikidata, and of those
Not all claims are "salient", given our topic of interest,
and of those,
Not all claims are yet expressible as natural language
output, and of those,
Not all claims are expressible in the target
language, and of those,
Natural language generation filters
apply for sense, volume...
Funnelled into rendering:
"I sat at my desk,
stared into the garden and thought '42 will do' .
I typed it out. End
of story." Douglas Adams (Q42)
Do you have a link for that OU paper [1]?
Best regards,
Al.
On Sunday, 26 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Conjugation and Declension Functions (Adam Sobieski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 26 Jul 2020 15:38:35 +0000
> From: Adam Sobieski <adamsobieski(a)hotmail.com>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wiki
> media.org>
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
> Message-ID:
> <CH2PR12MB41847550B85B34171077395FC5750(a)CH2PR12MB4184.namprd
> 12.prod.outlook.com>
>
> Content-Type: text/plain; charset="windows-1252"
>
> We could also view the matter as being one of specifying a set of
> constraints [1][2] with which to describe a desired sentence.
>
> That is, the syntactic sugar:
>
> (1) hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, [ {role: ‘agent’, value: Bobby}, {role:
> ‘patient’, value: ball}, {role: ‘instrument’, value: bat} ]);
>
> could be viewed as indicating a number of simultaneous constraints with
> regard to a desired sentence. With an expression tree model, we could form
> expressions for a constraint-based system while making use of objects in
> scope.
>
>
> 1. constraints.add(expr.equals(expr.symbol(‘verb’),
> expr.constant(getVerb(‘https://example.com/verb#hit’))));
>
> The generic arguments “<voice::active, tense::past, aspect::perfective,
> mood::indicative>” could translate or map to:
>
>
> 1. constraints.add(expr.equals(expr.symbol(‘voice’),
> expr.constant(‘active’)));
> 2. constraints.add(expr.equals(expr.symbol(‘tense’),
> expr.constant(‘past’)));
> 3. constraints.add(expr.equals(expr.symbol(‘aspect’),
> expr.constant(‘perfective’)));
> 4. constraints.add(expr.equals(expr.symbol(‘mood’),
> expr.constant(‘indicative’)));
>
> Next, to the array of role-adorned nouns, “[ {role: ‘agent’, value:
> Bobby}, {role: ‘patient’, value: ball}, {role: ‘instrument’, value: bat} ]”:
>
>
> 1. constraints.add(expr.equals(expr.property(e0, ‘role’),
> expr.constant(‘agent’)));
> 2. constraints.add(expr.equals(expr.property(e0, ‘value’),
> expr.constant(Bobby)));
>
>
> 1. constraints.add(expr.equals(expr.property(e1, ‘role’),
> expr.constant(‘patient’)));
> 2. constraints.add(expr.equals(expr.property(e1, ‘value’),
> expr.constant(ball)));
>
>
> 1. constraints.add(expr.equals(expr.property(e2, ‘role’),
> expr.constant(‘instrument’)));
> 2. constraints.add(expr.equals(expr.property(e2, ‘value’),
> expr.constant(ball)));
>
>
> 1. constraints.add(expr.holds(‘occursInSentenceBefore’, Bobby, ball));
> 2. constraints.add(expr.holds(‘occursInSentenceBefore’, ball, bat));
>
> There exists mappings from the information in the generic function
> invocation (1) to rough-draft pseudocode for describing a set of
> constraints ((a) – (m)).
>
> We could also add scalars as arguments when adding constraints to a set.
> For instance, we could add scalar arguments to (l) and (m):
>
> constraints.add(expr.holds(‘occursInSentenceBefore’, Bobby, ball), 0.5);
> constraints.add(expr.holds(‘occursInSentenceBefore’, ball, bat), 0.5);
>
> Constraint-based approaches are both expressive and extensible. With a set
> of weighted constraints describing a desired output sentence, we could then
> invoke a sentence realizer:
>
> realizeSentence(constraints) --> “Bobby hit the ball with the bat”
>
>
> Best regards,
> Adam
>
> [1] Piwek, Paul, and Kees Van Deemter. Constraint-based natural language
> generation: A survey. Technical Report 2006/03, Computing Department, The
> Open University, 2006.
> [2] https://en.wikipedia.org/wiki/Constraint_programming
>
> From: Adam Sobieski<mailto:adamsobieski@hotmail.com>
> Sent: Saturday, July 25, 2020 1:43 PM
> To: General public mailing list for the discussion of Abstract Wikipedia
> (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org>
> Subject: RE: Conjugation and Declension Functions
>
>
> Thank you all for the comments and feedback thus far.
>
>
>
> I would like to indicate that we could also utilize the expressiveness of
> JavaScript/TypeScript for array literals with object literal elements. In
> this alternative approach, arguments could be placed into an array of
> objects which each attach a thematic role to an argument. “Bobby”, “ball”
> and “bat” could be objects of type Noun, and we could pass to a function an
> array literal of object literals to get at the desired concepts of: (1)
> sequence, (2) roles, (3) values. That is rather the matter: attaching
> thematic roles to elements of a sequence of arguments.
>
>
>
> To the example pseudocode:
>
>
>
> Noun Bobby;
>
> Noun ball;
>
> Noun bat;
>
>
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, [ {role: ‘agent’, value: Bobby}, {role:
> ‘patient’, value: ball}, {role: ‘instrument’, value: bat} ]) --> “Bobby hit
> the ball with the bat”
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, [ {role: ‘agent’, value: Bobby}, {role:
> ‘instrument’, value: bat}, {role: ‘patient’, value: ball} ]) --> “Bobby,
> with the bat, hit the ball”
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, [ {role: ‘instrument’, value: bat}, {role:
> ‘agent’, value: Bobby}, {role: ‘patient’, value: ball} ]) --> “With the
> bat, Bobby hit the ball”
>
>
>
> or, if the grammatical arguments (voice, tense, aspect, mood) are
> properties of the context object:
>
>
>
> hit(context, [ {role: ‘agent’, value: Bobby}, {role: ‘patient’, value:
> ball}, {role: ‘instrument’, value: bat} ]) --> “Bobby hit the ball with the
> bat”
>
> hit(context, [ {role: ‘agent’, value: Bobby}, {role: ‘instrument’, value:
> bat}, {role: ‘patient’, value: ball} ]) --> “Bobby, with the bat, hit the
> ball”
>
> hit(context, [ {role: ‘instrument’, value: bat}, {role: ‘agent’, value:
> Bobby}, {role: ‘patient’, value: ball} ]) --> “With the bat, Bobby hit the
> ball”
>
>
>
> A downside would be that the function signatures would resemble:
>
>
>
> hit<…>(Context, Array)
>
>
>
> or
>
>
>
> hit<…>(Context, Array<RoleNounPair>)
>
>
>
> An upside would be expressiveness with which to indicate desired output
> paraphrases.
>
>
>
> These topics are discussed in Building Natural Language Generation Systems
> by Reiter and Dale [1], in chapter 6 on Surface Realization, in section 6.8
> on Bidirectional Grammars, pages 194 – 195, where the authors indicate that
> while “a parser might produce the same logical form for the following three
> sentences:
>
>
>
> 1. Mary gave John a ball.
> 2. Mary gave a ball to John.
> 3. John was given a ball by Mary.
>
>
>
> “The input to a realizer, however, should explicitly provide a means of
> choosing among these forms; the alternative is to choose at random.” (as
> noted, another alternative is to select using a manual of style).
>
>
>
> The authors also indicate that Winograd [2] provided a list of paraphrases
> for consideration:
>
>
>
> 1. Jon bought a painting for Vina.
> 2. Jon bought Vina a painting.
> 3. Vina was bought a painting by Jon.
> 4. What Jon bought Vina was a painting.
> 5. What Vina was bought by Jon was a painting.
> 6. It was a painting that Jon bought for Vina.
> 7. It was Jon that bought Vina a painting.
> 8. It was Vina that Jon bought a painting for.
>
>
>
> On the topic of “a” or “the”, definite or indefinite noun phrases, we can
> note a discussion in Reiter and Dale [1], section 5.4.2, on page 145. That
> is another important discussion topic.
>
>
>
> In the set of approaches under discussion, by making use of grammatical
> arguments (e.g. voice, tense, aspect, mood) and by placing thematic roles
> on elements of a sequence of arguments, we have an expressiveness beyond
> that of predicate calculus with which to distinguish paraphrases.
>
>
>
>
>
> Best regards,
>
> Adam
>
>
>
> [1] Reiter, Ehud, and Robert Dale. Building natural language generation
> systems. Cambridge university press, 2000.
>
> [2] Winograd, Terry. "Language as a cognitive process." (1983).
>
> From: Adam Sobieski<mailto:adamsobieski@hotmail.com>
> Sent: Friday, July 24, 2020 7:07 PM
> To: General public mailing list for the discussion of Abstract Wikipedia
> (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org>
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
>
> In addition to broaching the discovery, encoding and reuse of
> language-specific patterns, which I view as a discussion topic when
> comparing and contrasting approaches (see: the DRY principle [1] and the
> abstraction principle [2]), and indicating how conjugation and declension
> functions would facilitate the discovery, encoding and reuse of these
> patterns, I would like to share some topics which arose as I explored
> adding the thematic relation [3] of instrument to the agent-patient pair.
>
> When exploring how best to add the thematic relation of instrument to the
> agent-patient pair (e.g. adding “using the bat” to “Bobby hit the ball”), I
> observed that, for the same input grammatical arguments, there was a set of
> possible output paraphrases:
>
>
> 1. “Bobby hit the ball using the bat”
> 2. “Bobby, using the bat, hit the ball”
> 3. “Using the bat, Bobby hit the ball”
>
> I wondered: how might we be able to generate each?
>
> One possibility is indicated. There could be a type for each thematic
> relation [3]. Then, using explicit type conversions to these types, we
> could have different functions for different sequences of input arguments.
>
> That is,
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Agent)Bobby, (Patient)ball, (Instrument)bat)
> --> “Bobby hit the ball using the bat”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Agent)Bobby, (Instrument)bat, (Patient)ball)
> --> “Bobby, using the bat, hit the ball”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Instrument)bat, (Agent)Bobby, (Patient)ball)
> --> “Using the bat, Bobby hit the ball”
>
> or
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, Bobby as Agent, ball as Patient, bat as
> Instrument) --> “Bobby hit the ball using the bat”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, Bobby as Agent, bat as Instrument, ball as
> Patient) --> “Bobby, using the bat, hit the ball”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, bat as Instrument, Bobby as Agent, ball as
> Patient) --> “Using the bat, Bobby hit the ball”
>
> These functions could each wrap the use of patterns (as indicated in
> previous email) and attempt to realize output sentences utilizing the
> arguments in the same sequence in which they were provided. This would,
> however, mean that the callers of the functions would be responsible for
> “shuffling” the input arguments to express the desired paraphrase ((1),
> (2), (3)).
>
> There are, of course, other approaches to consider and other possibilities
> to consider with respect to addressing the matter of outputting the example
> paraphrases.
>
> Any thoughts on these topics?
>
>
> Best regards,
> Adam
>
> [1] https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
> [2] https://en.wikipedia.org/wiki/Abstraction_principle_(compute
> r_programming)
> [3] https://en.wikipedia.org/wiki/Thematic_relation
>
> P.S.: We could also put the grammatical arguments on the context object
> (context.voice, context.tense, context.aspect, context.mood, et cetera) and
> then make use of generic parameters for other uses.
>
> From: Adam Sobieski<mailto:adamsobieski@hotmail.com>
> Sent: Thursday, July 23, 2020 7:38 PM
> To: General public mailing list for the discussion of Abstract Wikipedia
> (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org>
> Subject: RE: Conjugation and Declension Functions
>
>
> I had an idea about discovering, encoding and reusing patterns in
> languages that I would like to share with the group.
>
>
>
> The following is rough-draft pseudocode for a function with which to
> generate sentences for the verb “to kick”. The function has generic
> parameters for voice and tense and parameters for context, agent and
> patient.
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> switch(V)
>
> {
>
> case voice::active:
>
> switch(T)
>
> {
>
> case tense::past:
>
> case tense::present:
>
> return join(" ", [
>
> agent,
>
> conjugate_en(“kick”, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> case tense::future:
>
> return join(" ", [
>
> agent,
>
> “will”,
>
> conjugate_en(“kick”, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> …
>
> }
>
> case voice::passive:
>
> switch(T)
>
> {
>
> case tense::past:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “was”,
>
> conjugate_en(“kick”, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> case tense::present:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “is”,
>
> “being”,
>
> conjugate_en(“kick”, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> …
>
> }
>
> }
>
> }
>
>
>
> and
>
>
>
> kick<voice::active, tense::past>(context, Bobby, ball) returns “Bobby
> kicked the ball”.
>
>
>
> In English, most combinations of tense, aspect, mood and voice are
> expressed periphrastically, using constructions with auxiliary verbs. After
> implementing a number of these for similar verbs, per the above example, we
> would notice a pattern. That pattern could be expressed as something
> resembling:
>
>
>
> Clause pattern123<V, T>(Context context, Verb verb, Noun agent, Noun
> patient)
>
> {
>
> switch(V)
>
> {
>
> case voice::active:
>
> switch(T)
>
> {
>
> case tense::past:
>
> case tense::present:
>
> return join(" ", [
>
> agent,
>
> conjugate_en(verb, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> case tense::future:
>
> return join(" ", [
>
> agent,
>
> “will”,
>
> conjugate_en(verb, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> …
>
> }
>
> case voice::passive:
>
> switch(T)
>
> {
>
> case tense::past:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “was”,
>
> conjugate_en(verb, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> case tense::present:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “is”,
>
> “being”,
>
> conjugate_en(verb, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> …
>
> }
>
> }
>
> }
>
>
> We could then simply express that the verb “to kick” is an instance of
> pattern pattern123 with something resembling:
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> pattern123<V, T>(context, new Verb(“kick”, …), agent, patient);
>
> }
>
> or
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> pattern123<V, T>(context, getVerb(“https://…#kick”), agent, patient);
>
> }
>
> and this pattern could be reused for a large number of verbs.
>
> This use of patterns should extend to scenarios where there are more
> parameters for other thematically-related nouns:
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient, Noun
> instrument)
>
> {
>
> pattern1234<V, T>(context, new Verb(“kick”, …), agent, patient,
> instrument);
>
> }
>
> For English, there might be more generic parameters than those for voice
> and tense; we could add those for aspect and mood (e.g. pattern1234<V, T,
> A, M>(…)).
>
> In conclusion, with generic programming and functions for conjugation and
> declension, we have expressiveness with which to discover, encode and reuse
> language-specific patterns.
>
>
> Best regards,
> Adam Sobieski
>
>
>
>
Louis indicated that we should consider modeling and approaches encompassing the features of all languages [1].
Factors that affect verb conjugation are numerous and we can envision a verb conjugation function which gathers these factors together from across languages, a function resembling:
conjugate(language, stem, person, number, gender, tense, aspect, mood, voice, formality, clusivity, transitivity, valency, evidentiality)
The following example utilizes a Context parameter as we haven’t yet discussed global variables. The following example uses a “::” syntax for enums. The following example shows that, with multilingual conjugation and declension functions, we could encapsulate multilingual conjugation and declension in language-specific convenience functions, perhaps resembling:
kick<T>(Context context, Noun agent, Noun patient)
{
join(" ", [
agent,
context.conjugate("en-US", "kick", person::second, agent.number, agent.gender, T, aspect::perfective, mood::indicative, voice::active, context.formality, clusivity::no, transitivity::transitive, valency::transitive, context.evidentiality),
"the",
patient
])
}
and, as expected:
kick<tense::past>(context, Bobby, ball) would produce: “Bobby kicked the ball”.
It appears that there are a multitude of ways to approach implementing NLG-related Wikilambda functions. Conjugation and declension functions could be components of some approaches.
What do you think about conjugation and declension functions?
Best regards,
Adam
[1] https://wals.info/
You're welcome, Adam.
Far be it from me to disagree with Reiter and Dale (and I don't). I find it
odd, however, that you put "hit<…>(Context, Array)". I wonder how, in our
Wikipedic context, we get the idea of a verb? And why do we put it outside
our array?
The reality is that Wikipedias are about things, in the main: people,
places, objects, types and historical events. These things have properties
and different types of involvement in events, but I guess more than half
the time we have to invent (or select or find...) a suitable verb for the
set of objects we find to be related (from the claims in Wikidata). If
English didn't demand a finite verb, our Wikipedia would manage perfectly
well without them, just as Wikidata does (although they "lurk" within
Properties).
To my way of thinking, if there is any substance of note in the sense of
the verb, it must be in the array of Senses (never mind their lexical
form). Of course, you might reason that we start with an array of "involved
senses" and infer the verb, which we can then inflect. It's part of
Lexicalization rather than Realization, in other words. Maybe that is
generally so, but I am not convinced that we should immediately assume that
it must be so in our Wikipedic context.
For me, in any event, Realization in our context is just the final
transformation of a near-final representation of our content, so I wouldn't
expect it to change the sequence of ideas or change around syntactic roles.
If you're deriving roles and sequences, then you're doing Lexicalization
(or you're at a higher level) and you won't generally know what the verb is
going to be at that stage (how could you?). It could be "strike", "hit" or
"bat", for example. And the involvement of a bat in our array might be a
good reason to choose the verb "bat" or a good reason to avoid it,
depending on the particular context. If we choose "bat" as the verb, we
will probably also drop it from the instrument role because that role is
implied by the verb ("hit with a bat"). That "fact" will also keep us from
choosing the verb "bat" if we need to refer to the bat as an object (to
specify that it was her father's bat, for example). If the verb "bat" is
going to subsume the instrument role, the instrument must be wholly
indefinite ("hit with 'any old' bat" rather than "hit with one of the
bats").
Returning to Reiter and Dale, I think it makes sense to view this problem
in reverse, as inverse parsing. As they say, you can parse the three forms
into a single "logical form" (which some might call "abstract content") and
there must be some loss of information in this process, since we cannot
guarantee that we will get back to the form we started with if we invert
the parsing. It is this lost information that we need to retain when we
have parsed (for future reference) and to have available before we proceed
to Realize (or "render"). Our logical "manual of style" is what we might
use to derive the missing information that we must have available, if we
cannot infer it from the selected Wikidata content.
To begin again at the end, I might say we can parse "Bobby batted the ball"
as something like: Sentence< {comment: here, Sentence = Subject + Verb
Phrase}
Subject< Noun Phrase< Noun <Alias<"Bobby">, entity<Q???>> >>
+ Verb Phrase< {comment: here, Verb Phrase = Verb + Instrument + Direct
Object}
Verb< Sense<"hit">, Tense<simple past>, Voice<active>>
+ Instrument< Noun Phrase< Noun< Sense<"bat">> + Determiner<indefinite>>
+ Direct Object< Noun Phrase< Noun< Sense<"ball">> + Determiner<definite>>
>>.
Assuming I've parsed that correctly (and it's been a few years since I had
to), we can assert that this "logical form" shall be valid input into our
Realization function. What do we get back? "Bobby hit the ball with a bat".
I had to cheat, of course. The reason, I suppose, is that the subsumed noun
phrase was prepositional and non-prepositional noun phrases, in English,
tend to attach themselves to their verbs ("I gave him it" versus "I gave it
to him", for example). So when I surfaced the subsumed instrumental, "with
a bat" (which is more interpretation than parsing), the result was
naturally in the wrong order, coming between the verb and the existing
non-prepositional noun phrase, "the ball". The order shouldn't matter to
the renderer, unless it's explicit. But I did say that I wouldn't expect
Realization to change the sequence of ideas, and then it had to because
"Bobby hit with a bat the ball" is un-English in any unfanciful context
(although perfectly understandable). Full disclosure, as they say!
So, if interpretive parsing surfaces subsumed elements, some Lexicalization
may be required to establish a "logical form" that is valid input into
Realization. Alternatively, we might consider that to occur within the
inverse of Referring Expression Generation, since it is loosely analogous
to surfacing nouns from pronouns and might be considered to be a kind of
anaphora (locally, at least, it is likely to be exophoric: "batted" implies
an unspecified bat, as discussed above, but surfacing it renders it
endophoric, if you'll forgive the pun, and thus anaphoric at that level).
By the way, Reiter claims to have written or admits to writing the main
articles on natural language generation in the English Wikipedia. You can
find the links in:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Related_and_previous_wor…
.
Best regards,
Al.
On Saturday, 25 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Conjugation and Declension Functions (Adam Sobieski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 25 Jul 2020 17:43:55 +0000
> From: Adam Sobieski <adamsobieski(a)hotmail.com>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wiki
> media.org>
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
> Message-ID:
> <CH2PR12MB41848A048AE721573F59A218C5740(a)CH2PR12MB4184.namprd
> 12.prod.outlook.com>
>
> Content-Type: text/plain; charset="windows-1252"
>
> Thank you all for the comments and feedback thus far.
>
>
>
> I would like to indicate that we could also utilize the expressiveness of
> JavaScript/TypeScript for array literals with object literal elements. In
> this alternative approach, arguments could be placed into an array of
> objects which each attach a thematic role to an argument. “Bobby”, “ball”
> and “bat” could be objects of type Noun, and we could pass to a function an
> array literal of object literals to get at the desired concepts of: (1)
> sequence, (2) roles, (3) values. That is rather the matter: attaching
> thematic roles to elements of a sequence of arguments.
>
>
>
> To the example pseudocode:
>
>
>
> Noun Bobby;
>
> Noun ball;
>
> Noun bat;
>
>
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, [ {role: ‘agent’, value: Bobby}, {role:
> ‘patient’, value: ball}, {role: ‘instrument’, value: bat} ]) --> “Bobby hit
> the ball with the bat”
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, [ {role: ‘agent’, value: Bobby}, {role:
> ‘instrument’, value: bat}, {role: ‘patient’, value: ball} ]) --> “Bobby,
> with the bat, hit the ball”
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, [ {role: ‘instrument’, value: bat}, {role:
> ‘agent’, value: Bobby}, {role: ‘patient’, value: ball} ]) --> “With the
> bat, Bobby hit the ball”
>
>
>
> or, if the grammatical arguments (voice, tense, aspect, mood) are
> properties of the context object:
>
>
>
> hit(context, [ {role: ‘agent’, value: Bobby}, {role: ‘patient’, value:
> ball}, {role: ‘instrument’, value: bat} ]) --> “Bobby hit the ball with the
> bat”
>
> hit(context, [ {role: ‘agent’, value: Bobby}, {role: ‘instrument’, value:
> bat}, {role: ‘patient’, value: ball} ]) --> “Bobby, with the bat, hit the
> ball”
>
> hit(context, [ {role: ‘instrument’, value: bat}, {role: ‘agent’, value:
> Bobby}, {role: ‘patient’, value: ball} ]) --> “With the bat, Bobby hit the
> ball”
>
>
>
> A downside would be that the function signatures would resemble:
>
>
>
> hit<…>(Context, Array)
>
>
>
> or
>
>
>
> hit<…>(Context, Array<RoleNounPair>)
>
>
>
> An upside would be expressiveness with which to indicate desired output
> paraphrases.
>
>
>
> These topics are discussed in Building Natural Language Generation Systems
> by Reiter and Dale [1], in chapter 6 on Surface Realization, in section 6.8
> on Bidirectional Grammars, pages 194 – 195, where the authors indicate that
> while “a parser might produce the same logical form for the following three
> sentences:
>
>
>
> 1. Mary gave John a ball.
> 2. Mary gave a ball to John.
> 3. John was given a ball by Mary.
>
>
>
> “The input to a realizer, however, should explicitly provide a means of
> choosing among these forms; the alternative is to choose at random.” (as
> noted, another alternative is to select using a manual of style).
>
>
>
> The authors also indicate that Winograd [2] provided a list of paraphrases
> for consideration:
>
>
>
> 1. Jon bought a painting for Vina.
> 2. Jon bought Vina a painting.
> 3. Vina was bought a painting by Jon.
> 4. What Jon bought Vina was a painting.
> 5. What Vina was bought by Jon was a painting.
> 6. It was a painting that Jon bought for Vina.
> 7. It was Jon that bought Vina a painting.
> 8. It was Vina that Jon bought a painting for.
>
>
>
> On the topic of “a” or “the”, definite or indefinite noun phrases, we can
> note a discussion in Reiter and Dale [1], section 5.4.2, on page 145. That
> is another important discussion topic.
>
>
>
> In the set of approaches under discussion, by making use of grammatical
> arguments (e.g. voice, tense, aspect, mood) and by placing thematic roles
> on elements of a sequence of arguments, we have an expressiveness beyond
> that of predicate calculus with which to distinguish paraphrases.
>
>
>
>
>
> Best regards,
>
> Adam
>
>
>
> [1] Reiter, Ehud, and Robert Dale. Building natural language generation
> systems. Cambridge university press, 2000.
>
> [2] Winograd, Terry. "Language as a cognitive process." (1983).
>
> From: Adam Sobieski<mailto:adamsobieski@hotmail.com>
> Sent: Friday, July 24, 2020 7:07 PM
> To: General public mailing list for the discussion of Abstract Wikipedia
> (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org>
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
>
> In addition to broaching the discovery, encoding and reuse of
> language-specific patterns, which I view as a discussion topic when
> comparing and contrasting approaches (see: the DRY principle [1] and the
> abstraction principle [2]), and indicating how conjugation and declension
> functions would facilitate the discovery, encoding and reuse of these
> patterns, I would like to share some topics which arose as I explored
> adding the thematic relation [3] of instrument to the agent-patient pair.
>
> When exploring how best to add the thematic relation of instrument to the
> agent-patient pair (e.g. adding “using the bat” to “Bobby hit the ball”), I
> observed that, for the same input grammatical arguments, there was a set of
> possible output paraphrases:
>
>
> 1. “Bobby hit the ball using the bat”
> 2. “Bobby, using the bat, hit the ball”
> 3. “Using the bat, Bobby hit the ball”
>
> I wondered: how might we be able to generate each?
>
> One possibility is indicated. There could be a type for each thematic
> relation [3]. Then, using explicit type conversions to these types, we
> could have different functions for different sequences of input arguments.
>
> That is,
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Agent)Bobby, (Patient)ball, (Instrument)bat)
> --> “Bobby hit the ball using the bat”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Agent)Bobby, (Instrument)bat, (Patient)ball)
> --> “Bobby, using the bat, hit the ball”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Instrument)bat, (Agent)Bobby, (Patient)ball)
> --> “Using the bat, Bobby hit the ball”
>
> or
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, Bobby as Agent, ball as Patient, bat as
> Instrument) --> “Bobby hit the ball using the bat”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, Bobby as Agent, bat as Instrument, ball as
> Patient) --> “Bobby, using the bat, hit the ball”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, bat as Instrument, Bobby as Agent, ball as
> Patient) --> “Using the bat, Bobby hit the ball”
>
> These functions could each wrap the use of patterns (as indicated in
> previous email) and attempt to realize output sentences utilizing the
> arguments in the same sequence in which they were provided. This would,
> however, mean that the callers of the functions would be responsible for
> “shuffling” the input arguments to express the desired paraphrase ((1),
> (2), (3)).
>
> There are, of course, other approaches to consider and other possibilities
> to consider with respect to addressing the matter of outputting the example
> paraphrases.
>
> Any thoughts on these topics?
>
>
> Best regards,
> Adam
>
> [1] https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
> [2] https://en.wikipedia.org/wiki/Abstraction_principle_(compute
> r_programming)
> [3] https://en.wikipedia.org/wiki/Thematic_relation
>
> P.S.: We could also put the grammatical arguments on the context object
> (context.voice, context.tense, context.aspect, context.mood, et cetera) and
> then make use of generic parameters for other uses.
>
> From: Adam Sobieski<mailto:adamsobieski@hotmail.com>
> Sent: Thursday, July 23, 2020 7:38 PM
> To: General public mailing list for the discussion of Abstract Wikipedia
> (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org>
> Subject: RE: Conjugation and Declension Functions
>
>
> I had an idea about discovering, encoding and reusing patterns in
> languages that I would like to share with the group.
>
>
>
> The following is rough-draft pseudocode for a function with which to
> generate sentences for the verb “to kick”. The function has generic
> parameters for voice and tense and parameters for context, agent and
> patient.
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> switch(V)
>
> {
>
> case voice::active:
>
> switch(T)
>
> {
>
> case tense::past:
>
> case tense::present:
>
> return join(" ", [
>
> agent,
>
> conjugate_en(“kick”, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> case tense::future:
>
> return join(" ", [
>
> agent,
>
> “will”,
>
> conjugate_en(“kick”, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> …
>
> }
>
> case voice::passive:
>
> switch(T)
>
> {
>
> case tense::past:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “was”,
>
> conjugate_en(“kick”, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> case tense::present:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “is”,
>
> “being”,
>
> conjugate_en(“kick”, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> …
>
> }
>
> }
>
> }
>
>
>
> and
>
>
>
> kick<voice::active, tense::past>(context, Bobby, ball) returns “Bobby
> kicked the ball”.
>
>
>
> In English, most combinations of tense, aspect, mood and voice are
> expressed periphrastically, using constructions with auxiliary verbs. After
> implementing a number of these for similar verbs, per the above example, we
> would notice a pattern. That pattern could be expressed as something
> resembling:
>
>
>
> Clause pattern123<V, T>(Context context, Verb verb, Noun agent, Noun
> patient)
>
> {
>
> switch(V)
>
> {
>
> case voice::active:
>
> switch(T)
>
> {
>
> case tense::past:
>
> case tense::present:
>
> return join(" ", [
>
> agent,
>
> conjugate_en(verb, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> case tense::future:
>
> return join(" ", [
>
> agent,
>
> “will”,
>
> conjugate_en(verb, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> …
>
> }
>
> case voice::passive:
>
> switch(T)
>
> {
>
> case tense::past:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “was”,
>
> conjugate_en(verb, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> case tense::present:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “is”,
>
> “being”,
>
> conjugate_en(verb, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> …
>
> }
>
> }
>
> }
>
>
> We could then simply express that the verb “to kick” is an instance of
> pattern pattern123 with something resembling:
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> pattern123<V, T>(context, new Verb(“kick”, …), agent, patient);
>
> }
>
> or
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> pattern123<V, T>(context, getVerb(“https://…#kick”), agent, patient);
>
> }
>
> and this pattern could be reused for a large number of verbs.
>
> This use of patterns should extend to scenarios where there are more
> parameters for other thematically-related nouns:
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient, Noun
> instrument)
>
> {
>
> pattern1234<V, T>(context, new Verb(“kick”, …), agent, patient,
> instrument);
>
> }
>
> For English, there might be more generic parameters than those for voice
> and tense; we could add those for aspect and mood (e.g. pattern1234<V, T,
> A, M>(…)).
>
> In conclusion, with generic programming and functions for conjugation and
> declension, we have expressiveness with which to discover, encode and reuse
> language-specific patterns.
>
>
> Best regards,
> Adam Sobieski
>
>
>
I would like to say first that I am new to this idea and project and not as
familiar with coding or linguistics as some of you, so I apologize in
advance if I am severely misunderstanding something in this thread.
While any of these phrases would sound decent in a sentence, I believe that
the context of the rest of the sentence would be needed to determine the
correct thematic relation. Even if the output phrases all contain the same
words, natural speech and language would not pick these phrases necessarily
at random, because the three phrases have different "set-ups" as I would
call them. I'll give examples so I become much more clear.
Let us assume we have the agent-patient-instrument pair of Bobby, the ball,
and the bat, and finish this clause with the directional relation "over the
stadium". Thus our sentences would most likely be as follows:
[1] Bobby hit the ball using the bat over the stadium.
[2] Bobby, using the bat, hit the ball over the stadium.
[3] Using the bat, Bobby hit the ball over the stadium.
Despite all of these being technically correct sentences in English,
sentence [1] in my opinion would definitely not look as natural to a native
English speaker in text as [2] or [3] (though in my opinion [3] is best)
Let's do another example with the phrase "while he hummed a song."
[1] Bobby hit the ball using the bat while he hummed a song.
[2] Bobby, using the bat, hit the ball while he hummed a song.
[3] Using the bat, Bobby hit the ball while he hummed a song.
Now with this new phrase, sentences [1] and [2] have become the most
natural sentences (again, in my opinion), when in the previous sentence
where the phrase "over the stadium" was used, these two were the most
*un*natural. As such, I believe that the most accurate method to find the
most natural phrasing would need to find a pattern in the context in order
to apply the correct formation to.
One last thought of mine has to do with spoken English and not written, but
still may assist with our problem. From my thinking, the stressed word on a
previous sentence may also affect the construction of the next. Let's treat
our agent-patient-instrument as a complete sentence, and the sentence
before this in spoken English as "Bobby had a ball and a bat" - introducing
Bobby, ball, and bat beforehand. The stress could affect the construction
like so, where * * indicates the stressed word in the sentence.
[1] *Bobby* had a ball and a bat. Bobby hit the ball using the bat.
[2] Bobby had a *ball* and a bat. Bobby, using the bat, hit the ball.
[3] Bobby had a ball and a *bat.* Using the bat, Bobby hit the ball.
Again, I am not a very experienced person in either linguistics or
programming, so I may have missed the point of something, but I hope my
thoughts help with this.
Sincerely,
SWork
On Fri, Jul 24, 2020, 6:07 PM <
abstract-wikipedia-request(a)lists.wikimedia.org> wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Conjugation and Declension Functions (Adam Sobieski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 24 Jul 2020 23:07:03 +0000
> From: Adam Sobieski <adamsobieski(a)hotmail.com>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <
> abstract-wikipedia(a)lists.wikimedia.org>
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
> Message-ID:
> <
> CH2PR12MB41841C5E1C4E820B9766AB1BC5770(a)CH2PR12MB4184.namprd12.prod.outlook.com
> >
>
> Content-Type: text/plain; charset="windows-1252"
>
> In addition to broaching the discovery, encoding and reuse of
> language-specific patterns, which I view as a discussion topic when
> comparing and contrasting approaches (see: the DRY principle [1] and the
> abstraction principle [2]), and indicating how conjugation and declension
> functions would facilitate the discovery, encoding and reuse of these
> patterns, I would like to share some topics which arose as I explored
> adding the thematic relation [3] of instrument to the agent-patient pair.
>
> When exploring how best to add the thematic relation of instrument to the
> agent-patient pair (e.g. adding “using the bat” to “Bobby hit the ball”), I
> observed that, for the same input grammatical arguments, there was a set of
> possible output paraphrases:
>
>
> 1. “Bobby hit the ball using the bat”
> 2. “Bobby, using the bat, hit the ball”
> 3. “Using the bat, Bobby hit the ball”
>
> I wondered: how might we be able to generate each?
>
> One possibility is indicated. There could be a type for each thematic
> relation [3]. Then, using explicit type conversions to these types, we
> could have different functions for different sequences of input arguments.
>
> That is,
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Agent)Bobby, (Patient)ball, (Instrument)bat)
> --> “Bobby hit the ball using the bat”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Agent)Bobby, (Instrument)bat, (Patient)ball)
> --> “Bobby, using the bat, hit the ball”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, (Instrument)bat, (Agent)Bobby, (Patient)ball)
> --> “Using the bat, Bobby hit the ball”
>
> or
>
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, Bobby as Agent, ball as Patient, bat as
> Instrument) --> “Bobby hit the ball using the bat”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, Bobby as Agent, bat as Instrument, ball as
> Patient) --> “Bobby, using the bat, hit the ball”
> hit<voice::active, tense::past, aspect::perfective,
> mood::indicative>(context, bat as Instrument, Bobby as Agent, ball as
> Patient) --> “Using the bat, Bobby hit the ball”
>
> These functions could each wrap the use of patterns (as indicated in
> previous email) and attempt to realize output sentences utilizing the
> arguments in the same sequence in which they were provided. This would,
> however, mean that the callers of the functions would be responsible for
> “shuffling” the input arguments to express the desired paraphrase ((1),
> (2), (3)).
>
> There are, of course, other approaches to consider and other possibilities
> to consider with respect to addressing the matter of outputting the example
> paraphrases.
>
> Any thoughts on these topics?
>
>
> Best regards,
> Adam
>
> [1] https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
> [2]
> https://en.wikipedia.org/wiki/Abstraction_principle_(computer_programming)
> [3] https://en.wikipedia.org/wiki/Thematic_relation
>
> P.S.: We could also put the grammatical arguments on the context object
> (context.voice, context.tense, context.aspect, context.mood, et cetera) and
> then make use of generic parameters for other uses.
>
> From: Adam Sobieski<mailto:adamsobieski@hotmail.com>
> Sent: Thursday, July 23, 2020 7:38 PM
> To: General public mailing list for the discussion of Abstract Wikipedia
> (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org>
> Subject: RE: Conjugation and Declension Functions
>
>
> I had an idea about discovering, encoding and reusing patterns in
> languages that I would like to share with the group.
>
>
>
> The following is rough-draft pseudocode for a function with which to
> generate sentences for the verb “to kick”. The function has generic
> parameters for voice and tense and parameters for context, agent and
> patient.
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> switch(V)
>
> {
>
> case voice::active:
>
> switch(T)
>
> {
>
> case tense::past:
>
> case tense::present:
>
> return join(" ", [
>
> agent,
>
> conjugate_en(“kick”, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> case tense::future:
>
> return join(" ", [
>
> agent,
>
> “will”,
>
> conjugate_en(“kick”, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> …
>
> }
>
> case voice::passive:
>
> switch(T)
>
> {
>
> case tense::past:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “was”,
>
> conjugate_en(“kick”, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> case tense::present:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “is”,
>
> “being”,
>
> conjugate_en(“kick”, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> …
>
> }
>
> }
>
> }
>
>
>
> and
>
>
>
> kick<voice::active, tense::past>(context, Bobby, ball) returns “Bobby
> kicked the ball”.
>
>
>
> In English, most combinations of tense, aspect, mood and voice are
> expressed periphrastically, using constructions with auxiliary verbs. After
> implementing a number of these for similar verbs, per the above example, we
> would notice a pattern. That pattern could be expressed as something
> resembling:
>
>
>
> Clause pattern123<V, T>(Context context, Verb verb, Noun agent, Noun
> patient)
>
> {
>
> switch(V)
>
> {
>
> case voice::active:
>
> switch(T)
>
> {
>
> case tense::past:
>
> case tense::present:
>
> return join(" ", [
>
> agent,
>
> conjugate_en(verb, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> case tense::future:
>
> return join(" ", [
>
> agent,
>
> “will”,
>
> conjugate_en(verb, V, T, …),
>
> “the”,
>
> patient
>
> ]);
>
> …
>
> }
>
> case voice::passive:
>
> switch(T)
>
> {
>
> case tense::past:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “was”,
>
> conjugate_en(verb, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> case tense::present:
>
> return join(" ", [
>
> “the”,
>
> patient,
>
> “is”,
>
> “being”,
>
> conjugate_en(verb, V, T, …),
>
> “by”
>
> agent
>
> ]);
>
> …
>
> }
>
> }
>
> }
>
>
> We could then simply express that the verb “to kick” is an instance of
> pattern pattern123 with something resembling:
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> pattern123<V, T>(context, new Verb(“kick”, …), agent, patient);
>
> }
>
> or
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient)
>
> {
>
> pattern123<V, T>(context, getVerb(“https://…#kick”), agent, patient);
>
> }
>
> and this pattern could be reused for a large number of verbs.
>
> This use of patterns should extend to scenarios where there are more
> parameters for other thematically-related nouns:
>
>
>
> Clause kick<V, T>(Context context, Noun agent, Noun patient, Noun
> instrument)
>
> {
>
> pattern1234<V, T>(context, new Verb(“kick”, …), agent, patient,
> instrument);
>
> }
>
> For English, there might be more generic parameters than those for voice
> and tense; we could add those for aspect and mood (e.g. pattern1234<V, T,
> A, M>(…)).
>
> In conclusion, with generic programming and functions for conjugation and
> declension, we have expressiveness with which to discover, encode and reuse
> language-specific patterns.
>
>
> Best regards,
> Adam Sobieski
>
>
Hi folks!
While reading the thread:
Use case: generation of short description
https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000012.h…
I wondered about the reverse use case... generating long names if short
names or abbreviations are found in wiki text (sometimes very hard to
disambiguate and sometimes easy based on signals). Imagine a renderer that
was semi-smart enough to expand abbreviations for *demonym's* into their
full long form.
Example: *Brit* is the short name for *British* in American English and a
few other languages actually.
Would the use case potentially be fulfilled with a smart renderer that
could use the statement short name P1813
<https://www.wikidata.org/wiki/Property:P1813> on a form applied on a
Lexeme itself as I did on https://www.wikidata.org/wiki/Lexeme:L156273 ?
or on the Item (location) itself as I did for *Brit* on United Kingdom Q145
<https://www.wikidata.org/wiki/Q145> ?
I ask because I am thinking of how to help developers in the future where
they code a smart renderer that best gets the signals it needs for this use
case. Once I know where to add those signals in a best practice, I plan to
populate the short name demonym's. (my hunch is that the Lexeme space is
probably the best, rather than the Item space and using P1549:P1813 as I
did on United Kingdom Q145 <https://www.wikidata.org/wiki/Q145> or both
Lexemes and Items for good measure, but it's not DRY ?)
In general, looking for best practices for dealing with abbreviations and
expanding them contextually. (yeah, rocket science in some respects, but
signals might help)
Thad
https://www.linkedin.com/in/thadguidry/
Welcome, Jan!
I start by agreeing with you, but there has to be a but.
I certainly agree that there would be a standard way to express a
particular Wikidata statement. For new statements, there might be a default
implied by the Property being added to the Wikidata Item or, if necessary,
a choice between a small number. But as soon as the Property implies or
connects to a different Item, there is a question about which might come
first, even in a neutral context. Is it "the UK's Head of State is Queen
Elizabeth II" or "Queen Elizabeth II is the UK's Head of State"? (Wikidata
has Q9682 with position held (Property P39) connecting to (among others)
Q9134365 (monarch of the United Kingdom) with subclass (P279) head of state
(Q48352)... but I simplified.) On a page about her, [she] is Queen of the
United Kingdom and other Commonwealth Realms but on a page about Canada,
[the country's] queen (or monarch) is Queen Elizabeth II.
Anyway, in general, yes, "Bobby hit [a or the] ball with [a or the] bat".
But if you're talking about the bat, it's likely to be "The bat Bobby hit a
ball with" or "The bat that Bobby used to hit a ball" or "The bat with
which Bobby hit a ball", according to the agreed style. And if you're
talking about the notable ball and the not-so-notable Bobby: "The ball that
was hit with the bat by Bobby".
And (agreeing with "no") no, we don't need to consider all the permutations
of {Bobby, ball, bat, hum, hit, over the stadium, ...}, but we do need to
know how we avoid considering so many and how we handle those which remain.
Best regards,
Al.
On Saturday, 25 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Conjugation and Declension Functions (Jan Ainali)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 25 Jul 2020 09:30:59 +0200
> From: Jan Ainali <ainali.jan(a)gmail.com>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wiki
> media.org>
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
> Message-ID:
> <CAKwu9WF8B5f1dwzdisASnHrnEarJwaxL0iHu73CFUoX3j0cabw(a)mail.gm
> ail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Den lör 25 juli 2020 kl 01:07 skrev Adam Sobieski <
> adamsobieski(a)hotmail.com
> >:
>
> > When exploring how best to add the thematic relation of instrument to the
> > agent-patient pair (e.g. adding “using the bat” to “Bobby hit the
> ball”), I
> > observed that, for the same input grammatical arguments, there was a set
> of
> > possible output paraphrases:
> >
> >
> >
> > 1. “Bobby hit the ball using the bat”
> > 2. “Bobby, using the bat, hit the ball”
> > 3. “Using the bat, Bobby hit the ball”
> >
> >
> >
> > I wondered: how might we be able to generate each?
> >
> >
> >
>
> Since we are generating text for an encyclopedia, do we need to be able to
> pick which one of them? I think we could have a manual of style that says
> that method 1. is the one we use. That simplifies the language not only for
> the reader, but also our task by not trying to be able to create all
> nuances of spoken language.
>
> /Jan
>
Hello all,
The first topic of discussion I want to start is also the most awkward to
start with - it is akin to the question “Can you all hear me?” at the
beginning of a talk. The ones who already hear will obviously be happy, and
the ones who don’t won’t have a chance to chime in.
What communication channels should we start with?
My thought would be that there is a need for three different channels:
* First and foremost, a channel to document decisions, essays, documents,
plans, etc.
* Second, a channel for asynchronous discussion, announcements, etc.
* Third, a channel for synchronous discussion, for quick discussions,
office hours, socialization, and later, when testing and deployment starts,
for quick feedback
Let’s go through these.
First type of channel: persistent documentation and planning. Currently it
is on Meta, but there are plenty of others that could be considered
(criteria: Must enable translation):
* Stay on Meta
* MediaWiki.org
* WikiSpore
* Wikidata
* Start our own wiki right away
* A mix of the above
* Others?
Second type of channel: asynchronous, ephemeral discussions. Currently it
is this mailing list, and a growing number of metawiki talkpages (thanks
Chris Cooley for starting a discussion about this here!). Here I don’t see
that many possible channels:
* A different existing mailing list
* A second new mailing list to focus on technical aspects
* Zulip
* dropping this channel in favor of the first and third type of channel
* Others?
Third channel: synchronous discussion. Currently, nothing is ‘blessed’ as
such a channel, but there’s already a lively number of unofficial places
that have been opened and discussed here (thanks ZI Jony!). As far as I can
tell, there’s:
* #wikipedia-abstract on IRC
* #wikilambda on IRC
* Zulip
* AbstractWikipedia channel on Telegram is already quite lively
* Dropping this channel in favor of the first and second channel
* Others?
I have listed these options more with a goal of partial-completeness, not
because I would be happy with all of them. Personally, I’d be so unhappy
with e.g. blessing a Facebook group as the main official channel, so I have
not included that. I am equally uneasy with a Telegram chat, but it’s also
about going where the communities are - for example, my preference for the
chat would be IRC, but I am a dinosaur.
Here are my thoughts:
* I would like to keep the number of channels small, so we don’t frizzle
our energy out. That’s particularly important for the beginning.
* Any official channel may require additional Foundation approvals, but I
don’t want to use resources evaluating all the possible channels beforehand
if most of them are not of interest for us anyway.
* We should have permanent records of all official channels.
* All official channels should be under appropriate Terms of Services and
Code of Conduct.
* Ideally, the community would coalesce and grow on official channels.
* Ideally, the channels we choose are aligned with our values.
No decision we make now is meant to be permanent, and as the project
develops and the community grows, we expect to see this shift and change.
I am afraid that the main language for discussion in the beginning will be
English. We will rely on community support to cover contributions in other
languages as best as possible, but I am afraid it will not be possible to
translate every discussion contribution and every essay. Sorry.
Besides these discussion channels, we will also have the usual stack of
technologies supporting a Wikimedia development project: Phabricator for
tasks, Git for version control, etc. For that, we’ll mostly follow
Foundation best practices.
I am looking forward to hear from you,
Denny