Hi folks!
While reading the thread:
Use case: generation of short description
https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000012.h…
I wondered about the reverse use case... generating long names if short
names or abbreviations are found in wiki text (sometimes very hard to
disambiguate and sometimes easy based on signals). Imagine a renderer that
was semi-smart enough to expand abbreviations for *demonym's* into their
full long form.
Example: *Brit* is the short name for *British* in American English and a
few other languages actually.
Would the use case potentially be fulfilled with a smart renderer that
could use the statement short name P1813
<https://www.wikidata.org/wiki/Property:P1813> on a form applied on a
Lexeme itself as I did on https://www.wikidata.org/wiki/Lexeme:L156273 ?
or on the Item (location) itself as I did for *Brit* on United Kingdom Q145
<https://www.wikidata.org/wiki/Q145> ?
I ask because I am thinking of how to help developers in the future where
they code a smart renderer that best gets the signals it needs for this use
case. Once I know where to add those signals in a best practice, I plan to
populate the short name demonym's. (my hunch is that the Lexeme space is
probably the best, rather than the Item space and using P1549:P1813 as I
did on United Kingdom Q145 <https://www.wikidata.org/wiki/Q145> or both
Lexemes and Items for good measure, but it's not DRY ?)
In general, looking for best practices for dealing with abbreviations and
expanding them contextually. (yeah, rocket science in some respects, but
signals might help)
Thad
https://www.linkedin.com/in/thadguidry/
Welcome, Jan!
I start by agreeing with you, but there has to be a but.
I certainly agree that there would be a standard way to express a
particular Wikidata statement. For new statements, there might be a default
implied by the Property being added to the Wikidata Item or, if necessary,
a choice between a small number. But as soon as the Property implies or
connects to a different Item, there is a question about which might come
first, even in a neutral context. Is it "the UK's Head of State is Queen
Elizabeth II" or "Queen Elizabeth II is the UK's Head of State"? (Wikidata
has Q9682 with position held (Property P39) connecting to (among others)
Q9134365 (monarch of the United Kingdom) with subclass (P279) head of state
(Q48352)... but I simplified.) On a page about her, [she] is Queen of the
United Kingdom and other Commonwealth Realms but on a page about Canada,
[the country's] queen (or monarch) is Queen Elizabeth II.
Anyway, in general, yes, "Bobby hit [a or the] ball with [a or the] bat".
But if you're talking about the bat, it's likely to be "The bat Bobby hit a
ball with" or "The bat that Bobby used to hit a ball" or "The bat with
which Bobby hit a ball", according to the agreed style. And if you're
talking about the notable ball and the not-so-notable Bobby: "The ball that
was hit with the bat by Bobby".
And (agreeing with "no") no, we don't need to consider all the permutations
of {Bobby, ball, bat, hum, hit, over the stadium, ...}, but we do need to
know how we avoid considering so many and how we handle those which remain.
Best regards,
Al.
On Saturday, 25 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Conjugation and Declension Functions (Jan Ainali)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 25 Jul 2020 09:30:59 +0200
> From: Jan Ainali <ainali.jan(a)gmail.com>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wiki
> media.org>
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
> Message-ID:
> <CAKwu9WF8B5f1dwzdisASnHrnEarJwaxL0iHu73CFUoX3j0cabw(a)mail.gm
> ail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Den lör 25 juli 2020 kl 01:07 skrev Adam Sobieski <
> adamsobieski(a)hotmail.com
> >:
>
> > When exploring how best to add the thematic relation of instrument to the
> > agent-patient pair (e.g. adding “using the bat” to “Bobby hit the
> ball”), I
> > observed that, for the same input grammatical arguments, there was a set
> of
> > possible output paraphrases:
> >
> >
> >
> > 1. “Bobby hit the ball using the bat”
> > 2. “Bobby, using the bat, hit the ball”
> > 3. “Using the bat, Bobby hit the ball”
> >
> >
> >
> > I wondered: how might we be able to generate each?
> >
> >
> >
>
> Since we are generating text for an encyclopedia, do we need to be able to
> pick which one of them? I think we could have a manual of style that says
> that method 1. is the one we use. That simplifies the language not only for
> the reader, but also our task by not trying to be able to create all
> nuances of spoken language.
>
> /Jan
>
Hello all,
The first topic of discussion I want to start is also the most awkward to
start with - it is akin to the question “Can you all hear me?” at the
beginning of a talk. The ones who already hear will obviously be happy, and
the ones who don’t won’t have a chance to chime in.
What communication channels should we start with?
My thought would be that there is a need for three different channels:
* First and foremost, a channel to document decisions, essays, documents,
plans, etc.
* Second, a channel for asynchronous discussion, announcements, etc.
* Third, a channel for synchronous discussion, for quick discussions,
office hours, socialization, and later, when testing and deployment starts,
for quick feedback
Let’s go through these.
First type of channel: persistent documentation and planning. Currently it
is on Meta, but there are plenty of others that could be considered
(criteria: Must enable translation):
* Stay on Meta
* MediaWiki.org
* WikiSpore
* Wikidata
* Start our own wiki right away
* A mix of the above
* Others?
Second type of channel: asynchronous, ephemeral discussions. Currently it
is this mailing list, and a growing number of metawiki talkpages (thanks
Chris Cooley for starting a discussion about this here!). Here I don’t see
that many possible channels:
* A different existing mailing list
* A second new mailing list to focus on technical aspects
* Zulip
* dropping this channel in favor of the first and third type of channel
* Others?
Third channel: synchronous discussion. Currently, nothing is ‘blessed’ as
such a channel, but there’s already a lively number of unofficial places
that have been opened and discussed here (thanks ZI Jony!). As far as I can
tell, there’s:
* #wikipedia-abstract on IRC
* #wikilambda on IRC
* Zulip
* AbstractWikipedia channel on Telegram is already quite lively
* Dropping this channel in favor of the first and second channel
* Others?
I have listed these options more with a goal of partial-completeness, not
because I would be happy with all of them. Personally, I’d be so unhappy
with e.g. blessing a Facebook group as the main official channel, so I have
not included that. I am equally uneasy with a Telegram chat, but it’s also
about going where the communities are - for example, my preference for the
chat would be IRC, but I am a dinosaur.
Here are my thoughts:
* I would like to keep the number of channels small, so we don’t frizzle
our energy out. That’s particularly important for the beginning.
* Any official channel may require additional Foundation approvals, but I
don’t want to use resources evaluating all the possible channels beforehand
if most of them are not of interest for us anyway.
* We should have permanent records of all official channels.
* All official channels should be under appropriate Terms of Services and
Code of Conduct.
* Ideally, the community would coalesce and grow on official channels.
* Ideally, the channels we choose are aligned with our values.
No decision we make now is meant to be permanent, and as the project
develops and the community grows, we expect to see this shift and change.
I am afraid that the main language for discussion in the beginning will be
English. We will rely on community support to cover contributions in other
languages as best as possible, but I am afraid it will not be possible to
translate every discussion contribution and every essay. Sorry.
Besides these discussion channels, we will also have the usual stack of
technologies supporting a Wikimedia development project: Phabricator for
tasks, Git for version control, etc. For that, we’ll mostly follow
Foundation best practices.
I am looking forward to hear from you,
Denny
First of all, welcome to Swork! I don't think you misunderstood anything,
but we're fine with that if you did. "Bobby hit the ball over the stadium
using the bat" sounds okay too.
I'll try to focus on your topic, Adam. It does seem as though we're moving
into a broader problem area, but we'll always get back to inflections
eventually.
So, yes, context is everything. Almost. So let's remember we're talking
primarily about Wikipedia. If we were talking about conversations, that
would introduce all sorts of additional and fascinating complications, like
how you add the thing you just remembered into the words you are in the
middle of speaking. But let's try to relate that sort of problem back to
our Wikipedic context.
We assume that we have a notable person aka "Bobby" and some record of an
event in this person's life (the infamous ball-hitting incident). This, I
think, corresponds to Adam's original "set-up" (to use your term, Swork).
Let's further assume that someone once wrote something in Wikipedia like
"Roberta, known to her friends as "Bobby", is the youngest female ever to
hit a ball over the <name> stadium. She used the bat her father had used in
1983, when..." And let's further assume that this event is represented in
some vaguely familiar way in Wikidata... (details avoided!) And now,
finally, we discover from Roberta's autobiography ("Call me Bobby") that
she was humming the very tune that her father used to hum... so we add that
to Wikidata (during tune Q864573216897 performed by Q<Roberta> as a hum,
reference Q<Call me Bobby>, p.132).
So, before the change, our process found "Bobby", "hit a ball" and "used
the bat". I've left the determiners (articles "a" and "the") in place
because they suggest it's any old ball but a particular bat. And Adam's
function didn't output "Bobby hit a ball used the bat" but (given the
set-up, which refers to the bat in an article about Bobby's father etc "his
daughter, ...") "...Bobby, using the bat, hit a ball..." ("over the
stadium", as it happens).
After the change in Wikidata (given the relative significance of these
well-sourced details), our selection process has transformed the recorded
fact about the tune to the action "hum" and ignored the memorable tune. It
has also, quite consistently, determined that the humming is incidental, so
"during tune" has become something equivalent to <while>. So we have
"Bobby", "hit a ball", "used the bat" plus <while>"Bobby", "hum". Given our
previous context, Adam's function will now produce "Bobby, humming, used
the bat to hit a ball..." (or something equivalent). The point is that it
is this particular context that pretty much requires "Bobby" to come first
and the "ball" to come last. In a different context, we might require
"Bobby" to come later: "Humming, while using the bat to hit a ball,
Bobby..." And notice this (it wasn't even deliberate), we ever so casually
shifted Bobby from being the subject of the verb "hit" to being the subject
of the verb "used" to being the subject of some other verb in the wider
context. Had the bat been more prominent, we might have ended up with that
as the subject: "[later] the bat was used by [his daughter] Bobby to hit a
ball..., while humming..."
Backtracking a little, if we leave open the question of which verb is the
main verb, we could just as easily come up with "Bobby hummed, while using
the bat to hit a ball. And going back to Adam's original example, we might
have got "Bobby used the bat to hit a ball" or even "Bobby batted a ball".
Only with my invented Wikipedic context can I suggest that "Bobby used the
bat to hit a ball" sounds best to my English ears. (I know that wasn't the
question but it's good to share.)
Best regards,
Al.
On Saturday, 25 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Conjugation and Declension Functions (Exocamp Exocamp)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 24 Jul 2020 23:06:45 -0500
> From: Exocamp Exocamp <scamgiver(a)gmail.com>
> To: abstract-wikipedia(a)lists.wikimedia.org
> Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
> Message-ID:
> <CAPBFV5evw=80nP7rhnXqXOJgOYS3Jgrf2X_h79n-7eE+RhajJA@mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I would like to say first that I am new to this idea and project and not as
> familiar with coding or linguistics as some of you, so I apologize in
> advance if I am severely misunderstanding something in this thread.
>
> While any of these phrases would sound decent in a sentence, I believe that
> the context of the rest of the sentence would be needed to determine the
> correct thematic relation. Even if the output phrases all contain the same
> words, natural speech and language would not pick these phrases necessarily
> at random, because the three phrases have different "set-ups" as I would
> call them. I'll give examples so I become much more clear.
>
> Let us assume we have the agent-patient-instrument pair of Bobby, the ball,
> and the bat, and finish this clause with the directional relation "over the
> stadium". Thus our sentences would most likely be as follows:
>
> [1] Bobby hit the ball using the bat over the stadium.
> [2] Bobby, using the bat, hit the ball over the stadium.
> [3] Using the bat, Bobby hit the ball over the stadium.
>
> Despite all of these being technically correct sentences in English,
> sentence [1] in my opinion would definitely not look as natural to a native
> English speaker in text as [2] or [3] (though in my opinion [3] is best)
> Let's do another example with the phrase "while he hummed a song."
>
> [1] Bobby hit the ball using the bat while he hummed a song.
> [2] Bobby, using the bat, hit the ball while he hummed a song.
> [3] Using the bat, Bobby hit the ball while he hummed a song.
>
> Now with this new phrase, sentences [1] and [2] have become the most
> natural sentences (again, in my opinion), when in the previous sentence
> where the phrase "over the stadium" was used, these two were the most
> *un*natural. As such, I believe that the most accurate method to find the
> most natural phrasing would need to find a pattern in the context in order
> to apply the correct formation to.
>
> One last thought of mine has to do with spoken English and not written, but
> still may assist with our problem. From my thinking, the stressed word on a
> previous sentence may also affect the construction of the next. Let's treat
> our agent-patient-instrument as a complete sentence, and the sentence
> before this in spoken English as "Bobby had a ball and a bat" - introducing
> Bobby, ball, and bat beforehand. The stress could affect the construction
> like so, where * * indicates the stressed word in the sentence.
>
> [1] *Bobby* had a ball and a bat. Bobby hit the ball using the bat.
> [2] Bobby had a *ball* and a bat. Bobby, using the bat, hit the ball.
> [3] Bobby had a ball and a *bat.* Using the bat, Bobby hit the ball.
>
> Again, I am not a very experienced person in either linguistics or
> programming, so I may have missed the point of something, but I hope my
> thoughts help with this.
>
> Sincerely,
> SWork
>
>
> On Fri, Jul 24, 2020, 6:07 PM <
> abstract-wikipedia-request(a)lists.wikimedia.org> wrote:
>
> > Send Abstract-Wikipedia mailing list submissions to
> > abstract-wikipedia(a)lists.wikimedia.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> > or, via email, send a message with subject or body 'help' to
> > abstract-wikipedia-request(a)lists.wikimedia.org
> >
> > You can reach the person managing the list at
> > abstract-wikipedia-owner(a)lists.wikimedia.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Abstract-Wikipedia digest..."
> >
> >
> > Today's Topics:
> >
> > 1. Re: Conjugation and Declension Functions (Adam Sobieski)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Fri, 24 Jul 2020 23:07:03 +0000
> > From: Adam Sobieski <adamsobieski(a)hotmail.com>
> > To: "General public mailing list for the discussion of Abstract
> > Wikipedia (aka Wikilambda)" <
> > abstract-wikipedia(a)lists.wikimedia.org>
> > Subject: Re: [Abstract-wikipedia] Conjugation and Declension Functions
> > Message-ID:
> > <
> > CH2PR12MB41841C5E1C4E820B9766AB1BC5770@CH2PR12MB4184.
> namprd12.prod.outlook.com
> > >
> >
> > Content-Type: text/plain; charset="windows-1252"
> >
> > In addition to broaching the discovery, encoding and reuse of
> > language-specific patterns, which I view as a discussion topic when
> > comparing and contrasting approaches (see: the DRY principle [1] and the
> > abstraction principle [2]), and indicating how conjugation and declension
> > functions would facilitate the discovery, encoding and reuse of these
> > patterns, I would like to share some topics which arose as I explored
> > adding the thematic relation [3] of instrument to the agent-patient pair.
> >
> > When exploring how best to add the thematic relation of instrument to the
> > agent-patient pair (e.g. adding “using the bat” to “Bobby hit the
> ball”), I
> > observed that, for the same input grammatical arguments, there was a set
> of
> > possible output paraphrases:
> >
> >
> > 1. “Bobby hit the ball using the bat”
> > 2. “Bobby, using the bat, hit the ball”
> > 3. “Using the bat, Bobby hit the ball”
> >
> > I wondered: how might we be able to generate each?
> >
> > One possibility is indicated. There could be a type for each thematic
> > relation [3]. Then, using explicit type conversions to these types, we
> > could have different functions for different sequences of input
> arguments.
> >
> > That is,
> >
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, (Agent)Bobby, (Patient)ball, (Instrument)bat)
> > --> “Bobby hit the ball using the bat”
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, (Agent)Bobby, (Instrument)bat, (Patient)ball)
> > --> “Bobby, using the bat, hit the ball”
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, (Instrument)bat, (Agent)Bobby, (Patient)ball)
> > --> “Using the bat, Bobby hit the ball”
> >
> > or
> >
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, Bobby as Agent, ball as Patient, bat as
> > Instrument) --> “Bobby hit the ball using the bat”
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, Bobby as Agent, bat as Instrument, ball as
> > Patient) --> “Bobby, using the bat, hit the ball”
> > hit<voice::active, tense::past, aspect::perfective,
> > mood::indicative>(context, bat as Instrument, Bobby as Agent, ball as
> > Patient) --> “Using the bat, Bobby hit the ball”
> >
> > These functions could each wrap the use of patterns (as indicated in
> > previous email) and attempt to realize output sentences utilizing the
> > arguments in the same sequence in which they were provided. This would,
> > however, mean that the callers of the functions would be responsible for
> > “shuffling” the input arguments to express the desired paraphrase ((1),
> > (2), (3)).
> >
> > There are, of course, other approaches to consider and other
> possibilities
> > to consider with respect to addressing the matter of outputting the
> example
> > paraphrases.
> >
> > Any thoughts on these topics?
> >
> >
> > Best regards,
> > Adam
> >
> > [1] https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
> > [2]
> > https://en.wikipedia.org/wiki/Abstraction_principle_(
> computer_programming)
> > [3] https://en.wikipedia.org/wiki/Thematic_relation
> >
> > P.S.: We could also put the grammatical arguments on the context object
> > (context.voice, context.tense, context.aspect, context.mood, et cetera)
> and
> > then make use of generic parameters for other uses.
> >
> > From: Adam Sobieski<mailto:adamsobieski@hotmail.com>
> > Sent: Thursday, July 23, 2020 7:38 PM
> > To: General public mailing list for the discussion of Abstract Wikipedia
> > (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org>
> > Subject: RE: Conjugation and Declension Functions
> >
> >
> > I had an idea about discovering, encoding and reusing patterns in
> > languages that I would like to share with the group.
> >
> >
> >
> > The following is rough-draft pseudocode for a function with which to
> > generate sentences for the verb “to kick”. The function has generic
> > parameters for voice and tense and parameters for context, agent and
> > patient.
> >
> >
> >
> > Clause kick<V, T>(Context context, Noun agent, Noun patient)
> >
> > {
> >
> > switch(V)
> >
> > {
> >
> > case voice::active:
> >
> > switch(T)
> >
> > {
> >
> > case tense::past:
> >
> > case tense::present:
> >
> > return join(" ", [
> >
> > agent,
> >
> > conjugate_en(“kick”, V, T, …),
> >
> > “the”,
> >
> > patient
> >
> > ]);
> >
> > case tense::future:
> >
> > return join(" ", [
> >
> > agent,
> >
> > “will”,
> >
> > conjugate_en(“kick”, V, T, …),
> >
> > “the”,
> >
> > patient
> >
> > ]);
> >
> > …
> >
> > }
> >
> > case voice::passive:
> >
> > switch(T)
> >
> > {
> >
> > case tense::past:
> >
> > return join(" ", [
> >
> > “the”,
> >
> > patient,
> >
> > “was”,
> >
> > conjugate_en(“kick”, V, T, …),
> >
> > “by”
> >
> > agent
> >
> > ]);
> >
> > case tense::present:
> >
> > return join(" ", [
> >
> > “the”,
> >
> > patient,
> >
> > “is”,
> >
> > “being”,
> >
> > conjugate_en(“kick”, V, T, …),
> >
> > “by”
> >
> > agent
> >
> > ]);
> >
> > …
> >
> > }
> >
> > }
> >
> > }
> >
> >
> >
> > and
> >
> >
> >
> > kick<voice::active, tense::past>(context, Bobby, ball) returns “Bobby
> > kicked the ball”.
> >
> >
> >
> > In English, most combinations of tense, aspect, mood and voice are
> > expressed periphrastically, using constructions with auxiliary verbs.
> After
> > implementing a number of these for similar verbs, per the above example,
> we
> > would notice a pattern. That pattern could be expressed as something
> > resembling:
> >
> >
> >
> > Clause pattern123<V, T>(Context context, Verb verb, Noun agent, Noun
> > patient)
> >
> > {
> >
> > switch(V)
> >
> > {
> >
> > case voice::active:
> >
> > switch(T)
> >
> > {
> >
> > case tense::past:
> >
> > case tense::present:
> >
> > return join(" ", [
> >
> > agent,
> >
> > conjugate_en(verb, V, T, …),
> >
> > “the”,
> >
> > patient
> >
> > ]);
> >
> > case tense::future:
> >
> > return join(" ", [
> >
> > agent,
> >
> > “will”,
> >
> > conjugate_en(verb, V, T, …),
> >
> > “the”,
> >
> > patient
> >
> > ]);
> >
> > …
> >
> > }
> >
> > case voice::passive:
> >
> > switch(T)
> >
> > {
> >
> > case tense::past:
> >
> > return join(" ", [
> >
> > “the”,
> >
> > patient,
> >
> > “was”,
> >
> > conjugate_en(verb, V, T, …),
> >
> > “by”
> >
> > agent
> >
> > ]);
> >
> > case tense::present:
> >
> > return join(" ", [
> >
> > “the”,
> >
> > patient,
> >
> > “is”,
> >
> > “being”,
> >
> > conjugate_en(verb, V, T, …),
> >
> > “by”
> >
> > agent
> >
> > ]);
> >
> > …
> >
> > }
> >
> > }
> >
> > }
> >
> >
> > We could then simply express that the verb “to kick” is an instance of
> > pattern pattern123 with something resembling:
> >
> >
> >
> > Clause kick<V, T>(Context context, Noun agent, Noun patient)
> >
> > {
> >
> > pattern123<V, T>(context, new Verb(“kick”, …), agent, patient);
> >
> > }
> >
> > or
> >
> >
> > Clause kick<V, T>(Context context, Noun agent, Noun patient)
> >
> > {
> >
> > pattern123<V, T>(context, getVerb(“https://…#kick”), agent, patient);
> >
> > }
> >
> > and this pattern could be reused for a large number of verbs.
> >
> > This use of patterns should extend to scenarios where there are more
> > parameters for other thematically-related nouns:
> >
> >
> >
> > Clause kick<V, T>(Context context, Noun agent, Noun patient, Noun
> > instrument)
> >
> > {
> >
> > pattern1234<V, T>(context, new Verb(“kick”, …), agent, patient,
> > instrument);
> >
> > }
> >
> > For English, there might be more generic parameters than those for voice
> > and tense; we could add those for aspect and mood (e.g. pattern1234<V, T,
> > A, M>(…)).
> >
> > In conclusion, with generic programming and functions for conjugation and
> > declension, we have expressiveness with which to discover, encode and
> reuse
> > language-specific patterns.
> >
> >
> > Best regards,
> > Adam Sobieski
> >
> >
Hi all,
I am not aware of details on the funding of Abstract Wikipedia, neither the
funding needs nor currently available funds or any function of budget
(estimated, approved, requested, targeted etc.) over time.
Thanks for any pointers,
Daniel
Hello all,
Based on a suggestion of Chris Cooley on the wiki, let's pull the topic of
"Theoretical and practical underpinnings of the natural language
generation" earlier. I started a page (basically empty), and hope we'll
over time work together on the structure of the page.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Related_and_previous_wor…
Abstract Wikipedia will generate natural language text from an abstract
representation. This is not a novel idea, and it has been tried a number of
times before.
On this new page we aim to collect different existing approaches. It tries
to summarize the core ideas of the different approaches, their advantages
and disadvantages, and point to existing implementations. This page (by and
for the community) will help to choose which approach to focus on first
when we get to it.
We have some time - I hope that part of the project will kick off in 12-18
months, it all goes well. It would be great if we would have a good and
understandable overview of the state of the art by then.
Thanks everyone,
Denny
Hello all,
I made a draft of possible function examples. I am not saying all of these
are useful, or we will have these, or that these are the right signatures
for the suggested functions, but this is more to illustrate the possible
scope of the project we are aiming for.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Early_function_examples
Feel free to add, discuss, improve.
Stay safe,
Denny
I like your "phrasebook" analogy, Charles. Working from the Wikidata
Statement, "Donald Trump has been President of the United States since
<start date>" inflects naturally (in Wikipedia, if the Wikdata Statement
changes) to "Donald Trump was President of the United States from <start
date> to <end date>, when he was succeeded by <successor>." Clearly it's
not just the verb that is inflected. In context, <successor> would be a
link if not previously linked, so we need "link inflection" too. I would be
inclined to return it as a link and let the receiver apply their particular
policy for duplicate links, but I've never seen them as a particular
problem anyway.
There is also a contingent form for death in office: "Donald Trump was
President of the United States from <start date> until his death on <date
of death>, when he was succeeded by <Vice President>." And maybe for the
exception to the rule: "Donald Trump was President of the United States
from <start date> until his death on <date of death> when, instead of being
succeeded by <Vice President>, who was Vice President at the time, he was
succeeded by <successor>, who was <role of successor> when the President
died. This is because <reason>."
Al.
On Thursday, 23 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: A few examples of functions (Denny Vrandečić)
> 2. Budget of Abstract Wikipedia? (Daniel Mietchen)
> 3. Re: Conjugation and Declension Functions (Adam Sobieski)
> (Charles Matthews)
> 4. Re: Conjugation and Declension Functions (Adam Sobieski)
> (Adam Sobieski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 22 Jul 2020 12:10:40 -0700
> From: Denny Vrandečić <dvrandecic(a)wikimedia.org>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists.
> wikimedia.org>
> Subject: Re: [Abstract-wikipedia] A few examples of functions
> Message-ID:
> <CA+bik1dPE60Vc-MWXso3AFTA5OpXSCdm-3W=FQT9CBY-
> G9Fjtg(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Louis,
>
> yes, you are right, but I want to point out that plural was not defined
> as plural: string ➝ string but as plural: English noun ➝ string. So the
> question is, what is English noun - and if that in turn was a wrapper
> object around Wikidata Lexemes, say L3337, it would indeed be rather easy
> to ask for the plural and get back "men" as a string.
>
> Yes, one question is what does it mean to be side-effect free. If for
> example we define that calls to Wikidata can be considered side-effect
> free, as they are basically calls to a static(-ish) knowledge base, we got
> that fixed. That is also needed for many of the other example functions.
> The problem in that case is how much caching can we get away with.
>
> So if we assume that Wikidata is available as a knowledge base for the
> functions, then I think that all the functions offered in the examples
> should be fine.
>
> What do you think?
>
> Thanks,
> Denny
>
>
>
>
>
> On Wed, Jul 22, 2020 at 8:28 AM Louis Lecailliez <
> louis.lecailliez(a)outlook.fr> wrote:
>
> > Hi Denny,
> >
> > while a lot of functions make sense while we stay in the software realm,
> > I'll note again that the ones concerning language have way too simplified
> > signature to fullfil their role.
> >
> > For example
> > * plural: English noun ➝ string
> > is gonna work as a pure function if and only if every irregular plural
> > forms of English (stuff like man/men) are hardcoded into the function
> > itself; and I think we all agree that's not a good engineering practice.
> > * plural: English noun, Dictionary<string, string> ➝ string
> > would be a more appropriate signature here, conceptually a least. In
> > reality, data will be pulled from Wikidata (if I'm not mistaken), which
> > mean the function has side effect and is not pure. It seems the
> problematic
> > of access to the data layer (whatever it is) is underspecified.
> >
> > I think it would be better to remove language generation signatures for
> > the time being to not give the impression the problem space is trivially
> > solvable by a function or two.
> >
> > Best regards,
> > Louis Lecailliez
> >
> > ------------------------------
> > *De :* Abstract-Wikipedia <abstract-wikipedia-bounces@
> lists.wikimedia.org>
> > de la part de Denny Vrandečić <dvrandecic(a)wikimedia.org>
> > *Envoyé :* mercredi 22 juillet 2020 14:30
> > *À :* Abstract Wikipedia list <abstract-wikipedia(a)lists.wikimedia.org>
> > *Objet :* [Abstract-wikipedia] A few examples of functions
> >
> > Hello all,
> >
> > I made a draft of possible function examples. I am not saying all of
> these
> > are useful, or we will have these, or that these are the right signatures
> > for the suggested functions, but this is more to illustrate the possible
> > scope of the project we are aiming for.
> >
> > https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Early_
> function_examples
> >
> > Feel free to add, discuss, improve.
> >
> > Stay safe,
> > Denny
> >
> > _______________________________________________
> > Abstract-Wikipedia mailing list
> > Abstract-Wikipedia(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> >
>
I'm loving all these emails but I find it hard to keep track of the
different topics. Someone find me a Wiki!
So, your question, Adam: "What do you think about conjugation and
declension functions?"
In general, I'm against them. (Well, I'm English. What would you expect?)
That said, I would expect correct inflections of natural language output
(and valid inferences from natural language input). I just wonder whether
we need specific functions for this and, if we do, whether they need to be
generic across languages. My view at the moment is that we need neither.
Instead, I see inflections as special cases of vocabulary and structure
selection. To refer to your example, Bobby and the ball can be related in
many ways, depending on the context. If we assume "Association Football"
(soccer), we might default to kicking. But if it's a foul, we should assume
he handled the ball. Otherwise, we can only assume (lawful) contact.
Staying on topic: "assume" is uninflected, so we might infer the present
tense; "kicking" might be a present participle but I think it's a gerund;
"it's" implies the present tense but "handled" implies the past (if it's
not passive). So, if I need to inflect the concept Bobby-kicking-the-ball,
I'm expecting to end up with "Bobby kicked the ball"... But maybe he
"didn't kick the ball". Maybe he "used to kick the ball" but now he's a
goalkeeper he "can handle the ball" without its being a foul.
So, if you like, we have some extra dimensions of inflection, to do with
negatives and interrogatives and modes and word function (should it be
"Bobby's handling the ball is lawful" or "Bobby handling the ball is
lawful" or "Bobby's handling OF the ball WAS lawful" or "handling the ball
was lawful for Bobby"). Considering only results, maybe these are just
minor tweaks of the near-final result. That's my "lazy evaluation"
perspective.
Perhaps it's worth re-visiting forms of a word. Let's say English has a
handful of forms ("kick", "kicks", "kick's", "kicks'", "kicking",
"kickings", "kicking's", "kickings'", "kicked"... and you might just treat
"kicked" as a noun) but empirically there are just three (what?!): "kick",
"kicking" and "kicked" (each of which has more than one use). Then there
are rules about adding an s (with or without an apostrophe) that are more
or less universal (and, of course, ambiguous). So my lazy English rules
would be: decide which of the three is best; add an s (or not); (if
written) use an apostrophe (or not). Okay, so that's not quite right, but I
think it's nearer to the truth than is generally supposed. The point is, if
we start with forms that actually occur (and here I would extend forms to
include auxiliary words) and then consider what rules different forms in
and out (when we need to), we won't necessarily end up with traditional
patterns. We might end up with something more interesting and more useful
(or of greater interest and more use, to "inflect" differently).
Best regards,
Alan [has general form "Al."]
On Wednesday, 22 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Conjugation and Declension Functions (Adam Sobieski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 22 Jul 2020 07:01:08 +0000
> From: Adam Sobieski <adamsobieski(a)hotmail.com>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wiki
> media.org>
> Subject: [Abstract-wikipedia] Conjugation and Declension Functions
> Message-ID:
> <CH2PR12MB4184E6AA57F4490441D7B7F6C5790(a)CH2PR12MB4184.namprd
> 12.prod.outlook.com>
>
> Content-Type: text/plain; charset="windows-1252"
>
> Louis indicated that we should consider modeling and approaches
> encompassing the features of all languages [1].
>
> Factors that affect verb conjugation are numerous and we can envision a
> verb conjugation function which gathers these factors together from across
> languages, a function resembling:
>
> conjugate(language, stem, person, number, gender, tense, aspect, mood,
> voice, formality, clusivity, transitivity, valency, evidentiality)
>
> The following example utilizes a Context parameter as we haven’t yet
> discussed global variables. The following example uses a “::” syntax for
> enums. The following example shows that, with multilingual conjugation and
> declension functions, we could encapsulate multilingual conjugation and
> declension in language-specific convenience functions, perhaps resembling:
>
> kick<T>(Context context, Noun agent, Noun patient)
> {
> join(" ", [
> agent,
> context.conjugate("en-US", "kick", person::second, agent.number,
> agent.gender, T, aspect::perfective, mood::indicative, voice::active,
> context.formality, clusivity::no, transitivity::transitive,
> valency::transitive, context.evidentiality),
> "the",
> patient
> ])
> }
>
> and, as expected:
>
> kick<tense::past>(context, Bobby, ball) would produce: “Bobby kicked the
> ball”.
>
> It appears that there are a multitude of ways to approach implementing
> NLG-related Wikilambda functions. Conjugation and declension functions
> could be components of some approaches.
>
> What do you think about conjugation and declension functions?
>
>
> Best regards,
> Adam
>
> [1] https://wals.info/
>
>
Hi! As suggested by Louis and seconded by Chris and Tiago on this list, and
since we had the Wikidata workshop already upcoming at ISWC, we decided to
extend the topics of the workshop to also cover Abstract Wikipedia. This
would be a peer-reviewed venue, with publications of the papers, etc. It is
virtually "co-located" with the ISWC2019.
Here is the call for papers.
*The First Wikidata Workshop*
Co-located with the 19th International Conference on Semantic Web (ISWC
2020).
Date: October 29, 2020
The workshop will be held online, afternoon European time.
Website: https://wikidataworkshop.github.io/
== Important dates ==
Papers due: August 10, 2020
Notification of accepted papers: September 11, 2020
Camera-ready papers due: September 21, 2020
Workshop date: October 29, 2020
== Overview ==
Wikidata is an openly available knowledge base, hosted by the Wikimedia
Foundation. It can be accessed and edited by both humans and machines and
acts as a common structured-data repository for several Wikimedia projects,
including Wikipedia, Wiktionary, and Wikisource. It is used in a variety of
applications by researchers and practitioners alike.
In recent years, we have seen an increase in the number of publications
around Wikidata. While there are several dedicated venues for the broader
Wikidata community to meet, none of them focuses on publishing original,
peer-reviewed research. This workshop fills this gap - we hope to provide a
forum to build this fledgling scientific community and promote novel work
and resources that support it.
The workshop seeks original contributions that address the opportunities
and challenges of creating, contributing to, and using a global,
collaborative, open-domain, multilingual knowledge graph such as Wikidata.
We encourage a range of submissions, including novel research, opinion
pieces, and descriptions of systems and resources, which are naturally
linked to Wikidata and its ecosystem, or enabled by it. What we’re less
interested in are works which use Wikidata alongside or in lieu of other
resources to carry out some computational task - unless the work feeds back
into the Wikidata ecosystem, for instance by improving or commenting on
some Wikidata aspect, or suggesting new design features, tools and
practices.
We also encourage submissions on the topic of Abstract Wikipedia,
particularly around collaborative code management, natural language
generation by a community, the abstract representation of knowledge, and
the interaction between Abstract Wikipedia and Wikidata on the one, and
Abstract Wikipedia and the language Wikipedias on the other side.
We welcome interdisciplinary work, as well as interesting applications
which shed light on the benefits of Wikidata and discuss areas of
improvement.
The workshop is planned as an interactive half-day event, in which most of
the time will be dedicated to discussions and exchange rather than frontal
presentations. For this reason, all accepted papers will be presented in
short talks and accompanied by a poster. We are considering online options
in response to ongoing challenges such as travel restrictions and the
recent Covid-19 pandemic.
== Topics ==
Topics of submissions include, but are not limited to:
- Data quality and vandalism detection in Wikidata
- Referencing in Wikidata
- Anomaly, bias, or novelty detection in Wikidata
- Algorithms for aligning Wikidata with other knowledge graphs
- The Semantic Web and Wikidata
- Community interaction in Wikidata
- Multilingual aspects in Wikidata
- Machine learning approaches to improve data quality in Wikidata
- Tools, bots and datasets for improving or evaluating Wikidata
- Participation, diversity and inclusivity aspects in the Wikidata ecosystem
- Human-bot interaction
- Managing knowledge evolution in Wikidata
- Abstract Wikipedia
== Submission guidelines ==
We welcome the following types of contributions:
- Full research paper: Novel research contributions (7-12 pages)
- Short research paper: Novel research contributions of smaller scope than
full papers (3-6 pages)
- Position paper: Well-argued ideas and opinion pieces, not yet in the
scope of a research contribution (6-8 pages)
- Resource paper: New dataset or other resource directly relevant to
Wikidata, including the publication of that resource (8-12 pages)
- Demo paper: New system critically enabled by Wikidata (6-8 pages)
Submissions must be as PDF or HTML, formatted in the style of the Springer
Publications format for Lecture Notes in Computer Science (LNCS). For
details on the LNCS style, see Springer’s Author Instructions.
The papers will be peer-reviewed by at least two researchers. Accepted
papers will be published as open access papers on CEUR (we only publish to
CEUR if the authors agree to have their papers published).
Papers have to be submitted through
easychair:https://easychair.org/conferences/?conf=wikidataworkshop2020
== Proceedings ==
The complete set of papers will be published with the CEUR Workshop
Proceedings (CEUR-WS.org).
== Organizing committee ==
- Lucie-Aimée Kaffee, University of Southampton
- Oana Tifrea-Marciuska, Bloomberg
- Elena Simperl, King’s College London
- Denny Vrandečić, Wikimedia Foundation
== Programme committee ==
- Dan Brickley, Google
- Andrew D. Gordon, Microsoft Research & University of Edinburgh
- Dennis Diefenbach, University Jean Monet
- Aidan Hogan, Universidad de Chile
- Markus Krötzsch, Technische Universität Dresden
- Edgar Meij, Bloomberg
- Claudia Müller-Birn, FU Berlin
- Finn Årup Nielsen, Technical University of Denmark
- Thomas Pellissier Tanon, Télécom ParisTech
- Lydia Pintscher, Wikidata, Wikimedia Deutschland
- Alessandro Piscopo, BBC
- Marco Ponza, University of Pisa
- Simon Razniewski, Max Planck Institute for Informatics
- Miriam Redi, Wikimedia Foundation
- Cristina Sarasua, University of Zurich
- Maria-Esther Vidal, TIB Hannover
- Pavlos Vougiouklis, Huawei Technologies, Edinburgh
- Zainan Victor Zhou, Google