Generating Natural-language Stories using Wikidata Historical Data

List overview All Threads
Download

newer

older

Newsletter #95: Meet Ori Livneh,...

Newsletter #93: Checking lexical...

Adam Sobieski

17 Nov 2022 17 Nov '22

10:47 a.m.

Abstract Wikipedia, Hello. I am recently thinking about the generation of natural-language stories from Wikidata data, e.g., graphs of interrelated real-world historical events. I am thinking about whether resultant machine-generated stories would be objectively or subjectively narrated. These topics appear to pertain to the philosophy of history [1] and neutrality [2], resembling encyclopedists' ideals of neutrality with respect to point of view [3]. In my opinion, there would be much to learn from developing natural-language story generating systems which could have parameters set or which could receive secondary input data to subsequently produce subjective stories. With such systems, developers could control and vary the subjectivities of resultant natural-language output, e.g., as pertaining to sentiment. What do you think about the idea that natural-language story generating systems could use parameters or additional inputs to vary the subjectivities of the output? Without a means of controlling and varying the subjectivities of output stories and language, shouldn't one desire for the output to be as measurably objective as possible? What do you think about providing the capability for developers to be able to trace backwards from natural-language outputs (from words, phrases, sentences, and paragraphs) into source code and data? Developers would, then, be able to more readily version software and data utilizing metrics and evaluation tools, e.g., Grammarly or sentiment analysis. In theory, systems could provide accompanying “debugging data” alongside natural-language outputs, this data including mappings from selections of natural language, wikitext, or hypertext to stack traces or other data structures. Best regards, Adam Sobieski [1] https://en.wikipedia.org/wiki/Philosophy_of_history [2] https://en.wikipedia.org/wiki/Philosophy_of_history#Philosophy_of_neutrality [3] https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view

Attachments:

attachment.htm (text/html — 5.1 KB)

Show replies by thread

Cory Massaro

17 Nov 17 Nov

7:49 p.m.

Hi Adam! This is a really interesting set of questions; for me, these are the most important discussions to have around Abstract Wikipedia. I'll clarify whether I'm answering from the perspective of what the team's currently doing or my own personal views and desires. First, there's a natural tension between the AW project and the notion of neutral POV. Abstract Wikipedia is an intrinsically pluralistic project, amplifying a given linguistic community's ability to represent their point of view. Neutral POV is intrinsically hegemonic, as "neutrality" is decided (even if implicitly) by the relative loudness of voices, and so usually comes down to what's said in a small number of languages with many, or highly privileged, speakers. Without a means of controlling and varying the subjectivities of output

...

stories and language, shouldn't one desire for the output to be as measurably objective as possible?

WMF perspective: presumably, all information in any Wiki project is conveyed from a neutral point of view. This policy is what's usually invoked to combat misinformation, say when a corporate entity or political group brigades a given Wiki and makes highly biased edits. I am thinking about whether resultant machine-generated stories would be

...

objectively or subjectively narrated. These topics appear to pertain to the philosophy of history [1] and neutrality [2], resembling encyclopedists' ideals of neutrality with respect to point of view [3]. Personal note: I did a little cross-linguistic literature review of Wikis

a while ago, which I'd be happy to share with you. There are real differences of perspective in many domains. Some obvious differences concerned contested territories, actively occupied territories, etc. Others had to do with historical events and political philosophies; some highlights would be the English Wikipedia article on "Communism" and its Spanish equivalent, or a comparison between the Arabic Wikipedia's article on the crusades and that in English or French. What would be really interesting here (speaking for myself) would be to augment these articles. Arguably, the most "objective" thing one can do would be to represent diversity (while trying to avoid misinformation ... somehow ;) ). So what would it look like if Wikipedia articles told history from multiple perspectives, and Abstract Wikipedia allowed us to gather those multiple perspectives and add them to the articles in multiple Wikis? What do you think about the idea that natural-language story generating

...

systems could use parameters or additional inputs to vary the subjectivities of the output?

Continuing from the above, and still speaking for myself: it would be interesting to adopt primitives from epistemic logic in Wikidata. So Wikidata could represent not just a fact, but the perspective from which that fact is true. This would require a real organized effort (or a really good polyglot data-mining ML), but it's a way that Abstract Wikipedia could maintain "neutrality" while also admitting some level of subjectivity. And again, there are a ton of dangers here, like legitimizing misinformed (or downright malicious) perspectives. Maybe we then need a "truth score," like in neutrostrophic logic <http://fs.unm.edu/IntrodNeutLogic.pdf>? I don't know. Concretely, I think that something in this area would be a great vertical to focus on as a pilot for Abstract Wikipedia! What do you think about providing the capability for developers to be able

...

to trace backwards from natural-language outputs (from words, phrases, sentences, and paragraphs) into source code and data? Developers would, then, be able to more readily version software and data utilizing metrics and evaluation tools, e.g., Grammarly or sentiment analysis. In theory, systems could provide accompanying “debugging data” alongside natural-language outputs, this data including mappings from selections of natural language, wikitext, or hypertext to stack traces or other data structures.

This is part of why Abstract Wikipedia avoids machine learning. As presently conceived, every step of the NLG process can be inspected. We don't yet package this "debugging data" up nicely; however, it would absolutely be possible to do so, and might have applications to prevent abuse--presumably, if somebody misuses Abstract Wikipedia to produce fake news, this metadata could provide signal to automate detection and remediation. I'm not an expert on that, though; the disinformation folks at the WMF would know more. Happy to chat further! Best, Cory On Thu, Nov 17, 2022 at 6:04 AM Adam Sobieski <adamsobieski(a)hotmail.com> wrote: > Abstract Wikipedia, > > > > Hello. I am recently thinking about the generation of natural-language > stories from Wikidata data, e.g., graphs of interrelated real-world > historical events. > > I am thinking about whether resultant machine-generated stories would be > objectively or subjectively narrated. These topics appear to pertain to the > philosophy of history [1] and neutrality [2], resembling encyclopedists' > ideals of neutrality with respect to point of view [3]. > > In my opinion, there would be much to learn from developing > natural-language story generating systems which could have parameters set > or which could receive secondary input data to subsequently produce > subjective stories. With such systems, developers could control and vary > the subjectivities of resultant natural-language output, e.g., as > pertaining to sentiment. > > What do you think about the idea that natural-language story generating

...

systems could use parameters or additional inputs to vary the subjectivities of the output?

> Without a means of controlling and varying the subjectivities of output

...

stories and language, shouldn't one desire for the output to be as measurably objective as possible?

> What do you think about providing the capability for developers to be able

...

> > > > > Best regards, > > Adam Sobieski > > [1] https://en.wikipedia.org/wiki/Philosophy_of_history > > [2] > https://en.wikipedia.org/wiki/Philosophy_of_history#Philosophy_of_neutrality > > > [3] https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view > > _______________________________________________ > Abstract-Wikipedia mailing list -- abstract-wikipedia(a)lists.wikimedia.org > List information: > https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikime… >

Douglas Clark

8:18 p.m.

Adam and Corey, This exact topic is one reason why I proposed building a paraphrase graph. While machine learning is a bit of a black box, the resulting graph would be human readable, and early in the project required, as paraphrase detection is only accurate to about the 90% range. We will need to do better over time - see Heaps Law. With a full graph of English Wikipedia for instance, an operator could invoke multiple pathways through the graph and with Wikidata metadata for each node, new history and even connection stories could be told (akin to James Burke’s *Connections* show). All of the pathways would reflect whatever level of neutrality is present in the Wiki. As you move out of a Wikipedia paraphrase graph into a larger web paraphrase graph, the base “ground truth” of the Wikipedia graph would be a guide to assessing misinformation as well as the amount of objectivity. We will never know our combined communication structure until we build a paraphrase graph. We know that Zipf’s Law holds to 9 orders of magnitude for phrases, thus the network effect is present in how we communicate to each other. What better way to tell our story in different ways, than to use the source material of us communicating knowledge to each other. Doug On Thu, Nov 17, 2022 at 2:50 PM Cory Massaro <cmassaro(a)wikimedia.org> wrote:

...

stories and language, shouldn't one desire for the output to be as measurably objective as possible?

systems could use parameters or additional inputs to vary the subjectivities of the output?

> Without a means of controlling and varying the subjectivities of output

stories and language, shouldn't one desire for the output to be as measurably objective as possible?

> What do you think about providing the capability for developers to be > able to trace backwards from natural-language outputs (from words, phrases, > sentences, and paragraphs) into source code and data? Developers would, > then, be able to more readily version software and data utilizing metrics > and evaluation tools, e.g., Grammarly or sentiment analysis. In theory, > systems could provide accompanying “debugging data” alongside > natural-language outputs, this data including mappings from selections of > natural language, wikitext, or hypertext to stack traces or other data > structures. > > > > > > Best regards, > > Adam Sobieski > > [1] https://en.wikipedia.org/wiki/Philosophy_of_history > > [2] > https://en.wikipedia.org/wiki/Philosophy_of_history#Philosophy_of_neutrality > > > [3] https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view > > _______________________________________________ > Abstract-Wikipedia mailing list -- abstract-wikipedia(a)lists.wikimedia.org > List information: > https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikime… > _______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia(a)lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikime…

Adam Sobieski

18 Nov 18 Nov

2:48 a.m.

Cory, All, It is good to see that others are also interested in and thinking about these topics. It is also good to see that eventual debugging and fine-tuning experiences for developers, editors, and end-users are presently being considered, experiences which involve being able to inspect and select portions of system output to trace back into source code and data to make improvements. It appears that controlling subjectivities and ensuring objectivity are relevant throughout the stages of natural-language generation: content determination, document structuring, aggregation, lexical choice, referring expression generation, and realization. I had been thinking about subjectivities and style with respect to lexical choice, referring expression generation, and realization (choosing the specific nouns, verbs, adjectives, and adverbs) and I agree with your point about content determination, that encyclopedia-article-generating systems could enhance objectivity by determining to present content from multiple perspectives. Yes, I would be interested in more information about your cross-linguistic literature review of Wikis. Thank you for sharing that publication about neutrosophic logic which describes every logical variable x as being "described by a triple: x = (t, i, f), where t is the degree of truth, f is the degree of false, and i is the level of indeterminacy." Best regards, Adam ________________________________ From: Cory Massaro <cmassaro(a)wikimedia.org> Sent: Thursday, November 17, 2022 2:49 PM To: General public mailing list for the discussion of Abstract Wikipedia and Wikifunctions <abstract-wikipedia(a)lists.wikimedia.org> Subject: [Abstract-wikipedia] Re: Generating Natural-language Stories using Wikidata Historical Data Hi Adam! This is a really interesting set of questions; for me, these are the most important discussions to have around Abstract Wikipedia. I'll clarify whether I'm answering from the perspective of what the team's currently doing or my own personal views and desires. First, there's a natural tension between the AW project and the notion of neutral POV. Abstract Wikipedia is an intrinsically pluralistic project, amplifying a given linguistic community's ability to represent their point of view. Neutral POV is intrinsically hegemonic, as "neutrality" is decided (even if implicitly) by the relative loudness of voices, and so usually comes down to what's said in a small number of languages with many, or highly privileged, speakers. Without a means of controlling and varying the subjectivities of output stories and language, shouldn't one desire for the output to be as measurably objective as possible? WMF perspective: presumably, all information in any Wiki project is conveyed from a neutral point of view. This policy is what's usually invoked to combat misinformation, say when a corporate entity or political group brigades a given Wiki and makes highly biased edits. I am thinking about whether resultant machine-generated stories would be objectively or subjectively narrated. These topics appear to pertain to the philosophy of history [1] and neutrality [2], resembling encyclopedists' ideals of neutrality with respect to point of view [3]. Personal note: I did a little cross-linguistic literature review of Wikis a while ago, which I'd be happy to share with you. There are real differences of perspective in many domains. Some obvious differences concerned contested territories, actively occupied territories, etc. Others had to do with historical events and political philosophies; some highlights would be the English Wikipedia article on "Communism" and its Spanish equivalent, or a comparison between the Arabic Wikipedia's article on the crusades and that in English or French. What would be really interesting here (speaking for myself) would be to augment these articles. Arguably, the most "objective" thing one can do would be to represent diversity (while trying to avoid misinformation ... somehow ;) ). So what would it look like if Wikipedia articles told history from multiple perspectives, and Abstract Wikipedia allowed us to gather those multiple perspectives and add them to the articles in multiple Wikis? What do you think about the idea that natural-language story generating systems could use parameters or additional inputs to vary the subjectivities of the output? Continuing from the above, and still speaking for myself: it would be interesting to adopt primitives from epistemic logic in Wikidata. So Wikidata could represent not just a fact, but the perspective from which that fact is true. This would require a real organized effort (or a really good polyglot data-mining ML), but it's a way that Abstract Wikipedia could maintain "neutrality" while also admitting some level of subjectivity. And again, there are a ton of dangers here, like legitimizing misinformed (or downright malicious) perspectives. Maybe we then need a "truth score," like in neutrostrophic logic<http://fs.unm.edu/IntrodNeutLogic.pdf>? I don't know. Concretely, I think that something in this area would be a great vertical to focus on as a pilot for Abstract Wikipedia! What do you think about providing the capability for developers to be able to trace backwards from natural-language outputs (from words, phrases, sentences, and paragraphs) into source code and data? Developers would, then, be able to more readily version software and data utilizing metrics and evaluation tools, e.g., Grammarly or sentiment analysis. In theory, systems could provide accompanying “debugging data” alongside natural-language outputs, this data including mappings from selections of natural language, wikitext, or hypertext to stack traces or other data structures. This is part of why Abstract Wikipedia avoids machine learning. As presently conceived, every step of the NLG process can be inspected. We don't yet package this "debugging data" up nicely; however, it would absolutely be possible to do so, and might have applications to prevent abuse--presumably, if somebody misuses Abstract Wikipedia to produce fake news, this metadata could provide signal to automate detection and remediation. I'm not an expert on that, though; the disinformation folks at the WMF would know more. Happy to chat further! Best, Cory On Thu, Nov 17, 2022 at 6:04 AM Adam Sobieski <adamsobieski@hotmail.com<mailto:adamsobieski@hotmail.com>> wrote: Abstract Wikipedia, Hello. I am recently thinking about the generation of natural-language stories from Wikidata data, e.g., graphs of interrelated real-world historical events. I am thinking about whether resultant machine-generated stories would be objectively or subjectively narrated. These topics appear to pertain to the philosophy of history [1] and neutrality [2], resembling encyclopedists' ideals of neutrality with respect to point of view [3]. In my opinion, there would be much to learn from developing natural-language story generating systems which could have parameters set or which could receive secondary input data to subsequently produce subjective stories. With such systems, developers could control and vary the subjectivities of resultant natural-language output, e.g., as pertaining to sentiment. What do you think about the idea that natural-language story generating systems could use parameters or additional inputs to vary the subjectivities of the output? Without a means of controlling and varying the subjectivities of output stories and language, shouldn't one desire for the output to be as measurably objective as possible? What do you think about providing the capability for developers to be able to trace backwards from natural-language outputs (from words, phrases, sentences, and paragraphs) into source code and data? Developers would, then, be able to more readily version software and data utilizing metrics and evaluation tools, e.g., Grammarly or sentiment analysis. In theory, systems could provide accompanying “debugging data” alongside natural-language outputs, this data including mappings from selections of natural language, wikitext, or hypertext to stack traces or other data structures. Best regards, Adam Sobieski [1] https://en.wikipedia.org/wiki/Philosophy_of_history [2] https://en.wikipedia.org/wiki/Philosophy_of_history#Philosophy_of_neutrality [3] https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view _______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org<mailto:abstract-wikipedia@lists.wikimedia.org> List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikime…

530

days inactive

531

days old

abstract-wikipedia@lists.wikimedia.org

Manage subscription

3 comments

3 participants

tags (0)

participants (3)

Adam Sobieski
Cory Massaro
Douglas Clark