Hi Adam!
This is a really interesting set of questions; for me, these are the most
important discussions to have around Abstract Wikipedia. I'll clarify
whether I'm answering from the perspective of what the team's currently
doing or my own personal views and desires.
First, there's a natural tension between the AW project and the notion of
neutral POV. Abstract Wikipedia is an intrinsically pluralistic project,
amplifying a given linguistic community's ability to represent their point
of view. Neutral POV is intrinsically hegemonic, as "neutrality" is decided
(even if implicitly) by the relative loudness of voices, and so usually
comes down to what's said in a small number of languages with many, or
highly privileged, speakers.
Without a means of controlling and varying the subjectivities of output
stories and language, shouldn't one desire for the
output to be as
measurably objective as possible?
WMF perspective: presumably, all information in any Wiki project is
conveyed from a neutral point of view. This policy is what's usually
invoked to combat misinformation, say when a corporate entity or political
group brigades a given Wiki and makes highly biased edits.
I am thinking about whether resultant machine-generated stories would be
objectively or subjectively narrated. These topics
appear to pertain to the
philosophy of history [1] and neutrality [2], resembling encyclopedists'
ideals of neutrality with respect to point of view [3].
Personal note: I did a little cross-linguistic literature review of Wikis
a while
ago, which I'd be happy to share with you. There are real
differences of perspective in many domains. Some obvious differences
concerned contested territories, actively occupied territories, etc. Others
had to do with historical events and political philosophies; some
highlights would be the English Wikipedia article on "Communism" and its
Spanish equivalent, or a comparison between the Arabic Wikipedia's article
on the crusades and that in English or French.
What would be really interesting here (speaking for myself) would be to
augment these articles. Arguably, the most "objective" thing one can do
would be to represent diversity (while trying to avoid misinformation ...
somehow ;) ). So what would it look like if Wikipedia articles told history
from multiple perspectives, and Abstract Wikipedia allowed us to gather
those multiple perspectives and add them to the articles in multiple Wikis?
What do you think about the idea that natural-language story generating
systems could use parameters or additional inputs to
vary the
subjectivities of the output?
Continuing from the above, and still speaking for myself: it would be
interesting to adopt primitives from epistemic logic in Wikidata. So
Wikidata could represent not just a fact, but the perspective from which
that fact is true. This would require a real organized effort (or a really
good polyglot data-mining ML), but it's a way that Abstract Wikipedia could
maintain "neutrality" while also admitting some level of subjectivity. And
again, there are a ton of dangers here, like legitimizing misinformed (or
downright malicious) perspectives. Maybe we then need a "truth score," like
in neutrostrophic logic <http://fs.unm.edu/IntrodNeutLogic.pdf>?
I don't know. Concretely, I think that something in this area would be a
great vertical to focus on as a pilot for Abstract Wikipedia!
What do you think about providing the capability for developers to be able
to trace backwards from natural-language outputs (from
words, phrases,
sentences, and paragraphs) into source code and data? Developers would,
then, be able to more readily version software and data utilizing metrics
and evaluation tools, e.g., Grammarly or sentiment analysis. In theory,
systems could provide accompanying “debugging data” alongside
natural-language outputs, this data including mappings from selections of
natural language, wikitext, or hypertext to stack traces or other data
structures.
This is part of why Abstract Wikipedia avoids machine learning. As
presently conceived, every step of the NLG process can be inspected. We
don't yet package this "debugging data" up nicely; however, it would
absolutely be possible to do so, and might have applications to prevent
abuse--presumably, if somebody misuses Abstract Wikipedia to produce fake
news, this metadata could provide signal to automate detection and
remediation. I'm not an expert on that, though; the disinformation folks at
the WMF would know more.
Happy to chat further!
Best,
Cory
On Thu, Nov 17, 2022 at 6:04 AM Adam Sobieski <adamsobieski(a)hotmail.com>
wrote:
> Abstract Wikipedia,
>
>
>
> Hello. I am recently thinking about the generation of natural-language
> stories from Wikidata data, e.g., graphs of interrelated real-world
> historical events.
>
> I am thinking about whether resultant machine-generated stories would be
> objectively or subjectively narrated. These topics appear to pertain to the
> philosophy of history [1] and neutrality [2], resembling encyclopedists'
> ideals of neutrality with respect to point of view [3].
>
> In my opinion, there would be much to learn from developing
> natural-language story generating systems which could have parameters set
> or which could receive secondary input data to subsequently produce
> subjective stories. With such systems, developers could control and vary
> the subjectivities of resultant natural-language output, e.g., as
> pertaining to sentiment.
>
> What do you think about the idea that natural-language story generating
systems could use parameters or additional inputs to
vary the
subjectivities of the output?
> Without a means of controlling and varying the subjectivities of output
stories and language, shouldn't one desire for the
output to be as
measurably objective as possible?
> What do you think about providing the capability for developers to be able
to trace backwards from natural-language outputs (from
words, phrases,
sentences, and paragraphs) into source code and data? Developers would,
then, be able to more readily version software and data utilizing metrics
and evaluation tools, e.g., Grammarly or sentiment analysis. In theory,
systems could provide accompanying “debugging data” alongside
natural-language outputs, this data including mappings from selections of
natural language, wikitext, or hypertext to stack traces or other data
structures.
>
>
>
>
> Best regards,
>
> Adam Sobieski
>
> [1]
https://en.wikipedia.org/wiki/Philosophy_of_history
>
> [2]
>
https://en.wikipedia.org/wiki/Philosophy_of_history#Philosophy_of_neutrality
>
>
> [3]
https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view
>
> _______________________________________________
> Abstract-Wikipedia mailing list -- abstract-wikipedia(a)lists.wikimedia.org
> List information:
>
https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikime…
>