I came at this problem from an enterprise work management perspective.
Using first
principles reasoning, all collaborative work is initiated, managed and
terminated by communication. Thus, each large enterprise has a
differentiated language field that can be used for risk management, work
forecasting and modeling, and innovation discovery and nurturing. Our
highly abstracted work management paradigm of today is I’ll suited for the
coming massive programs to address living on a viable earth.
On Tue, Oct 5, 2021 at 9:41 PM Adam Sobieski <adamsobieski(a)hotmail.com>
wrote:
On these topics, I also found articles about the
related topics of
“language change” [7] and “historical linguistics” [8].
In these regards, understanding languages to be organic and dynamic, I
think that we scientists and scholars should, over the course of time,
measure and evaluate the outputs of large-scale NLG systems, e.g., Abstract
Wikipedia, to compare these generated corpora to other human- and
machine-generated corpora. For instance, how might various measurements of
the English generated by Abstract Wikipedia compare to measurements of the
English Wikipedia as a corpus?
It is interesting to consider which varieties of measurements,
evaluations, and analytics will be useful for NLG development teams –
measuring objectivity and subjectivity is but one topic of many. It is
interesting to consider how NLG development teams will best go about
utilizing these measurements, evaluations, and analytics when versioning
documents, software logic, and data to hone and to finetune system outputs.
Best regards,
Adam
[7]
https://en.wikipedia.org/wiki/Language_change
[8]
https://en.wikipedia.org/wiki/Historical_linguistics
*From: *Adam Sobieski <adamsobieski(a)hotmail.com>
*Sent: *Tuesday, October 5, 2021 3:37 PM
*To: *General public mailing list for the discussion of Abstract
Wikipedia and Wikifunctions <abstract-wikipedia(a)lists.wikimedia.org>
*Subject: *[Abstract-wikipedia] Re: Objectivity and Subjectivity in
Computational Historical Narration
Doug,
Thank you. Technologies like BERT, GPT-3, and Lamda [1] are, indeed,
impressive and it is also interesting to consider what might be on the
horizon in AI.
The thoughts on language that you shared remind of “social
constructionism” [2] and “lexical entrainment” [3]. You also indicate
language as dynamic and “relative in time and location”, which reminds of
philology [4] and cognitive philology [5].
On the topics of computational historical narration – generating objective
history-educational documents from structured (wiki)data – my opinions
include that we can measure subjectivity and objectivity increasingly well,
e.g., with frame analysis and sentiment analysis, and that these
measurements and evaluations can be of use for training AI, for
implementing NLG algorithms, and for implementing NLU technologies,
applications of NLU including word-processing-related tools, products, and
services (see also: [6]).
Best regards,
Adam
[1]
https://blog.google/technology/ai/lamda/
[2]
https://en.wikipedia.org/wiki/Social_constructionism
[3]
https://en.wikipedia.org/wiki/Lexical_entrainment
[4]
https://en.wikipedia.org/wiki/Philology
[5]
https://en.wikipedia.org/wiki/Cognitive_philology
[6]
https://github.com/w3c/document-services
*From: *Douglas Clark <clarkdd(a)gmail.com>
*Sent: *Tuesday, October 5, 2021 11:52 AM
*To: *General public mailing list for the discussion of Abstract
Wikipedia and Wikifunctions <abstract-wikipedia(a)lists.wikimedia.org>
*Subject: *[Abstract-wikipedia] Re: Objectivity and Subjectivity in
Computational Historical Narration
Adam,
I like to think of language as a field [1]. Each discrete meaning occupies
a position within the field. Each discreet meaning is a concept (not word)
that can be conveyed with any word grouping as long as the meaning is the
same. A concept that appears the same from a word structure, but is
different with context is a separate meaning. “Bring me the server,” can
refer to a restaurant setting as well as an IT setting. They are two very
separate meanings. Even at the most abstracted level, one is bring an
object, and the other bring a person. With use, meanings accrete influence
following Zipf’s inverse power law [2]. Context is a binding force within
the field. A language field is inert and completely objective until acted
upon by an observer. Each communication updates the field meanings and
modifies the contextual binding force. Fields nest from a person’s, to a
family’s language field to a field of legal jargon to fields of slang to
the field of all human communication. They all follow Zipf’s Law and they
all use context to limit meaning choices to drive understanding.
When observer A wants to communicate their subjective reality, the context
they set within the shared language field constrains the possible direction
of the communication. For centuries, pizza probably had zero contextual
relationship to pedophillia, yet now has a fairly strong contextual
binding. A discussion about taking a walk has massive contextual binding to
meanings with outdoor settings. Conversely, a conversation is not very
likely to transition next to the topic of SpaceX’s upcoming launch with an
initial context of making baked beans. It’s much more likely to include
topics such as fire up the BBQ and get the corn bread out of the oven.
So while the initial observer’s selection of context is driven by a
subjective input, the attraction (I think of it as meaning gravity) of high
use meanings (Zipf’s’s Law) and contextual constraints for follow-on topics
morphs communication toward objectivity among the participating observers.
For any outside observer, the communication would still be perceived as
subjective, since they did not participate in the communication by setting
their own contexts. However, if a linguist came along 100 years later, the
communication would appear as an objective documentation of the
communication event. So, language use is also relative in time and location.
So that’s a bunch of stuff to say using machine learning to generate
language will be as objective as the field(s) ingested for both training
and building the model. If the ML uses teen slang for training, it will not
perform well for aerospace uses, but it will objectively represent the
current state of teen slang. Observers constrained to the contexts of teen
slang would find it exceedingly difficult to use the ML products for
microbiology. Further, if the field was of 1950’s teen slang, today’s teens
would find little use in any ML product.
Most training datasets in use today, to include GPT-3, use huge corpora of
words and word n-grams. Context is set by word distances. Those rules do no
represent the language field I have described. The rules are abstracted and
artificial. I do not choose language by paying attention to word spacing.
Observers do not communicate with words, they communicate with concepts in
contexts. That’s why word choice many times does not matter, if the concept
and the context are the same. Shakespeare got it:
*“What's in a name? That which we call a rose*
* By any other word would smell as sweet..."*
That’s why I advocate for the building of the universal language field.
The only way to objectively generate language for all observings is to
build a 99.9999% near real time complete field of human communication and
then let context metadata resolve the language to the appropriate sub
field(s). We have language fields going back to at least 3100 BCE. ML is
great at using categories (metadata) to parse data into meaningful models.
For example, there are indications that communication that that includes
long distances between concepts (with respect to the field), or those with
low context bindings may indicate the presence of misinformation. We would
see that instantly if we had ML searching for communication meeting those
criteria. It could also signal a breakthrough in some human endeavor as
connecting seemingly unrelated things is a hallmark of innovation. We could
use algorithms like Polya’s Urn to help identify innovating communication.
Lastly, we have dictionaries, we have encyclopedias, we have thesauri, but
we do not have a reference of human communication. We couldn’t before
because the curation was beyond a lifetime of effort. We can now.
Doug
[1]
https://en.wikipedia.org/wiki/Vector_field
[2]
https://en.wikipedia.org/wiki/Zipf%27s_law
On Wed, Sep 29, 2021 at 7:09 PM Adam Sobieski <adamsobieski(a)hotmail.com>
wrote:
Wikidata,
Abstract Wikipedia,
Hello. I am recently thinking about objectivity and subjectivity with
respect to natural language generation, in particular in the contexts of
story generation using historical data [1][2].
In the near future, digital humanities scholars – in particular historians
– could modify collections of data and finetune generation-related
parameters, watching as resultant multimodal historical narratives emerged
and varied. In these regards, we can envision both computer-aided and
automated historical narrative generation tools and technologies.
Could AI be a long-sought objective narrator for historians? Is all
narration, or all language use, inherently subjective? What might the
nature of “generation-related parameters” and “finetuning” be for style and
subjectivity [3][4][5][6][7][8] when generating natural language and
multimodal historical narratives from historical data [1][2]?
Thank you. Hopefully, these topics are interesting.
Best regards,
Adam Sobieski
[1] Metilli, Daniele, Valentina Bartalesi, and Carlo Meghini. "A
Wikidata-based tool for building and visualising narratives." International
Journal on Digital Libraries 20, no. 4 (2019): 417-432.
[2] Metilli, Daniele, Valentina Bartalesi, Carlo Meghini, and Nicola
Aloia. "Populating narratives using Wikidata events: An initial
experiment." In Italian Research Conference on Digital Libraries, pp.
159-166. Springer, Cham, 2019.
[3]
https://en.wikipedia.org/wiki/Subjectivity
[4]
https://en.wikipedia.org/wiki/Objectivity_(philosophy)
[5]
https://en.wikipedia.org/wiki/Political_subjectivity
[6]
https://en.wikipedia.org/wiki/Framing_(social_sciences)
[7]
https://en.wikipedia.org/wiki/Focalisation
[8]
https://en.wikipedia.org/wiki/Point_of_view_(philosophy)
_______________________________________________
Abstract-Wikipedia mailing list -- abstract-wikipedia(a)lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikime…
_______________________________________________
Abstract-Wikipedia mailing list -- abstract-wikipedia(a)lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikime…