Resembling multipart MIME [1] and parallel markup from MathML3 [2][3], we can envision envelopes which can contain multiple representations, each describing the content, structure or meaning of a portion of natural language.

 

An envelope could, for instance, contain a text version of a sentence, a parse tree of a sentence, and a model of the semantic meaning of a sentence:

 

<envelope>

  <part type="text/plain" lang="en-US">Bobby hit the ball.</part>

  <part type="application/parse+xml" lang="en-US">

    <s id="…" xmlns="…">

      <n id="…">Bobby</np>

      <vp id="…">

        <v id="…">hit</v>

        <np id="…">

          <d id="…">the</d>

          <n id="…">ball</n>

        </np>

      </vp>

    </s>

  </part>

  <part type="application/calculus+xml">

    …

  </part>

</envelope>

 

The content in parts could cross-reference, connect to, or link to specific content in other parts in the same envelope. One means of accomplishing this would be to make use of XML id and xref attributes for XML-based formats. Another means of accomplishing this would be to make use of XML id attributes, for XML-based formats, or to otherwise ensure content is URI-addressable, for other formats, and then to include a part which interrelates the content from other parts, for instance with RDF.

 

Is there any interest in envelopes which can contain multiple, interrelated representations of the content, structure and meaning of portions of natural language?

 

 

Best regards,

Adam Sobieski

 

[1] https://en.wikipedia.org/wiki/MIME#Multipart_messages

[2] https://www.w3.org/TR/MathML3/chapter5.html

[3] https://www.w3.org/TR/MathML3/chapter5.html#mixing.parallel

 

From: Tiago Timponi Torrent
Sent: Monday, July 6, 2020 7:32 PM
To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)
Subject: Re: [Abstract-wikipedia] Knowledge Representation for Natural Language Generation

 

Hi, Adam

 

This is very similar to how Berkeley Constructicon Grammar, Sign-Based Construction Grammar and Embodied Construction Grammar conceive the representation of linguistic knowledge, with some variation depending on the specific type of CxG.

 

Cheers

 

Tiago

 

Em seg, 6 de jul de 2020 às 19:43, Adam Sobieski <adamsobieski@hotmail.com> escreveu:

I would like to move this discussion to a new thread for those interested in knowledge representation for natural language generation.

 

Something which interested me about UNL when it was recently brought to my attention was that it expands upon predicate calculus by providing the expressiveness for placing attributes upon objects and relations. I presumed that one could also place them upon expressions (@a4, @a8, @a13) and upon sets of expressions as well (@a14).

 

UNL’

 

{

  r1.@a1(o1(icl>domain1).@a2, o2(icl>domain2).@a3).@a4

  r2.@a5(o3(icl>domain3).@a6, o4(icl>domain4).@a7).@a8

  r3.@a9(o5(icl>domain5).@a10, o6(icl>domain6).@a11, o7(icl>domain7).@a12).@a13

}.@a14

 

I then considered that, beyond attributes, one could place attribute-value pairs upon objects, relations, expressions and sets of expressions.

 

Something New

 

{

  r1.[@a1=v1](o1.[@a2=v2], o2.[@a3=v3]).[@a4=v4]

  r2.[@a5=v5](o3.[@a6=v6], o4.[@a7=v7]).[@a8=v8]

  r3.[@a9=v9](o5.[@a10=v10], o6.[@a11=v11], o7.[@a12=v12]).[@a13=v13]

}.[@a14=v14]

 

Resembling W3C technologies, URI could be used for objects, relations and attributes instead of plain text strings.

 

Beyond attribute-value pairs, it is also possible that objects, relations, expressions, and sets of expressions could each be as objects in the sense of object graphs (or potentially semantic graphs).

 

In the Wikipedia article about UNL, there is the example: “The sky is blue?!”. The Wikipedia article indicates the tabular representation in UNL for that utterance to be:

 

aoj( blue(icl>color).@entry.@past.@interrogative.@exclamation , sky(icl>natural world).@def)

 

In UNL’, a tabular representation for that utterance might be:

 

aoj.@past( blue(icl>color).@entry , sky(icl>natural world).@def).@interrogative.@exclamation

 

In the new knowledge representation format, we could, resembling the mapping of valueless attributes from HTML to XHTML, map the valueless attributes to Boolean-valued attributes and assign them the value of true:

 

aoj.[@past=true]( blue(icl>color).[@entry=true] , sky(icl>natural world).[@def=true]).[@interrogative=true].[@exclamation=true]

 

We could also map semantics to other attribute-value pairs, for example:

 

aoj.[@tense=past](…)

 

I am finding these topics to be interesting. I am also presently considering an object model for the new knowledge representation format, resembling how the DOM conveniences developers working with XML.

 

Any thoughts on these ideas or about knowledge representation for natural language generation in general?

 

 

Best regards,

Adam Sobieski

 

_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

--

Tiago Timponi Torrent

PPG-Linguística - FrameNet Brasil

Universidade Federal de Juiz de Fora

http://tiagotorrent.com