A Document Abstraction Layer

List overview All Threads
Download

newer

older

Runtime considerations:...

Wikispore Sunday mini-conference

Adam Sobieski

19 Jul 2020 19 Jul '20

10:11 a.m.

Introduction

A document abstraction layer (DAL) could provide an API for natural language generation output while encapsulating a configurable and mechanical transformation to specific wikitext implementations. Natural language generation software could, then, output through a DAL to wikitext and this would facilitate certain conveniences and features.

Feature 1: Provenance, Sources and Derivations

A DAL could include a means for annotating portions of natural language with provenance data, sources or derivations. A DAL could provide an API for easily attaching provenance data to document elements (e.g. words, phrases, sentences, paragraphs, sections) and it would encapsulate handing the correct outputting of wikitext (e.g. <ref> and <references> elements) for output articles [1][2][3][4].

That is, one could attach provenance data, sources or derivations, to a particular sentence in a DAL object model and the DAL’s configurable, mechanical transformation to wikitext would encapsulate a specific implementation of outputting a <ref> element at the end of the sentence and ensuring that the automatically-generated wikitext article contained a <references> element in a “References” section.

Furthermore, a DAL could encapsulate the naming of citation instances (<ref name="multiple">) [2] to ensure proper formatting of the references section when materials are cited multiple times.

Feature 2: Computational Traces and Diagnostics

A DAL could include a means of attaching computational traces and diagnostic messages or information to specific portions of generated natural language output (e.g. words, phrases, sentences, paragraphs, sections).

While viewing articles which were automatically-generated in a “debug mode” or “developer mode”, developers could view and interact with content-specific computational traces or diagnostic messages or information in a number of ways (hoverboxes, margin notes, et cetera).

That is, a DAL could encapsulate outputting specific wikitext implementations for providing developers with a means of interacting with automatically-generated articles to view and to understand the computational processes involved in generating specific portions of natural language.

As citations and references are implemented by means of an extension [1], this set of features might require a new MediaWiki extension.

Feature 3: Versioning a Document Abstraction Layer

One could version a DAL, adding features to it, without impacting any existing natural language generation software code. One could also adjust specific output wikitext implementations for particular scenarios without impacting any natural language generation software code.

Conclusion

Should natural language generation output directly to wikitext or instead make use of an abstraction layer? What do you think of these ideas?

Best regards, Adam Sobieski

[1] https://www.mediawiki.org/wiki/Extension:Cite [2] https://www.mediawiki.org/wiki/Help:Cite [3] https://en.wikipedia.org/wiki/Help:Cite_link_labels [4] https://github.com/SemanticMediaWiki/SemanticCite/blob/master/README.md

Attachments:

attachment.htm (text/html — 6.4 KB)

Show replies by date

Adam Sobieski

20 Jul 20 Jul

4:48 a.m.

A DAL could include a means of attaching computational traces and diagnostic messages or information to specific portions of generated natural language output.

Exploring these concepts, resembling the W3C document object model (DOM) [6][7], we could have a scripting environment global element of document [8] for the output article DAL API. Also resembling the DOM, an interface like Element [9] could be defined and, pertinent to DAL scenarios, provide API resembling the console interface [2][3].

Each element [9] of a DAL document tree (e.g. word, phrase, sentence, paragraph, section) could provide console-like API [2][3] for its own separate diagnostic events and messages. There could be a parameter for toggling whether elements inherit particular diagnostic events and messages from their parent elements.

To an example, we could access the output article via a global document interface (resembling [6][7]), use some API to access the first paragraph of the output article, and then use an interface resembling console [2][3][4] upon that paragraph element:

document.getElementsByTagName('paragraph').item(0).debug("this is a debugging message attached to a paragraph");

Each element [9] of a DAL document tree could have its own console-like output [2][3]. A DAL would then encapsulate outputting arbitrarily intricate wikitext for developers to be able to view articles in a “debug mode” or “developer mode” so as to be able to interact with the computational traces and diagnostic messages relevant to specific portions of natural language content.

Best regards, Adam

[1] https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.trace?view=ne... [2] https://developer.mozilla.org/en-US/docs/Web/API/Console_API [3] https://developer.mozilla.org/en-US/docs/Web/API/console [4] https://developer.mozilla.org/en-US/docs/Web/API/Console/trace [5] https://v8.dev/docs/stack-trace-api [6] https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model [7] https://dom.spec.whatwg.org/ [8] https://developer.mozilla.org/en-US/docs/Web/API/Document [9] https://developer.mozilla.org/en-US/docs/Web/API/Element

From: Adam Sobieskimailto:adamsobieski@hotmail.com Sent: Sunday, July 19, 2020 4:37 AM To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)mailto:abstract-wikipedia@lists.wikimedia.org Subject: [Abstract-wikipedia] A Document Abstraction Layer

Introduction

Feature 1: Provenance, Sources and Derivations

Furthermore, a DAL could encapsulate the naming of citation instances (<ref name="multiple">) [2] to ensure proper formatting of the references section when materials are cited multiple times.

Feature 2: Computational Traces and Diagnostics

As citations and references are implemented by means of an extension [1], this set of features might require a new MediaWiki extension.

Feature 3: Versioning a Document Abstraction Layer

Conclusion

Should natural language generation output directly to wikitext or instead make use of an abstraction layer? What do you think of these ideas?

Best regards, Adam Sobieski

Adam Sobieski

2:30 p.m.

The Document Abstraction Layer and the Document Object Model

A document abstraction layer (DAL) abstracts and models Wikipedia articles in a manner designed to: (1) convenience NLG development, (2) encapsulate the outputting of resultant wikitext strings.

As envisioned, the DAL extends the document object model (DOM) [1][2], adding certain functionalities with which to convenience NLG output scenarios, including, but not limited to: (1) adding provenance data to specific portions of articles, and (2) adding computational traces and diagnostic messages to specific portions of articles.

An envisioned, the DAL utilizes a simpler document model than that of HTML5, for example one comprised of: articles, sections, paragraphs, sentences, phrases and words.

Relevant to Node.js and to other JavaScript-based runtimes, the DOM is implemented in JavaScript in the JSDOM project [3].

Examples

Provenance, Sources and Derivations

document.getElementsByTagName('sentence').item(0).addSource(s); document.getElementsByTagName('sentence').item(0).addDerivation(d);

These examples show how the DAL could provide NLG developers with convenience with respect to adding provenance data to specific portions of Wikipedia articles.

Computational Traces and Diagnostics

document.getElementsByTagName('paragraph').item(0).debug("this is a debugging message attached to a paragraph");

This example shows how functionality based on the console API [4][5] could be added to DAL elements so that NLG developers could easily attach informational messages, warning messages, error messages, log messages, debugging messages and computational traces to specific portions of Wikipedia articles.

The DAL would encapsulate the outputting of resultant wikitext strings including for “debug mode” and/or “developer mode” scenarios. Arguments for “debug mode” and/or “developer mode” outputting include:

1. It would be convenient when developing NLG software. 2. When responding to an anomaly, developers would be able to quickly navigate to relevant source code from specific portions of NLG output. 3. Developers new to the Abstract Wikipedia project would be able to quickly learn about the codebase by viewing and understanding the processes of computation which generated portions of automatically-generated articles and navigating to relevant source code from specific portions of those articles.

Obtaining Wikitext for a Document Tree

document.innerWikitext or document.getWikitext() or document.documentElement.innerWikitext or document.documentElement.getWikitext()

These examples show some possibilities for obtaining resultant wikitext strings from DAL documents.

What do you think of these ideas?

Best regards, Adam

[1] https://dom.spec.whatwg.org/ [2] https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model [3] https://github.com/jsdom/jsdom#readme [4] https://developer.mozilla.org/en-US/docs/Web/API/Console_API [5] https://developer.mozilla.org/en-US/docs/Web/API/console

From: Adam Sobieskimailto:adamsobieski@hotmail.com Sent: Sunday, July 19, 2020 10:48 PM To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)mailto:abstract-wikipedia@lists.wikimedia.org Subject: Re: [Abstract-wikipedia] A Document Abstraction Layer

A DAL could include a means of attaching computational traces and diagnostic messages or information to specific portions of generated natural language output.

document.getElementsByTagName('paragraph').item(0).debug("this is a debugging message attached to a paragraph");

Best regards, Adam

Introduction

Feature 1: Provenance, Sources and Derivations

Furthermore, a DAL could encapsulate the naming of citation instances (<ref name="multiple">) [2] to ensure proper formatting of the references section when materials are cited multiple times.

Feature 2: Computational Traces and Diagnostics

As citations and references are implemented by means of an extension [1], this set of features might require a new MediaWiki extension.

Feature 3: Versioning a Document Abstraction Layer

Conclusion

Should natural language generation output directly to wikitext or instead make use of an abstraction layer? What do you think of these ideas?

Best regards, Adam Sobieski

Arthur Smith

21 Jul 21 Jul

1:55 a.m.

Hi Adam,

XML was designed to allow creation of structured documents, it sounds like you're describing a particular XML DTD, with some tools (javascript?) to turn it into wikitext. We're going to need some more structure than bare text/wikitext, maybe something like this, in the underlying abstract text (language-independent) articles. But I don't think we're close to the point of narrowing down exactly what that should look like yet! Maybe Denny has more of an idea for it though. But I wouldn't object to coming up with some standard XML DTD for the basic abstract text content.

Arthur

On Mon, Jul 20, 2020 at 8:31 AM Adam Sobieski adamsobieski@hotmail.com wrote:

...

The Document Abstraction Layer and the Document Object Model

A document abstraction layer (DAL) abstracts and models Wikipedia articles in a manner designed to: (1) convenience NLG development, (2) encapsulate the outputting of resultant wikitext strings.

As envisioned, the DAL extends the document object model (DOM) [1][2], adding certain functionalities with which to convenience NLG output scenarios, including, but not limited to: (1) adding provenance data to specific portions of articles, and (2) adding computational traces and diagnostic messages to specific portions of articles.

An envisioned, the DAL utilizes a simpler document model than that of HTML5, for example one comprised of: articles, sections, paragraphs, sentences, phrases and words.

Relevant to Node.js and to other JavaScript-based runtimes, the DOM is implemented in JavaScript in the JSDOM project [3].

Examples

Provenance, Sources and Derivations

document.getElementsByTagName('sentence').item(0).addSource(s);

document.getElementsByTagName('sentence').item(0).addDerivation(d);

These examples show how the DAL could provide NLG developers with convenience with respect to adding provenance data to specific portions of Wikipedia articles.

Computational Traces and Diagnostics

document.getElementsByTagName('paragraph').item(0).debug("this is a debugging message attached to a paragraph");

This example shows how functionality based on the console API [4][5] could be added to DAL elements so that NLG developers could easily attach informational messages, warning messages, error messages, log messages, debugging messages and computational traces to specific portions of Wikipedia articles.

The DAL would encapsulate the outputting of resultant wikitext strings including for “debug mode” and/or “developer mode” scenarios. Arguments for “debug mode” and/or “developer mode” outputting include:

It would be convenient when developing NLG software.

When responding to an anomaly, developers would be able to quickly

navigate to relevant source code from specific portions of NLG output. 3. Developers new to the Abstract Wikipedia project would be able to quickly learn about the codebase by viewing and understanding the processes of computation which generated portions of automatically-generated articles and navigating to relevant source code from specific portions of those articles.

Obtaining Wikitext for a Document Tree

document.innerWikitext

or document.getWikitext()

or document.documentElement.innerWikitext

or document.documentElement.getWikitext()

These examples show some possibilities for obtaining resultant wikitext strings from DAL documents.

What do you think of these ideas?

Best regards,

Adam

[1] https://dom.spec.whatwg.org/

[2] https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model

[3] https://github.com/jsdom/jsdom#readme

[4] https://developer.mozilla.org/en-US/docs/Web/API/Console_API

[5] https://developer.mozilla.org/en-US/docs/Web/API/console

*From: *Adam Sobieski adamsobieski@hotmail.com *Sent: *Sunday, July 19, 2020 10:48 PM *To: *General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda) abstract-wikipedia@lists.wikimedia.org *Subject: *Re: [Abstract-wikipedia] A Document Abstraction Layer

A DAL could include a means of attaching computational traces and diagnostic messages or information to specific portions of generated natural language output.

Exploring these concepts, resembling the W3C document object model (DOM) [6][7], we could have a scripting environment global element of document [8] for the output article DAL API. Also resembling the DOM, an interface like Element [9] could be defined and, pertinent to DAL scenarios, provide API resembling the console interface [2][3].

Each element [9] of a DAL document tree (e.g. word, phrase, sentence, paragraph, section) could provide console-like API [2][3] for its own separate diagnostic events and messages. There could be a parameter for toggling whether elements inherit particular diagnostic events and messages from their parent elements.

To an example, we could access the output article via a global document interface (resembling [6][7]), use some API to access the first paragraph of the output article, and then use an interface resembling console [2][3][4] upon that paragraph element:

document.getElementsByTagName('paragraph').item(0).debug("this is a debugging message attached to a paragraph");

Each element [9] of a DAL document tree could have its own console-like output [2][3]. A DAL would then encapsulate outputting arbitrarily intricate wikitext for developers to be able to view articles in a “debug mode” or “developer mode” so as to be able to interact with the computational traces and diagnostic messages relevant to specific portions of natural language content.

Best regards,

Adam

[1] https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.trace?view=ne...

[2] https://developer.mozilla.org/en-US/docs/Web/API/Console_API

[3] https://developer.mozilla.org/en-US/docs/Web/API/console

[4] https://developer.mozilla.org/en-US/docs/Web/API/Console/trace

[5] https://v8.dev/docs/stack-trace-api

[6] https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model

[7] https://dom.spec.whatwg.org/

[8] https://developer.mozilla.org/en-US/docs/Web/API/Document

[9] https://developer.mozilla.org/en-US/docs/Web/API/Element

*From: *Adam Sobieski adamsobieski@hotmail.com *Sent: *Sunday, July 19, 2020 4:37 AM *To: *General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda) abstract-wikipedia@lists.wikimedia.org *Subject: *[Abstract-wikipedia] A Document Abstraction Layer

Introduction

A *document abstraction layer* (DAL) could provide an API for natural language generation output while encapsulating a configurable and mechanical transformation to specific wikitext implementations. Natural language generation software could, then, output through a DAL to wikitext and this would facilitate certain conveniences and features.

Feature 1: Provenance, Sources and Derivations

A DAL could include a means for annotating portions of natural language with provenance data, sources or derivations. A DAL could provide an API for easily attaching provenance data to document elements (e.g. words, phrases, sentences, paragraphs, sections) and it would encapsulate handing the correct outputting of wikitext (e.g. <ref> and <references> elements) for output articles [1][2][3][4].

That is, one could attach provenance data, sources or derivations, to a particular sentence in a DAL object model and the DAL’s configurable, mechanical transformation to wikitext would encapsulate a specific implementation of outputting a <ref> element at the end of the sentence and ensuring that the automatically-generated wikitext article contained a <references> element in a “References” section.

Furthermore, a DAL could encapsulate the naming of citation instances (<ref name="multiple">) [2] to ensure proper formatting of the references section when materials are cited multiple times.

Feature 2: Computational Traces and Diagnostics

A DAL could include a means of attaching computational traces and diagnostic messages or information to specific portions of generated natural language output (e.g. words, phrases, sentences, paragraphs, sections).

While viewing articles which were automatically-generated in a “debug mode” or “developer mode”, developers could view and interact with content-specific computational traces or diagnostic messages or information in a number of ways (hoverboxes, margin notes, et cetera).

That is, a DAL could encapsulate outputting specific wikitext implementations for providing developers with a means of interacting with automatically-generated articles to view and to understand the computational processes involved in generating specific portions of natural language.

As citations and references are implemented by means of an extension [1], this set of features might require a new MediaWiki extension.

Feature 3: Versioning a Document Abstraction Layer

One could version a DAL, adding features to it, without impacting any existing natural language generation software code. One could also adjust specific output wikitext implementations for particular scenarios without impacting any natural language generation software code.

Conclusion

Should natural language generation output directly to wikitext or instead make use of an abstraction layer? What do you think of these ideas?

Best regards,

Adam Sobieski

[1] https://www.mediawiki.org/wiki/Extension:Cite

[2] https://www.mediawiki.org/wiki/Help:Cite

[3] https://en.wikipedia.org/wiki/Help:Cite_link_labels

[4] https://github.com/SemanticMediaWiki/SemanticCite/blob/master/README.md

Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Adam Sobieski

9:13 a.m.

Arthur,

Yes, I am describing a use of XML structured documents during NLG output phases such that resultant structured documents could then be algorithmically converted to wikitext strings, e.g. by JavaScript. I am hoping that producing structured documents during NLG output phases would convenience NLG software developers.

The NLG output features that I am describing could be implemented in pure XML using the standard DOM. The following example shows how both provenance data and console data could be added to a sentence element in an XML-based abstraction layer:

<article xmlns="..." xmlns:meta="..."> <section> <paragraph> <sentence> meta:provenance...</meta:provenance> meta:console...</meta:console> </sentence> </paragraph> </section> </article>

In my opinion, NLG software developers would tend to want convenience functions for dealing with per-element provenance data and per-element console outputs. In these regards, we could extend the Element API [1] or provide a DALElement interface to provide convenience functions to developers.

Best regards, Adam

[1] https://developer.mozilla.org/en-US/docs/Web/API/Element

From: Arthur Smithmailto:arthurpsmith@gmail.com Sent: Monday, July 20, 2020 7:56 PM To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)mailto:abstract-wikipedia@lists.wikimedia.org Subject: Re: [Abstract-wikipedia] A Document Abstraction Layer

Hi Adam,

Arthur

On Mon, Jul 20, 2020 at 8:31 AM Adam Sobieski <adamsobieski@hotmail.commailto:adamsobieski@hotmail.com> wrote: The Document Abstraction Layer and the Document Object Model

A document abstraction layer (DAL) abstracts and models Wikipedia articles in a manner designed to: (1) convenience NLG development, (2) encapsulate the outputting of resultant wikitext strings.

An envisioned, the DAL utilizes a simpler document model than that of HTML5, for example one comprised of: articles, sections, paragraphs, sentences, phrases and words.

Relevant to Node.js and to other JavaScript-based runtimes, the DOM is implemented in JavaScript in the JSDOM project [3].

Examples

Provenance, Sources and Derivations

document.getElementsByTagName('sentence').item(0).addSource(s); document.getElementsByTagName('sentence').item(0).addDerivation(d);

These examples show how the DAL could provide NLG developers with convenience with respect to adding provenance data to specific portions of Wikipedia articles.

Computational Traces and Diagnostics

document.getElementsByTagName('paragraph').item(0).debug("this is a debugging message attached to a paragraph");

1.It would be convenient when developing NLG software. 2.When responding to an anomaly, developers would be able to quickly navigate to relevant source code from specific portions of NLG output. 3.Developers new to the Abstract Wikipedia project would be able to quickly learn about the codebase by viewing and understanding the processes of computation which generated portions of automatically-generated articles and navigating to relevant source code from specific portions of those articles.

Obtaining Wikitext for a Document Tree

document.innerWikitext or document.getWikitext() or document.documentElement.innerWikitext or document.documentElement.getWikitext()

These examples show some possibilities for obtaining resultant wikitext strings from DAL documents.

What do you think of these ideas?

Best regards, Adam

A DAL could include a means of attaching computational traces and diagnostic messages or information to specific portions of generated natural language output.

document.getElementsByTagName('paragraph').item(0).debug("this is a debugging message attached to a paragraph");

Best regards, Adam

Introduction

Feature 1: Provenance, Sources and Derivations

Furthermore, a DAL could encapsulate the naming of citation instances (<ref name="multiple">) [2] to ensure proper formatting of the references section when materials are cited multiple times.

Feature 2: Computational Traces and Diagnostics

As citations and references are implemented by means of an extension [1], this set of features might require a new MediaWiki extension.

Feature 3: Versioning a Document Abstraction Layer

Conclusion

Should natural language generation output directly to wikitext or instead make use of an abstraction layer? What do you think of these ideas?

Best regards, Adam Sobieski

_______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.orgmailto:Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

1591

Age (days ago)

1593

Last active (days ago)

abstract-wikipedia@lists.wikimedia.org

4 comments

2 participants

tags (0)

participants (2)

Adam Sobieski
Arthur Smith