A DAL could include a means of attaching computational traces and diagnostic messages or
information to specific portions of generated natural language output.
Exploring these concepts, resembling the W3C document object model (DOM) [6][7], we could
have a scripting environment global element of document [8] for the output article DAL
API. Also resembling the DOM, an interface like Element [9] could be defined and,
pertinent to DAL scenarios, provide API resembling the console interface [2][3].
Each element [9] of a DAL document tree (e.g. word, phrase, sentence, paragraph, section)
could provide console-like API [2][3] for its own separate diagnostic events and messages.
There could be a parameter for toggling whether elements inherit particular diagnostic
events and messages from their parent elements.
To an example, we could access the output article via a global document interface
(resembling [6][7]), use some API to access the first paragraph of the output article, and
then use an interface resembling console [2][3][4] upon that paragraph element:
document.getElementsByTagName('paragraph').item(0).debug("this is a debugging
message attached to a paragraph");
Each element [9] of a DAL document tree could have its own console-like output [2][3]. A
DAL would then encapsulate outputting arbitrarily intricate wikitext for developers to be
able to view articles in a “debug mode” or “developer mode” so as to be able to interact
with the computational traces and diagnostic messages relevant to specific portions of
natural language content.
Best regards,
Adam
[1]
https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.trace?view=n…
[2]
https://developer.mozilla.org/en-US/docs/Web/API/Console_API
[3]
https://developer.mozilla.org/en-US/docs/Web/API/console
[4]
https://developer.mozilla.org/en-US/docs/Web/API/Console/trace
[5]
https://v8.dev/docs/stack-trace-api
[6]
https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model
[7]
https://dom.spec.whatwg.org/
[8]
https://developer.mozilla.org/en-US/docs/Web/API/Document
[9]
https://developer.mozilla.org/en-US/docs/Web/API/Element
From: Adam Sobieski<mailto:adamsobieski@hotmail.com>
Sent: Sunday, July 19, 2020 4:37 AM
To: General public mailing list for the discussion of Abstract Wikipedia (aka
Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org>
Subject: [Abstract-wikipedia] A Document Abstraction Layer
Introduction
A document abstraction layer (DAL) could provide an API for natural language generation
output while encapsulating a configurable and mechanical transformation to specific
wikitext implementations. Natural language generation software could, then, output through
a DAL to wikitext and this would facilitate certain conveniences and features.
Feature 1: Provenance, Sources and Derivations
A DAL could include a means for annotating portions of natural language with provenance
data, sources or derivations. A DAL could provide an API for easily attaching provenance
data to document elements (e.g. words, phrases, sentences, paragraphs, sections) and it
would encapsulate handing the correct outputting of wikitext (e.g. <ref> and
<references> elements) for output articles [1][2][3][4].
That is, one could attach provenance data, sources or derivations, to a particular
sentence in a DAL object model and the DAL’s configurable, mechanical transformation to
wikitext would encapsulate a specific implementation of outputting a <ref> element
at the end of the sentence and ensuring that the automatically-generated wikitext article
contained a <references> element in a “References” section.
Furthermore, a DAL could encapsulate the naming of citation instances (<ref
name="multiple">) [2] to ensure proper formatting of the references section
when materials are cited multiple times.
Feature 2: Computational Traces and Diagnostics
A DAL could include a means of attaching computational traces and diagnostic messages or
information to specific portions of generated natural language output (e.g. words,
phrases, sentences, paragraphs, sections).
While viewing articles which were automatically-generated in a “debug mode” or “developer
mode”, developers could view and interact with content-specific computational traces or
diagnostic messages or information in a number of ways (hoverboxes, margin notes, et
cetera).
That is, a DAL could encapsulate outputting specific wikitext implementations for
providing developers with a means of interacting with automatically-generated articles to
view and to understand the computational processes involved in generating specific
portions of natural language.
As citations and references are implemented by means of an extension [1], this set of
features might require a new MediaWiki extension.
Feature 3: Versioning a Document Abstraction Layer
One could version a DAL, adding features to it, without impacting any existing natural
language generation software code. One could also adjust specific output wikitext
implementations for particular scenarios without impacting any natural language generation
software code.
Conclusion
Should natural language generation output directly to wikitext or instead make use of an
abstraction layer? What do you think of these ideas?
Best regards,
Adam Sobieski
[1]
https://www.mediawiki.org/wiki/Extension:Cite
[2]
https://www.mediawiki.org/wiki/Help:Cite
[3]
https://en.wikipedia.org/wiki/Help:Cite_link_labels
[4]
https://github.com/SemanticMediaWiki/SemanticCite/blob/master/README.md