Introduction

 

A document abstraction layer (DAL) could provide an API for natural language generation output while encapsulating a configurable and mechanical transformation to specific wikitext implementations. Natural language generation software could, then, output through a DAL to wikitext and this would facilitate certain conveniences and features.

 

Feature 1: Provenance, Sources and Derivations

 

A DAL could include a means for annotating portions of natural language with provenance data, sources or derivations. A DAL could provide an API for easily attaching provenance data to document elements (e.g. words, phrases, sentences, paragraphs, sections) and it would encapsulate handing the correct outputting of wikitext (e.g. <ref> and <references> elements) for output articles [1][2][3][4].

 

That is, one could attach provenance data, sources or derivations, to a particular sentence in a DAL object model and the DAL’s configurable, mechanical transformation to wikitext would encapsulate a specific implementation of outputting a <ref> element at the end of the sentence and ensuring that the automatically-generated wikitext article contained a <references> element in a “References” section.

 

Furthermore, a DAL could encapsulate the naming of citation instances (<ref name="multiple">) [2] to ensure proper formatting of the references section when materials are cited multiple times.

 

Feature 2: Computational Traces and Diagnostics

 

A DAL could include a means of attaching computational traces and diagnostic messages or information to specific portions of generated natural language output (e.g. words, phrases, sentences, paragraphs, sections).

 

While viewing articles which were automatically-generated in a “debug mode” or “developer mode”, developers could view and interact with content-specific computational traces or diagnostic messages or information in a number of ways (hoverboxes, margin notes, et cetera).

 

That is, a DAL could encapsulate outputting specific wikitext implementations for providing developers with a means of interacting with automatically-generated articles to view and to understand the computational processes involved in generating specific portions of natural language.

 

As citations and references are implemented by means of an extension [1], this set of features might require a new MediaWiki extension.

 

Feature 3: Versioning a Document Abstraction Layer

 

One could version a DAL, adding features to it, without impacting any existing natural language generation software code. One could also adjust specific output wikitext implementations for particular scenarios without impacting any natural language generation software code.

 

Conclusion

 

Should natural language generation output directly to wikitext or instead make use of an abstraction layer? What do you think of these ideas?

 

 

Best regards,

Adam Sobieski

 

[1] https://www.mediawiki.org/wiki/Extension:Cite

[2] https://www.mediawiki.org/wiki/Help:Cite

[3] https://en.wikipedia.org/wiki/Help:Cite_link_labels

[4] https://github.com/SemanticMediaWiki/SemanticCite/blob/master/README.md