Re: [Abstract-wikipedia] A Document Abstraction Layer

20 Jul 2020

Hi Adam,

    XML was designed to allow creation of structured documents, it sounds
like you're describing a particular XML DTD, with some tools (javascript?)
to turn it into wikitext. We're going to need some more structure than bare
text/wikitext, maybe something like this, in the underlying abstract text
(language-independent) articles. But I don't think we're close to the point
of narrowing down exactly what that should look like yet! Maybe Denny has
more of an idea for it though. But I wouldn't object to coming up with some
standard XML DTD for the basic abstract text content.

   Arthur

On Mon, Jul 20, 2020 at 8:31 AM Adam Sobieski &lt;adamsobieski(a)hotmail.com&gt;
wrote:

...
  The Document Abstraction Layer and the Document Object
Model

 A document abstraction layer (DAL) abstracts and models Wikipedia articles
 in a manner designed to: (1) convenience NLG development, (2) encapsulate
 the outputting of resultant wikitext strings.

 As envisioned, the DAL extends the document object model (DOM) [1][2],
 adding certain functionalities with which to convenience NLG output
 scenarios, including, but not limited to: (1) adding provenance data to
 specific portions of articles, and (2) adding computational traces and
 diagnostic messages to specific portions of articles.

 An envisioned, the DAL utilizes a simpler document model than that of
 HTML5, for example one comprised of: articles, sections, paragraphs,
 sentences, phrases and words.

 Relevant to Node.js and to other JavaScript-based runtimes, the DOM is
 implemented in JavaScript in the JSDOM project [3].

 Examples

 Provenance, Sources and Derivations

 document.getElementsByTagName('sentence').item(0).addSource(s);

 document.getElementsByTagName('sentence').item(0).addDerivation(d);

 These examples show how the DAL could provide NLG developers with
 convenience with respect to adding provenance data to specific portions of
 Wikipedia articles.

 Computational Traces and Diagnostics

 document.getElementsByTagName('paragraph').item(0).debug("this is a
 debugging message attached to a paragraph");

 This example shows how functionality based on the console API [4][5] could
 be added to DAL elements so that NLG developers could easily attach
 informational messages, warning messages, error messages, log messages,
 debugging messages and computational traces to specific portions of
 Wikipedia articles.

 The DAL would encapsulate the outputting of resultant wikitext strings
 including for “debug mode” and/or “developer mode” scenarios. Arguments for
 “debug mode” and/or “developer mode” outputting include:

    1. It would be convenient when developing NLG software.
    2. When responding to an anomaly, developers would be able to quickly
    navigate to relevant source code from specific portions of NLG output.
    3. Developers new to the Abstract Wikipedia project would be able to
    quickly learn about the codebase by viewing and understanding the processes
    of computation which generated portions of automatically-generated articles
    and navigating to relevant source code from specific portions of those
    articles.

 Obtaining Wikitext for a Document Tree

 document.innerWikitext

 or document.getWikitext()

 or document.documentElement.innerWikitext

 or document.documentElement.getWikitext()

 These examples show some possibilities for obtaining resultant wikitext
 strings from DAL documents.

 What do you think of these ideas?

 Best regards,

 Adam

 [1] https://dom.spec.whatwg.org/

 [2] https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model

 [3] https://github.com/jsdom/jsdom#readme

 [4] https://developer.mozilla.org/en-US/docs/Web/API/Console_API

 [5] https://developer.mozilla.org/en-US/docs/Web/API/console

 *From: *Adam Sobieski &lt;adamsobieski(a)hotmail.com&gt;
 *Sent: *Sunday, July 19, 2020 10:48 PM
 *To: *General public mailing list for the discussion of Abstract
 Wikipedia (aka Wikilambda) &lt;abstract-wikipedia(a)lists.wikimedia.org&gt;
 *Subject: *Re: [Abstract-wikipedia] A Document Abstraction Layer

 A DAL could include a means of attaching computational traces and
 diagnostic messages or information to specific portions of generated
 natural language output.

 Exploring these concepts, resembling the W3C document object model (DOM)
 [6][7], we could have a scripting environment global element of document
 [8] for the output article DAL API. Also resembling the DOM, an interface
 like Element [9] could be defined and, pertinent to DAL scenarios,
 provide API resembling the console interface [2][3].

 Each element [9] of a DAL document tree (e.g. word, phrase, sentence,
 paragraph, section) could provide console-like API [2][3] for its own
 separate diagnostic events and messages. There could be a parameter for
 toggling whether elements inherit particular diagnostic events and messages
 from their parent elements.

 To an example, we could access the output article via a global document
 interface (resembling [6][7]), use some API to access the first paragraph
 of the output article, and then use an interface resembling console
 [2][3][4] upon that paragraph element:

 document.getElementsByTagName('paragraph').item(0).debug("this is a
 debugging message attached to a paragraph");

 Each element [9] of a DAL document tree could have its own console-like
 output [2][3]. A DAL would then encapsulate outputting arbitrarily
 intricate wikitext for developers to be able to view articles in a “debug
 mode” or “developer mode” so as to be able to interact with the
 computational traces and diagnostic messages relevant to specific portions
 of natural language content.

 Best regards,

 Adam

 [1]
 https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.trace?view=n…

 [2] https://developer.mozilla.org/en-US/docs/Web/API/Console_API

 [3] https://developer.mozilla.org/en-US/docs/Web/API/console

 [4] https://developer.mozilla.org/en-US/docs/Web/API/Console/trace

 [5] https://v8.dev/docs/stack-trace-api

 [6] https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model

 [7] https://dom.spec.whatwg.org/

 [8] https://developer.mozilla.org/en-US/docs/Web/API/Document

 [9] https://developer.mozilla.org/en-US/docs/Web/API/Element

 *From: *Adam Sobieski &lt;adamsobieski(a)hotmail.com&gt;
 *Sent: *Sunday, July 19, 2020 4:37 AM
 *To: *General public mailing list for the discussion of Abstract
 Wikipedia (aka Wikilambda) &lt;abstract-wikipedia(a)lists.wikimedia.org&gt;
 *Subject: *[Abstract-wikipedia] A Document Abstraction Layer

 Introduction

 A *document abstraction layer* (DAL) could provide an API for natural
 language generation output while encapsulating a configurable and
 mechanical transformation to specific wikitext implementations. Natural
 language generation software could, then, output through a DAL to wikitext
 and this would facilitate certain conveniences and features.

 Feature 1: Provenance, Sources and Derivations

 A DAL could include a means for annotating portions of natural language
 with provenance data, sources or derivations. A DAL could provide an API
 for easily attaching provenance data to document elements (e.g. words,
 phrases, sentences, paragraphs, sections) and it would encapsulate handing
 the correct outputting of wikitext (e.g. <ref> and <references> elements)
 for output articles [1][2][3][4].

 That is, one could attach provenance data, sources or derivations, to a
 particular sentence in a DAL object model and the DAL’s configurable,
 mechanical transformation to wikitext would encapsulate a specific
 implementation of outputting a <ref> element at the end of the sentence
 and ensuring that the automatically-generated wikitext article contained a
 <references> element in a “References” section.

 Furthermore, a DAL could encapsulate the naming of citation instances (<ref
 name="multiple">) [2] to ensure proper formatting of the references
 section when materials are cited multiple times.

 Feature 2: Computational Traces and Diagnostics

 A DAL could include a means of attaching computational traces and
 diagnostic messages or information to specific portions of generated
 natural language output (e.g. words, phrases, sentences, paragraphs,
 sections).

 While viewing articles which were automatically-generated in a “debug
 mode” or “developer mode”, developers could view and interact with
 content-specific computational traces or diagnostic messages or information
 in a number of ways (hoverboxes, margin notes, et cetera).

 That is, a DAL could encapsulate outputting specific wikitext
 implementations for providing developers with a means of interacting with
 automatically-generated articles to view and to understand the
 computational processes involved in generating specific portions of natural
 language.

 As citations and references are implemented by means of an extension [1],
 this set of features might require a new MediaWiki extension.

 Feature 3: Versioning a Document Abstraction Layer

 One could version a DAL, adding features to it, without impacting any
 existing natural language generation software code. One could also adjust
 specific output wikitext implementations for particular scenarios without
 impacting any natural language generation software code.

 Conclusion

 Should natural language generation output directly to wikitext or instead
 make use of an abstraction layer? What do you think of these ideas?

 Best regards,

 Adam Sobieski

 [1] https://www.mediawiki.org/wiki/Extension:Cite

 [2] https://www.mediawiki.org/wiki/Help:Cite

 [3] https://en.wikipedia.org/wiki/Help:Cite_link_labels

 [4]
 https://github.com/SemanticMediaWiki/SemanticCite/blob/master/README.md

 _______________________________________________
 Abstract-Wikipedia mailing list
 Abstract-Wikipedia(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

2024

2023

2022

2021

2020

Re: [Abstract-wikipedia] A Document Abstraction Layer