Thanks, Charles. That's a very fine piece of work! And so relevant, not just to the quiz use case but to the whole of the NLG side of our project. I wonder whether there have been any further developments. I don't see any links to that pdf on Google. I've put the link in our to-do list[1].

There's some good stuff in there about performance as well as data quality, which are both areas we should certainly be looking into.

On the question of data structure, I guess it rather depends where you are sitting. The essence of my quiZiverse idea is that the consumer handles a relatively small dataset (client side) because WMF servers are running the functions. Essentially, the result is a pretty manageable structured object (in the more collaborative mode, perhaps a ZObject) and it can be grown iteratively. Given those assumptions, it hardly matters which formats are desired as inputs to (for consumption within) an actual quiz.

Since I was envisaging passing out "links" that are essentially a pre-written call back to the function with different arguments, those potential call-backs could be queued for processing server-side, so that the eventual call-back is referencing a freshly minted structured object (with a fairly limited shelf-life, unless it's a refresh of a pre-existing ZObject). So, again, lots of options to explore on the technical side.

I'm inclined to disagree with you on the question of hints, though. The structure I was envisaging is very straightforward; the "distractor" is an answer to some other question, so the "hint" is just that question phrased as a statement. It could be more complicated, but it should still be a fairly simple connection to the next function call. Of course, "where next?" may depend on whether the question was answered correctly, or there might be a choice to be made, but I think that would still resolve quite simply into a single next call. The functionality is an interactive "crawler" [2], at the end of the day, with each "next step" deferred until required or pre-prepared if responsiveness might be an issue.

Keep it simple; iterate collaboratively; make it great!

Best regards,
Al.
[1] https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Related_and_previous_work/Natural_language_generation 
[2] https://en.wikipedia.org/wiki/Web_crawler

On Monday, 3 August 2020, <abstract-wikipedia-request@lists.wikimedia.org> wrote:
Send Abstract-Wikipedia mailing list submissions to
        abstract-wikipedia@lists.wikimedia.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
or, via email, send a message with subject or body 'help' to
        abstract-wikipedia-request@lists.wikimedia.org

You can reach the person managing the list at
        abstract-wikipedia-owner@lists.wikimedia.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Abstract-Wikipedia digest..."


Today's Topics:

   1. Re: Comprehension questions (Charles Matthews)
   2. Natural Language and Mathematics Generation (Adam Sobieski)
   3. Re: Natural Language and Mathematics Generation (Charles Matthews)
   4. Loose notes (Andy)


----------------------------------------------------------------------

Message: 1
Date: Sun, 2 Aug 2020 17:22:11 +0100 (BST)
From: Charles Matthews <charles.r.matthews@ntlworld.com>
To: "General public mailing list for the discussion of Abstract
        Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists.wikimedia.org>
Subject: Re: [Abstract-wikipedia] Comprehension questions
Message-ID: <29359200.234452.1596385332016@mail2.virginmedia.com>
Content-Type: text/plain; charset="utf-8"


> On 02 August 2020 at 16:16 Grounder UK <grounderuk@gmail.com> wrote:
>

>     But we don't just want the answer, we want a quiz! Equally, maybe we don't just want the question and the answer, we want some wrong answers and some tips.
>

It goes back to 2016, just to generate questions from Wikidata:

https://pub.tik.ee.ethz.ch/students/2016-FS/BA-2016-03.pdf

Technically the incorrect answers in multiple choice are called "distractors". Clearly this is a rather simple data structure to handle. Hints assume quite a bit more.

At the beginning of 2017, I decided to take seriously the suggestion (from Magnus Manske) that questions should be treated as structured data. I even suggested Wikidata should have a namespace for them (this didn't go down well). A road not taken then, and just as the Comprende! tool was finished I got diverted into a Wikimedian in Residence position. So much for that.

Anyone, one take on this is that AW output might be some kind of structured data, rather than the sectioned prose (+media files and tables and templated data) familiar from Wikipedia.

By the way, mathematics in wikitext has traditionally been a threefold mix of approaches (HTML, png, LaTex): not an elegant solution.

Charles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/abstract-wikipedia/attachments/20200802/967f4138/attachment-0001.html>

------------------------------

Message: 2
Date: Mon, 3 Aug 2020 00:51:06 +0000
From: Adam Sobieski <adamsobieski@hotmail.com>
To: "General public mailing list for the discussion of Abstract
        Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists.wikimedia.org>
Subject: [Abstract-wikipedia] Natural Language and Mathematics
        Generation
Message-ID:
        <CH2PR12MB418455F1C1E8479E06A6DDB4C54D0@CH2PR12MB4184.namprd12.prod.outlook.com>

Content-Type: text/plain; charset="utf-8"

I would like to broach, for discussion, the generation of natural language and mathematics for Abstract Wikipedia. Regardless of the eventual natural language generation approaches, it seems desirable to be able to include mathematics in automatically-generated encyclopedia articles.

In the thread: A Document Abstraction Layer [1], it was mentioned that natural language generation algorithms could output to, instead of text strings, a custom XML format which could then be mechanically and configurably converted into intricate wikitext.

That custom XML could resemble:


<article xmlns="..." xmlns:meta="...">

  <head>...</head>

  <body>

    <section>

      <head>...</head>

      <body>

        <paragraph>

          <sentence>

            <head>

              <meta:provenance>...</meta:provenance>

              <meta:console>...</meta:console>

            </head>

            <body>Next, consider the variable <math latex="x" />.</body>

          </sentence>

        </paragraph>

      </body>

    </section>

  </body>

</article>

or:


<article xmlns="..." xmlns:meta="...">

  <head>...</head>

  <body>

    <section>

      <head>...</head>

      <body>

        <paragraph>

          <sentence>

            <head>

              <meta:provenance>...</meta:provenance>

              <meta:console>...</meta:console>

            </head>

            <body>Next, consider the variable <math>x</math>.</body>

          </sentence>

        </paragraph>

      </body>

    </section>

  </body>

</article>

A <math> element could be of use for expressing mathematical notations in natural language articles. A <math> element with LaTeX syntax could simplify the complex matter of outputting mathematics into wikitext [2].

What do you think?


Best regards,
Adam

[1] https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000151.html
[2] https://en.wikipedia.org/wiki/Wikipedia:Rendering_math

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/abstract-wikipedia/attachments/20200803/e0963851/attachment-0001.html>

------------------------------

Message: 3
Date: Mon, 3 Aug 2020 07:02:20 +0100 (BST)
From: Charles Matthews <charles.r.matthews@ntlworld.com>
To: "General public mailing list for the discussion of Abstract
        Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists.wikimedia.org>
Subject: Re: [Abstract-wikipedia] Natural Language and Mathematics
        Generation
Message-ID: <1759645849.242731.1596434540716@mail2.virginmedia.com>
Content-Type: text/plain; charset="utf-8"


> On 03 August 2020 at 01:51 Adam Sobieski <adamsobieski@hotmail.com> wrote:
>
>
>     A <math> element could be of use for expressing mathematical notations in natural language articles. A <math> element with LaTeX syntax could simplify the complex matter of outputting mathematics into wikitext [2].
>
>     
>
>     What do you think?
>
>     
>

Judging by the articles MathML and MathJax on Wikipedia, it seems premature to come to a decision. English Wikipedia, with a great deal of legacy code, has fallen back onto another solution. Browser dependence has had a big effect on the issue.

>
>     
>
>     [1] https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000151.html https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000151.html
>
>     [2] https://en.wikipedia.org/wiki/Wikipedia:Rendering_math https://en.wikipedia.org/wiki/Wikipedia:Rendering_math
>
>     
>




> _______________________________________________
>     Abstract-Wikipedia mailing list
>     Abstract-Wikipedia@lists.wikimedia.org
>     https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/abstract-wikipedia/attachments/20200803/01e6c83e/attachment-0001.html>

------------------------------

Message: 4
Date: Mon, 3 Aug 2020 12:29:03 +0200
From: Andy <borucki.andrzej@gmail.com>
To: abstract-wikipedia@lists.wikimedia.org
Subject: [Abstract-wikipedia] Loose notes
Message-ID:
        <CAE2KeAK00kSL=jJp8gNGPNp_N8KGH0yXXUXKSa6XLM9R-ParvA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

Abstract Wikipedia give benefits:

- first, is creating multi-language corpus for machine translation
learning. The big disadvantage of the existing multi-language corpuses is
that most of data is from movie subtitles, which are very inaccurate.

- second, that it will data for Word Sense Disambiguation learning and WSD
in many languages(!).

In abstract form should be graph of senses. Senses will be choosed from
English Wordnet/UNL or English Wiktionary? UNL is piece of good work but is
inactive for years and not evolves. Wiktoinary senses have plus, that are
grouped by etymology – quite different senses are in other etymology group.
Abstract Wikipedia will linked with Wiktionary? Wiktionary senses numbers
should be now persistent , or better have unique idents. Wiktionary has
advantage that senses are translated to other languages, with disadvantage
that its points to words not senses in other language. Alternative Abstract
Wikipedia can have own sense list with idents but how to lik with
Wiktionary?

Graph: should be possibility to create text in many/all laguages. For
example in English is “I saw”, in Polish “widziałemwidziałam” – Polish need
gender, in Abstract form should be gender of verb, even though some
languages not uses it.

Senses dictionary can grow gradually with abstract text. If I edit abstract
text, editor should enforce me add word with senses to dictionary if not
exists and enable me to add new sense if not exists.

Is neede:

abstract text = corpus

growing dictionary of senses

growing senses to national language senses dictionary

possibly link with Wiktionaries


Best regards,

Andrzej
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/abstract-wikipedia/attachments/20200803/2075227b/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia


------------------------------

End of Abstract-Wikipedia Digest, Vol 2, Issue 4
************************************************