I would like to broach, for discussion, the generation of natural language and mathematics for Abstract Wikipedia. Regardless of the eventual natural language generation approaches, it seems desirable to be able to include mathematics in automatically-generated encyclopedia articles.
In the thread: A Document Abstraction Layer [1], it was mentioned that natural language generation algorithms could output to, instead of text strings, a custom XML format which could then be mechanically and configurably converted into intricate wikitext.
That custom XML could resemble:
<article xmlns="..." xmlns:meta="...">
<head>...</head>
<body>
<section>
<head>...</head>
<body>
<paragraph>
<sentence>
<head>
meta:provenance...</meta:provenance>
meta:console...</meta:console>
</head>
<body>Next, consider the variable <math latex="x" />.</body>
</sentence>
</paragraph>
</body>
</section>
</body>
</article>
or:
<article xmlns="..." xmlns:meta="...">
<head>...</head>
<body>
<section>
<head>...</head>
<body>
<paragraph>
<sentence>
<head>
meta:provenance...</meta:provenance>
meta:console...</meta:console>
</head>
<body>Next, consider the variable <math>x</math>.</body>
</sentence>
</paragraph>
</body>
</section>
</body>
</article>
A <math> element could be of use for expressing mathematical notations in natural language articles. A <math> element with LaTeX syntax could simplify the complex matter of outputting mathematics into wikitext [2].
What do you think?
Best regards, Adam
[1] https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000151.ht... [2] https://en.wikipedia.org/wiki/Wikipedia:Rendering_math
On 03 August 2020 at 01:51 Adam Sobieski adamsobieski@hotmail.com wrote:
A <math> element could be of use for expressing mathematical notations in natural language articles. A <math> element with LaTeX syntax could simplify the complex matter of outputting mathematics into wikitext [2]. What do you think?
Judging by the articles MathML and MathJax on Wikipedia, it seems premature to come to a decision. English Wikipedia, with a great deal of legacy code, has fallen back onto another solution. Browser dependence has had a big effect on the issue.
[1] https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000151.html https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000151.html [2] https://en.wikipedia.org/wiki/Wikipedia:Rendering_math https://en.wikipedia.org/wiki/Wikipedia:Rendering_math
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Relevant previous work in this area includes: WebALT [1] and GF-Alfa [2]. Both were built using Grammatical Framework technology.
By utilizing <math>LaTeX</math> elements in an XML-based intermediate output format, one could simply copy that mathematical content to the resultant output wikitext [3]. Wikitext utilizes this same convention for mathematical expressions [3].
Whether or not to include mathematics in Abstract Wikipedia is an important decision to make at a future point. Choosing to include mathematics would entail discussions about representing mathematical knowledge on Wikidata. It would entail discussions about how specific senses of certain words have mathematical meaning. It would entail discussions about how algorithms should determine when to use mathematical and scientific notations and when they should, instead, use paraphrases with the semantic content expressed using natural language. These are just some of the discussion topics which would arise should we desire to include mathematical and scientific notations in Abstract Wikipedia articles.
Best regards, Adam
[1] http://mathdox.org/new-web/projects/webalt.html [2] http://cth.altocumulus.org/~hallgren/Alfa/Tutorial/GFplugin.html [3] https://en.wikipedia.org/wiki/Help:Displaying_a_formula
From: Adam Sobieskimailto:adamsobieski@hotmail.com Sent: Sunday, August 2, 2020 8:51 PM To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)mailto:abstract-wikipedia@lists.wikimedia.org Subject: [Abstract-wikipedia] Natural Language and Mathematics Generation
I would like to broach, for discussion, the generation of natural language and mathematics for Abstract Wikipedia. Regardless of the eventual natural language generation approaches, it seems desirable to be able to include mathematics in automatically-generated encyclopedia articles.
In the thread: A Document Abstraction Layer [1], it was mentioned that natural language generation algorithms could output to, instead of text strings, a custom XML format which could then be mechanically and configurably converted into intricate wikitext.
That custom XML could resemble:
<article xmlns="..." xmlns:meta="...">
<head>...</head>
<body>
<section>
<head>...</head>
<body>
<paragraph>
<sentence>
<head>
meta:provenance...</meta:provenance>
meta:console...</meta:console>
</head>
<body>Next, consider the variable <math latex="x" />.</body>
</sentence>
</paragraph>
</body>
</section>
</body>
</article>
or:
<article xmlns="..." xmlns:meta="...">
<head>...</head>
<body>
<section>
<head>...</head>
<body>
<paragraph>
<sentence>
<head>
meta:provenance...</meta:provenance>
meta:console...</meta:console>
</head>
<body>Next, consider the variable <math>x</math>.</body>
</sentence>
</paragraph>
</body>
</section>
</body>
</article>
A <math> element could be of use for expressing mathematical notations in natural language articles. A <math> element with LaTeX syntax could simplify the complex matter of outputting mathematics into wikitext [2].
What do you think?
Best regards, Adam
[1] https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000151.ht... [2] https://en.wikipedia.org/wiki/Wikipedia:Rendering_math
On 03 August 2020 at 16:50 Adam Sobieski adamsobieski@hotmail.com wrote:
By utilizing <math>LaTeX</math> elements in an XML-based intermediate output format, one could simply copy that mathematical content to the resultant output wikitext [3]. Wikitext utilizes this same convention for mathematical expressions [3]. Whether or not to include mathematics in Abstract Wikipedia is an important decision to make at a future point. Choosing to include mathematics would entail discussions about representing mathematical knowledge on Wikidata. It would entail discussions about how specific senses of certain words have mathematical meaning. It would entail discussions about how algorithms should determine when to use mathematical and scientific notations and when they should, instead, use paraphrases with the semantic content expressed using natural language. These are just some of the discussion topics which would arise should we desire to include mathematical and scientific notations in Abstract Wikipedia articles.
I'm disagreeing with much of this.
On LaTeX: while it is "industry standard", I'd like to draw attention to a point made in https://en.wikipedia.org/wiki/Help:Displaying_a_formula#Rendering: "Latex does not have full support for Unicode characters, and not all characters render."
It goes on to suggest that Vietnamese, for example, would not be well catered for, in terms of its diacritics.
I appreciate that we are only talking currently about scoping, and high-level initial planning. But given AW's objectives, this is not a good sign, and I don't think we should just assume that LaTeX as an incumbent gets waved through. It is pre-Web, and something closer to HTML would be preferable, in my view.
My background is in mathematics, and began my Wikipedia career writing mathematics articles. There are certainly issues, such as prose/notation balance. Mathematical language is heavily overloaded, from the disambiguation aspect. But I'm not really recognising the landscape of issues set out there.
Charles
Charles,
There is also MathML to consider. Work is underway at the W3C with respect to a new version of MathML, MathML4 [1][2]. Work is underway with respect to adding MathML support to Chromium [3][4].
Instead of LaTeX, MathML could be the way to go.
Best regards, Adam
[1] https://www.w3.org/community/mathml4/ [2] https://mathml-refresh.github.io/mathml/ [3] https://www.chromestatus.com/feature/5240822173794304 [4] https://mathml.igalia.com/
From: Charles Matthews via Abstract-Wikipediamailto:abstract-wikipedia@lists.wikimedia.org Sent: Monday, August 3, 2020 1:53 PM To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)mailto:abstract-wikipedia@lists.wikimedia.org Subject: Re: [Abstract-wikipedia] Natural Language and Mathematics Generation
On 03 August 2020 at 16:50 Adam Sobieski adamsobieski@hotmail.com wrote:
By utilizing <math>LaTeX</math> elements in an XML-based intermediate output format, one could simply copy that mathematical content to the resultant output wikitext [3]. Wikitext utilizes this same convention for mathematical expressions [3].
Whether or not to include mathematics in Abstract Wikipedia is an important decision to make at a future point. Choosing to include mathematics would entail discussions about representing mathematical knowledge on Wikidata. It would entail discussions about how specific senses of certain words have mathematical meaning. It would entail discussions about how algorithms should determine when to use mathematical and scientific notations and when they should, instead, use paraphrases with the semantic content expressed using natural language. These are just some of the discussion topics which would arise should we desire to include mathematical and scientific notations in Abstract Wikipedia articles.
I'm disagreeing with much of this.
On LaTeX: while it is "industry standard", I'd like to draw attention to a point made in https://en.wikipedia.org/wiki/Help:Displaying_a_formula#Rendering: "Latex does not have full support for Unicode characters, and not all characters render."
It goes on to suggest that Vietnamese, for example, would not be well catered for, in terms of its diacritics.
I appreciate that we are only talking currently about scoping, and high-level initial planning. But given AW's objectives, this is not a good sign, and I don't think we should just assume that LaTeX as an incumbent gets waved through. It is pre-Web, and something closer to HTML would be preferable, in my view.
My background is in mathematics, and began my Wikipedia career writing mathematics articles. There are certainly issues, such as prose/notation balance. Mathematical language is heavily overloaded, from the disambiguation aspect. But I'm not really recognising the landscape of issues set out there.
Charles
abstract-wikipedia@lists.wikimedia.org