Bawolff,
Thank you. I just checked out the parsoid output and do see nested <section>’s.
Brainstorming, what do you think about wikitext syntax for putting class, style, or other
attributes on sections?
Best regards,
Adam
From: Brian Wolff<mailto:bawolff@gmail.com>
Sent: Tuesday, January 11, 2022 10:51 PM
To: Wikitech-l<mailto:wikitech-l@lists.wikimedia.org>
Subject: [Wikitech-l] Re: Wikitext, Document Models, and HTML5 Output
Have you seen the html structure of parsoid?
E.g.
https://en.wikipedia.org/api/rest_v1/page/html/Dog
--
Bawolff
On Monday, January 10, 2022, Adam Sobieski
<adamsobieski@hotmail.com<mailto:adamsobieski@hotmail.com>> wrote:
Wikitech-l,
Hello. I have a question about the HTML output of wiki parsers. I wonder about how simple
or complex that it would be for a wiki parser to output, instead of a flat document
structure inside of a <div> element, an <article> element containing nested
<section> elements?
Recently, in the Community Wishlist Survey
Sandbox<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey/Sandbo…ox>, the
speech synthesis of Wikipedia
articles<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey/Sandb…
was broached. The proposer of these ideas indicated that, for best results, some content,
e.g., “See also” sections, should not be synthesized.
In response to these interesting ideas, I mentioned some ideas from EPUB, referencing
pronunciation lexicons from
HTML<https://www.w3.org/publishing/epub3/epub-contentdocs.html#sec-pls&g… and SSML
attributes in
HTML<https://www.w3.org/publishing/epub3/epub-contentdocs.html#sec-xhtml…ib>,
the CSS Speech
Module<https://www.w3.org/TR/css-speech-1/>1/>, and that output HTML
content could be styled using the CSS Speech Module’s speak property.
In these regards, I started thinking about how one might extend wikitext syntax to be able
to style sections, e.g.,:
== See also == {style="speak:never"}
Next, I inspected the HTML of some Wikipedia articles and realized that, due to the
structure of the output HTML documents, it isn’t simple to style or to add attributes to
sections. There are only <h2>, <h3>, <h4> (et cetera) elements inside of
a containing <div> element; sections are not yet structured elements.
The gist is that, instead of outputting HTML like:
<div class="mw-parser-output">
<h2><span class="mw-headline"
id="Heading">Heading</span></h2>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<h3><span class="mw-headline"
id="Subheading">Subheading</span></h3>
<p>Paragraph 3</p>
<p>Paragraph 4</p>
</div>
could a wiki parser output HTML5 like:
<article class="mw-parser-output">
<section id="Heading">
<header><h2><span
class="mw-headline">Heading</span></h2></header>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<section id="Subheading">
<header><h3><span
class="mw-headline">Subheading</span></h3></header>
<p>Paragraph 3</p>
<p>Paragraph 4</p>
</section>
</section>
</article>
Initial thoughts regarding the latter HTML5 include that it is better structured, more
semantic, more styleable, and potentially more accessible. If there is any interest, I
could write up some lengthier discussion about one versus the other, why one might be
better – and more useful – than the other.
Is this the correct mailing list to discuss any of these wiki technology, wiki parsing,
wikitext, document model, and HTML5 output topics?
Best regards,
Adam