First: <section> tags are definitely on the near-future roadmap. There are
some issues with balancing tags (when the section deliberately has an
unclosed <div>) that are awaiting the completion of the parsoid transition,
but it will certainly happen.
WRT adding additional attributes/properties to certain constructs -- yes,
this is a somewhat pervasive issue with wikitext. List items and headings
are probably the biggest examples of "un-annotatable" constructs.
Typically folks hack around the issue by using literal <div> or <span> tags
in their markup ... but then that leads to the nesting issues described in
the second sentence above.
contains a discussion, originally
in the context of list item attributes, and in
it is proposed to have a
general-purpose `{{#attr}}` parser function which would attach itself to
the nearest containing HTML tag. More discussion of that proposal in
. I think that would
address your use case?
--scott
On Tue, Jan 11, 2022 at 6:38 PM Adam Sobieski <adamsobieski(a)hotmail.com>
wrote:
Wikitech-l,
Hello. I have a question about the HTML output of wiki parsers. I wonder
about how simple or complex that it would be for a wiki parser to output,
instead of a flat document structure inside of a <div> element, an
<article> element containing nested <section> elements?
Recently, in the Community Wishlist Survey Sandbox
<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey/Sandbox>, the
speech synthesis of Wikipedia articles
<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey/Sandbox#Spoken_articles>
was broached. The proposer of these ideas indicated that, for best results,
some content, e.g., “See also” sections, should not be synthesized.
In response to these interesting ideas, I mentioned some ideas from EPUB, referencing
pronunciation lexicons from HTML
<https://www.w3.org/publishing/epub3/epub-contentdocs.html#sec-pls> and SSML
attributes in HTML
<https://www.w3.org/publishing/epub3/epub-contentdocs.html#sec-xhtml-ssml-attrib>,
the CSS Speech Module <https://www.w3.org/TR/css-speech-1/>, and that
output HTML content could be styled using the CSS Speech Module’s speak
property.
In these regards, I started thinking about how one might extend wikitext
syntax to be able to style sections, e.g.,:
== See also == {style="speak:never"}
Next, I inspected the HTML of some Wikipedia articles and realized that,
due to the structure of the output HTML documents, it isn’t simple to style
or to add attributes to sections. There are only <h2>, <h3>, <h4> (et
cetera) elements inside of a containing <div> element; sections are not
yet structured elements.
The gist is that, instead of outputting HTML like:
<div class="mw-parser-output">
<h2><span class="mw-headline"
id="Heading">Heading</span></h2>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<h3><span class="mw-headline"
id="Subheading">Subheading</span></h3>
<p>Paragraph 3</p>
<p>Paragraph 4</p>
</div>
could a wiki parser output HTML5 like:
<article class="mw-parser-output">
<section id="Heading">
<header><h2><span
class="mw-headline">Heading</span></h2></header>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<section id="Subheading">
<header><h3><span
class="mw-headline">Subheading</span></h3></header>
<p>Paragraph 3</p>
<p>Paragraph 4</p>
</section>
</section>
</article>
Initial thoughts regarding the latter HTML5 include that it is better
structured, more semantic, more styleable, and potentially more accessible.
If there is any interest, I could write up some lengthier discussion about
one versus the other, why one might be better – and more useful – than the
other.
Is this the correct mailing list to discuss any of these wiki technology,
wiki parsing, wikitext, document model, and HTML5 output topics?
Best regards,
Adam
_______________________________________________
Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/