Wikitech-l,
Hello. I have a question about the HTML output of wiki parsers. I wonder about how simple or complex that it would be for a wiki parser to output, instead of a flat document structure inside of a <div> element, an <article> element containing nested <section> elements?
Recently, in the Community Wishlist Survey Sandbox<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey/Sandbox>, the speech synthesis of Wikipedia articles<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey/Sandbox#Spoken_ar…> was broached. The proposer of these ideas indicated that, for best results, some content, e.g., “See also” sections, should not be synthesized.
In response to these interesting ideas, I mentioned some ideas from EPUB, referencing pronunciation lexicons from HTML<https://www.w3.org/publishing/epub3/epub-contentdocs.html#sec-pls> and SSML attributes in HTML<https://www.w3.org/publishing/epub3/epub-contentdocs.html#sec-xhtml-ssml-at…>, the CSS Speech Module<https://www.w3.org/TR/css-speech-1/>, and that output HTML content could be styled using the CSS Speech Module’s speak property.
In these regards, I started thinking about how one might extend wikitext syntax to be able to style sections, e.g.,:
== See also == {style="speak:never"}
Next, I inspected the HTML of some Wikipedia articles and realized that, due to the structure of the output HTML documents, it isn’t simple to style or to add attributes to sections. There are only <h2>, <h3>, <h4> (et cetera) elements inside of a containing <div> element; sections are not yet structured elements.
The gist is that, instead of outputting HTML like:
<div class="mw-parser-output">
<h2><span class="mw-headline" id="Heading">Heading</span></h2>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<h3><span class="mw-headline" id="Subheading">Subheading</span></h3>
<p>Paragraph 3</p>
<p>Paragraph 4</p>
</div>
could a wiki parser output HTML5 like:
<article class="mw-parser-output">
<section id="Heading">
<header><h2><span class="mw-headline">Heading</span></h2></header>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<section id="Subheading">
<header><h3><span class="mw-headline">Subheading</span></h3></header>
<p>Paragraph 3</p>
<p>Paragraph 4</p>
</section>
</section>
</article>
Initial thoughts regarding the latter HTML5 include that it is better structured, more semantic, more styleable, and potentially more accessible. If there is any interest, I could write up some lengthier discussion about one versus the other, why one might be better – and more useful – than the other.
Is this the correct mailing list to discuss any of these wiki technology, wiki parsing, wikitext, document model, and HTML5 output topics?
Best regards,
Adam
If you are not an extension developer, you can safely ignore this message.
I am excited to announce that the Minerva Neue skin will be bundled with
MediaWiki in 1.38 per https://phabricator.wikimedia.org/T191743
The Minerva Neue skin powers the mobile site of Wikimedia projects, so it
makes sense to include this skin for 3rd party MediaWiki instances that
want a responsive but simplified experience.
The implication of this is that extension developers should be making sure
their code compatible with MinervaNeue.
Note, that Minerva Neue operates in two modes (a desktop or mobile mode)
depending on whether MobileFrontend is installed, however, MobileFrontend
is still not part of the MediaWiki bundle so this mode doesn't need to be
tested at this point.
You can tell if you are in Minerva desktop mode by testing in an incognito
window and verifying that you see a more (ellipsis) dropdown in the toolbar.
e.g. Test on https://en.wikipedia.org/wiki/Minerva?useskin=minerva NOT
https://en.m.wikipedia.org/wiki/Minerva?useskin=minerva
if you have any questions or concerns please feel free to reply to this
e-mail or the Phabricator ticket.
Thanks for reading!
Jon
🚂🌈Summary of 1.38.0-wmf.16 train deployment
This email is a summary of the Wikimedia production deployment of
1.38.0-wmf.16
*wmf.16* is in production across all wikis and I'll be handing the
conductor hat to Dan for *wmf.17* which starts rolling today.
- Conductor: Mukunda Modell
- Backup Conductor: Antoine "hashar" Musso
- Blocker Task: T293957 <https://phabricator.wikimedia.org/T293957>
- Current Status <https://versions.toolforge.org>
📈 Stats
Stats for this train compared to the last 5 trains.
- 483 patches ▁▃▂▂█
- 0 Rollbacks █▆▃▆▁
- 1 Days of delay ▆█▁▂▂
- 4 Blockers ▁█▇▅▄
🎉 Traintastic Folks 🎉 Thanks to folks who reported or resolved blockers:
- Samuel
- Urbanecm
- James D. Forrester
- Timo Tijhof
- Ed Sanders
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month—though we're going to have
them a week later this month. Come talk to us about anything related to
Wikimedia search, Wikidata Query Service, Wikimedia Commons Query Service,
etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, January 12th, 2022
Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET
& WAT
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Hope to talk to you in a week!
—Trey
Trey Jones
Staff Computational Linguist, Search Platform
Wikimedia Foundation
UTC–5 / EST
Hi!
I just published the first version of a Go package which provides
utilities for processing
Wikidata entities JSON dumps and Wikimedia Enterprise HTML dumps. It
processes them in parallel on multiple cores, so processing is rather
fast. I hope it will be useful to others, too.
https://gitlab.com/tozd/go/mediawiki
Any feedback is welcome.
Mitar
--
http://mitar.tnode.com/https://twitter.com/mitar_m