On 11-08-09 05:03 PM, John Elliot wrote:
On 10/08/2011 9:49 AM, Daniel Friesen wrote:
WikiText is loose so instead of errors, if the parser doesn't like something you inputted it's not going to pass that through raw and let a html validator say it's wrong, it's going to decide it doesn't like it and treat it as plaintext.
Well, the validation feature that I added to my web-site helped me catch a bug for you.
If you are outputting WikiText that includes the HTML-like <h1>, <h2>, etc., tags, then make sure you're not outputting them in the context of table content, because that is invalid. In order to turn such WikiText into compliant HTML, the <h1> WikiText should be converted to a <span class="h1"> HTML element, and so forth. The various skins should be updated to do something sensible with the h* classes.
<h#> tags are not invalid inside of table contents. <tr>'s contents are flow content, and <h#> tags are flow content.
<h#> tags are however invalid inside of <th> tags which are phrasing content. However in that context the correct thing would not necessarily be to turn the h# into a span, but fold it into the header that's already there. Which may or may not be what the user wants. Both of those changes can break a user's site styles.
Would you like to argue for a $wgStricterParsing bool that will sacrifice parser output consistency for things like folding == headers into parent th's (perhaps turn into a span if they explicitly use a <h#> instead of ==), and other things we haven't been able to do to the parser for compat reasons?
I'll let you know if my HTML validator helps me to easily catch any other bugs like this for you.
We've already established that MediaWiki is broken because it's outputting empty <ul> elements, so maybe you can have a look at fixing that up too.
That was a HTML4/XHTML1 rule that's been removed. An empty <ul></ul> is valid HTML5. Wikipedia is just currently set to output an XHTML DOCTYPE and well-formed XML output because of some bots that still use screen-scraping content that were given a second chance to have their developers fix them to use the api before HTML5 is turned on permanently.
Thanks.
John.