HTML5 and non valid attributes/elements of previous versions (bug 40329)

List overview All Threads
Download

newer

older

Bugzilla Weekly Report

Proposal Extension:Browserid...

Derk-Jan Hartman

19 Sep 2012 19 Sep '12

4:23 p.m.

I would like to open some discussion about https://bugzilla.wikimedia.org/show_bug.cgi?id=40329 This bug is about the fact that we currently do a 'partial' transform of the HTML5-invalid attribute 'align'.

We all agree that this is bad, what we need to figure out is what to do next:

1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)

My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.

Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).

So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ? <Discuss>

Show replies by date

Derric Atzrott

19 Sep 19 Sep

4:30 p.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

...

1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)

My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.

Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).

So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ?

Can't we do both 1 and 4. Remove it for now, fix the transform, and then re-enable the transform and disable the align attribute?

Thank you, Derric Atzrott

Derk-Jan Hartman

4:40 p.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

On Wed, Sep 19, 2012 at 6:30 PM, Derric Atzrott < datzrott@alizeepathology.com> wrote:

...

...
1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)

My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.

Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).

So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ?

Can't we do both 1 and 4. Remove it for now, fix the transform, and then re-enable the transform and disable the align attribute?

The problem there is that you need to target "direct child elements that are in block mode", which is rather hard you might be able to get away with targeting all direct child elements, but that would possibly wreck margins of inline elements.

Rob Lanphier

7:08 p.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

On Wed, Sep 19, 2012 at 9:23 AM, Derk-Jan Hartman d.j.hartman+wmf_ml@gmail.com wrote:

...

We all agree that this is bad, what we need to figure out is what to do next:

1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)

My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.

Agreed, let's do #1. It sounds like this isn't our only HTML5 validity problem. We shouldn't punish our readers with poorly-formatted content in the name of technical correctness.

Rob

Derric Atzrott

7:15 p.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

...

Agreed, let's do #1. It sounds like this isn't our only HTML5 validity problem. We shouldn't punish our readers with poorly-formatted content in the name of technical correctness.

Rob

Long term though I still think we should aim for technical correctness. Which is why I advocate #4 as well.

DJ would you mind explaining again (in different terms) why we can't do both #1 and #4 (#1 as a temporary measure while we achieve #4)? I don't think I quite understood your first explanation.

Thank you, Derric Atzrott

Derk-Jan Hartman

20 Sep 20 Sep

8:10 a.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

On Wed, Sep 19, 2012 at 9:15 PM, Derric Atzrott < datzrott@alizeepathology.com> wrote:

...

DJ would you mind explaining again (in different terms) why we can't do both #1 and #4 (#1 as a temporary measure while we achieve #4)? I don't think I quite understood your first explanation.

I'll try. Before HTML4, the "align" attribute (other than for 'table') with the value "center" meant "Center all of my content". Since the attribute has been removed from the spec, you need to replace it with CSS rules. Unfortunately however there are no CSS rules that are able to exactly reproduce the behavior of the attribute.

You have "text-align:center;" but this only applies to content that is inline. It does not center a div inside a table cell for instance, where align="center" would have done this. To get this behavior, you need to apply "margin-left:auto; margin-right:auto;" on the div.

So the pseudo CSS rules would be something like: table[align=center] { text-align: center; }

table[align=center] > *[display=block or table]{ margin-left: auto; margin-right: auto; }

The problem is that as far as I know, that last one is not possible with CSS. You cannot say: "apply this style to all direct children that are in block mode".

If you look at the internals of mozilla and webkit, then you will note they have the same problem (they also transform the align attribute). Therefore they have the following browser specific text-align attributes: -webkit-right, -moz-right, -webkit-center etc... which bypass the default behavior of text-align to apply to just inline elements to include block elements as well. How other browser vendors do this, I'm not sure of.

Derk-Jan Hartman

8:13 a.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

Small mistake in the example, I meant td instead of table.

td[align=center] { text-align: center; }

td[align=center] > *[display=block or table]{ margin-left: auto; margin-right: auto; }

On Thu, Sep 20, 2012 at 10:10 AM, Derk-Jan Hartman < d.j.hartman+wmf_ml@gmail.com> wrote:

...

On Wed, Sep 19, 2012 at 9:15 PM, Derric Atzrott < datzrott@alizeepathology.com> wrote:

...
DJ would you mind explaining again (in different terms) why we can't do both #1 and #4 (#1 as a temporary measure while we achieve #4)? I don't think I quite understood your first explanation.

I'll try. Before HTML4, the "align" attribute (other than for 'table') with the value "center" meant "Center all of my content". Since the attribute has been removed from the spec, you need to replace it with CSS rules. Unfortunately however there are no CSS rules that are able to exactly reproduce the behavior of the attribute.

You have "text-align:center;" but this only applies to content that is inline. It does not center a div inside a table cell for instance, where align="center" would have done this. To get this behavior, you need to apply "margin-left:auto; margin-right:auto;" on the div.

So the pseudo CSS rules would be something like: table[align=center] { text-align: center; }

table[align=center] > *[display=block or table]{ margin-left: auto; margin-right: auto; }

The problem is that as far as I know, that last one is not possible with CSS. You cannot say: "apply this style to all direct children that are in block mode".

If you look at the internals of mozilla and webkit, then you will note they have the same problem (they also transform the align attribute). Therefore they have the following browser specific text-align attributes: -webkit-right, -moz-right, -webkit-center etc... which bypass the default behavior of text-align to apply to just inline elements to include block elements as well. How other browser vendors do this, I'm not sure of.

DJ

Krinkle

11:12 p.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

On Sep 19, 2012, at 6:23 PM, Derk-Jan Hartman d.j.hartman+wmf_ml@gmail.com wrote:

...

I would like to open some discussion about https://bugzilla.wikimedia.org/show_bug.cgi?id=40329 This bug is about the fact that we currently do a 'partial' transform of the HTML5-invalid attribute 'align'.

We all agree that this is bad, what we need to figure out is what to do next:

1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)

My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.

Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).

So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ?

<Discuss>

I agree with others, #1 seems to be the best choice.

The W3C validator is not a visitor nor a user of the software. It's a useful tool to find problems, but as long as browsers are not standards compliant, and the W3C validator stays ignorant of that fact, we have very good reason to choose to optimize for real browsers, and not the hypothetical browser in the eyes of the validator.

The HTML output of the MediaWiki software is meant for users. Users that have browsers in front of them.

All relevant browsers support "align", regardless of whether the page is in HTML5 made.

Having said that, word shall be spread to users to stop using "align" and make layouts in CSS instead (through classes), which by design will make use of "align" impossible and require usage of text-align and margin instead.

Even if we could transform it correctly, I would oppose automatic transformation (be it from output-only in the parser, or by a bot changing the actual wikitext). Because the "align" attribute is a means to an end that has lots of implications and possible unintended side-effects. Contrary to text-align and margin, which are very specific and targeted at their purpose. By replacing a single align attribute with all kinds of inline styles the original intention of that align attribute will be lost at the cost of a lot of bloat in the output that we don't really need anyway.

-- Krinkle

Jon Robson

21 Sep 21 Sep

1:19 a.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

To solve validness I'd suggest creating styles for this in MediaWiki:Common.css and on a regular basis running reports to surface which articles use the text-align property. It would be great to have a dedicated wiki page linking to these articles and asking editors to fix them. It would give people who care about Wikipedia an easy way to contribute.

I have a similar problem in mobile - at some point I'd like us to deprecate use of the style attribute in wikitext in favour of using stylesheets and the class attribute which is much more manageable and would be interested in whatever solution you come to here.

On Thu, Sep 20, 2012 at 4:12 PM, Krinkle krinklemail@gmail.com wrote:

...

On Sep 19, 2012, at 6:23 PM, Derk-Jan Hartman d.j.hartman+wmf_ml@gmail.com wrote:

...
I would like to open some discussion about https://bugzilla.wikimedia.org/show_bug.cgi?id=40329 This bug is about the fact that we currently do a 'partial' transform of the HTML5-invalid attribute 'align'.

We all agree that this is bad, what we need to figure out is what to do next:

1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)

My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.

Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).

So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ?

<Discuss>

I agree with others, #1 seems to be the best choice.

The W3C validator is not a visitor nor a user of the software. It's a useful tool to find problems, but as long as browsers are not standards compliant, and the W3C validator stays ignorant of that fact, we have very good reason to choose to optimize for real browsers, and not the hypothetical browser in the eyes of the validator.

The HTML output of the MediaWiki software is meant for users. Users that have browsers in front of them.

All relevant browsers support "align", regardless of whether the page is in HTML5 made.

Having said that, word shall be spread to users to stop using "align" and make layouts in CSS instead (through classes), which by design will make use of "align" impossible and require usage of text-align and margin instead.

Even if we could transform it correctly, I would oppose automatic transformation (be it from output-only in the parser, or by a bot changing the actual wikitext). Because the "align" attribute is a means to an end that has lots of implications and possible unintended side-effects. Contrary to text-align and margin, which are very specific and targeted at their purpose. By replacing a single align attribute with all kinds of inline styles the original intention of that align attribute will be lost at the cost of a lot of bloat in the output that we don't really need anyway.

-- Krinkle

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Jon Robson http://jonrobson.me.uk @rakugojon

MZMcBride

2:40 a.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

Jon Robson wrote:

...

To solve validness I'd suggest creating styles for this in MediaWiki:Common.css and on a regular basis running reports to surface which articles use the text-align property. It would be great to have a dedicated wiki page linking to these articles and asking editors to fix them. It would give people who care about Wikipedia an easy way to contribute.

I have a similar problem in mobile - at some point I'd like us to deprecate use of the style attribute in wikitext in favour of using stylesheets and the class attribute which is much more manageable and would be interested in whatever solution you come to here.

Finding specific text strings like these requires scanning XML dumps. There are a few projects dedicated to this on various wikis. English examples:

* https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Check_Wikipedia * https://en.wikipedia.org/wiki/Wikipedia:Dump_reports

Scanning dumps (or really dealing with them in any form) is pretty awful. There's been some brainstorming in the past for how to set up a system where users (or operators) could run arbitrary regular expressions on all of the current wikitext regularly, but such a setup requires _a lot_ of anything involved (disk space, RAM, bandwidth, processing power, etc.). Maybe one day Labs will have something like this.

It's a well-known fact that if you give Wikimedians lists of things to do, they will eventually get done. I've done this for years with https://en.wikipedia.org/wiki/Wikipedia:Database_reports.

MZMcBride

Gabriel Wicke

5:04 p.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

On 09/20/2012 07:40 PM, MZMcBride wrote:

...

Scanning dumps (or really dealing with them in any form) is pretty awful. There's been some brainstorming in the past for how to set up a system where users (or operators) could run arbitrary regular expressions on all of the current wikitext regularly, but such a setup requires _a lot_ of anything involved (disk space, RAM, bandwidth, processing power, etc.). Maybe one day Labs will have something like this.

We have a dump grepper tool in the Parsoid codebase (see js/tests/dumpGrepper.js) that takes about 25 minutes to grep an XML dump of the English Wikipedia. The memory involved is minimal and constant, the thing is mostly CPU-bound.

It should not be hard to hook this up to a web service. Our parser web service in js/api could serve as a template for that.

Gabriel

Daniel Friesen

27 Sep 27 Sep

8:51 p.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

On Fri, 21 Sep 2012 10:04:50 -0700, Gabriel Wicke gwicke@wikimedia.org wrote:

...

On 09/20/2012 07:40 PM, MZMcBride wrote:

...
Scanning dumps (or really dealing with them in any form) is pretty awful. There's been some brainstorming in the past for how to set up a system where users (or operators) could run arbitrary regular expressions on all of the current wikitext regularly, but such a setup requires _a lot_ of anything involved (disk space, RAM, bandwidth, processing power, etc.). Maybe one day Labs will have something like this.

We have a dump grepper tool in the Parsoid codebase (see js/tests/dumpGrepper.js) that takes about 25 minutes to grep an XML dump of the English Wikipedia. The memory involved is minimal and constant, the thing is mostly CPU-bound.

It should not be hard to hook this up to a web service. Our parser web service in js/api could serve as a template for that.

Gabriel

Another option would be to start indexing tag/attr/property usage. I've thought of doing this before. Sometimes you want to cleanup the use of certain tags. Other times you want to stop using a parser function or tag hook from an extension in your pages. Other times your wiki is full of -moz-border-radius properties added by people who never quite got the fact that it's a standardized property with other forms that need to be included.

So aggregating this information into parser output properties we can display on a special page would make it easier for users to track down.

...of course we could always opt for the easier [[Category:Pages using deprecated WikiText]] built-in maintenance category.

Another thing I've wanted to do was build an on-wiki mass-replacement tool. One that properly uses the job queue, has a good UI, and some extra features. That could help cleanup smaller wikis too.

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Daniel Friesen

9:07 p.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

A bit of a side topic. But could someone point out some urls where the deprecated/removed align attribute is used along with block elements that are supposed to be centered too.

I've seen a lot of WikiText. But frankly, I have never seen any article or template that even tried to use align to center non-inline content (besides {| align=center).

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name] On Wed, 19 Sep 2012 09:23:37 -0700, Derk-Jan Hartman d.j.hartman+wmf_ml@gmail.com wrote: > I would like to open some discussion about > https://bugzilla.wikimedia.org/show_bug.cgi?id=40329 > This bug is about the fact that we currently do a 'partial' transform of > the HTML5-invalid attribute 'align'. > > We all agree that this is bad, what we need to figure out is what to do > next: > > 1: Disable the transform and output the align attribute even though it's > not valid HTML5. Solve validness later. > 2: Remove the attribute from HTML5 and 'break' the content. Fix by users > (or bot). > 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) > and remove the attribute in HTML5 mode, reenable HTML5. > 4: Fix the transform (not that easy) > > My personal preference is with 1, since this is causing trouble now and > with 1 we solve immediate problems, we just add to the lack of valid > HTML5 > output that we already have. In my opinion 2 would be too disruptive and > 3 > would take too long. > > Danny is of the opinion that we should never transform at the parser side > and that we should fix the content instead (2 or 3). > > So, how best to fix the issue/what should be our strategy with regard to > content that is not HTML 5 valid in general ? > <Discuss> > > DJ

Derk-Jan Hartman

29 Sep 29 Sep

2:13 p.m.

New subject: HTML5 and non valid attributes/elements of previous versions (bug 40329)

I'm on vacation, but it seems discussion is getting a bit out of control here: https://bugzilla.wikimedia.org/show_bug.cgi?id=40329 If someone can bring back some sensibility .....

BTW this has been 'broken' for weeks now, we should take some actual action, or just accept the status quo.

On 27 sep. 2012, at 23:07, Daniel Friesen daniel@nadir-seen-fire.com wrote:

...

A bit of a side topic. But could someone point out some urls where the deprecated/removed align attribute is used along with block elements that are supposed to be centered too.

I've seen a lot of WikiText. But frankly, I have never seen any article or template that even tried to use align to center non-inline content (besides {| align=center).

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

On Wed, 19 Sep 2012 09:23:37 -0700, Derk-Jan Hartman d.j.hartman+wmf_ml@gmail.com wrote:

...
I would like to open some discussion about https://bugzilla.wikimedia.org/show_bug.cgi?id=40329 This bug is about the fact that we currently do a 'partial' transform of the HTML5-invalid attribute 'align'.

We all agree that this is bad, what we need to figure out is what to do next:

1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)

My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.

Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).

So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ?

<Discuss>

DJ

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

4473

Age (days ago)

4483

Last active (days ago)

wikitech-l@lists.wikimedia.org

13 comments

8 participants

tags (0)

participants (8)

Daniel Friesen
Derk-Jan Hartman
Derric Atzrott
Gabriel Wicke
Jon Robson
Krinkle
MZMcBride
Rob Lanphier