I would like to open some discussion about https://bugzilla.wikimedia.org/show_bug.cgi?id=40329 This bug is about the fact that we currently do a 'partial' transform of the HTML5-invalid attribute 'align'.
We all agree that this is bad, what we need to figure out is what to do next:
1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)
My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.
Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).
So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ? <Discuss>
DJ
1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)
My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.
Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).
So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ?
Can't we do both 1 and 4. Remove it for now, fix the transform, and then re-enable the transform and disable the align attribute?
Thank you, Derric Atzrott
On Wed, Sep 19, 2012 at 6:30 PM, Derric Atzrott < datzrott@alizeepathology.com> wrote:
1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)
My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.
Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).
So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ?
Can't we do both 1 and 4. Remove it for now, fix the transform, and then re-enable the transform and disable the align attribute?
The problem there is that you need to target "direct child elements that are in block mode", which is rather hard you might be able to get away with targeting all direct child elements, but that would possibly wreck margins of inline elements.
DJ
On Wed, Sep 19, 2012 at 9:23 AM, Derk-Jan Hartman d.j.hartman+wmf_ml@gmail.com wrote:
We all agree that this is bad, what we need to figure out is what to do next:
1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)
My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.
Agreed, let's do #1. It sounds like this isn't our only HTML5 validity problem. We shouldn't punish our readers with poorly-formatted content in the name of technical correctness.
Rob
Agreed, let's do #1. It sounds like this isn't our only HTML5 validity problem. We shouldn't punish our readers with poorly-formatted content in the name of technical correctness.
Rob
Long term though I still think we should aim for technical correctness. Which is why I advocate #4 as well.
DJ would you mind explaining again (in different terms) why we can't do both #1 and #4 (#1 as a temporary measure while we achieve #4)? I don't think I quite understood your first explanation.
Thank you, Derric Atzrott
On Wed, Sep 19, 2012 at 9:15 PM, Derric Atzrott < datzrott@alizeepathology.com> wrote:
DJ would you mind explaining again (in different terms) why we can't do both #1 and #4 (#1 as a temporary measure while we achieve #4)? I don't think I quite understood your first explanation.
I'll try. Before HTML4, the "align" attribute (other than for 'table') with the value "center" meant "Center all of my content". Since the attribute has been removed from the spec, you need to replace it with CSS rules. Unfortunately however there are no CSS rules that are able to exactly reproduce the behavior of the attribute.
You have "text-align:center;" but this only applies to content that is inline. It does not center a div inside a table cell for instance, where align="center" would have done this. To get this behavior, you need to apply "margin-left:auto; margin-right:auto;" on the div.
So the pseudo CSS rules would be something like: table[align=center] { text-align: center; }
table[align=center] > *[display=block or table]{ margin-left: auto; margin-right: auto; }
The problem is that as far as I know, that last one is not possible with CSS. You cannot say: "apply this style to all direct children that are in block mode".
If you look at the internals of mozilla and webkit, then you will note they have the same problem (they also transform the align attribute). Therefore they have the following browser specific text-align attributes: -webkit-right, -moz-right, -webkit-center etc... which bypass the default behavior of text-align to apply to just inline elements to include block elements as well. How other browser vendors do this, I'm not sure of.
DJ
Small mistake in the example, I meant td instead of table.
td[align=center] { text-align: center; }
td[align=center] > *[display=block or table]{ margin-left: auto; margin-right: auto; }
On Thu, Sep 20, 2012 at 10:10 AM, Derk-Jan Hartman < d.j.hartman+wmf_ml@gmail.com> wrote:
On Wed, Sep 19, 2012 at 9:15 PM, Derric Atzrott < datzrott@alizeepathology.com> wrote:
DJ would you mind explaining again (in different terms) why we can't do both #1 and #4 (#1 as a temporary measure while we achieve #4)? I don't think I quite understood your first explanation.
I'll try. Before HTML4, the "align" attribute (other than for 'table') with the value "center" meant "Center all of my content". Since the attribute has been removed from the spec, you need to replace it with CSS rules. Unfortunately however there are no CSS rules that are able to exactly reproduce the behavior of the attribute.
You have "text-align:center;" but this only applies to content that is inline. It does not center a div inside a table cell for instance, where align="center" would have done this. To get this behavior, you need to apply "margin-left:auto; margin-right:auto;" on the div.
So the pseudo CSS rules would be something like: table[align=center] { text-align: center; }
table[align=center] > *[display=block or table]{ margin-left: auto; margin-right: auto; }
The problem is that as far as I know, that last one is not possible with CSS. You cannot say: "apply this style to all direct children that are in block mode".
If you look at the internals of mozilla and webkit, then you will note they have the same problem (they also transform the align attribute). Therefore they have the following browser specific text-align attributes: -webkit-right, -moz-right, -webkit-center etc... which bypass the default behavior of text-align to apply to just inline elements to include block elements as well. How other browser vendors do this, I'm not sure of.
DJ
On Sep 19, 2012, at 6:23 PM, Derk-Jan Hartman d.j.hartman+wmf_ml@gmail.com wrote:
I would like to open some discussion about https://bugzilla.wikimedia.org/show_bug.cgi?id=40329 This bug is about the fact that we currently do a 'partial' transform of the HTML5-invalid attribute 'align'.
We all agree that this is bad, what we need to figure out is what to do next:
1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)
My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.
Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).
So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ?
<Discuss>
I agree with others, #1 seems to be the best choice.
The W3C validator is not a visitor nor a user of the software. It's a useful tool to find problems, but as long as browsers are not standards compliant, and the W3C validator stays ignorant of that fact, we have very good reason to choose to optimize for real browsers, and not the hypothetical browser in the eyes of the validator.
The HTML output of the MediaWiki software is meant for users. Users that have browsers in front of them.
All relevant browsers support "align", regardless of whether the page is in HTML5 made.
Having said that, word shall be spread to users to stop using "align" and make layouts in CSS instead (through classes), which by design will make use of "align" impossible and require usage of text-align and margin instead.
Even if we could transform it correctly, I would oppose automatic transformation (be it from output-only in the parser, or by a bot changing the actual wikitext). Because the "align" attribute is a means to an end that has lots of implications and possible unintended side-effects. Contrary to text-align and margin, which are very specific and targeted at their purpose. By replacing a single align attribute with all kinds of inline styles the original intention of that align attribute will be lost at the cost of a lot of bloat in the output that we don't really need anyway.
-- Krinkle
To solve validness I'd suggest creating styles for this in MediaWiki:Common.css and on a regular basis running reports to surface which articles use the text-align property. It would be great to have a dedicated wiki page linking to these articles and asking editors to fix them. It would give people who care about Wikipedia an easy way to contribute.
I have a similar problem in mobile - at some point I'd like us to deprecate use of the style attribute in wikitext in favour of using stylesheets and the class attribute which is much more manageable and would be interested in whatever solution you come to here.
On Thu, Sep 20, 2012 at 4:12 PM, Krinkle krinklemail@gmail.com wrote:
On Sep 19, 2012, at 6:23 PM, Derk-Jan Hartman d.j.hartman+wmf_ml@gmail.com wrote:
I would like to open some discussion about https://bugzilla.wikimedia.org/show_bug.cgi?id=40329 This bug is about the fact that we currently do a 'partial' transform of the HTML5-invalid attribute 'align'.
We all agree that this is bad, what we need to figure out is what to do next:
1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)
My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.
Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).
So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ?
<Discuss>
I agree with others, #1 seems to be the best choice.
The W3C validator is not a visitor nor a user of the software. It's a useful tool to find problems, but as long as browsers are not standards compliant, and the W3C validator stays ignorant of that fact, we have very good reason to choose to optimize for real browsers, and not the hypothetical browser in the eyes of the validator.
The HTML output of the MediaWiki software is meant for users. Users that have browsers in front of them.
All relevant browsers support "align", regardless of whether the page is in HTML5 made.
Having said that, word shall be spread to users to stop using "align" and make layouts in CSS instead (through classes), which by design will make use of "align" impossible and require usage of text-align and margin instead.
Even if we could transform it correctly, I would oppose automatic transformation (be it from output-only in the parser, or by a bot changing the actual wikitext). Because the "align" attribute is a means to an end that has lots of implications and possible unintended side-effects. Contrary to text-align and margin, which are very specific and targeted at their purpose. By replacing a single align attribute with all kinds of inline styles the original intention of that align attribute will be lost at the cost of a lot of bloat in the output that we don't really need anyway.
-- Krinkle
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Jon Robson wrote:
To solve validness I'd suggest creating styles for this in MediaWiki:Common.css and on a regular basis running reports to surface which articles use the text-align property. It would be great to have a dedicated wiki page linking to these articles and asking editors to fix them. It would give people who care about Wikipedia an easy way to contribute.
I have a similar problem in mobile - at some point I'd like us to deprecate use of the style attribute in wikitext in favour of using stylesheets and the class attribute which is much more manageable and would be interested in whatever solution you come to here.
Finding specific text strings like these requires scanning XML dumps. There are a few projects dedicated to this on various wikis. English examples:
* https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Check_Wikipedia * https://en.wikipedia.org/wiki/Wikipedia:Dump_reports
Scanning dumps (or really dealing with them in any form) is pretty awful. There's been some brainstorming in the past for how to set up a system where users (or operators) could run arbitrary regular expressions on all of the current wikitext regularly, but such a setup requires _a lot_ of anything involved (disk space, RAM, bandwidth, processing power, etc.). Maybe one day Labs will have something like this.
It's a well-known fact that if you give Wikimedians lists of things to do, they will eventually get done. I've done this for years with https://en.wikipedia.org/wiki/Wikipedia:Database_reports.
MZMcBride
On 09/20/2012 07:40 PM, MZMcBride wrote:
Scanning dumps (or really dealing with them in any form) is pretty awful. There's been some brainstorming in the past for how to set up a system where users (or operators) could run arbitrary regular expressions on all of the current wikitext regularly, but such a setup requires _a lot_ of anything involved (disk space, RAM, bandwidth, processing power, etc.). Maybe one day Labs will have something like this.
We have a dump grepper tool in the Parsoid codebase (see js/tests/dumpGrepper.js) that takes about 25 minutes to grep an XML dump of the English Wikipedia. The memory involved is minimal and constant, the thing is mostly CPU-bound.
It should not be hard to hook this up to a web service. Our parser web service in js/api could serve as a template for that.
Gabriel
On Fri, 21 Sep 2012 10:04:50 -0700, Gabriel Wicke gwicke@wikimedia.org wrote:
On 09/20/2012 07:40 PM, MZMcBride wrote:
Scanning dumps (or really dealing with them in any form) is pretty awful. There's been some brainstorming in the past for how to set up a system where users (or operators) could run arbitrary regular expressions on all of the current wikitext regularly, but such a setup requires _a lot_ of anything involved (disk space, RAM, bandwidth, processing power, etc.). Maybe one day Labs will have something like this.
We have a dump grepper tool in the Parsoid codebase (see js/tests/dumpGrepper.js) that takes about 25 minutes to grep an XML dump of the English Wikipedia. The memory involved is minimal and constant, the thing is mostly CPU-bound.
It should not be hard to hook this up to a web service. Our parser web service in js/api could serve as a template for that.
Gabriel
Another option would be to start indexing tag/attr/property usage. I've thought of doing this before. Sometimes you want to cleanup the use of certain tags. Other times you want to stop using a parser function or tag hook from an extension in your pages. Other times your wiki is full of -moz-border-radius properties added by people who never quite got the fact that it's a standardized property with other forms that need to be included.
So aggregating this information into parser output properties we can display on a special page would make it easier for users to track down.
...of course we could always opt for the easier [[Category:Pages using deprecated WikiText]] built-in maintenance category.
Another thing I've wanted to do was build an on-wiki mass-replacement tool. One that properly uses the job queue, has a good UI, and some extra features. That could help cleanup smaller wikis too.
A bit of a side topic. But could someone point out some urls where the deprecated/removed align attribute is used along with block elements that are supposed to be centered too.
I've seen a lot of WikiText. But frankly, I have never seen any article or template that even tried to use align to center non-inline content (besides {| align=center).
I'm on vacation, but it seems discussion is getting a bit out of control here: https://bugzilla.wikimedia.org/show_bug.cgi?id=40329 If someone can bring back some sensibility .....
BTW this has been 'broken' for weeks now, we should take some actual action, or just accept the status quo.
DJ
On 27 sep. 2012, at 23:07, Daniel Friesen daniel@nadir-seen-fire.com wrote:
A bit of a side topic. But could someone point out some urls where the deprecated/removed align attribute is used along with block elements that are supposed to be centered too.
I've seen a lot of WikiText. But frankly, I have never seen any article or template that even tried to use align to center non-inline content (besides {| align=center).
-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
On Wed, 19 Sep 2012 09:23:37 -0700, Derk-Jan Hartman d.j.hartman+wmf_ml@gmail.com wrote:
I would like to open some discussion about https://bugzilla.wikimedia.org/show_bug.cgi?id=40329 This bug is about the fact that we currently do a 'partial' transform of the HTML5-invalid attribute 'align'.
We all agree that this is bad, what we need to figure out is what to do next:
1: Disable the transform and output the align attribute even though it's not valid HTML5. Solve validness later. 2: Remove the attribute from HTML5 and 'break' the content. Fix by users (or bot). 3: Disable HTML5, correct the content of the wiki's (possibly with a bot) and remove the attribute in HTML5 mode, reenable HTML5. 4: Fix the transform (not that easy)
My personal preference is with 1, since this is causing trouble now and with 1 we solve immediate problems, we just add to the lack of valid HTML5 output that we already have. In my opinion 2 would be too disruptive and 3 would take too long.
Danny is of the opinion that we should never transform at the parser side and that we should fix the content instead (2 or 3).
So, how best to fix the issue/what should be our strategy with regard to content that is not HTML 5 valid in general ?
<Discuss>
DJ
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org