Can someone explain the point of these lines to me?
https://github.com/wikimedia/mediawiki/blob/master/includes/Html.php#L269
If these tags are optional, they can be there, so why remove them? If you remove them, you should probably also care about them in closeElement() so that mediawiki doesn't produce html which has only ending tags for html and head.
It seems that on WMF installation we don't use anyway as there is head tag. So why is it there? What purpose does it server?
On Wed, Feb 18, 2015 at 1:23 PM, Petr Bena benapetr@gmail.com wrote:
Can someone explain the point of these lines to me?
https://github.com/wikimedia/mediawiki/blob/master/includes/Html.php#L269
Someone thought it would be clever to reduce page size by a few bytes here and there. :)
In practice we always have attributes on the <html> so this would only take effect on <head> for this particular case. However there are other bits which can be more aggressively dropped, such as closing tags for table parts etc.
Here's the commit that added this and related bits: https://github.com/wikimedia/mediawiki/commit/eefe1b13a382a7d11c2137bbf900b7...
If these tags are optional, they can be there, so why remove them? If you remove them, you should probably also care about them in closeElement() so that mediawiki doesn't produce html which has only ending tags for html and head.
I believe that's also covered, yes. (Keep in mind also that the way the HTML content model works is that those elements always exist in the DOM; the tags can simply be omitted from the markup because they are implied by the surrounding content.)
It seems that on WMF installation we don't use anyway as there is head tag. So why is it there? What purpose does it server?
I think we still have the well-formed XML mode enabled for backwards-compatibility?
-- brion
I was digging and I found that commit you sent - eefe1b13a but you can see that this commit modified 2 functions: both open and closeElement, however for some reason the code was removed from closeElement so now calling openElement('html') would produce empty string, while closeElement('html') would produce </html> and that should IMHO create HTML code that would only contain closing tags in case that html / head didn't have any parameters. Actually head, at least on wmf installation doesn't have any parameters, but it has opening tag, which is weird. (Perhaps wgWellFormedXml is true by default?)
On Wed, Feb 18, 2015 at 10:31 PM, Brion Vibber bvibber@wikimedia.org wrote:
On Wed, Feb 18, 2015 at 1:23 PM, Petr Bena benapetr@gmail.com wrote:
Can someone explain the point of these lines to me?
https://github.com/wikimedia/mediawiki/blob/master/includes/Html.php#L269
Someone thought it would be clever to reduce page size by a few bytes here and there. :)
In practice we always have attributes on the <html> so this would only take effect on <head> for this particular case. However there are other bits which can be more aggressively dropped, such as closing tags for table parts etc.
Here's the commit that added this and related bits: https://github.com/wikimedia/mediawiki/commit/eefe1b13a382a7d11c2137bbf900b7...
If these tags are optional, they can be there, so why remove them? If you remove them, you should probably also care about them in closeElement() so that mediawiki doesn't produce html which has only ending tags for html and head.
I believe that's also covered, yes. (Keep in mind also that the way the HTML content model works is that those elements always exist in the DOM; the tags can simply be omitted from the markup because they are implied by the surrounding content.)
It seems that on WMF installation we don't use anyway as there is head tag. So why is it there? What purpose does it server?
I think we still have the well-formed XML mode enabled for backwards-compatibility?
-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Uploaded https://gerrit.wikimedia.org/r/191473/ for consistency with closeElement. Thanks for reporting.
Il 18/02/2015 22:38, Petr Bena ha scritto:
I was digging and I found that commit you sent - eefe1b13a but you can see that this commit modified 2 functions: both open and closeElement, however for some reason the code was removed from closeElement so now calling openElement('html') would produce empty string, while closeElement('html') would produce </html> and that should IMHO create HTML code that would only contain closing tags in case that html / head didn't have any parameters. Actually head, at least on wmf installation doesn't have any parameters, but it has opening tag, which is weird. (Perhaps wgWellFormedXml is true by default?)
On Wed, Feb 18, 2015 at 10:31 PM, Brion Vibber bvibber@wikimedia.org wrote:
On Wed, Feb 18, 2015 at 1:23 PM, Petr Bena benapetr@gmail.com wrote:
Can someone explain the point of these lines to me?
https://github.com/wikimedia/mediawiki/blob/master/includes/Html.php#L269
Someone thought it would be clever to reduce page size by a few bytes here and there. :)
In practice we always have attributes on the <html> so this would only take effect on <head> for this particular case. However there are other bits which can be more aggressively dropped, such as closing tags for table parts etc.
Here's the commit that added this and related bits: https://github.com/wikimedia/mediawiki/commit/eefe1b13a382a7d11c2137bbf900b7...
If these tags are optional, they can be there, so why remove them? If you remove them, you should probably also care about them in closeElement() so that mediawiki doesn't produce html which has only ending tags for html and head.
I believe that's also covered, yes. (Keep in mind also that the way the HTML content model works is that those elements always exist in the DOM; the tags can simply be omitted from the markup because they are implied by the surrounding content.)
It seems that on WMF installation we don't use anyway as there is head tag. So why is it there? What purpose does it server?
I think we still have the well-formed XML mode enabled for backwards-compatibility?
-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I think it's the other way, omiting the html tags is probably correct, as long as both opening and ending tags are omitted :P so I think the closeElement function needs a fix
On Wed, Feb 18, 2015 at 10:42 PM, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Uploaded https://gerrit.wikimedia.org/r/191473/ for consistency with closeElement. Thanks for reporting.
Il 18/02/2015 22:38, Petr Bena ha scritto:
I was digging and I found that commit you sent - eefe1b13a but you can see that this commit modified 2 functions: both open and closeElement, however for some reason the code was removed from closeElement so now calling openElement('html') would produce empty string, while closeElement('html') would produce </html> and that should IMHO create HTML code that would only contain closing tags in case that html / head didn't have any parameters. Actually head, at least on wmf installation doesn't have any parameters, but it has opening tag, which is weird. (Perhaps wgWellFormedXml is true by default?)
On Wed, Feb 18, 2015 at 10:31 PM, Brion Vibber bvibber@wikimedia.org wrote:
On Wed, Feb 18, 2015 at 1:23 PM, Petr Bena benapetr@gmail.com wrote:
Can someone explain the point of these lines to me?
https://github.com/wikimedia/mediawiki/blob/master/includes/Html.php#L269
Someone thought it would be clever to reduce page size by a few bytes here and there. :)
In practice we always have attributes on the <html> so this would only take effect on <head> for this particular case. However there are other bits which can be more aggressively dropped, such as closing tags for table parts etc.
Here's the commit that added this and related bits:
https://github.com/wikimedia/mediawiki/commit/eefe1b13a382a7d11c2137bbf900b7...
If these tags are optional, they can be there, so why remove them? If you remove them, you should probably also care about them in closeElement() so that mediawiki doesn't produce html which has only ending tags for html and head.
I believe that's also covered, yes. (Keep in mind also that the way the HTML content model works is that those elements always exist in the DOM; the tags can simply be omitted from the markup because they are implied by the surrounding content.)
It seems that on WMF installation we don't use anyway as there is head tag. So why is it there? What purpose does it server?
I think we still have the well-formed XML mode enabled for backwards-compatibility?
-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That patch you made even breaks tests: https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs-HEAD/13002/con... :)
so either remove a unit test for that feature, or just fix the closeElement but I don't even know if it's really broken, maybe there is some more magic on behing that would fix the broken html before it gets printed out
On Wed, Feb 18, 2015 at 10:47 PM, Petr Bena benapetr@gmail.com wrote:
I think it's the other way, omiting the html tags is probably correct, as long as both opening and ending tags are omitted :P so I think the closeElement function needs a fix
On Wed, Feb 18, 2015 at 10:42 PM, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Uploaded https://gerrit.wikimedia.org/r/191473/ for consistency with closeElement. Thanks for reporting.
Il 18/02/2015 22:38, Petr Bena ha scritto:
I was digging and I found that commit you sent - eefe1b13a but you can see that this commit modified 2 functions: both open and closeElement, however for some reason the code was removed from closeElement so now calling openElement('html') would produce empty string, while closeElement('html') would produce </html> and that should IMHO create HTML code that would only contain closing tags in case that html / head didn't have any parameters. Actually head, at least on wmf installation doesn't have any parameters, but it has opening tag, which is weird. (Perhaps wgWellFormedXml is true by default?)
On Wed, Feb 18, 2015 at 10:31 PM, Brion Vibber bvibber@wikimedia.org wrote:
On Wed, Feb 18, 2015 at 1:23 PM, Petr Bena benapetr@gmail.com wrote:
Can someone explain the point of these lines to me?
https://github.com/wikimedia/mediawiki/blob/master/includes/Html.php#L269
Someone thought it would be clever to reduce page size by a few bytes here and there. :)
In practice we always have attributes on the <html> so this would only take effect on <head> for this particular case. However there are other bits which can be more aggressively dropped, such as closing tags for table parts etc.
Here's the commit that added this and related bits:
https://github.com/wikimedia/mediawiki/commit/eefe1b13a382a7d11c2137bbf900b7...
If these tags are optional, they can be there, so why remove them? If you remove them, you should probably also care about them in closeElement() so that mediawiki doesn't produce html which has only ending tags for html and head.
I believe that's also covered, yes. (Keep in mind also that the way the HTML content model works is that those elements always exist in the DOM; the tags can simply be omitted from the markup because they are implied by the surrounding content.)
It seems that on WMF installation we don't use anyway as there is head tag. So why is it there? What purpose does it server?
I think we still have the well-formed XML mode enabled for backwards-compatibility?
-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benapetr@gmail.com wrote:
(Perhaps wgWellFormedXml is true by default?)
It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml
Yep
That makes me feel like this is bug which everyone overlooked because it's true by default. If you set it to false mediawiki would likely write out syntactically wrong code.
On Wed, Feb 18, 2015 at 10:43 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benapetr@gmail.com wrote:
(Perhaps wgWellFormedXml is true by default?)
It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Most of browsers would probably handle it OK, but validators not.
On Wed, Feb 18, 2015 at 10:45 PM, Petr Bena benapetr@gmail.com wrote:
Yep
That makes me feel like this is bug which everyone overlooked because it's true by default. If you set it to false mediawiki would likely write out syntactically wrong code.
On Wed, Feb 18, 2015 at 10:43 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benapetr@gmail.com wrote:
(Perhaps wgWellFormedXml is true by default?)
It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 19/02/15 08:43, Gergo Tisza wrote:
On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benapetr@gmail.com wrote:
(Perhaps wgWellFormedXml is true by default?)
It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml
There was a Bugzilla report and Gerrit change requesting that it be set to false:
https://phabricator.wikimedia.org/T52040 https://gerrit.wikimedia.org/r/#/c/70036/
I was against it, partly because of the omitted <head> tag.
-- Tim Starling
I would also recommend against actively trying to emit barely parsing output. Any savings after compression should be rather small, and if only end tags are omitted the DOM will of course still be the same size after parsing.
In Parsoid we went to some modest lengths https://github.com/wikimedia/parsoid/blob/master/lib/XMLSerializer.js to produce polyglot markup http://www.w3.org/TR/html-polyglot/, which is both valid XML and HTML5. This has enabled consumers to use either XML or HTML5 parsers, which has proven very useful in practice. For example, this makes it easier to consume this content using PHP's libxml. Doing the same in MediaWiki core is admittedly harder, but I still think that we should follow the robustness principle https://en.wikipedia.org/wiki/Robustness_principle wherever we can.
Gabriel
On Wed, Feb 18, 2015 at 5:59 PM, Tim Starling tstarling@wikimedia.org wrote:
On 19/02/15 08:43, Gergo Tisza wrote:
On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benapetr@gmail.com wrote:
(Perhaps wgWellFormedXml is true by default?)
It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml
There was a Bugzilla report and Gerrit change requesting that it be set to false:
https://phabricator.wikimedia.org/T52040 https://gerrit.wikimedia.org/r/#/c/70036/
I was against it, partly because of the omitted <head> tag.
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org