I am looking at bug 1310, which involves parser behavior such that when given nested tag extensions, i.e.:
<tag> AAA <tag>BBB</tag> CCC </tag>
The parser selects the tag block as running from the first open tag to the FIRST close tag, i.e. in the example it gives:
AAA <tag>BBB
as the inner text of the first tag. It should be fairly straightforward to modify this to handle nested tags by checking for additional open tags in the inner string.
However, since this is parser behavior going back to the dawn of time (first reported in MW 1.4), I wanted to ask if there are known use cases where the current behavior is actually the expected behavior? In other words, are there any use cases in the current code base or extensions that would necessarily break if the parser were changed to allow nested tags? I can't think of any right now, but I wouldn't want to modify an old parser quirk without giving it a good look. For the record, my particular interest is related to nested refs.
-Robert Rohde
2009/9/20 Robert Rohde rarohde@gmail.com:
However, since this is parser behavior going back to the dawn of time (first reported in MW 1.4), I wanted to ask if there are known use cases where the current behavior is actually the expected behavior? In other words, are there any use cases in the current code base or extensions that would necessarily break if the parser were changed to allow nested tags? I can't think of any right now, but I wouldn't want to modify an old parser quirk without giving it a good look. For the record, my particular interest is related to nested refs.
What's the actual use case for nested refs? (Do you have an example page or two where they're useful and there's no way to do it right without nesting the refs?)
- d.
2009/9/20 David Gerard dgerard@gmail.com:
What's the actual use case for nested refs? (Do you have an example page or two where they're useful and there's no way to do it right without nesting the refs?)
“There’s no way”? Well, obviously, there is always a way do it differently… But nested refs can be useful especially with grouped references [1], when a footnote can refer to a source (or, more generally, refs from one group can refer to another group).
Note that there is a workaround for this problem, explained at the enwp help page. [2]
-- [[cs:User:Mormegil | Petr Kadlec]]
[1] http://www.mediawiki.org/wiki/Extension:Cite/Cite.php#Grouped_references [2] http://en.wikipedia.org/wiki/Wikipedia:Footnotes#Known_bugs
2009/9/20 Petr Kadlec petr.kadlec@gmail.com:
“There’s no way”? Well, obviously, there is always a way do it differently… But nested refs can be useful especially with grouped references [1], when a footnote can refer to a source (or, more generally, refs from one group can refer to another group).
Ah, that makes sense :-) I keep thinking of references as just a reference, not as the extended footnotes many people use.
- d.
On Sun, Sep 20, 2009 at 10:04 AM, David Gerard dgerard@gmail.com wrote:
2009/9/20 Robert Rohde rarohde@gmail.com:
However, since this is parser behavior going back to the dawn of time (first reported in MW 1.4), I wanted to ask if there are known use cases where the current behavior is actually the expected behavior? In other words, are there any use cases in the current code base or extensions that would necessarily break if the parser were changed to allow nested tags? I can't think of any right now, but I wouldn't want to modify an old parser quirk without giving it a good look. For the record, my particular interest is related to nested refs.
What's the actual use case for nested refs? (Do you have an example page or two where they're useful and there's no way to do it right without nesting the refs?)
There is a workaround (of sorts) using #tag which has been recommended for nesting refs on enwiki for two years [1]. So, nested refs will exist in the wild in some fashion whether we like them or not. I'd like to fully support nested refs IF it isn't going to break other things. If it is likely to be a major mess to change this piece of the parser, then I wouldn't try.
The most common use case seems to be when someone wants to have a note on a reference or a reference on a note (where each is broken up into different sections using ref's group attribute). It is still very rare to use ref nesting though. Some examples are at [2][3][4].
-Robert Rohde
[1] http://en.wikipedia.org/wiki/Wikipedia:Footnotes#Known_bugs [2] http://en.wikipedia.org/wiki/Super_Nintendo_Entertainment_System#Content_not... [3] http://en.wikipedia.org/wiki/Battle_of_Barnet#Footnotes [4] http://en.wikipedia.org/wiki/List_of_Governors_of_California#Notes
Robert Rohde wrote:
I am looking at bug 1310, which involves parser behavior such that when given nested tag extensions, i.e.:
<tag> AAA <tag>BBB</tag> CCC </tag>
The parser selects the tag block as running from the first open tag to the FIRST close tag, i.e. in the example it gives:
AAA <tag>BBB
as the inner text of the first tag. It should be fairly straightforward to modify this to handle nested tags by checking for additional open tags in the inner string.
This syntax was chosen because originally <pre> and <nowiki> were the only such tags (I called them xmlish elements in [[mw:Preprocessor ABNF]]). Those two tags were imagined as being useful solely for escaping HTML and other wikitext, so that it is displayed literally on the page. No nesting behaviour was desirable.
<math>, and then the extension interface, were added afterwards using the same syntax. There is no application for nesting with <math> since the contents are TeX.
<ref> was the first tag to assume that its contents were some kind of wikitext, unfortunately this was an inefficient and ugly hack on the software side, and could have been much more easily done, with appropriate nesting behaviour, if a different syntax had been chosen. Other tags were later added, following this bad example.
So if you ask me if there's a use case, I would say most likely yes, especially for <nowiki> and <pre>, and very likely for the extensions that shell out, like <math> and <lilypond>. These use cases would become especially obvious if an extension registered a short name name like <->, then the lack of a syntax for communicating this string with a shell command would become especially obvious.
But it would be possible to enable or disable nesting on a per-tag basis at registration time.
-- Tim Starling
On Sun, Sep 20, 2009 at 8:14 PM, Tim Starling tstarling@wikimedia.org wrote:
Robert Rohde wrote:
I am looking at bug 1310, which involves parser behavior such that when given nested tag extensions, i.e.:
<tag> AAA <tag>BBB</tag> CCC </tag>
The parser selects the tag block as running from the first open tag to the FIRST close tag, i.e. in the example it gives:
AAA <tag>BBB
as the inner text of the first tag. It should be fairly straightforward to modify this to handle nested tags by checking for additional open tags in the inner string.
This syntax was chosen because originally <pre> and <nowiki> were the only such tags (I called them xmlish elements in [[mw:Preprocessor ABNF]]). Those two tags were imagined as being useful solely for escaping HTML and other wikitext, so that it is displayed literally on the page. No nesting behaviour was desirable.
Actually, if one is following the HTML4 spec then <pre> would be expected to nest. (Not particularly useful as far as I can see, but it is what it is.)
I could see arguments in either direction for nowiki. For example, it might be nice to be able to wrap <nowiki> around arbitrary blocks without worrying if another nowiki was already present in the middle.
<math>, and then the extension interface, were added afterwards using the same syntax. There is no application for nesting with <math> since the contents are TeX.
<ref> was the first tag to assume that its contents were some kind of wikitext, unfortunately this was an inefficient and ugly hack on the software side, and could have been much more easily done, with appropriate nesting behaviour, if a different syntax had been chosen. Other tags were later added, following this bad example.
So if you ask me if there's a use case, I would say most likely yes, especially for <nowiki> and <pre>, and very likely for the extensions that shell out, like <math> and <lilypond>. These use cases would become especially obvious if an extension registered a short name name like <->, then the lack of a syntax for communicating this string with a shell command would become especially obvious.
I can't really think of an example where it would be valid and useful to enclose a single tag, i.e. <math> x + <math> 5 = 6 </math> is silly, but I can't rule out that there might be some circumstance somewhere where one would want that behavior.
But it would be possible to enable or disable nesting on a per-tag basis at registration time.
That seems like probably the best option.
-Robert Rohde
Robert Rohde wrote:
Actually, if one is following the HTML4 spec then <pre> would be expected to nest. (Not particularly useful as far as I can see, but it is what it is.)
<pre> in wikitext is explicitly not the same as <pre> in HTML. Unlike in HTML, HTML-like tags which appear inside <pre> are considered to be literal and are escaped. This behaviour is often used.
-- Tim Starling
On Mon, Sep 21, 2009 at 1:46 AM, Tim Starling tstarling@wikimedia.org wrote:
<pre> in wikitext is explicitly not the same as <pre> in HTML. Unlike in HTML, HTML-like tags which appear inside <pre> are considered to be literal and are escaped. This behaviour is often used.
The choice of name was probably unfortunate, though. There's now no actual way to put arbitrary content in an HTML <pre> in wikitext -- only if it plays nicely with the leading space syntax.
wikitech-l@lists.wikimedia.org