Here are a few syntax features that I propose we deprecate for 1.5, the reasons are mainly to simplify the the wikitext to make parsing simpler.
1. <p>
The only reason one would want to use <p> is to avoid the effect two newlines have of spawning a new paragraph, that is, to be able to write something like:
<p>This
Is all one
Paragraph</p>
This feature is seldom (if ever) used.
2. <i> and <b> We have a syntax for these two, namely ''' and '' for <b> and <i> respectively.
3. {{msg:}}
Leftover stuff from the 1.2 days, let's kill it;)
Ævar Arnfjörð Bjarmason wrote:
Here are a few syntax features that I propose we deprecate for 1.5, the reasons are mainly to simplify the the wikitext to make parsing simpler.
Hi Ævar,
I support all three points on your list (why do you want to keep <br> and <pre>?)
Depreciating it would result in error messages? A simple handbook addition? A social pressure to convert remaining ones?
Mathias
On Wednesday, May 11, 2005, at 2:21 PM, Mathias Schindler neubau@presroi.de wrote:
Ævar Arnfjörð Bjarmason wrote:
Here are a few syntax features that I propose we deprecate for 1.5, the reasons are mainly to simplify the the wikitext to make parsing simpler.
[Snip: <p>, <i>, and <b> elements, and the {{msg:xxx}} transclude tag.]
Hi Ævar,
I support all three points on your list
Yes, they seem sensible.
(why do you want to keep <br> and <pre>?)
The <pre> element allows styling which pre-pending a ' ' to each line doesn't, which can be useful (and is used in places).
The <br> element is used a lot to enforce linebreaks where you can't do it otherwise (inline as template variables, for instance) on w:en, at the very least. I don't see that it would be particularly easy to avoid using them.
Depreciating it would result in error messages? A simple handbook addition? A social pressure to convert remaining ones?
I'd imagine an automatic conversion would be easiest to manage.
Yours,
The <br> element is used a lot to enforce linebreaks where you can't do it otherwise (inline as template variables, for instance) on w:en, at the very least. I don't see that it would be particularly easy to avoid using them.
Don't forget forcing a <br style="clear: both" />, commonly needed for columns of images and things.
-- Austin
-----BEGIN PGP SIGNED MESSAGE-----
Moin,
On Wednesday 11 May 2005 15:51, Austin Hair wrote:
The <br> element is used a lot to enforce linebreaks where you can't do it otherwise (inline as template variables, for instance) on w:en, at the very least. I don't see that it would be particularly easy to avoid using them.
Don't forget forcing a <br style="clear: both" />, commonly needed for columns of images and things.
Wouldn't a <div class="clear"></div> with appropriate CSS magic work as well if not better?
Best wishes,
Tels
- -- Signed on Wed May 11 18:28:14 2005 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.
"Duke Nukem Forever is a 1999 game and we think that timeframe matches very well with what we have planned for the game." - George Broussard, 1998 (http://tinyurl.com/6m8nh)
Austin Hair wrote:
The <br> element is used a lot to enforce linebreaks where you can't do it otherwise (inline as template variables, for instance) on w:en, at the very least. I don't see that it would be particularly easy to avoid using them..
Don't forget forcing a <br style="clear: both" />, commonly needed for columns of images and things.
No, you don't want to force that! Otherwise you'll have problems displaying images alongside a poem... (That's just an example.)
-----BEGIN PGP SIGNED MESSAGE-----
Moin,
On Wednesday 11 May 2005 15:28, James D. Forrester wrote:
On Wednesday, May 11, 2005, at 2:21 PM, Mathias Schindler
neubau@presroi.de wrote:
Ævar Arnfjörð Bjarmason wrote:
Here are a few syntax features that I propose we deprecate for 1.5, the reasons are mainly to simplify the the wikitext to make parsing simpler.
[Snip: <p>, <i>, and <b> elements, and the {{msg:xxx}} transclude tag.]
Hi Ævar,
I support all three points on your list
Yes, they seem sensible.
(why do you want to keep <br> and <pre>?)
The <pre> element allows styling which pre-pending a ' ' to each line doesn't, which can be useful (and is used in places).
And inside <pre>, the {{template}} syntax doesn't work, while it works in ' '. This distinction might be important. (Whether this is a bug or a feature, I do not know :)
Best wishes,
Tels
- -- Signed on Wed May 11 18:26:37 2005 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.
"Some spammers have this warped idea that their freedom of speech is guaranteed all the way into my hard drive, but it is my firm belief that their rights end at my firewall." -- Nigel Featherston
Tels wrote:
The <pre> element allows styling which pre-pending a ' ' to each line doesn't, which can be useful (and is used in places).
And inside <pre>, the {{template}} syntax doesn't work, while it works in ' '. This distinction might be important. (Whether this is a bug or a feature, I do not know :)
It's a feature -- <pre>-in-wiki essentially means: <nowiki> + <pre>-in-HTML.
"Mathias" == Mathias Schindler neubau@presroi.de writes:
Ævar Arnfjörð Bjarmason wrote:
Here are a few syntax features that I propose we deprecate for 1.5, the reasons are mainly to simplify the the wikitext to make parsing simpler.
Hi Ævar,
I support all three points on your list (why do you want to keep <br> and <pre>?)
<br style="clear: right;" /> is pretty damm usefull when there's more than one floated image on a page.
Anders Wegge Jakobsen wrote:
<br style="clear: right;" /> is pretty damm usefull when there's more than one floated image on a page.
No, not at all. Such br's often break floating images, especially when there is more than one on a page.
Remember that the *right*-floating images (left-floating on LTR wikis) have a "clear:right" (resp. clear:left) attribute themselves. This means that right-floating images will display underneath each other and never next to each other. Ideally, all right-floating tables should have this attribute too so they fit in with the vertical alignment of everything. Unfortunately, too many people use "float:right" (or even worse, the now-obsolete align='right') without a corresponding "clear:right" -- I usually have to fix this myself.
If you add a <br style="clear: right;" />, you will cause not only other images, but also text and headings to move below existing right-floating images. This usually generates a lot of unnecessary vertical space. I really really hate it when people do that when they should have added the "clear:right" to a table that was missing it.
Timwi
Timwi (timwi@gmx.net) [050514 02:29]:
Remember that the *right*-floating images (left-floating on LTR wikis) have a "clear:right" (resp. clear:left) attribute themselves. This means that right-floating images will display underneath each other and never next to each other.
Er, this appears not to be the case at all in 1.4, at least. I frequently end up having to kludge this with tables.
- d.
"Timwi" == Timwi timwi@gmx.net writes:
Anders Wegge Jakobsen wrote:
<br style="clear: right;" /> is pretty damm usefull when there's more than one floated image on a page.
No, not at all. Such br's often break floating images, especially when there is more than one on a page.
Why say break, when it's the intended purpose?
Remember that the *right*-floating images (left-floating on LTR wikis) have a "clear:right" (resp. clear:left) attribute themselves. This means that right-floating images will display underneath each other and never next to each other.
I'm talking about pages like http://da.wikipedia.org/wiki/Bonde_%28skak%29, which would look quite broken if the text could not be aligned where it's supposed to be.
...
If you add a <br style="clear: right;" />, you will cause not only other images, but also text and headings to move below existing right-floating images.
Often, that's the point of doing so.
This usually generates a lot of unnecessary vertical space. I really really hate it when people do that when they should have added the "clear:right" to a table that was missing it.
That's a whole other matter than the practice of bringing floating pictures into the vicinity of the text describing them.
Anders Wegge Jakobsen wrote:
"Timwi" == Timwi timwi@gmx.net writes:
No, not at all. Such br's often break floating images, especially when there is more than one on a page.
Why say break, when it's the intended purpose?
When I say "break" I mean it's *NOT* the intended purpose. 99% of all <br>s I have seen on en did *NOT* accomplish the intended purpose and instead "broke" things.
I'm talking about pages like http://da.wikipedia.org/wiki/Bonde_%28skak%29, which would look quite broken if the text could not be aligned where it's supposed to be.
This is a good example of the rare page where the <br> tags are useful and used correctly.
If you add a <br style="clear: right;" />, you will cause not only other images, but also text and headings to move below existing right-floating images.
Often, that's the point of doing so.
That's not what the examples I saw were trying to accomplish (or when they were, they were doing it in a place where it was inappropriate, such as forcing the See Also section underneath a lengthy taxobox; why do you want to do that?).
Timwi
"Timwi" == Timwi timwi@gmx.net writes:
Anders Wegge Jakobsen wrote:
"Timwi" == Timwi timwi@gmx.net writes:
No, not at all. Such br's often break floating images, especially when there is more than one on a page.
Why say break, when it's the intended purpose?
When I say "break" I mean it's *NOT* the intended purpose. 99% of all <br>s I have seen on en did *NOT* accomplish the intended purpose and instead "broke" things.
Your point may be correct as far as en is concerned. Many other wikis have so few contributors that there are simply not enough text in a given article, as to take up the slack induced by pictures.
I'm talking about pages like http://da.wikipedia.org/wiki/Bonde_%28skak%29, which would look quite broken if the text could not be aligned where it's supposed to be.
This is a good example of the rare page where the <br> tags are useful and used correctly.
Belive me or not, we have many similar pages on dawiki.
If you add a <br style="clear: right;" />, you will cause not only other images, but also text and headings to move below existing right-floating images.
Often, that's the point of doing so.
That's not what the examples I saw were trying to accomplish (or when they were, they were doing it in a place where it was inappropriate, such as forcing the See Also section underneath a lengthy taxobox; why do you want to do that?).
Typographic conventions perhaps? I'm just guessing here.
What's actually being discussed is the reverse, or "clear: both" for good measure. For a right-floating image, a break styled "clear: left" prevents strict adherence to the box model from forcing text into a teeny-tiny margin if even a single line of text aligns horizontally with a floating element (image). You may sometimes wind up with an slight gap between paragraphs, but it's a far sight better than the alternative.
Nobody suggested that the images themselves should clear left.
-- Austin
Ævar Arnfjörð Bjarmason wrote:
I support all three points on your list (why do you want to keep <br> and <pre>?)
First, I didn't say anything about <br> and <pre>, and second, we need to keep those because there's no equivalent for them in the wikisyntax.
Some people believe that there should not be a <br> or an equivalent. I can sort of appreciate this sentiment, because <br> is very often used in completely inappropriate contexts. Many contexts where <br> is currently believed to be useful and legitimate can be replaced with a new, more appropriate wiki-syntax.
Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:
<p>
The only reason one would want to use <p> is to avoid the effect two newlines have of spawning a new paragraph, that is, to be able to write something like:
I've used it in a few cases where a single bullet point needs to have multiple paragraphs. (Yes, this really does happen sometimes.) There's no other way to do this in wikicode.
Other than that, I concur.
Brent 'Dax' Royal-Gordon wrote:
Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:
<p>
The only reason one would want to use <p> is to avoid the effect two newlines have of spawning a new paragraph, that is, to be able to write something like:
I've used it in a few cases where a single bullet point needs to have multiple paragraphs. (Yes, this really does happen sometimes.) There's no other way to do this in wikicode.
Well, that's not really an argument to keep <p>, but rather an argument to extend the list syntax so you don't need the <p>.
I've always thought this was a bit of a stupid limitation of the list syntax. I would extend it to allow blocks:
* First list item **{ This list item allows several paragraphs.
I think this syntax can be very useful. -- ~~~~ }
Hi,
Le Friday 13 May 2005 18:28, Timwi a écrit :
The only reason one would want to use <p> is to avoid the effect two newlines have of spawning a new paragraph, that is, to be able to write something like:
I've used it in a few cases where a single bullet point needs to have multiple paragraphs. (Yes, this really does happen sometimes.) There's no other way to do this in wikicode.
Well, that's not really an argument to keep <p>, but rather an argument to extend the list syntax so you don't need the <p>.
I've always thought this was a bit of a stupid limitation of the list syntax. I would extend it to allow blocks:
- First list item
**{ This list item allows several paragraphs.
I think this syntax can be very useful. -- ~~~~ }
Yes, this would be very useful. And even more so for #, ##, etc.
BTW, while talking about new syntax, could a new code be added for texts in verse ? Several people requested this. This would avoid a wrong use of <br/> (or worse).
Thanks, Yann
Yann Forget wrote:
BTW, while talking about new syntax, could a new code be added for texts in verse ? Several people requested this. This would avoid a wrong use of <br/> (or worse).
You can already use : for this:
: Integral of t squared dt : From one to the cube-root of three :: Is two-thirds cosine :: Of three pi over nine : Plus log of the cube-root of e.
3 ___ / 3 / | 2 2 3 pi 3 __ | t dt = --- cos(------) + ln(/e ) | 3 9 / 1
Timwi
On 5/11/05, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:
Here are a few syntax features that I propose we deprecate for 1.5, the reasons are mainly to simplify the the wikitext to make parsing simpler.
(Hello, I'm the guy who wrote "Parsers", but now from a subscribed address)
I'd also like to suggest the removal of the workaround (occupying most of the first half of parser.php doQuotes) dealing with confusion of quotes-as-text and quotes-as-markup within words. In other words, the workaround provides that "l'''italic''plain" renders as "l<i>italic</i>plain".
My feelings are that this is a rather complicated workaround for a user-input error (see doQuotes() if you want to know how complicated), and that furthermore, it's bad for parsing. It introduces an ambiguity to the wikitext syntax that can't be resolved without an arbitrary amount of forward context. This makes it hard to express wikitext as a grammar, and almost as hard to parse it quickly. Hopefully its use is rather rare; would it be possible to run a scan to find out more conclusively?
Thank you for your time, Andrew
I'd also like to suggest the removal of the workaround (occupying most of the first half of parser.php doQuotes) dealing with confusion of quotes-as-text and quotes-as-markup within words. In other words, the workaround provides that "l'''italic''plain" renders as "l<i>italic</i>plain".
<snipped>
Hopefully its use is rather rare; would it be possible to run a scan to find out more conclusively?
Fatal error: failed assert( not_much_in_use ); on line 345 :) This is widely used (at least) on w:fr: and is *required* - many words starting with a vowel will yield a leading "l'", sometimes we want to highlight only the word after.
Thank you for your time, Andrew
Nicolas
On Wed, May 11, 2005, Nicolas Weeger wrote:
I'd also like to suggest the removal of the workaround (occupying most of the first half of parser.php doQuotes) dealing with confusion of quotes-as-text and quotes-as-markup within words. In other words, the workaround provides that "l'''italic''plain" renders as "l<i>italic</i>plain".
Fatal error: failed assert( not_much_in_use ); on line 345 :) This is widely used (at least) on w:fr: and is *required* - many words starting with a vowel will yield a leading "l'", sometimes we want to highlight only the word after.
The proper fix for this situation is to use the real apostrophe character (U+2019 right single quotation mark, UTF-8 0xE28099) for apostrophes and continue using ' for wiki markup.
On 5/12/05, Sam Hocevar sam@zoy.org wrote:
On Wed, May 11, 2005, Nicolas Weeger wrote:
I'd also like to suggest the removal of the workaround (occupying most of the first half of parser.php doQuotes) dealing with confusion of quotes-as-text and quotes-as-markup within words. In other words, the workaround provides that "l'''italic''plain" renders as "l<i>italic</i>plain".
Fatal error: failed assert( not_much_in_use ); on line 345 :) This is widely used (at least) on w:fr: and is *required* - many words starting with a vowel will yield a leading "l'", sometimes we want to highlight only the word after.
The proper fix for this situation is to use the real apostrophe character (U+2019 right single quotation mark, UTF-8 0xE28099) for apostrophes and continue using ' for wiki markup.
Unicode RIGHT SINGLE QUOTATION MARK is not an apostrophe, any more than Unicode SWUNG DASH is a tilde. U+0027 APOSTROPHE is the proper apostrophe character.
What *might* help is if we could use the ' HTML entity. IE doesn't support it, but perhaps MediaWiki could translate it after it's done with the '' => <i> transformation.
Brent 'Dax' Royal-Gordon wrote
What *might* help is if we could use the ' HTML entity. IE doesn't support it, but perhaps MediaWiki could translate it after it's done with the '' => <i> transformation.
Are you suggesting that we explain to francophone users that they have to type a HTML entity every time they write a contracted definite article? :-)
On Thu, May 12, 2005, Brent 'Dax' Royal-Gordon wrote:
The proper fix for this situation is to use the real apostrophe character (U+2019 right single quotation mark, UTF-8 0xE28099) for apostrophes and continue using ' for wiki markup.
Unicode RIGHT SINGLE QUOTATION MARK is not an apostrophe, any more than Unicode SWUNG DASH is a tilde. U+0027 APOSTROPHE is the proper apostrophe character.
I wonder what version of the Unicode standard you are referring to. Both the U+0027 and the U+2019 comments mention that (I quote) U+2019 is the preferred character to use for apostrophe.
What *might* help is if we could use the ' HTML entity. IE doesn't support it, but perhaps MediaWiki could translate it after it's done with the '' => <i> transformation.
' is the ASCII apostrophe, not the apostrophe used in latin languages, which is ’.
Regards,
Andrew Rodland wrote:
I'd also like to suggest the removal of the workaround (occupying most of the first half of parser.php doQuotes) dealing with confusion of quotes-as-text and quotes-as-markup within words. In other words, the workaround provides that "l'''italic''plain" renders as "l<i>italic</i>plain".
This is incorrect -- it renders as "l'<i>italic</i>plain", and this is intended. As has already been mentioned, this is used extremely often on French wikis.
I was the one that made the modification to this code, and I think it solves the issue quite nicely.
You said that it was a "user-input error", but you haven't offered a suggestion what the *correct* input should be in order to yield the output "l'<i>homme</i>". Before my modification, the only way to do it was "l'''homme''", which is obviously too cumbersome, too unreadable, and too unintuitive to new editors.
Greetings, Timwi
On Fri, 13 May 2005, Timwi wrote:
Andrew Rodland wrote:
I'd also like to suggest the removal of the workaround (occupying most of the first half of parser.php doQuotes) dealing with confusion of quotes-as-text and quotes-as-markup within words. In other words, the workaround provides that "l'''italic''plain" renders as "l<i>italic</i>plain".
This is incorrect -- it renders as "l'<i>italic</i>plain", and this is intended. As has already been mentioned, this is used extremely often on French wikis.
I'm sorry but I have been out for some days, and I missed this discussion. The "l'''italic'' plain" syntax is *extensively* used in - guess it - the Italian wikipedia, Actually, the most common use is probably "l''''italic''' plain", which renders as "l'<b>italic</b> plain".
Putting in <nowiki> tags, or whatever XML-complaint tag, is NOT an option, even if it's "simpler" in technical terms.
<article> <title>On XML wikipedia syntax</title> <body> <p type="normal">It is <i>really</i> hard to write proper <a href="http://xml-site">XML</a> or HTML, don't you think?</p><br /> <p type="emphasys"> This is exactly why a wikisyntax was developed. </p> <p style="conclusion"> <header level="MediaWiki == equivalent">So...</header> No, we can't force people to learn XML in order to write a wikipedia article. </p> </body> </article>
The current mediawiki syntax is a hack, but it's doing a nice job of making things simpler for the majority of editors, which aren't technically-oriented. While changing it to allow better future developments is good, it's not worth it if we scare away 90% of our users.
Alfio
To continue the discussion Andrew started on the ticks this is the dilema with the current syntax Is that we can't easily guarentee that ''' always starts a bold segment.
* Current syntax: '''italic''plain => '<i>italic</i>plain * Proposed syntax: '''italic''plain => <b>italic<i>plain</i></b> * Previous behaviour with new syntax: <nowiki>'</nowiki>''italic''plain
and:
* Current syntax: ''''foo => '<b>foo</b> * Proposed syntax: ''''foo => ''''foo (four standalone ticks would be undefined)
This would make parsing alot easier.
"Ævar Arnfjörð Bjarmason" avarab@gmail.com wrote in message news:51dd1af8050512052238732cc4@mail.gmail.com...
To continue the discussion Andrew started on the ticks this is the dilema with the current syntax Is that we can't easily guarentee that ''' always starts a bold segment.
- Current syntax: '''italic''plain => '<i>italic</i>plain
- Proposed syntax: '''italic''plain => <b>italic<i>plain</i></b>
- Previous behaviour with new syntax: <nowiki>'</nowiki>''italic''plain
and:
- Current syntax: ''''foo => '<b>foo</b>
- Proposed syntax: ''''foo => ''''foo (four standalone ticks would be
undefined)
This would make parsing alot easier.
I was wondering whether we could define a [[en:Parsing expression grammar]] which could then be made into a [[en:Packrat parser]].
As far as I can tell from those articles, and the references therein, this would help a great deal.
Phil Boswell wrote:
I was wondering whether we could define a [[en:Parsing expression grammar]] which could then be made into a [[en:Packrat parser]].
As far as I can tell from those articles, and the references therein, this would help a great deal.
I looked into this a while ago when I was dabbling at writing a parser for wikitext, and came to the conclusion it was probably not practical for MediaWiki use. The advantage of packrat parsers, of course, is that they allow (the equivalent of) full GLR grammar, not some subset like LR(1), so writing a grammar is not very hard even for a complex one like wikitext.
The disadvantage is that the RAM needed for the parser state grows linearly in the *total input size*---not in the depth of the parse tree as with most---and the constant on that is fairly large as well. For some of our larger pages, back-of-the-envelope calculations using the data in the master's thesis about packrat parsing yield estimates of 3-6MB of RAM needed for the parser state. I imagine Wikipedia sometimes parses enough pages simultaneously to be eating up gigabytes of RAM at these levels.
-Mark
wikitech-l@lists.wikimedia.org