Re:[Wikitech-l] Re: Proposal for deprecation of syntax features for 1.5

List overview All Threads
Download

newer

older

RE: [Wikitech-l] Crazy idea...

my favorite nightmare...

Nicolas Weeger

12 May 2005 12 May '05

12:47 p.m.

...

Current syntax: '''italic''plain => 'italicplain

Proposed syntax: '''italic''plain => italicplain

Previous behaviour with new syntax: <nowiki>'</nowiki>''italic''plain

For the record, I shall say that I fully expect riots on w:fr: if this new behaviour becomes mandatory :)

Some versions, months ago, did require the <nowiki> tag, and it was really afwul to manage. And we'd also need to *fix* all pages with this syntax, and we can't really fix automatically, need to check on a case-by-case basis.

Nicolas Weeger

Accédez au courrier électronique de La Poste : www.laposte.net ; 3615 LAPOSTENET (0,34/mn) ; tél : 08 92 68 13 50 (0,34/mn)

Show replies by date

Andrew Rodland

12 May 12 May

3:58 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

On 5/12/05, Nicolas Weeger nicolas.weeger@laposte.net wrote:

...

...

Current syntax: '''italic''plain => 'italicplain

Proposed syntax: '''italic''plain => italicplain

Previous behaviour with new syntax: <nowiki>'</nowiki>''italic''plain

For the record, I shall say that I fully expect riots on w:fr: if this new behaviour becomes mandatory :)

Some versions, months ago, did require the <nowiki> tag, and it was really afwul to manage. And we'd also need to *fix* all pages with this syntax, and we can't really fix automatically, need to check on a case-by-case basis.

Not quite true; it could be fixed automatically by having the existing parser code apply the rule that it does now, and write a <nowiki> around the ' that it interprets literally. It wouldn't be quite semantically perfect, but articles would keep their appearances (and it could even be applied before/without removing the offending code). I agree that <nowiki> is somewhat unwieldy, but it's a solution to the problem that already exists (and always has) and doesn't require mangling the grammar so that it's imposible to know whether ''' means ''' or not.

A slightly more radical approach that just crossed my mind would be to add a token reminiscent of TeX's "/" which would produce no output, but break up tokens. For the sake of readability in this email, let's imagine it's ";"

* l';''italic''bold => "l'italicbold" * l'';'italic''bold => "l'italicbold"

gives you full control without the 17 characters of <nowiki></nowiki>.

Tels

6:20 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

-----BEGIN PGP SIGNED MESSAGE-----

Moin, On Thursday 12 May 2005 17:58, Andrew Rodland wrote:

...

On 5/12/05, Nicolas Weeger nicolas.weeger@laposte.net wrote:

...
...

Current syntax: '''italic''plain => 'italicplain

Proposed syntax: '''italic''plain => italicplain

Previous behaviour with new syntax:

<nowiki>'</nowiki>''italic''plain

For the record, I shall say that I fully expect riots on w:fr: if this new behaviour becomes mandatory :)

Some versions, months ago, did require the <nowiki> tag, and it was really afwul to manage. And we'd also need to *fix* all pages with this syntax, and we can't really fix automatically, need to check on a case-by-case basis.

Not quite true; it could be fixed automatically by having the existing parser code apply the rule that it does now, and write a <nowiki> around the ' that it interprets literally. It wouldn't be quite semantically perfect, but articles would keep their appearances (and it could even be applied before/without removing the offending code). I agree that <nowiki> is somewhat unwieldy, but it's a solution to the problem that already exists (and always has) and doesn't require mangling the grammar so that it's imposible to know whether ''' means ''' or not.

A slightly more radical approach that just crossed my mind would be to add a token reminiscent of TeX's "/" which would produce no output, but break up tokens. For the sake of readability in this email, let's imagine it's ";"

l';''italic''bold => "l'italicbold"

l'';'italic''bold => "l'italicbold"

"" as in

* l'''italic''bold * l'''italic''bold

? "" escaping is used a lot elsewhere and "" would be the first char that springs to my mind.

Best wishes,

Tels

...

gives you full control without the 17 characters of <nowiki></nowiki>. _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

- -- Signed on Thu May 12 20:19:22 2005 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.

"Spammed if you do, spammed if you don't." - Murphy's Law

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iQEVAwUBQoOeY3cLPEOTuEwVAQHMEAf+LjH+mywiDF2ov/tUfCjP39cH0PURR1zU XPk8b2hdAHcYs0RnXMIEbdPXkJHZaYI0X/SakPEdd9KAAXcvrIX0zqV6kQdcURKc XlY47o8JQt0oPLzgBRb+9Txktwjs1ur9ANqjriqSZFY1SrodiqIPVB4sSuKAxg8T DxAdPO41/pK7MhzLT5/N4KSDUq/lADS2woZM0lxQShhaM2vagZg855XFdS+4yqu+ gpm8iRMQnoA51xLkUFwXBCwS/KNHVya4rv5nNdR/R9V+XGTY787jfz4GkBGizGIb 3VsJHXjDLyqbrkVL6pG1OlbL4SeUI5EG4Dk2XsB5P01oXUK3wyeMxQ== =v1r2 -----END PGP SIGNATURE-----

Nicolas Weeger

6:50 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

...

"" as in

l'''italic''bold

l'''italic''bold

? "" escaping is used a lot elsewhere and "" would be the first char that springs to my mind.

That's too tech-savy. The aim of wikisyntax is to be easy to understand - we can't ask people to put a \ before ' or surround it by <nowiki></nowiki>. I don't mind a slightly more complex syntax for tables, templates & such, "advanced" behaviour. But for your daily usage (and ' is used often at least in french), something quite simple to use is imo waaaaaaay better.

...

Best wishes,

Tels

Regards Nicolas Weeger

Tels

7:29 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

-----BEGIN PGP SIGNED MESSAGE-----

Moin,

On Thursday 12 May 2005 20:50, Nicolas Weeger wrote:

...

...
"" as in

l'''italic''bold

l'''italic''bold

? "" escaping is used a lot elsewhere and "" would be the first char that springs to my mind.

That's too tech-savy. The aim of wikisyntax is to be easy to understand - we can't ask people to put a \ before ' or surround it by <nowiki></nowiki>. I don't mind a slightly more complex syntax for tables, templates & such, "advanced" behaviour. But for your daily usage (and ' is used often at least in french), something quite simple to use is imo waaaaaaay better.

Yes, I agree on the "simple" solution. However, the current situation isn't "simple" for things like l'''italic'' because it is unclear what exactly that should mean. Maybe it means

l'italic

or it means

l'italic

How is the computer (and the human writing it) to know which is which? You need at least one more bit information to distinguish between these two variants.

Best wishes,

Tels

- -- Signed on Thu May 12 21:26:43 2005 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.

This email violates U.S. patent #4,197,590:

for (x = 0; x < widht; x++) { for (y = 0; y < height; y++) { setPixel (x+xm, y+ym, getPixel(x+xm,y+ym) ^ getCursorPixel(x,y); } }

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iQEVAwUBQoOuqHcLPEOTuEwVAQGlTQf+OuBRC1X8b1kFVVtdCnNe2aFU+/EdsCfh VfEXdJmcUefVitB2NI+p+3rVaxcP8nLdGWcM5n+NSvFNQVykidB1ifZQqyOda7SE 9pR9EbzuhSV1HRJOOqc6P8pj01bIVtkn9y+aNhIci3sezscPsNYNFh54JX3PFKBo uP8rkeiGWBOLberV4qaG8WzQf17RpuruabLHI1EAz7z8oh2WUm+74B7kqQhRfK7c jtpHqCtYHoQ3XuyyvkT5CMMyMAl+6uwYE6B+FxiTz0/61y/DzyQl8tyMgthRH9Kb sAdc4nhvS448PEwxS3JbWUJ2cuemyYReUo2w7UvJRCmPIV3mIL0U9g== =8RoC -----END PGP SIGNATURE-----

Andrew Rodland

8:32 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

On 5/12/05, Tels nospam-abuse@bloodgate.com wrote:

...

Moin,

On Thursday 12 May 2005 20:50, Nicolas Weeger wrote:

...
That's too tech-savy. The aim of wikisyntax is to be easy to understand - we can't ask people to put a \ before ' or surround it by <nowiki></nowiki>. I don't mind a slightly more complex syntax for tables, templates & such, "advanced" behaviour. But for your daily usage (and ' is used often at least in french), something quite simple to use is imo waaaaaaay better.

Yes, I agree on the "simple" solution. However, the current situation isn't "simple" for things like l'''italic'' because it is unclear what exactly that should mean. Maybe it means

l'italic

or it means

l'italic

How is the computer (and the human writing it) to know which is which? You need at least one more bit information to distinguish between these two variants.

Exactly. Now add to that the complication that ''' supposedly means something completely different from _both_ of those when it's not in the middle of a word, and that (as with much of the syntax) there has never been any well-defined rule governing this behavior, only a last-resort hack in PHP code. It's not simple conceptually, and it's far from simple for the computer, and it's bad for parsing. I hate to be argumentative, but I don't believe that the issue is as simple (heh) as you make it out to be, Nicolas.

Nicolas Weeger

8:35 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

...

Exactly. Now add to that the complication that ''' supposedly means something completely different from _both_ of those when it's not in the middle of a word, and that (as with much of the syntax) there has never been any well-defined rule governing this behavior, only a last-resort hack in PHP code. It's not simple conceptually, and it's far from simple for the computer, and it's bad for parsing. I hate to be argumentative, but I don't believe that the issue is as simple (heh) as you make it out to be, Nicolas.

I don't believe either it's simple, conceptually or computer-ally (whao, a neologism).

Merely pointing out that: 1) there'd be many broken page if the syntax was changed, and some huge work to fix everything 2) people (on w:fr:, can't tell other languages) would be simply really bothered by the change, and would simply revolt to go hang the people changing everything :)

Nicolas Weeger

Timwi

14 May 14 May

1:41 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Andrew Rodland wrote:

...

Exactly. Now add to that the complication that ''' supposedly means something completely different from _both_ of those when it's not in the middle of a word, and that (as with much of the syntax) there has never been any well-defined rule governing this behavior,

Just because you don't understand the rule doesn't mean it's not well-defined. It's perfectly well-defined: If there is another ''' in the same line, it means bold. If there isn't, but there's a '', it means apostrophe-plus-italics. If neither, it means three apostrophes. Normal editors don't have to know about or understand this rule in detail as long as the behaviour is what they expect, which apparently on the French Wikipedia it is.

...

only a last-resort hack in PHP code. It's not simple conceptually,

I think you're expecting a simple solution to a complex problem.

...

and it's far from simple for the computer, and it's bad for parsing.

It's perfectly easy for the computer, just as long as it's programmed right. I have demonstrated this both in the current MediaWiki parser and in flexbisonparse by replicating the same behaviour without trouble.

Timwi

1:31 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Tels wrote:

...

Yes, I agree on the "simple" solution. However, the current situation isn't "simple" for things like l'''italic'' because it is unclear what exactly that should mean. Maybe it means

l'italic

or it means

l'italic

How is the computer (and the human writing it) to know which is which? You need at least one more bit information to distinguish between these two variants.

The computer knows which to output because it's been programmed a certain way.

The human can reasonably expect to get the behaviour that would clearly be more useful. We keep talking about French, but actually a great deal of languages is affected in the same way. All you need is an apostrophised contraction before a word that can potentially be italicised or bolded.

Therefore, l'''italic'' should always output l'italic, and the "one more bit of information" you talked about should only be added in the unusual case. The only language I have come across where you will commonly want the apostrophe to be italicised or bolded is Klingon, though I could imagine that Hawaiian may also be affected, as a word can begin with an apostrophe in both of these languages.

Greetings, Timwi

Karl Eichwalder

13 May 13 May

4:41 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Tels nospam-abuse@bloodgate.com writes:

...

? "" escaping is used a lot elsewhere and "" would be the first char that springs to my mind.

Only on hacker keyboards (US flavor) the backslash is easy to enter...

-- http://www.gnu.franken.de/ke/ | ,__o | _-_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C

Timwi

14 May 14 May

1:44 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Karl Eichwalder wrote:

...

Tels nospam-abuse@bloodgate.com writes:

...
? "" escaping is used a lot elsewhere and "" would be the first char that springs to my mind.

Only on hacker keyboards (US flavor) the backslash is easy to enter...

That is a moot argument, because there is no single character that is "easy to type" on every official national keyboard layout. Indeed, we already have a lot of characters ([[]], {{}} and ~~~~) that are a nightmare on German keyboards.

Karl Eichwalder

4:51 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Timwi timwi@gmx.net writes:

...

That is a moot argument, because there is no single character that is "easy to type" on every official national keyboard layout. Indeed, we already have a lot of characters ([[]], {{}} and ~~~~) that are a nightmare on German keyboards.

This does not mean it does not hurt to add even more strange characters. On the contrary, think again about a sane markup language like XML.

-- http://www.gnu.franken.de/ke/ | ,__o | _-_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C

Ævar Arnfjörð Bjarmason

13 May 13 May

9:23 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

...

And we'd also need to *fix* all pages with this syntax, and we can't really fix automatically, need to check on a case-by-case basis.

Why not? The current parser manages just fine with it, you could write a bot that changes everything with the same rule.

The proposed change is just that ''' always opens a bold tag and '' always opens an italic tag, not having to check *the entire wikitext* for context (as is done currently) would make things a lot easier, not just for machines but for people to, the only way to tell currently if ''' is really opening a whole lot of bold text or just opening some italic text is to find the matching '' or ''', that's not intuitive at all.

Alfio Puglisi

7:51 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

On Fri, 13 May 2005, [ISO-8859-1] Ævar Arnfjörð Bjarmason wrote:

...

...
And we'd also need to *fix* all pages with this syntax, and we can't really fix automatically, need to check on a case-by-case basis.

Why not? The current parser manages just fine with it, you could write a bot that changes everything with the same rule.

The proposed change is just that ''' always opens a bold tag and '' always opens an italic tag, not having to check *the entire wikitext* for context (as is done currently)

I'm pretty sure that only the current line of text is checked, not the entire article.

Alfio

Timwi

14 May 14 May

1:51 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Ævar Arnfjörð Bjarmason wrote:

...

The proposed change is just that ''' always opens a bold tag and '' always opens an italic tag, not having to check *the entire wikitext* for context (as is done currently)

This is not true, neither in MediaWiki's current parser, nor flexbisonparse.

...

would make things a lot easier, not just for machines but for people to, the only way to tell currently if ''' is really opening a whole lot of bold text or just opening some italic text is to find the matching '' or ''', that's not intuitive at all.

I can't follow this argumentation at all. The current behaviour is extremely easy for our human editors because it does *what they expect*. The construction "l'''homme''" should *not* open a bold because this is *not what they expect*. It is *way* easier for them than having to type absolutely *anything* extra (even if it's just a "").

7004

Age (days ago)

7006

Last active (days ago)

wikitech-l@lists.wikimedia.org

14 comments

7 participants

tags (0)

participants (7)

Alfio Puglisi
Andrew Rodland
Karl Eichwalder
Nicolas Weeger
Tels
Timwi
Ævar Arnfjörð Bjarmason