Re: Proposal for deprecation of syntax features for 1.5

List overview All Threads
Download

newer

older

Meta-templates considered harmful?

Semantic Model: first pass

Andrew Rodland

11 May 2005 11 May '05

4:49 p.m.

(forwarded to list. Whoops.) On 5/11/05, Nicolas Weeger nicolas.weeger@laposte.net wrote:

...

...
I'd also like to suggest the removal of the workaround (occupying most of the first half of parser.php doQuotes) dealing with confusion of quotes-as-text and quotes-as-markup within words. In other words, the workaround provides that "l'''italic''plain" renders as "litalicplain".

<snipped> > Hopefully its use is > rather rare; would it be possible to run a scan to find out more > conclusively?

Fatal error: failed assert( not_much_in_use ); on line 345 :) This is widely used (at least) on w:fr: and is *required* - many words starting with a vowel will yield a leading "l'", sometimes we want to highlight only the word after.

...
Thank you for your time, Andrew

Nicolas

I had guessed that it might find some use in fr -- It's too bad to hear that it's "widely" used. However, I should note that it's not _required_. <nowiki> resolves the ambiguity nicely. The fact that it takes 17 characters is, well, unfortunate, but it's _correct_. The workaround, on the other hand, does bad things to the language, and makes the implementation of a more advanced parser exceedingly difficult.

Andrew

Show replies by date

Delirium

11 May 11 May

4:53 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

Andrew Rodland wrote:

...

I had guessed that it might find some use in fr -- It's too bad to hear that it's "widely" used. However, I should note that it's not _required_. <nowiki> resolves the ambiguity nicely. The fact that it takes 17 characters is, well, unfortunate, but it's _correct_. The workaround, on the other hand, does bad things to the language, and makes the implementation of a more advanced parser exceedingly difficult.

I'm not sure if it would be a more or less drastic change, but making bold be four tics instead of three would also solve the problem. Then without lookahead: 2 tics is italics; 3 ticks is an apostrophe plus italics; 4 tics is bold; 5 tics is an apostrophe plus bold; 6 tics is bold and italic; and 7 tics is an apostrophe plus bold and italic.

-Mark

Brent 'Dax' Royal-Gordon

5:25 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

Delirium delirium@hackish.org wrote:

...

I'm not sure if it would be a more or less drastic change, but making bold be four tics instead of three would also solve the problem. Then without lookahead: 2 tics is italics; 3 ticks is an apostrophe plus italics; 4 tics is bold; 5 tics is an apostrophe plus bold; 6 tics is bold and italic; and 7 tics is an apostrophe plus bold and italic.

If we're going to do that, we might as well face up to the fact that '' and ''' aren't particularly logical choices in the first place. (I'm not going to argue that we should change it, just that if we're going to change it we should make it more intuitive and convenient, not less.)

-- Brent 'Dax' Royal-Gordon brent@brentdax.com Perl and Parrot hacker

Karl Eichwalder

12 May 12 May

2:16 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Delirium delirium@hackish.org writes:

...

I'm not sure if it would be a more or less drastic change, but making bold be four tics instead of three would also solve the problem. Then without lookahead: 2 tics is italics; 3 ticks is an apostrophe plus italics; 4 tics is bold; 5 tics is an apostrophe plus bold; 6 tics is bold and italic; and 7 tics is an apostrophe plus bold and italic.

Sure, or simply use proper XML markup... This would have the nice side effect that it can span more than a single line.

Yes, I know, XML was not invented here and thus it isn't an option.

-- http://www.gnu.franken.de/ke/ | ,__o | _-_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C

Brent 'Dax' Royal-Gordon

2:35 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Karl Eichwalder ke@gnu.franken.de wrote:

...

Sure, or simply use proper XML markup... This would have the nice side effect that it can span more than a single line.

Yes, I know, XML was not invented here and thus it isn't an option.

If you want to know why it was really done this way instead of muttering inflammitory remarks, see http://www.c2.com/cgi/wiki?WhyDoesntWikiDoHtml.

-- Brent 'Dax' Royal-Gordon brent@brentdax.com Perl and Parrot hacker

Karl Eichwalder

3:33 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Brent 'Dax' Royal-Gordon brentdax@gmail.com writes:

...

Karl Eichwalder ke@gnu.franken.de wrote:

...
Sure, or simply use proper XML markup... This would have the nice side effect that it can span more than a single line.

Yes, I know, XML was not invented here and thus it isn't an option.

If you want to know why it was really done this way instead of muttering inflammitory remarks, see http://www.c2.com/cgi/wiki?WhyDoesntWikiDoHtml.

I do not ask for HTML, I vote for XML. Our all-time-morphing wiki-syntax is not user-friendly. It keeps developers busy - how boring if we would go for XML as an authoring language! No need to write yet another parser...

The wiki-syntax is short, but not easy. Sometimes whitespace is critical, but you will never know for sure. It is a very limited presentation or layout language.

-- http://www.gnu.franken.de/ke/ | ,__o | _-_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C

Tels

3:43 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

-----BEGIN PGP SIGNED MESSAGE-----

Moin,

On Thursday 12 May 2005 09:33, Karl Eichwalder wrote:

...

Brent 'Dax' Royal-Gordon brentdax@gmail.com writes:

...
Karl Eichwalder ke@gnu.franken.de wrote:

...
Sure, or simply use proper XML markup... This would have the nice side effect that it can span more than a single line.

Yes, I know, XML was not invented here and thus it isn't an option.

If you want to know why it was really done this way instead of muttering inflammitory remarks, see http://www.c2.com/cgi/wiki?WhyDoesntWikiDoHtml.

I do not ask for HTML, I vote for XML.

- From a human point of view, XML is even worse than HTML - way to much formatting gets into the way of writing the actual content. Wiki's syntax is exactly why I love them over editing HTML/XML pages manually.

Yes, you could write the XML with some sort of WYSIWYG editor - but that is beside the point.

(If you think about it, there is a reason that we write email in plain text and not HTML, XML or anything else :)

...

Our all-time-morphing wiki-syntax is not user-friendly. It keeps developers busy - how boring if we would go for XML as an authoring language! No need to write yet another parser...

I do think you are slightly wrong. While parsing XML is easy, making sense of what it actually means is not so easy. And writing XML is much harder than writing plain text (which wiki syntax basically is).

...

The wiki-syntax is short, but not easy. Sometimes whitespace is critical, but you will never know for sure. It is a very limited presentation or layout language.

It certainly has some drawbacks. But throwing it out for XML is a solution in search of a problem :)

Best wishes,

Tels

- -- Signed on Thu May 12 09:39:12 2005 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.

"My glasses, my glasses. I cannot see without my glasses." - "My glasses, my glasses. I cannot be seen without my glasses."

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iQEVAwUBQoMJFncLPEOTuEwVAQF6jAf+Ndhii6xYTapQaRYFYqdQGihvBR+WJCiY jYhl7aLpkRCKTUbjWP+J7nPhgkVjlCj4BLzIo8iy0zwTULwKFmIKZmqZheBDoyKM Xd1Ss4tFl8RxOzgFoF31sEPhFuPpsV8+JcVPVyMtNfJwPB/AShlk7fDK9ETPhK4D 8xvYD1K54P9dxM/9cCU0IhxsSZu7KJyW0Ii4wGMHo/ob50rCw3FVe2z0V3mKTYQt LsmOkpqBaxJBQ9qJr5D8BwabLMXDfXF7sWN9ISGYMrcqFaE7s/RN3JcmQH+UEgze WhlBr4csax9sp5c3R5nInMHBIa+VvlWYmX2gGZwwc/q+dr1pDykkEw== =6eVh -----END PGP SIGNATURE-----

Karl Eichwalder

13 May 13 May

12:56 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Tels nospam-abuse@bloodgate.com writes:

...

And writing XML is much harder than writing plain text (which wiki syntax basically is).

And the same is valid for XML.

...

It certainly has some drawbacks. But throwing it out for XML is a solution in search of a problem :)

I think different. XML is simple, self-explaining. wiki-syntax is cryptic and by no mean ''plain text''. Many a lot ASCII characters are special, they are overloaded with markup resp. layout side effects. We are wasting much time with getting these side effects right.

-- http://www.gnu.franken.de/ke/ | ,__o | _-_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C

Daniel Wunsch

17 May 17 May

9:24 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

On Friday 13 May 2005 06:56, Karl Eichwalder wrote:

...

I think different. XML is simple, self-explaining. wiki-syntax is cryptic and by no mean ''plain text''.

why do you think comma people invented punktuation question mark because a dot is easier to distinguish from the doublequote plain text doublequote that you seem to prefer semicolon closing parentheses linefeed linefeed daniel

Lee Daniel Crocker

12 May 12 May

8:20 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

On Thu, 2005-05-12 at 09:33 +0200, Karl Eichwalder wrote:

...

I do not ask for HTML, I vote for XML. Our all-time-morphing wiki-syntax is not user-friendly.

Yes, The current wiki syntax is dreadful, and getting worse with every release. But using XML doesn't solve the problem, it just puts up a big barrier to entry for newcomers.

I know I've been claiming to be working on it for some time now, but I really am...and I'm very close to releasing something for comment, so keep an eye out.

FWIW, in the current thread, I'm all for deprecating unneeded syntaxes now, even before we reform the whole package.

-- Lee Daniel Crocker lee@piclab.com http://creativecommons.org/licenses/publicdomain/

Ævar Arnfjörð Bjarmason

9:07 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

...

Yes, The current wiki syntax is dreadful, and getting worse with every release. But using XML doesn't solve the problem, it just puts up a big barrier to entry for newcomers.

Fully agreed, and here are some of the biggest problems with it:

1. It's not thought out

The whole syntax is something which just grew over time, it's not clean, it's not pretty and it's not consistent, and neither is it easy to parse since there's often no definition for how things are supposed to work aside from the parser code itself and conventions of use.

2. Too much reliance on too few meta characters

* [] is used for one thing (good) * [[]] is used for several different *completely different* things, ** [[Normal links]] ** [[Image:Magical includes of images.png]] ** [[th:is]] is an interwiki link, [[thi:s]] is not, how about [[:th:is]]? ** [[January 1]], [[1970]] <- things that get magical transformations. ** Go to a [[Wikipedia:Page]] and add a [[Category:Tag]] to it, it might not do what you expect at first. * {{this}} is used for one thing, and easy to distinguish from {{{this}}}.

I've sometimes pondered how to solve this, and I've often come to the conclution that adding another type of double bracket (««»») for magic stuff would solve a lot

««Image:Foo.png»» would work like [[Image:Foo.png]] does currently and [[Image:Foo.png]] would work like [[:Image:Foo.png]] does now (no need to escape the magicness since [[]] would never be used for anything besides ordinary links), ««Category:Foo»» and [[Category:Foo]] would be the same, so would interwiki links.

Then I've often thought that that might suck just as much as well, because at least in the case of categories and interwiki links and other metadata it would be best to just add them to some yet-to-be-made metadata page, this would make it easier to maintain these links (interwiki links could even be maintained automatically across projects)

I'm sure others have alot of nice ideas on how to change the syntax to be more consistent and/or easier to parse, which of course brings us to the next problem:

3. We can't (easily) change it

It sucks but we really can't, in would instantly render gigabytes of current wikitext incompatable with the parser, there are basically two solutions to this problem: continue supporting two syntaxes for some time which puts a big limit on what kind of changes can be made or automatically convert them with some kind of bot bot.

Lee Daniel Crocker

6:32 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

On Thu, 2005-05-12 at 13:07 +0000, Ævar Arnfjörð Bjarmason wrote:

...

I'm sure others have alot of nice ideas on how to change the syntax to be more consistent and/or easier to parse, which of course brings us to the next problem:

We can't (easily) change it

It sucks but we really can't, in would instantly render gigabytes of current wikitext incompatable with the parser, there are basically two solutions to this problem: continue supporting two syntaxes for some time which puts a big limit on what kind of changes can be made or automatically convert them with some kind of bot bot.

I don't realy see that as a big problem at all. If we change the syntax, yes, we'll have to lock the databases, making everything read-only while an update process goes through all those gigabytes. But we really can do that overnight, only inconveniencing users a little.

A slightly bigger problem is bots that expect the old syntax. We'll have to give plenty of warning and make sure they're turned off or updated prior to the switch.

-- Lee Daniel Crocker <lee at piclab.com> http://www.piclab.com/lee/ http://creativecommons.org/licenses/publicdomain/

MaPhi Werner

13 May 13 May

1:38 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Lee Daniel Crocker wrote:

...

I don't realy see that as a big problem at all. If we change the syntax, yes, we'll have to lock the databases, making everything read-only while an update process goes through all those gigabytes. But we really can do that overnight, only inconveniencing users a little.

Overnight? There is no "overnight" on a system that's accessed world-wide, 24x7.

In my opinion, if the wiki syntax is changed dramatically, then the db table should receive a new column called "syntax version". This would allow the parser to distinguish between old and new syntax, rendering both correctly.

MaPhi

Patrick Collison

4:51 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

There's a partial overnight. See http://wikimedia.org/stats/live/ org.wikimedia.all.squid.requests-hits-day.png.

If there is a decision to change the syntax in a major way, locking the database for 12 hours is hardly a huge price to pay. How could syntax versions work? Can you mix syntax versions in the same article? If so, does the parser know what regions use what syntax? If not, can only new articles use the new syntax? I don't think a hack is the solution to a hack-ish syntax.

Le 13 mai 05 à 06:38, MaPhi Werner a écrit :

...

Lee Daniel Crocker wrote:

...
I don't realy see that as a big problem at all. If we change the syntax, yes, we'll have to lock the databases, making everything read-only while an update process goes through all those gigabytes. But we really can do that overnight, only inconveniencing users a little.

Overnight? There is no "overnight" on a system that's accessed world-wide, 24x7.

In my opinion, if the wiki syntax is changed dramatically, then the db table should receive a new column called "syntax version". This would allow the parser to distinguish between old and new syntax, rendering both correctly.

MaPhi _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Gerard Meijssen

5:22 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Patrick Collison wrote:

...

There's a partial overnight. See http://wikimedia.org/stats/live/ org.wikimedia.all.squid.requests-hits-day.png.

If there is a decision to change the syntax in a major way, locking the database for 12 hours is hardly a huge price to pay. How could syntax versions work? Can you mix syntax versions in the same article? If so, does the parser know what regions use what syntax? If not, can only new articles use the new syntax? I don't think a hack is the solution to a hack-ish syntax.

Le 13 mai 05 à 06:38, MaPhi Werner a écrit :

...
Lee Daniel Crocker wrote:

...
I don't realy see that as a big problem at all. If we change the syntax, yes, we'll have to lock the databases, making everything read-only while an update process goes through all those gigabytes. But we really can do that overnight, only inconveniencing users a little.

Overnight? There is no "overnight" on a system that's accessed world-wide, 24x7.

In my opinion, if the wiki syntax is changed dramatically, then the db table should receive a new column called "syntax version". This would allow the parser to distinguish between old and new syntax, rendering both correctly.

MaPhi

Hoi, You are right that we could do all kinds of everything. However, I have the distinct impression that some people could lynch you when you implement what is under discussion. What has been proposed is techno and English oriented. It would make it next to impossible to edit French for instance. The Mediawiki is there to ENABLE people to add content. When schemes like this make this impossible, it is not an acceptable scheme however good is may seem from a technical point of view.

This is one of those few instances where developpers should not have the final say about the future of the Mediawiki software.

If you want to enhance the syntax, the first thing that comes to my mind is working on a syntax that is easy to use in languages like Farsi because I found it next to impossible to edit Farsi with the Mediawiki syntax. I would not be suprised if this is one of the major reasons why projects in other scripts have such a hard time.

Thanks. GerardM

Timwi

12:59 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

Patrick Collison wrote:

...

There's a partial overnight. See http://wikimedia.org/stats/live/ org.wikimedia.all.squid.requests-hits-day.png.

If there is a decision to change the syntax in a major way, locking the database for 12 hours is hardly a huge price to pay. How could syntax versions work? Can you mix syntax versions in the same article? If so, does the parser know what regions use what syntax? If not, can only new articles use the new syntax? I don't think a hack is the solution to a hack-ish syntax.

I think the idea was that old articles would be flagged as "old syntax", and then a conversion bot can be run *while the site remains up*. As soon as the bot has converted an article, it will be flagged as "new syntax" and can henceforth be edited in the new syntax.

I don't agree to the proposal, though.

Patrick Collison

14 May 14 May

7:30 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Apologies, I certainly got the wrong end of the stick there.

Le 13 mai 05 à 17:59, Timwi a écrit :

...

Patrick Collison wrote:

...
There's a partial overnight. See http://wikimedia.org/stats/live/ org.wikimedia.all.squid.requests-hits-day.png. If there is a decision to change the syntax in a major way, locking the database for 12 hours is hardly a huge price to pay. How could syntax versions work? Can you mix syntax versions in the same article? If so, does the parser know what regions use what syntax? If not, can only new articles use the new syntax? I don't think a hack is the solution to a hack-ish syntax.

I think the idea was that old articles would be flagged as "old syntax", and then a conversion bot can be run *while the site remains up*. As soon as the bot has converted an article, it will be flagged as "new syntax" and can henceforth be edited in the new syntax.

I don't agree to the proposal, though.

Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Lee Daniel Crocker

13 May 13 May

7:35 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

On Fri, 2005-05-13 at 07:38 +0200, MaPhi Werner wrote:

...

Overnight? There is no "overnight" on a system that's accessed world-wide, 24x7.

In my opinion, if the wiki syntax is changed dramatically, then the db table should receive a new column called "syntax version". This would allow the parser to distinguish between old and new syntax, rendering both correctly.

That's another way to do the conversion: leave both parsers in for the time it takes to do the update. By "overnight" I mean "10-20 hours, or however long it takes". I don't think making the English Wikipedia read-only for that long is unacceptable at all (and all the others are small enough that it's not an issue).

-- Lee Daniel Crocker <lee at piclab.com> http://www.piclab.com/lee/ http://creativecommons.org/licenses/publicdomain/

Karl Eichwalder

12:39 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Lee Daniel Crocker lee@piclab.com writes:

...

On Thu, 2005-05-12 at 09:33 +0200, Karl Eichwalder wrote:

...
I do not ask for HTML, I vote for XML. Our all-time-morphing wiki-syntax is not user-friendly.

Yes, The current wiki syntax is dreadful, and getting worse with every release. But using XML doesn't solve the problem, it just puts up a big barrier to entry for newcomers.

I think different. Those who don't know or don't like XML can write simple paragraphs without any markup at all. After editing some article the markup is self-explaining:

I never edit table boxes, too complicate, and it isn't worth learning a wiki-syntax; I cannot imagine something more boring than learning random markup codes. But that's just me ;)

...

FWIW, in the current thread, I'm all for deprecating unneeded syntaxes now, even before we reform the whole package.

If well planned, exchange the syntax could happen at edit/save time - simply store converted articles in a new table, etc.

-- http://www.gnu.franken.de/ke/ | ,__o | _-_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C

Timwi

1:03 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

Karl Eichwalder wrote:

...

Lee Daniel Crocker lee@piclab.com writes:

...
On Thu, 2005-05-12 at 09:33 +0200, Karl Eichwalder wrote:

...
I do not ask for HTML, I vote for XML. Our all-time-morphing wiki-syntax is not user-friendly.

Yes, The current wiki syntax is dreadful, and getting worse with every release. But using XML doesn't solve the problem, it just puts up a big barrier to entry for newcomers.

I think different. Those who don't know or don't like XML can write

simple paragraphs without any markup at all. After editing some article the markup is self-explaining:
<table>
 <row>
 <entry>...</entry>
 </row>
 <row>
 <entry>...</entry>
 </row>
</table>
I never edit table boxes, too complicate,

Haha! LOL! That was a really good one. Just so you know, the current table syntax is *waaaay* easier for non-techies than the XML-based syntax you have mentioned above. When you say you find it easier, what you really mean is just that you're used to it, or that it is similar to something you're used to (namely, HTML), or that you came up with it (and therefore naturally understand it). Unfortunately that doesn't speak for non-technically-minded people!

Timwi

12:49 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

Karl Eichwalder wrote:

...

I do not ask for HTML, I vote for XML.

The same reasons against HTML apply to XML.

...

Our all-time-morphing wiki-syntax is not user-friendly.

Certainly more than XML.

...

The wiki-syntax is short, but not easy. Sometimes whitespace is critical, but you will never know for sure.

Well, I don't know about other people, but I always know for sure, even before I wrote flexbisonparse.

...

It is a very limited presentation or layout language.

It's supposed to be.

Timwi

12:46 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

Delirium wrote:

...

Andrew Rodland wrote:

...
I had guessed that it might find some use in fr -- It's too bad to hear that it's "widely" used. However, I should note that it's not _required_. <nowiki> resolves the ambiguity nicely. The fact that it takes 17 characters is, well, unfortunate, but it's _correct_. The workaround, on the other hand, does bad things to the language, and makes the implementation of a more advanced parser exceedingly difficult.

I'm not sure if it would be a more or less drastic change, but making bold be four tics instead of three would also solve the problem.

That would solve the problem only if we can somehow automatically convert all existing articles that use triple-apostrophes to mean "bold". That's not worth it.

My suggestion would instead be to use ""..."" to mean "bold". This would not interfere with ''...'' at all. We could have both '''...''' and ""..."" for the transition period, and once all use of '''...''' has been removed (which would be a long long time), the syntax can be deprecated. This, too, I don't think would be worth it, but on the other hand, I just ''know'' that someone is going to write a script to do the conversion automatically.

...

Then without lookahead: 2 tics is italics; 3 ticks is an apostrophe plus italics; 4 tics is bold; 5 tics is an apostrophe plus bold; 6 tics is bold and italic; and 7 tics is an apostrophe plus bold and italic.

How do you know that 5 tics isn't a single apostrophe wrapped in italics? :)

Timwi

12:42 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

Andrew Rodland wrote:

...

I had guessed that it might find some use in fr -- It's too bad to hear that it's "widely" used. However, I should note that it's not _required_.

Of *course* it's required. By saying it isn't, you're thinking too technically. Humans aren't like that, humans just want to write their text and not ugly tags and syntax elements just for a single apostrophe.

...

<nowiki> resolves the ambiguity nicely.

Again -- "nicely" only in the technical sense, but not in the human usability sense.

...

The workaround, on the other hand, does bad things to the language, and makes the implementation of a more advanced parser exceedingly difficult.

You are making two assumptions here that are both false.

Firstly, you are assuming that the language becomes more ambiguous this way. This is false, because by handling this case explicitly, I have actually made it *less* ambiguous. Previously, it was only a side-effect of the way regular expressions match text that three apostrophes were rendered as followed by an apostrophe. Now I have specifically written code to define three apostrophes to mean "an apostrophe followed by open-italics, unless there is another triple-apostrophe in the line, in which case it's open-bold". No ambiguity there.

The second assumption you are making (explicitly, even) is that it is more difficult to implement, when in fact you really just mean that you found it harder because it is not the way regular expressions normally work (and because you find the behaviour confusing because you don't normally think of French). I didn't find this particularly difficult to do -- neither in the current parser, nor in flexbisonparse.

Greetings, Timwi

Andrew Rodland

5:40 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

On 5/13/05, Timwi timwi@gmx.net wrote:

...

Andrew Rodland wrote:

...
I had guessed that it might find some use in fr -- It's too bad to hear that it's "widely" used. However, I should note that it's not _required_.

Of *course* it's required. By saying it isn't, you're thinking too technically. Humans aren't like that, humans just want to write their text and not ugly tags and syntax elements just for a single apostrophe.

I didn't know that it was too technical of me to think that "required" should mean "required".

...

...
<nowiki> resolves the ambiguity nicely.

Again -- "nicely" only in the technical sense, but not in the human usability sense.

It's nice in the human-usability sense that you can say exactly what you mean, instead of having to guess how it's going to be interpreted (speaking of which, can you show me any document, preferably in English, which explains this behavior?). I agree that <nowiki> is rather unwieldy, but that in itself doesn't make the existing solution a good one.

...

...
The workaround, on the other hand, does bad things to the language, and makes the implementation of a more advanced parser exceedingly difficult.

You are making two assumptions here that are both false.

Firstly, you are assuming that the language becomes more ambiguous this way. This is false, because by handling this case explicitly, I have actually made it *less* ambiguous. Previously, it was only a side-effect of the way regular expressions match text that three apostrophes were rendered as followed by an apostrophe. Now I have specifically written code to define three apostrophes to mean "an apostrophe followed by open-italics, unless there is another triple-apostrophe in the line, in which case it's open-bold". No ambiguity there.

How does "a side-effect of the way regular expressions match text" turn the markup for bold into an apostrophe and the markup for italic?

...

The second assumption you are making (explicitly, even) is that it is more difficult to implement, when in fact you really just mean that you found it harder because it is not the way regular expressions normally work (and because you find the behaviour confusing because you don't normally think of French). I didn't find this particularly difficult to do -- neither in the current parser, nor in flexbisonparse.

If you had read my messages, you might have noticed that my reasoning was based neither on anything to do with regexes at all, nor on linguistic prejudice, but on a simple consideration. It is impossible, at the time that the parser sees a ''', to resolve what type of token it is, without looking ahead to the end of the line (an unbounded and unknown distance away). _That's_ what I called ambiguity. The alternative is that '' means '', and ''' means '''. My current feeling is that the "cleanest" solution to the problem would be to introduce a separator which produces no output, but breaks up tokens; then you could write (with ∙ as sequence operator) ' ∙ '', '' ∙ ', ' ∙ ''', ''' ∙ ', and even '' ∙ '' all you want, with no ambiguity to the parser and no considerable hassle to the user. "Otherwise how is the computer supposed to know what you mean?" is an argument anyone can understand. The existing code in doQuotes() simply operates by logically _separating_ the consecutive quotes, so automatic conversion wouldn't be overly taxing, nor time-critical. I haven't seen flexbisonparse, but the reason it's "easy" in the current parser is, as I'm sure you know, that it makes N passes over the entire string, with the benefit of unlimited lookahead. You're right that it _can_ be done -- I think I've got it down. But it's still not pretty. And it's still, I think, a violation of expectations. Nonetheless, I'll shut up about it.

Andrew

Timwi

11:11 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

Andrew Rodland wrote:

...

It's nice in the human-usability sense that you can say exactly what you mean,

You can. You *can* use <nowiki> or ' or any other kludge you like. The point is that you shouldn't have to do this when what you're trying to do is something that occurs frequently in half of all articles!

...

How does "a side-effect of the way regular expressions match text" turn the markup for bold into an apostrophe and the markup for italic?

If you had read my message, you might have noticed that I've been trying to tell you that ''' needn't always be "bold" because it's frequently not what the user meant (especially when there's no matching close-bold).

...

...
The second assumption you are making (explicitly, even) is that it is more difficult to implement, when in fact you really just mean that you found it harder because it is not the way regular expressions normally work (and because you find the behaviour confusing because you don't normally think of French). I didn't find this particularly difficult to do -- neither in the current parser, nor in flexbisonparse.

If you had read my messages, you might have noticed that my reasoning was based neither on anything to do with regexes at all, nor on linguistic prejudice, but on a simple consideration. It is impossible, at the time that the parser sees a ''', to resolve what type of token it is, without looking ahead to the end of the line (an unbounded and unknown distance away).

Right, so *this* is what you're on about. I'm afraid there's another false assumption you're making, namely inefficiency or otherwise inherent evil of what you call "look-ahead". You're forgetting that the same applies to [[ and {{ and {| and || and a whole host of other things. *This is not a problem.* Neither for the current parser (because it passes the entire text several times anyway) nor for a proper parser like flexbisonparse (simply because of the way LALR parsers work). If you want me to elaborate on the latter, please feel free to ask and I'll explain.

...

The alternative is that '' means '', and ''' means '''.

And indeed, they do.

...

The existing code in doQuotes() simply operates by logically _separating_ the consecutive quotes, so automatic conversion wouldn't be overly taxing, nor time-critical.

... nor required.

...

And it's still, I think, a violation of expectations.

What you're saying essentially amounts to saying that this:

On trouve l'''homme'' sur la Terre.

should be rendered as:

On trouve lhomme sur la Terre.

even though that is clearly not what the user meant.

Timwi

Fredrik Johansson

11:24 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

Why not the intuitive /italicized text/ and *bold text*?

Fredrik

Brent 'Dax' Royal-Gordon

14 May 14 May

2:10 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Fredrik Johansson fredrik.johansson@gmail.com wrote:

...

Why not the intuitive /italicized text/ and *bold text*?

For any big change in syntax, there would be a cost to translate the various wikis to the new syntax, but that only happens once. Perhaps more importantly, there would be a significant cognitive change involved--people (and bots) would have to get used to the new syntax.

As for the specific characters you proposed, * at the begining of a line already means bullet; however, there is arguably a more accurate heuristic for this than the ones people are currently defending--basically, "does the rest of this paragraph contain an odd or even number of stars?" Italicized text is somewhat more difficult; sequences like foo/bar/baz aren't uncommon, and would be rendered unintelligible by a transformation to italics.

One thing worth looking at is Kwid, the markup language of Kwiki. In Kwid, the canonical forms of the inline formatting commands are {*bold*}, {/italics/}, {`code`} and so on, but when it's unambiguous (whitespace or punctuation before the leading character and after the trailing character), the curly brackets can be omitted: *bold*, /italics/, `code`. I don't really like a lot of things about Kwid, but this one might be worth stealing--*if* we're really contemplating a large syntax change.

I had a copy of Meta's cur table from 6 April 2005 lying around, so I loaded it into MySQL for a little analysis:

SELECT COUNT(*) FROM cur WHERE NOT cur_is_redirect; -> 12782 SELECT COUNT(*) FROM cur WHERE cur_text REGEXP " \*[^\n]*\*[ \n]"; -> 289 SELECT COUNT(*) FROM cur WHERE cur_text REGEXP " /[^\n]*/[ \n]"; -> 291

So these sequences certainly aren't unheard of in wikitext, but they're not terribly common. (I'd love to see somebody run this against a few Wikipedias, btw--as I said, I only used Meta because that's what I had handy.)

-- Brent 'Dax' Royal-Gordon brent@brentdax.com Perl and Parrot hacker

Neil Barsema

3:59 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Brent 'Dax' Royal-Gordon wrote:

...

Fredrik Johansson fredrik.johansson@gmail.com wrote:

...
Why not the intuitive /italicized text/ and *bold text*?

I like this syntax.

...

As for the specific characters you proposed, * at the begining of a line already means bullet; however, there is arguably a more accurate heuristic for this than the ones people are currently defending--basically, "does the rest of this paragraph contain an odd or even number of stars?" Italicized text is somewhat more difficult; sequences like foo/bar/baz aren't uncommon, and would be rendered unintelligible by a transformation to italics.

The funny thing is my Thunderbird mailclient has *no* trouble rendering all the *bold* /italicized/ and possible/troubling/combinations you mention in your mail.... ;-)

Neil

Timwi

3:51 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

Neil Barsema wrote:

...

The funny thing is my Thunderbird mailclient has *no* trouble rendering all the *bold* /italicized/ and possible/troubling/combinations you mention in your mail.... ;-)

That's different. Firstly, it still displays the asterisks and the slashes, so you are not losing anything. Secondly, even when it does it when it "shouldn't", it is no biggie, but on Wikipedia it would be.

Just yesterday I wrote "*ducks, runs and hides*" somewhere in this newsgroup, and it bolded it even though that wasn't the real semantics of the asterisks. You forgive this because you know Thunderbird can't tell, but on Wikipedia that would be different.

Timwi

Ray Saintonge

11:27 a.m.

New subject: Proposal for deprecation of syntax features for 1.5

Fredrik Johansson wrote:

...

Why not the intuitive /italicized text/ and *bold text*?

What's intuitive about that?

Andrew Rodland

2:50 p.m.

New subject: Proposal for deprecation of syntax features for 1.5

I think it's not so much "intuitive" as "people have been using it for so long that it just seems natural". And // certainly conveys a notion of slantiness, while ** has a history of making things stand out. Not that I would ever support such a change, but it certainly makes more sense than '' and ''' :)

On 5/14/05, Ray Saintonge saintonge@telus.net wrote:

...

Fredrik Johansson wrote:

...
Why not the intuitive /italicized text/ and *bold text*?

What's intuitive about that?

Ec

Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

7000

Age (days ago)

7007

Last active (days ago)

wikitech-l@lists.wikimedia.org

30 comments

15 participants

tags (0)

participants (15)

Andrew Rodland
Brent 'Dax' Royal-Gordon
Daniel Wunsch
Delirium
Fredrik Johansson
Gerard Meijssen
Karl Eichwalder
Lee Daniel Crocker
MaPhi Werner
Neil Barsema
Patrick Collison
Ray Saintonge
Tels
Timwi
Ævar Arnfjörð Bjarmason