Define and Formalize wiki markup

List overview All Threads
Download

newer

older

MediaWiki automated test run...

crear plantilla

Pedro de Medeiros

24 Apr 2006 24 Apr '06

6:34 p.m.

Hello, list members.

I have noticed that there is a summer of code project for definition and formalization of wiki markup language. But it is stated in the current meta-wiki page that this would be "mostly a documentation step and developer discipline issue."

Well, the summer of code rules clearly specify that a valid project can't be one of documentation, but of coding (and I assume "discipline issues", whatever that means, is also non-coding).

With that said, I am interest in such project if it involves coding. To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

Who should I talk to? Maybe I can email some of my draft yacc/bison parser to the mentor?

Cheers. -- Pedro de Medeiros - Computer Science - University of Brasília Email: pedro.medeiros@gmail.com - Home Page: http://www.nonseq.net Linux User No.: 234250 - ICQ: 2878740 - Jabber: medeiros@jabber.org

Show replies by date

Ivan Krstic

24 Apr 24 Apr

6:43 p.m.

Pedro de Medeiros wrote:

...

To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

Have you looked at the existing parser attempts in SVN (I don't remember if they're all still there)? Getting the first 90% of a real parser for MediaWiki syntax will take a small fraction of the time required to get a full parser. This makes it easy to create another almost-but-not-quite-finished parser by the end of the summer, and we'd be no better off for it.

I strongly recommend investigating the existing parser attempts, and finishing one of them.

...

Who should I talk to? Maybe I can email some of my draft yacc/bison parser to the mentor?

If Arne "Timwi" Heizmann was interested in mentoring someone for SoC, he'd likely be a good person to mentor this project.

-- Ivan Krstic krstic@fas.harvard.edu | GPG: 0x147C722D

Pedro de Medeiros

7:47 p.m.

On 4/24/06, Ivan Krstic krstic@fas.harvard.edu wrote:

...

Pedro de Medeiros wrote:

...
To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

Have you looked at the existing parser attempts in SVN (I don't remember if they're all still there)?

As a matter of fact, I did. Sometime ago. But code was difficult to understand, it wouldn't compile and running bison on the .y file returned lots (maybe 2000?) grammar conflicts.

...

Getting the first 90% of a real parser for MediaWiki syntax will take a small fraction of the time required to get a full parser. This makes it easy to create another almost-but-not-quite-finished parser by the end of the summer, and we'd be no better off for it.

That depends also in what platform this parser need be, and also for what use. For instance, a C/C++ parser would be a necessary step to create a php module for wiki parsing.

If a parser takes only a small fraction of the time, maybe I could also write the php module that uses the parser. :)

...

I strongly recommend investigating the existing parser attempts, and finishing one of them.

I have seen some attemps. But none of them is in C/C++ or even complete.

...

...
Who should I talk to? Maybe I can email some of my draft yacc/bison parser to the mentor?

If Arne "Timwi" Heizmann was interested in mentoring someone for SoC, he'd likely be a good person to mentor this project.

Thanks.

Cheers, -- Pedro de Medeiros - Computer Science - University of Brasília Email: pedro.medeiros@gmail.com - Home Page: http://www.nonseq.net Linux User No.: 234250 - ICQ: 2878740 - Jabber: medeiros@jabber.org

Ivan Krstic

7:57 p.m.

Pedro de Medeiros wrote:

...

As a matter of fact, I did. Sometime ago. But code was difficult to understand, it wouldn't compile and running bison on the .y file returned lots (maybe 2000?) grammar conflicts.

The latter part of this is surprising. Timwi, can you clarify? I seem to remember that the parser was incomplete, but what of it was there worked fine.

...

That depends also in what platform this parser need be, and also for what use.

Not really. My point was that it's easy to build a parser for *a lot* of the MediaWiki syntax, and hard to build one for all of it. We need the latter.

...

If a parser takes only a small fraction of the time, maybe I could also write the php module that uses the parser. :)

You misunderstood me; see above.

...

I have seen some attemps. But none of them is in C/C++ or even complete.

Timwi's work used standard tools that would eventually produce a C parser, IIRC. Of course none of the parsers in SVN are complete -- if they were, we wouldn't be having this discussion.

-- Ivan Krstic krstic@fas.harvard.edu | GPG: 0x147C722D

Pedro de Medeiros

8:29 p.m.

On 4/24/06, Ivan Krstic krstic@fas.harvard.edu wrote:

...

Pedro de Medeiros wrote:

...
As a matter of fact, I did. Sometime ago. But code was difficult to understand, it wouldn't compile and running bison on the .y file returned lots (maybe 2000?) grammar conflicts.

The latter part of this is surprising. Timwi, can you clarify? I seem to remember that the parser was incomplete, but what of it was there worked fine.

If I am not mistaken, the name of the directory is flexbisonparse. CVS tells me it is several months stopped. Code was probably confusing because wiki syntax is not really context free, so there is a lot of considerations to make beforehand.

I would also like to know from Timwi (or anyone who is involved with it) what is planned for this parser in the future and how it will be used.

...

...
That depends also in what platform this parser need be, and also for what use.

Not really. My point was that it's easy to build a parser for *a lot* of the MediaWiki syntax, and hard to build one for all of it. We need the latter.

Ok. But what I said about writing the wiki parser is not besides the point: I suppose a complete wiki parser is pretty much needed. If one is to finish an incomplete parser or to rewrite a new one based on that is a matter of approach. And I think the existing one needs some heavy lifting and, from what I saw of it, it won't be an easy task to complete it.

...

Timwi's work used standard tools that would eventually produce a C parser, IIRC. Of course none of the parsers in SVN are complete -- if they were, we wouldn't be having this discussion.

Yes, I guess it is pretty much unmantained nowadays. Most of it is more than 2 years old. I guess this serves as an indication that it needs a heavy lifting. :)

Magnus Manske

25 Apr 25 Apr

10:23 a.m.

Pedro de Medeiros schrieb:

...

On 4/24/06, Ivan Krstic krstic@fas.harvard.edu wrote:

...
Pedro de Medeiros wrote:

...
As a matter of fact, I did. Sometime ago. But code was difficult to understand, it wouldn't compile and running bison on the .y file returned lots (maybe 2000?) grammar conflicts.

The latter part of this is surprising. Timwi, can you clarify? I seem to remember that the parser was incomplete, but what of it was there worked fine.

If I am not mistaken, the name of the directory is flexbisonparse. CVS tells me it is several months stopped. Code was probably confusing because wiki syntax is not really context free, so there is a lot of considerations to make beforehand.

I would also like to know from Timwi (or anyone who is involved with it) what is planned for this parser in the future and how it will be used.

...
...
That depends also in what platform this parser need be, and also for what use.

Not really. My point was that it's easy to build a parser for *a lot* of the MediaWiki syntax, and hard to build one for all of it. We need the latter.

Ok. But what I said about writing the wiki parser is not besides the point: I suppose a complete wiki parser is pretty much needed. If one is to finish an incomplete parser or to rewrite a new one based on that is a matter of approach. And I think the existing one needs some heavy lifting and, from what I saw of it, it won't be an easy task to complete it.

...
Timwi's work used standard tools that would eventually produce a C parser, IIRC. Of course none of the parsers in SVN are complete -- if they were, we wouldn't be having this discussion.

Yes, I guess it is pretty much unmantained nowadays. Most of it is more than 2 years old. I guess this serves as an indication that it needs a heavy lifting. :)

After Timwi stopped working on it, I had a look but couldn't make enough sense of the lexer to contribute anything useful.

The parser was actually quite developed at the time, with the notable exception of HTML-style tags; anything between two matching ones was not parsed at all. If someone could add that, it would be 90% complete (rough estimate).

Magnus

Magnus Manske

24 Apr 24 Apr

7:51 p.m.

Ivan Krstic wrote:

...

Pedro de Medeiros wrote:

...
To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

Have you looked at the existing parser attempts in SVN (I don't remember if they're all still there)? Getting the first 90% of a real parser for MediaWiki syntax will take a small fraction of the time required to get a full parser. This makes it easy to create another almost-but-not-quite-finished parser by the end of the summer, and we'd be no better off for it.

I strongly recommend investigating the existing parser attempts, and finishing one of them.

SVN module "wiki2xml", directory "php". Includes a mostly-working, reasoably fast converter (almost-parser) to XML, and several subsequent converters to XHTML, DocBook, OpenDocument, plain text. Includes a script to convert a wikipedia dump to lots'o'text files, which can then be browsed offline based on the wiki-to-XML-to-(X)HTML converters. I'm currently working on plugging in the lucene engine to add offline fulltext search.

Magnus

Jay R. Ashworth

8:16 p.m.

On Mon, Apr 24, 2006 at 11:34:10AM -0400, Pedro de Medeiros wrote:

...

With that said, I am interest in such project if it involves coding. To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

Check the list archive; Tim or brion just pointed out in the last 48 hours that the prep work for this is sufficiently wide ranging that it's unlikely to be the best choice of project (ie: if it requires modifying things, it's a Major Project).

Cheers, -- jra

-- Jay R. Ashworth jra@baylink.com Designer Baylink RFC 2100 Ashworth & Associates The Things I Think '87 e24 St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274 A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on Usenet and in e-mail?

Gregory Maxwell

8:17 p.m.

On 4/24/06, Jay R. Ashworth jra@baylink.com wrote:

...

On Mon, Apr 24, 2006 at 11:34:10AM -0400, Pedro de Medeiros wrote:

...
With that said, I am interest in such project if it involves coding. To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

Check the list archive; Tim or brion just pointed out in the last 48 hours that the prep work for this is sufficiently wide ranging that it's unlikely to be the best choice of project (ie: if it requires modifying things, it's a Major Project).

A solid, high speed, external parser is valuable even if not yet integrated into Mediawiki.

Pedro de Medeiros

9:07 p.m.

On 4/24/06, Jay R. Ashworth jra@baylink.com wrote:

...

On Mon, Apr 24, 2006 at 11:34:10AM -0400, Pedro de Medeiros wrote:

...
With that said, I am interest in such project if it involves coding. To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

Check the list archive; Tim or brion just pointed out in the last 48 hours that the prep work for this is sufficiently wide ranging that it's unlikely to be the best choice of project (ie: if it requires modifying things, it's a Major Project).

Well, it is more like 72 hours ago. But it revolves around a different project: the mediawiki php parser itself.

But a fast C/C++ parser is not a part of mediawiki and a complete different matter.

William Allen Simpson

11:22 p.m.

A solid first step would be defining and formalizing, and then proposing alternatives once the problems are known. That several folks have attempted something in the past is a clear indication that this needs to be done.

The Major part isn't the actual coding, it's reworking the old files for any changes. But it's hard to attempt buy-in for the Major work until the defining and formalizing has been done.

I don't know whether the final product should be C or PHP or both, and won't know until problems have been identified and buy-in has been accomplished.

I'm in favor of this project.

Andy Rabagliati

25 Apr 25 Apr

11:24 a.m.

On Mon, 24 Apr 2006, William Allen Simpson wrote:

...

The Major part isn't the actual coding, it's reworking the old files for any changes. But it's hard to attempt buy-in for the Major work until the defining and formalizing has been done.

The fact that 90% seems to slide smoothly, and the other 90% is bumpy, also seems to cry out for formalisation.

Once that is done, the 'other' 90% could perhaps be altered to make it look more like 10%.

In other words, if it is formalised, we might have a good shot at simplifying it.

Cheers, Andy!

Dirk Riehle

12:54 p.m.

At 25.04.2006, Andy Rabagliati wrote:

...

On Mon, 24 Apr 2006, William Allen Simpson wrote:

...
The Major part isn't the actual coding, it's reworking the old files for any changes. But it's hard to attempt buy-in for the Major work until the defining and formalizing has been done.

The fact that 90% seems to slide smoothly, and the other 90% is bumpy, also seems to cry out for formalisation.

Once that is done, the 'other' 90% could perhaps be altered to make it look more like 10%.

In other words, if it is formalised, we might have a good shot at simplifying it.

Isn't it the other way round? First you cut features, then you formalize, and then you extend?

As said above, the first 80% are usually easy, and the last 20% will cost you all your hair.

Dirk

Jay R. Ashworth

5 p.m.

On Mon, Apr 24, 2006 at 02:07:42PM -0400, Pedro de Medeiros wrote:

...

On 4/24/06, Jay R. Ashworth jra@baylink.com wrote:

...
On Mon, Apr 24, 2006 at 11:34:10AM -0400, Pedro de Medeiros wrote:

...
With that said, I am interest in such project if it involves coding. To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

Check the list archive; Tim or brion just pointed out in the last 48 hours that the prep work for this is sufficiently wide ranging that it's unlikely to be the best choice of project (ie: if it requires modifying things, it's a Major Project).

Well, it is more like 72 hours ago. But it revolves around a different project: the mediawiki php parser itself.

But a fast C/C++ parser is not a part of mediawiki and a complete different matter.

As the other reply before you notes; clearly, I missed the import of the OP's query.

Cheers, -- jra

Brion Vibber

4:36 a.m.

Pedro de Medeiros wrote:

...

Hello, list members.

I have noticed that there is a summer of code project for definition and formalization of wiki markup language. But it is stated in the current meta-wiki page that this would be "mostly a documentation step and developer discipline issue."

Well, the summer of code rules clearly specify that a valid project can't be one of documentation, but of coding (and I assume "discipline issues", whatever that means, is also non-coding).

This wouldn't be a suitable SoC project, as it's a major core change with far-ranging consequences and thus requires lots of compatibility work etc.

-- brion vibber (brion @ pobox.com)

Dirk Riehle

12:49 p.m.

I realize you are talking about MediaWiki markup only, but I would like to encourage to also watch what other wiki engines are doing. Wiki engine implementers will come together at

WikiSym 2006 (www.wikisym.org/ws2006)

and I expect a standardized wiki markup to be an important topic (I hope we will have workshop on it). In any case, if you proceed with this work, I would like to encourage you to keep the wiki standards mailing list informed, see

http://www.wikisym.org/cgi-bin/mailman/listinfo/

We have plenty of engine implementers there fighting related problems.

Dirk

Dirk Riehle, ph: +49 172 184 8755, web: http://www.riehle.org Interested in wiki research? Please see http://www.wikisym.org !

At 24.04.2006, Pedro de Medeiros wrote:

...

Hello, list members.

I have noticed that there is a summer of code project for definition and formalization of wiki markup language. But it is stated in the current meta-wiki page that this would be "mostly a documentation step and developer discipline issue."

Well, the summer of code rules clearly specify that a valid project can't be one of documentation, but of coding (and I assume "discipline issues", whatever that means, is also non-coding).

With that said, I am interest in such project if it involves coding. To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

Who should I talk to? Maybe I can email some of my draft yacc/bison parser to the mentor?

Cheers.

Pedro de Medeiros - Computer Science - University of Brasília Email: pedro.medeiros@gmail.com - Home Page: http://www.nonseq.net Linux User No.: 234250 - ICQ: 2878740 - Jabber: medeiros@jabber.org _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Jay R. Ashworth

5:03 p.m.

On Tue, Apr 25, 2006 at 11:49:27AM +0200, Dirk Riehle wrote:

...

and I expect a standardized wiki markup to be an important topic (I hope we will have workshop on it).

I wish you hadn't told me that, Dirk.

My one fond wish, which I've come to understand will never happen with MWtext, is to get *bold* and _italics_ as markup tokens, noting that that's how they've been used on Usenet and in email for almost 2x10E-6 centuries.

Now I have hope again.

You bastard.

Cheers, -- jr 'nicest possible way' a

Tels

8:21 p.m.

Moin,

On Tuesday 25 April 2006 16:03, Jay R. Ashworth wrote:

...

On Tue, Apr 25, 2006 at 11:49:27AM +0200, Dirk Riehle wrote:

...
and I expect a standardized wiki markup to be an important topic (I hope we will have workshop on it).

I wish you hadn't told me that, Dirk.

My one fond wish, which I've come to understand will never happen with MWtext, is to get *bold* and _italics_ as markup tokens, noting that

Wouldn't that be *bold*, _underlined_ and /italics/? :-P

Best wishes,

Tels

-- Signed on Tue Apr 25 19:21:12 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. I'm a Sis-sis-sis-sis-sis-sis-sis-sinnahr...

Jay R. Ashworth

8:31 p.m.

On Tue, Apr 25, 2006 at 07:21:37PM +0200, Tels wrote:

...

On Tuesday 25 April 2006 16:03, Jay R. Ashworth wrote:

...
On Tue, Apr 25, 2006 at 11:49:27AM +0200, Dirk Riehle wrote:

...
and I expect a standardized wiki markup to be an important topic (I hope we will have workshop on it).

I wish you hadn't told me that, Dirk.

My one fond wish, which I've come to understand will never happen with MWtext, is to get *bold* and _italics_ as markup tokens, noting that

Wouldn't that be *bold*, _underlined_ and /italics/? :-P

:-)

In general, no. Handwritten underlines are customarily rendered in print as italics, hence that mapping. You're right; the extra flexibility would be nice, but it doesn't match what I most commonly see people do, and that was my goal. It makes no sense changing markup just for fun; mine was an Installed Base attempt to conform with the Principle of Least Astonishment.

Cheers, -- jra

William Allen Simpson

26 Apr 26 Apr

2:27 a.m.

Jay R. Ashworth wrote:

...

On Tue, Apr 25, 2006 at 07:21:37PM +0200, Tels wrote:

...
Wouldn't that be *bold*, _underlined_ and /italics/? :-P

In general, no. Handwritten underlines are customarily rendered in print as italics, hence that mapping. You're right; the extra flexibility would be nice, but it doesn't match what I most commonly see people do, and that was my goal. It makes no sense changing markup just for fun; mine was an Installed Base attempt to conform with the Principle of Least Astonishment.

Actually, Jay, you must not go back in usenet far enough, because those actually *WERE* the old markup. Underscores are _underlines_ and slashes are /italics/ (probably because of the association with slant).

Much to my surprise, I see that Thunderbird displays them! So, we have both Installed Base and Least Astonishment.

Admittedly, I also like <parameter>, but that got hijacked by some newfangled thing or other. Apparently, somebody thought the standard [n|t]roff markups aren't good enough?

However, a bunch of us stopped using usenet somewhere around 1988, when it became unusable with too much useless traffic. (I hear it's gotten worse.) Mailing lists are far better!

I'd be in favor of bringing the usenet markups here, and reverting '' to "" and ''' to "", as I seem to remember from not-so-long-ago around wikidom.

But first, even better to get a firm grasp on the existing syntax....

Jay R. Ashworth

3 a.m.

On Tue, Apr 25, 2006 at 07:27:46PM -0400, William Allen Simpson wrote:

...

Jay R. Ashworth wrote:

...
On Tue, Apr 25, 2006 at 07:21:37PM +0200, Tels wrote:

...
Wouldn't that be *bold*, _underlined_ and /italics/? :-P

In general, no. Handwritten underlines are customarily rendered in print as italics, hence that mapping. You're right; the extra flexibility would be nice, but it doesn't match what I most commonly see people do, and that was my goal. It makes no sense changing markup just for fun; mine was an Installed Base attempt to conform with the Principle of Least Astonishment.

Actually, Jay, you must not go back in usenet far enough, because those actually *WERE* the old markup. Underscores are _underlines_ and slashes are /italics/ (probably because of the association with slant).

1983, though DejaGoo doesn't seem to be able to find any of my postings earlier than dec 85. People who used /this marking/ were, IME, substantially in the minority.

...

Much to my surprise, I see that Thunderbird displays them! So, we have both Installed Base and Least Astonishment.

Does it, really? Cool!

...

Admittedly, I also like <parameter>, but that got hijacked by some newfangled thing or other. Apparently, somebody thought the standard [n|t]roff markups aren't good enough?

I dunno.

...

However, a bunch of us stopped using usenet somewhere around 1988, when it became unusable with too much useless traffic. (I hear it's gotten worse.) Mailing lists are far better!

No, it's not. No, they're not.

If you think Usenet is useless, you're a) not using slrn or haven't figured out how to score, or b) hanging out in the wrong newsgroups.

...

I'd be in favor of bringing the usenet markups here, and reverting '' to "" and ''' to "", as I seem to remember from not-so-long-ago around wikidom.

But first, even better to get a firm grasp on the existing syntax....

You missed the part where I said that I don't ever expect MWtext to change, right? Don't go getting *me* in trouble, here... :-)

Cheers, -- jra

Lars Aronsson

3:56 a.m.

William Allen Simpson wrote:

...

Actually, Jay, you must not go back in usenet far enough, because those actually *WERE* the old markup. Underscores are _underlines_ and slashes are /italics/ (probably because of the association with slant).

Either way, I think this was discussed within the Wikipedia community in 2001, when the project was less than a year old, and the conclusion was that it could well have been the right thing to do, but it was too late to change because there were already more than 10,000 articles or something like that.

Nothing really stops you from downloading the latest dump, convert it to your own favorite format, and set up your own fork. But I don't think any discussions will lead wikipedia.org to change.

You could even start alt.dumps.wikipedia where you post an article (converted to Usenet markup) each hour for the next 125 years.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Jay R. Ashworth

4:24 a.m.

On Wed, Apr 26, 2006 at 02:56:09AM +0200, Lars Aronsson wrote:

...

You could even start alt.dumps.wikipedia where you post an article (converted to Usenet markup) each hour for the next 125 years.

Well, the strawman is delightful, Lars, really, but I *had* already made clear that I realized the issue was academic in both meanings of the word.

Cheers, -- jra

Tels

7:22 p.m.

Moin,

On Wednesday 26 April 2006 01:27, William Allen Simpson wrote:

...

Jay R. Ashworth wrote:

...
On Tue, Apr 25, 2006 at 07:21:37PM +0200, Tels wrote:

...
Wouldn't that be *bold*, _underlined_ and /italics/? :-P

In general, no. Handwritten underlines are customarily rendered in print as italics, hence that mapping. You're right; the extra flexibility would be nice, but it doesn't match what I most commonly see people do, and that was my goal. It makes no sense changing markup just for fun; mine was an Installed Base attempt to conform with the Principle of Least Astonishment.

Actually, Jay, you must not go back in usenet far enough, because those actually *WERE* the old markup. Underscores are _underlines_ and slashes are /italics/ (probably because of the association with slant).

Thats what I remember, too, but with so many new people rushing on the net and everybody trying to do something clever, the original markups go lost.

...

Much to my surprise, I see that Thunderbird displays them! So, we have both Installed Base and Least Astonishment.

Wow, didn't know that :)

Best wishes,

Tels

-- Signed on Wed Apr 26 18:21:44 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. "Eat, eat, eat, eat the delicious sandwich!" -- Elan the Bard (Order of the Stick)

Chad Perrin

25 Apr 25 Apr

11:13 p.m.

On Tue, Apr 25, 2006 at 07:21:37PM +0200, Tels wrote:

...

Moin,

On Tuesday 25 April 2006 16:03, Jay R. Ashworth wrote:

...
My one fond wish, which I've come to understand will never happen with MWtext, is to get *bold* and _italics_ as markup tokens, noting that

Wouldn't that be *bold*, _underlined_ and /italics/? :-P

I'd rather not. /italics/ always looks like a matching regex to me. It's confusing.

-- Chad Perrin [ CCD CopyWrite | http://ccd.apotheon.org ] "It's just incredible that a trillion-synapse computer could actually spend Saturday afternoon watching a football game." - Marvin Minsky

Jay R. Ashworth

11:43 p.m.

On Tue, Apr 25, 2006 at 02:13:59PM -0600, Chad Perrin wrote:

...

...
...
My one fond wish, which I've come to understand will never happen with MWtext, is to get *bold* and _italics_ as markup tokens, noting that

Wouldn't that be *bold*, _underlined_ and /italics/? :-P

I'd rather not. /italics/ always looks like a matching regex to me. It's confusing.

I knew there was a *good* reason (y'know; in addition to "yick!" :-).

Hey, chaper. How you been?

Cheers, -- jra

Chad Perrin

26 Apr 26 Apr

12:06 a.m.

On Tue, Apr 25, 2006 at 04:43:35PM -0400, Jay R. Ashworth wrote:

...

On Tue, Apr 25, 2006 at 02:13:59PM -0600, Chad Perrin wrote:

...
I'd rather not. /italics/ always looks like a matching regex to me. It's confusing.

I knew there was a *good* reason (y'know; in addition to "yick!" :-).

Hey, chaper. How you been?

Busy! Among other things, I've been doing stuff like this:

http://techrepublic.com.com/5100-10877-6064734.html

Thanks for noticing my absence.

-- Chad Perrin [ CCD CopyWrite | http://ccd.apotheon.org ] unix virus: If you're using a unixlike OS, please forward this to 20 others and erase your system partition.

Timwi

12:47 a.m.

Chad Perrin wrote:

...

On Tue, Apr 25, 2006 at 07:21:37PM +0200, Tels wrote:

...
Moin,

On Tuesday 25 April 2006 16:03, Jay R. Ashworth wrote:

...
My one fond wish, which I've come to understand will never happen with MWtext, is to get *bold* and _italics_ as markup tokens, noting that

Wouldn't that be *bold*, _underlined_ and /italics/? :-P

I'd rather not. /italics/ always looks like a matching regex to me. It's confusing.

That's, like, *so* totally a reason not to have it. Like, you know, Wiki syntax is so totally aimed at people who use regular expressions.

Timwi

Jeremy Dunck

12:56 a.m.

On 4/25/06, Timwi timwi@gmx.net wrote:

...

That's, like, *so* totally a reason not to have it. Like, you know, Wiki syntax is so totally aimed at people who use regular expressions.

As long as we're talking about radical changes to millions of pages and tens of software packages, I like Markdown and Restructured Text.

Elliott F. Cable

1:17 a.m.

I've heard of markdown, never used it though - what is restructured text?

On Apr 25, 2006, at 1:56 PM, Jeremy Dunck wrote:

...

On 4/25/06, Timwi timwi@gmx.net wrote:

...
That's, like, *so* totally a reason not to have it. Like, you know, Wiki syntax is so totally aimed at people who use regular expressions.

As long as we're talking about radical changes to millions of pages and tens of software packages, I like Markdown and Restructured Text.

Jeremy Dunck

3:31 a.m.

On 4/25/06, Elliott F. Cable ecable@avxw.com wrote:

...

I've heard of markdown, never used it though - what is restructured text?

It was sort of tongue-in-cheek, but: http://en.wikipedia.org/wiki/ReStructured_Text

Chad Perrin

1:21 a.m.

On Tue, Apr 25, 2006 at 10:47:32PM +0100, Timwi wrote:

...

Chad Perrin wrote:

...
On Tue, Apr 25, 2006 at 07:21:37PM +0200, Tels wrote:

...
Moin,

On Tuesday 25 April 2006 16:03, Jay R. Ashworth wrote:

...
My one fond wish, which I've come to understand will never happen with MWtext, is to get *bold* and _italics_ as markup tokens, noting that

Wouldn't that be *bold*, _underlined_ and /italics/? :-P

I'd rather not. /italics/ always looks like a matching regex to me. It's confusing.

That's, like, *so* totally a reason not to have it. Like, you know, Wiki syntax is so totally aimed at people who use regular expressions.

My impression is that it's aimed at "everybody" -- which includes me.

-- Chad Perrin [ CCD CopyWrite | http://ccd.apotheon.org ] Ben Franklin: "As we enjoy great Advantages from the Inventions of others we should be glad of an Opportunity to serve others by any Invention of ours, and this we should do freely and generously."

Timwi

7:55 p.m.

Chad Perrin wrote:

...

My impression is that it's aimed at "everybody" -- which includes me.

It is definitely not aimed at _pleasing_ everybody, especially not minorities like programmers.

Chad Perrin

9:12 p.m.

On Wed, Apr 26, 2006 at 05:55:17PM +0100, Timwi wrote:

...

Chad Perrin wrote:

...
My impression is that it's aimed at "everybody" -- which includes me.

It is definitely not aimed at _pleasing_ everybody, especially not minorities like programmers.

Hopefully it's not aimed at disenfranchising "minorities", either -- especially since minorities like programmers aren't as much of a minority at Wikipedia as they are at, say, the Coopersmith's bar downtown.

-- Chad Perrin [ CCD CopyWrite | http://ccd.apotheon.org ] "It's just incredible that a trillion-synapse computer could actually spend Saturday afternoon watching a football game." - Marvin Minsky

Timwi

27 Apr 27 Apr

3:47 p.m.

Chad Perrin wrote:

...

On Wed, Apr 26, 2006 at 05:55:17PM +0100, Timwi wrote:

...
Chad Perrin wrote:

...
My impression is that it's aimed at "everybody" -- which includes me.

It is definitely not aimed at _pleasing_ everybody, especially not minorities like programmers.

Hopefully it's not aimed at disenfranchising "minorities", either -- especially since minorities like programmers aren't as much of a minority at Wikipedia as they are at, say, the Coopersmith's bar downtown.

Yeah, so let's design MediaWiki so it's useful only to the programmers, since no-one else would be helpful in writing the encyclopedia anyway. Especially since programmers know everything.

Timwi

Jay R. Ashworth

4:11 p.m.

On Thu, Apr 27, 2006 at 01:47:02PM +0100, Timwi wrote:

...

Chad Perrin wrote:

...
On Wed, Apr 26, 2006 at 05:55:17PM +0100, Timwi wrote:

...
Chad Perrin wrote:

...
My impression is that it's aimed at "everybody" -- which includes me.

It is definitely not aimed at _pleasing_ everybody, especially not minorities like programmers.

Hopefully it's not aimed at disenfranchising "minorities", either -- especially since minorities like programmers aren't as much of a minority at Wikipedia as they are at, say, the Coopersmith's bar downtown.

Yeah, so let's design MediaWiki so it's useful only to the programmers, since no-one else would be helpful in writing the encyclopedia anyway. Especially since programmers know everything.

This thread has gotten so dense that I can no longer discern what the topic is, much less who's on which side. If anyone sees fit to continue it, I suggest they start by enumerating those two things.

Cheers, -- jra

Timwi

8:53 p.m.

...

This thread has gotten so dense that I can no longer discern what the topic is, much less who's on which side. If anyone sees fit to continue it, I suggest they start by enumerating those two things.

Not sure what you mean by 'dense'. My "side" of the thread is just that I get agitated at illogical/fallacious arguments like "let's not use slashes for italics -- they look like regular expressions." That's all.

Timwi

Pedro de Medeiros

9:30 p.m.

On 4/27/06, Timwi timwi@gmx.net wrote:

...

...
This thread has gotten so dense that I can no longer discern what the topic is, much less who's on which side. If anyone sees fit to continue it, I suggest they start by enumerating those two things.

Not sure what you mean by 'dense'. My "side" of the thread is just that I get agitated at illogical/fallacious arguments like "let's not use slashes for italics -- they look like regular expressions." That's all.

According to the principle of least astonishment, if someone types something between /slashes/ in a wiki document, he doesn't expect it to turn into regular expressions to do some unobvious pattern matching when "commit" is pressed. So I agree with you.

Jay R. Ashworth

11:18 p.m.

On Thu, Apr 27, 2006 at 02:30:09PM -0400, Pedro de Medeiros wrote:

...

On 4/27/06, Timwi timwi@gmx.net wrote:

...
...
This thread has gotten so dense that I can no longer discern what the topic is, much less who's on which side. If anyone sees fit to continue it, I suggest they start by enumerating those two things.

Not sure what you mean by 'dense'. My "side" of the thread is just that I get agitated at illogical/fallacious arguments like "let's not use slashes for italics -- they look like regular expressions." That's all.

According to the principle of least astonishment, if someone types something between /slashes/ in a wiki document, he doesn't expect it to turn into regular expressions to do some unobvious pattern matching when "commit" is pressed. So I agree with you.

See my other message: the actual issue is when people *type an example of a regular expression into a wikipage*. While that may not happen much on Wikipedia, remember that Not All Mediawikiae Are Wikipedia, a rule that's pertinent when discussing this category of topic.

Cheers, -- jra

Elliott F. Cable

11:46 p.m.

To pitch in, many many many mediawiki run sites are DEFIANTLY something that regex would be bad to make impossible to type.

Also, first thing that came to mind for ME, was URLS - most URLs have /something/like/this.ext in them - that would be tons of extra code to make it not apply within URLs...

On a sidetopic, I'm having trouble submitting things to this list unless I'm replying to something like this - I've tried to send at least 15 notes to this list, and they won't go that I can tell, unless people have completely ignored all of them so far...

On Apr 27, 2006, at 12:18 PM, Jay R. Ashworth wrote:

...

On Thu, Apr 27, 2006 at 02:30:09PM -0400, Pedro de Medeiros wrote:

...
On 4/27/06, Timwi timwi@gmx.net wrote:

...
...
This thread has gotten so dense that I can no longer discern what the topic is, much less who's on which side. If anyone sees fit to continue it, I suggest they start by enumerating those two things.

Not sure what you mean by 'dense'. My "side" of the thread is just that I get agitated at illogical/fallacious arguments like "let's not use slashes for italics -- they look like regular expressions." That's all.

According to the principle of least astonishment, if someone types something between /slashes/ in a wiki document, he doesn't expect it to turn into regular expressions to do some unobvious pattern matching when "commit" is pressed. So I agree with you.

See my other message: the actual issue is when people *type an example of a regular expression into a wikipage*. While that may not happen much on Wikipedia, remember that Not All Mediawikiae Are Wikipedia, a rule that's pertinent when discussing this category of topic.

Cheers,

-- jra

Jay R. Ashworth jra@baylink.com Designer Baylink RFC 2100 Ashworth & Associates The Things I Think '87 e24 St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274
 A: Because it messes up the order in which people normally  
read text. Q: Why is top-posting such a bad thing?
 A: Top-posting.
 Q: What is the most annoying thing on Usenet and in e-mail?
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Jay R. Ashworth

28 Apr 28 Apr

12:28 a.m.

On Thu, Apr 27, 2006 at 12:46:53PM -0800, Elliott F. Cable wrote:

...

Also, first thing that came to mind for ME, was URLS - most URLs have /something/like/this.ext in them - that would be tons of extra code to make it not apply within URLs...

Excellent point -- or, an excellent addendum to my point; however one wants to look at these things. :-)

...

On a sidetopic, I'm having trouble submitting things to this list unless I'm replying to something like this - I've tried to send at least 15 notes to this list, and they won't go that I can tell, unless people have completely ignored all of them so far...

I do not recall having seen any threads from you; are you getting bounce messages?

Cheers, -- jra

Elliott F. Cable

12:31 a.m.

New subject: Having trouble submitting to this list

On Apr 27, 2006, at 1:28 PM, Jay R. Ashworth wrote:

...

On Thu, Apr 27, 2006 at 12:46:53PM -0800, Elliott F. Cable wrote:

...
Also, first thing that came to mind for ME, was URLS - most URLs have /something/like/this.ext in them - that would be tons of extra code to make it not apply within URLs...

Excellent point -- or, an excellent addendum to my point; however one wants to look at these things. :-)

...
On a sidetopic, I'm having trouble submitting things to this list unless I'm replying to something like this - I've tried to send at least 15 notes to this list, and they won't go that I can tell, unless people have completely ignored all of them so far...

I do not recall having seen any threads from you; are you getting bounce messages?

Ok, I changed the subject of this, lets see if that works - no bounce messages; and I always CC everything I send to myself, so I have 4 copies of every e-mail I send - draft, sent mail, on my server, and sent back to me - and i got he 4th back, so I know it was sent to the mailing list server. No bounce messages, nothing - just doesn't work... And they never show up in the mailing list folder that all my mediawiki-l and wikitech-l messages are moved to automagically...

Plyd

10:34 a.m.

New subject: Having trouble submitting to this list

seems to be related to your email client. You probably have a way to see the headers of your emails. There, if you can see where it has been... from or to the mailing list.

You may have to check your filters too.

To remove multiple copies of email, you should use clients like gmail, which does it pretty well.

To be published immediatly on the ml, you should also use the registered email adress, the email adress that you use to receive the ml messages (it is probably already the case but...).

Plyd

On 4/27/06, Elliott F. Cable ecable@avxw.com wrote:

...

On Apr 27, 2006, at 1:28 PM, Jay R. Ashworth wrote:

...
On Thu, Apr 27, 2006 at 12:46:53PM -0800, Elliott F. Cable wrote:

...
Also, first thing that came to mind for ME, was URLS - most URLs have /something/like/this.ext in them - that would be tons of extra code to make it not apply within URLs...

Excellent point -- or, an excellent addendum to my point; however one wants to look at these things. :-)

...
On a sidetopic, I'm having trouble submitting things to this list unless I'm replying to something like this - I've tried to send at least 15 notes to this list, and they won't go that I can tell, unless people have completely ignored all of them so far...

I do not recall having seen any threads from you; are you getting bounce messages?

Ok, I changed the subject of this, lets see if that works - no bounce messages; and I always CC everything I send to myself, so I have 4 copies of every e-mail I send - draft, sent mail, on my server, and sent back to me - and i got he 4th back, so I know it was sent to the mailing list server. No bounce messages, nothing - just doesn't work... And they never show up in the mailing list folder that all my mediawiki-l and wikitech-l messages are moved to automagically... _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Elliott F. Cable

7:17 p.m.

New subject: Having trouble submitting to this list

On Apr 27, 2006, at 11:34 PM, Plyd wrote:

...

seems to be related to your email client. You probably have a way to see the headers of your emails. There, if you can see where it has been... from or to the mailing list.

You may have to check your filters too.

To remove multiple copies of email, you should use clients like gmail, which does it pretty well.

To be published immediatly on the ml, you should also use the registered email adress, the email adress that you use to receive the ml messages (it is probably already the case but...).

Plyd

On 4/27/06, Elliott F. Cable ecable@avxw.com wrote:

...
On Apr 27, 2006, at 1:28 PM, Jay R. Ashworth wrote:

...
On Thu, Apr 27, 2006 at 12:46:53PM -0800, Elliott F. Cable wrote:

...
Also, first thing that came to mind for ME, was URLS - most URLs have /something/like/this.ext in them - that would be tons of extra code to make it not apply within URLs...

Excellent point -- or, an excellent addendum to my point; however one wants to look at these things. :-)

...
On a sidetopic, I'm having trouble submitting things to this list unless I'm replying to something like this - I've tried to send at least 15 notes to this list, and they won't go that I can tell, unless people have completely ignored all of them so far...

I do not recall having seen any threads from you; are you getting bounce messages?

Ok, I changed the subject of this, lets see if that works - no bounce messages; and I always CC everything I send to myself, so I have 4 copies of every e-mail I send - draft, sent mail, on my server, and sent back to me - and i got he 4th back, so I know it was sent to the mailing list server. No bounce messages, nothing - just doesn't work... And they never show up in the mailing list folder that all my mediawiki-l and wikitech-l messages are moved to automagically...

Yes, Brion already helped me with my first problem, I was submitting from ecable@avxw.com while having signed up under ecable@avxworkshop.com, so I fixed that; but I still can't submit new mail to the list... ~~/

Pedro de Medeiros

12:08 a.m.

On 4/27/06, Jay R. Ashworth jra@baylink.com wrote:

...

On Thu, Apr 27, 2006 at 02:30:09PM -0400, Pedro de Medeiros wrote:

...
On 4/27/06, Timwi timwi@gmx.net wrote:

...
...
This thread has gotten so dense that I can no longer discern what the topic is, much less who's on which side. If anyone sees fit to continue it, I suggest they start by enumerating those two things.

Not sure what you mean by 'dense'. My "side" of the thread is just that I get agitated at illogical/fallacious arguments like "let's not use slashes for italics -- they look like regular expressions." That's all.

According to the principle of least astonishment, if someone types something between /slashes/ in a wiki document, he doesn't expect it to turn into regular expressions to do some unobvious pattern matching when "commit" is pressed. So I agree with you.

See my other message: the actual issue is when people *type an example of a regular expression into a wikipage*. While that may not happen much on Wikipedia, remember that Not All Mediawikiae Are Wikipedia, a rule that's pertinent when discussing this category of topic.

It is then an exception, not a rule. Suppose you have a programming language X that uses some of the same markup wiki uses and you want to list some example source code in that language, should you change the wiki markup not to create conflict or just quote the source code? I guess the latter, so why things should be different for regular expressions?

There might be good reasons for not using /slashes/, I just think this was not one of them. :)

Jay R. Ashworth

12:30 a.m.

On Thu, Apr 27, 2006 at 05:08:30PM -0400, Pedro de Medeiros wrote:

...

...
See my other message: the actual issue is when people *type an example of a regular expression into a wikipage*. While that may not happen much on Wikipedia, remember that Not All Mediawikiae Are Wikipedia, a rule that's pertinent when discussing this category of topic.

It is then an exception, not a rule. Suppose you have a programming language X that uses some of the same markup wiki uses and you want to list some example source code in that language, should you change the wiki markup not to create conflict or just quote the source code? I guess the latter, so why things should be different for regular expressions?

Well, as the other poster notes: *URL's*. The issue is twofold: which item is more newly defined, and which one is more common.

...

There might be good reasons for not using /slashes/, I just think this was not one of them. :)

URLs are definitely a better example, but as Mr Cable notes, there are *lots* of programming wikis; overloading /RE/ is about as bad as overloading /U/R/L.

And since we *can* avoid it, we pretty much *must* avoid it.

Cheers, -- jra

Elliott F. Cable

12:43 a.m.

If I may, I'd like to add something older than either one - emphasis. I, personally associate *this* with emphasis, and I associate bold text with emphasis... but that doesn't mean I want *this* and '''this''' to be combined. I also happen to associate ALL CAPS TEXT WITH EMPHASIS - BUT I DON'T WANT MEDIAWIKI TO AUTOMATICALLY CONVERT ALL CAPS TO BOLD, DO I? No, I don't. I think *this* sort of emphasis is a fundamentally different sort of subconscious emphasis than '''this''' (where I expect that to mean bold to you) unrelated to any sort of conflicts. Also, '''this''' is something I never encountered before MediaWiki software, yet it somehow made sense that ''this'' is emphasized, '''this''' is more so, and finally '''''this''''' is most so. Whereas /this/ doesn't really make sense...

Another sort of related topic to bring up - the whole idea behind CSS applies here, also. and do the same thing, right? Wrong. is a formatting control, which I believe technically shouldn't even be in HTML - is a content control, saying 'this text is important'. This is also important for alternative browsing - audible, or any sort of non-visual or non-textual browsing, where means absolutely nothing, but means emphasized. In the same way, I understand fundamentally that ''this'' or '''this''' or '''''this'''''' are different levels of in-line emphasized text, not headings like ==this==, but DIFFERENT from the text around them. In other words, emphasized. However, /this/ and *this* are mere formatting controls - exactly what the entire web standards movement is fighting against. Yes, /this/ sort of logically slants text - but that is all. It merely slants text. Unless you directly associate slanted text with emphasis, /this/ IN AND OF ITSELF doesn't emphasize the text, as doesn't *this*! Just another thought to consider before considering the leap to that sort of formatting...

On Apr 27, 2006, at 1:30 PM, Jay R. Ashworth wrote:

...

On Thu, Apr 27, 2006 at 05:08:30PM -0400, Pedro de Medeiros wrote:

...
...
See my other message: the actual issue is when people *type an example of a regular expression into a wikipage*. While that may not happen much on Wikipedia, remember that Not All Mediawikiae Are Wikipedia, a rule that's pertinent when discussing this category of topic.

It is then an exception, not a rule. Suppose you have a programming language X that uses some of the same markup wiki uses and you want to list some example source code in that language, should you change the wiki markup not to create conflict or just quote the source code? I guess the latter, so why things should be different for regular expressions?

Well, as the other poster notes: *URL's*. The issue is twofold: which item is more newly defined, and which one is more common.

...
There might be good reasons for not using /slashes/, I just think this was not one of them. :)

URLs are definitely a better example, but as Mr Cable notes, there are *lots* of programming wikis; overloading /RE/ is about as bad as overloading /U/R/L.

And since we *can* avoid it, we pretty much *must* avoid it.

Cheers,

-- jra

Jay R. Ashworth jra@baylink.com Designer Baylink RFC 2100 Ashworth & Associates The Things I Think '87 e24 St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274
 A: Because it messes up the order in which people normally  
read text. Q: Why is top-posting such a bad thing?
 A: Top-posting.
 Q: What is the most annoying thing on Usenet and in e-mail?
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Pedro de Medeiros

1:46 a.m.

On 4/27/06, Jay R. Ashworth jra@baylink.com wrote:

...

URLs are definitely a better example, but as Mr Cable notes, there are *lots* of programming wikis;

Any wiki that display code should properly quote it. Have a look at any source code example in wikipedia. It is a good practice.

...

overloading /RE/ is about as bad as overloading /U/R/L.

Not really. /U/R/L has a meaning in wiki markup, /RE/ doesn't. It is just plain text. So you can't say they compare that much.

Besides, "overloading /RE/" is not correct, because /RE/ is not processed by wiki markup (as stated, /RE/ is just plain text). On the other hand, "Overloading /U/R/L" is more correct.

Elliott F. Cable

1:54 a.m.

On Apr 27, 2006, at 2:46 PM, Pedro de Medeiros wrote:

...

On 4/27/06, Jay R. Ashworth jra@baylink.com wrote:

...
URLs are definitely a better example, but as Mr Cable notes, there are *lots* of programming wikis;

Any wiki that display code should properly quote it. Have a look at any source code example in wikipedia. It is a good practice.

...
overloading /RE/ is about as bad as overloading /U/R/L.

Not really. /U/R/L has a meaning in wiki markup, /RE/ doesn't. It is just plain text. So you can't say they compare that much.

Besides, "overloading /RE/" is not correct, because /RE/ is not processed by wiki markup (as stated, /RE/ is just plain text). On the other hand, "Overloading /U/R/L" is more correct.

In any case, as he said, this won't be changing any time soon, so lets leave it at this, shall we? There are arguments both for and against this type of /italic/ *bold* and _underlined_ syntax, we've heard them all now, and they won't affect anything at all until a decision is made to totally re-define the syntax anyway; so lets let this go and move on.

Jay R. Ashworth

27 Apr 27 Apr

11:16 p.m.

On Thu, Apr 27, 2006 at 06:53:13PM +0100, Timwi wrote:

...

...
This thread has gotten so dense that I can no longer discern what the topic is, much less who's on which side. If anyone sees fit to continue it, I suggest they start by enumerating those two things.

Not sure what you mean by 'dense'. My "side" of the thread is just that I get agitated at illogical/fallacious arguments like "let's not use slashes for italics -- they look like regular expressions." That's all.

I don't believe it's a fallacious arguemnt. Let me cast it for you slightly differently:

Choices for inline text markup coding should be made so as to collide with the least possible number of already extant uses of that set of punctuation.

That's why [[ ]], '' '', and ''' ''' are pretty good choices, while the leading space for indention, slightly less so.

While *bold* would be contextual, since <BOL>* is already in use for list items, _italics_ would not, and doesn't collide with anything.

/italics/, though, would, probably unexpectedly, collide with the writing of Regexes, and there's no good way to disambiguate from context (as there is with *bold face marking*).

Cheers, -- jra

Chad Perrin

28 Apr 28 Apr

1:06 a.m.

On Thu, Apr 27, 2006 at 04:16:13PM -0400, Jay R. Ashworth wrote:

...

On Thu, Apr 27, 2006 at 06:53:13PM +0100, Timwi wrote:

...
...
This thread has gotten so dense that I can no longer discern what the topic is, much less who's on which side. If anyone sees fit to continue it, I suggest they start by enumerating those two things.

Not sure what you mean by 'dense'. My "side" of the thread is just that I get agitated at illogical/fallacious arguments like "let's not use slashes for italics -- they look like regular expressions." That's all.

I don't believe it's a fallacious arguemnt. Let me cast it for you slightly differently:

Choices for inline text markup coding should be made so as to collide with the least possible number of already extant uses of that set of punctuation.

That's why [[ ]], '' '', and ''' ''' are pretty good choices, while the leading space for indention, slightly less so.

While *bold* would be contextual, since <BOL>* is already in use for list items, _italics_ would not, and doesn't collide with anything.

/italics/, though, would, probably unexpectedly, collide with the writing of Regexes, and there's no good way to disambiguate from context (as there is with *bold face marking*).

Thanks. That (and some of your other commentary) clarifies where I was heading, and I wasn't coming up with a useful way to phrase it. I'm glad someone else did.

Additionally, my statements weren't "fallacious" because they didn't purport to be some kind of valid argument that they were not. One might claim that my premises were wrong or inapplicable, but I don't think "invalid" or "fallacious" in any way applied to my statements.

Besides, I only started this by saying something like "I'd rather we didn't do it that way." I wasn't trying to mandate policy or call anyone names, so the implicit vitriol with which my perspective seems to have been met is a mite bewildering to me.

-- Chad Perrin [ CCD CopyWrite | http://ccd.apotheon.org ] "There comes a time in the history of any project when it becomes necessary to shoot the engineers and begin production." - MacUser, November 1990

Elliott F. Cable

1:26 a.m.

On Apr 27, 2006, at 2:06 PM, Chad Perrin wrote:

...

On Thu, Apr 27, 2006 at 04:16:13PM -0400, Jay R. Ashworth wrote:

...
On Thu, Apr 27, 2006 at 06:53:13PM +0100, Timwi wrote:

...
...
This thread has gotten so dense that I can no longer discern what the topic is, much less who's on which side. If anyone sees fit to continue it, I suggest they start by enumerating those two things.

Not sure what you mean by 'dense'. My "side" of the thread is just that I get agitated at illogical/fallacious arguments like "let's not use slashes for italics -- they look like regular expressions." That's all.

I don't believe it's a fallacious arguemnt. Let me cast it for you slightly differently:

Choices for inline text markup coding should be made so as to collide with the least possible number of already extant uses of that set of punctuation.

That's why [[ ]], '' '', and ''' ''' are pretty good choices, while the leading space for indention, slightly less so.

While *bold* would be contextual, since <BOL>* is already in use for list items, _italics_ would not, and doesn't collide with anything.

/italics/, though, would, probably unexpectedly, collide with the writing of Regexes, and there's no good way to disambiguate from context (as there is with *bold face marking*).

Thanks. That (and some of your other commentary) clarifies where I was heading, and I wasn't coming up with a useful way to phrase it. I'm glad someone else did.

Additionally, my statements weren't "fallacious" because they didn't purport to be some kind of valid argument that they were not. One might claim that my premises were wrong or inapplicable, but I don't think "invalid" or "fallacious" in any way applied to my statements.

Besides, I only started this by saying something like "I'd rather we didn't do it that way." I wasn't trying to mandate policy or call anyone names, so the implicit vitriol with which my perspective seems to have been met is a mite bewildering to me.

Don't read something into this which isn't here (-: I see a lot of argument in this list... almost more than there is useful discussion (-: I didn't see any 'vitrol'... if there was, I'm sure it was fully un- intended! Let's keep it civil. I happen to agree with Chad; for reasons stated in my other message (this has seemed to fork into two discussions on the same topic, lets merge them shall we?) I think that /this/ and *this* just wouldn't be appropriate for MediaWiki - and although there is no current underline syntax, and I think _this_ would work wonderfully for that in MediaWiki, again, the discussion of underlining in WikiPedia has already been had, and I don't think there is enough demand outside of wikipedia for an underline syntax to be added. (However, if underlining syntax WERE Added, I'd cast my vote for _this_ - but again, see my other post about content and meaning vs. display)

Timwi

29 Apr 29 Apr

2:17 a.m.

Jay R. Ashworth wrote:

...

I don't believe it's a fallacious arguemnt. Let me cast it for you slightly differently:

Choices for inline text markup coding should be made so as to collide with the least possible number of already extant uses of that set of punctuation.

Okay, so let's use "@*$&#^@(&#^$(#@" for italics and "!(&@^$(&@^$#@&%" for bold. Since, obviously, they collide with pretty few extant uses.

Timwi

Tels

4:13 a.m.

Moin,

On Saturday 29 April 2006 01:17, Timwi wrote:

...

Jay R. Ashworth wrote:

...
I don't believe it's a fallacious arguemnt. Let me cast it for you slightly differently:

Choices for inline text markup coding should be made so as to collide with the least possible number of already extant uses of that set of punctuation.

Okay, so let's use "@*$&#^@(&#^$(#@" for italics and "!(&@^$(&@^$#@&%"

Heh, stop all that @*$&#^@(&#^$(#@ swearing!

Best wishes,

Tels

-- Signed on Sat Apr 29 03:12:37 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. "Retsina?" - "Ja, Papa?" - "Schach Matt." - "Is gut, Papa."

Tels

26 Apr 26 Apr

1:51 a.m.

Moin,

On Tuesday 25 April 2006 22:13, Chad Perrin wrote:

...

On Tue, Apr 25, 2006 at 07:21:37PM +0200, Tels wrote:

...
Moin,

On Tuesday 25 April 2006 16:03, Jay R. Ashworth wrote:

...
My one fond wish, which I've come to understand will never happen with MWtext, is to get *bold* and _italics_ as markup tokens, noting that

Wouldn't that be *bold*, _underlined_ and /italics/? :-P

I'd rather not. /italics/ always looks like a matching regex to me. It's confusing.

Yeah you are right. Funny thing is, even with me /(writing|carving)/ regexs thousand times a day, it didnt occur to me that it looks like one. :)

I still like the POD way to do B<bold>, I<italics>, U<underlined>, C<code>, L<link> text. But its more to type, especially the shifty angle brackets.

As they say: simple, easy-to-parse, nestable. Pick two :)

Best wishes,

Tels

-- Signed on Wed Apr 26 00:47:06 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. "Let's say there are a thousand. But there are 284 million people in this country. You can't have public policy that is aimed at 100,000 people when the other multi-multi-millions are also involved. You can't do it that way." - Jack Valenty in http://tinyurl.com/2y65n

Timwi

25 Apr 25 Apr

8:15 p.m.

Hello everybody. I haven't participated on this list for a while, but I have been drawn attention to this thread.

...

With that said, I am interest in such project if it involves coding. To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

I am interested in continuing developing my parser (flexbisonparse) if other people are interested in helping me. I am happy to explain what I remember of how it works, because I know it's really hard to figure out, but I'm sure it's not that hard to explain.

I am disappointed that people are *still* trying to re-start the effort from scratch. Surely the plethora of existing parsers has shown that every new effort will end up the same, especially if no effort is made to understand the existing unfinished products and to recognise their flaws and faults. You'll just make the same mistakes again and again.

My mistake was to fail to recognise the importance and complexity of HTML and HTML-like tags in the wiki mark-up. My parser can parse everything non-HTML/SGML-based that was part of the syntax at the time I wrote it. With co-operation, I'm sure we can do the rest. Without, I'm sure no-one can.

Timwi

Pedro de Medeiros

8:44 p.m.

On 4/25/06, Timwi timwi@gmx.net wrote:

...

Hello everybody. I haven't participated on this list for a while, but I have been drawn attention to this thread.

...
With that said, I am interest in such project if it involves coding. To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

I am interested in continuing developing my parser (flexbisonparse) if other people are interested in helping me. I am happy to explain what I remember of how it works, because I know it's really hard to figure out, but I'm sure it's not that hard to explain.

Yes, that was exactly my point previously (about being hard to figure it out). I just assumed this project was dead and no one would help me understand it. If you can help me, I won't think twice before working on it. :)

...

I am disappointed that people are *still* trying to re-start the effort from scratch. Surely the plethora of existing parsers has shown that every new effort will end up the same, especially if no effort is made to understand the existing unfinished products and to recognise their flaws and faults. You'll just make the same mistakes again and again.

My take on "restarting" doesn't actually mean doing everything from zero and forgetting everything that was previously attempted. Depending on how much I can make of existing code, it may involve breaking it down, learning from it and reassembling as I understand how it works and maybe rewriting things that I find are confusing and could improve.

That is what I think is implied when working on free software: you reuse and rebuild, standing on the shoulders of previous developers. :)

...

My mistake was to fail to recognise the importance and complexity of HTML and HTML-like tags in the wiki mark-up. My parser can parse everything non-HTML/SGML-based that was part of the syntax at the time I wrote it. With co-operation, I'm sure we can do the rest. Without, I'm sure no-one can.

Maybe we should talk about what is missng and what to do next pvt'ly?

Rich Morin

9:28 p.m.

At 1:44 PM -0400 4/25/06, Pedro de Medeiros wrote:

...

That is what I think is implied when working on free software: you reuse and rebuild, standing on the shoulders of previous developers. :)

...

The famous quotation...

If I have seen further it is by standing on the shoulders of Giants -- Isaac Newton in: Letter to Robert Hooke, February 5, 1675/1676

...inspired the following variations:

In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand. -- Gerald Holton

If I have not seen as far as others, it is because giants were standing on my shoulders. -- Hal Abelson

In computer science, we stand on each other's feet. -- Brian K. Reid

http://reddit.com/info/13pj/comments

-- http://www.cfcl.com/rdm Rich Morin http://www.cfcl.com/rdm/resume rdm@cfcl.com http://www.cfcl.com/rdm/weblog +1 650-873-7841 Technical editing and writing, programming, and web development

Timwi

26 Apr 26 Apr

1:46 a.m.

Pedro de Medeiros wrote:

...

Maybe we should talk about what is missng and what to do next pvt'ly?

Sure.

Magnus Manske

12:21 p.m.

Timwi schrieb:

...

Hello everybody. I haven't participated on this list for a while, but I have been drawn attention to this thread.

...
With that said, I am interest in such project if it involves coding. To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.

I am interested in continuing developing my parser (flexbisonparse) if other people are interested in helping me. I am happy to explain what I remember of how it works, because I know it's really hard to figure out, but I'm sure it's not that hard to explain.

I am disappointed that people are *still* trying to re-start the effort from scratch. Surely the plethora of existing parsers has shown that every new effort will end up the same, especially if no effort is made to understand the existing unfinished products and to recognise their flaws and faults. You'll just make the same mistakes again and again.

As this is probably aimed at me :-) I tried to improve the flexbisonparse software, but I don't have that much experience with the matter, so I failed to understand the inner workings, especially of the lexer, which frankly looks like crypto to me ;-)

My wiki2xml acually rebulids the workings of a "real" parser, except it is not generated by a compiler-compiler but manually. While this potentially adds another error source, it is not that different from a real parser IMHO.

...

My mistake was to fail to recognise the importance and complexity of HTML and HTML-like tags in the wiki mark-up. My parser can parse everything non-HTML/SGML-based that was part of the syntax at the time I wrote it. With co-operation, I'm sure we can do the rest. Without, I'm sure no-one can.

I think you'll also find that "template hell" has gotten worse since the day. While improving wiki2xml, I found lots of constructs that cannot be resolved by putting the template name and paramaters in a neat XML tag. Live transclusion and changing of the very text that is parsed is IMHO the only solution to generate valid XML while maintaining the indended result. This might prove hard to do in bison, though.

Magnus

Timwi

7:58 p.m.

...

As this is probably aimed at me :-)

No, it wasn't aimed at anyone in particular. The criticism was general and not aimed at your particular attempt. I know far too little about it to even say whether I would consider it a parser or not.

Timwi

Ævar Arnfjörð Bjarmason

28 Apr 28 Apr

2:06 p.m.

I think all of you are missing a pretty big point here, once you have a proper parser it doesn't matter what wikisyntax looks like by default. You could make it spit out content from its internal parse tree in many formats (DocBook, XHTML, POD, plaintext, LaTeX) and you could, if you were willing to accept losing some formatting changes (although that need not be the case) allow users to write text in any of these formats, then parse it and save some sort of glorified parse tree in the db.

6788

Age (days ago)

6793

Last active (days ago)

wikitech-l@lists.wikimedia.org

61 comments

18 participants

tags (0)

participants (18)

Andy Rabagliati
Brion Vibber
Chad Perrin
Dirk Riehle
Elliott F. Cable
Gregory Maxwell
Ivan Krstic
Jay R. Ashworth
Jeremy Dunck
Lars Aronsson
Magnus Manske
Pedro de Medeiros
Plyd
Rich Morin
Tels
Timwi
William Allen Simpson
Ævar Arnfjörð Bjarmason