templates fields delimiters

List overview All Threads
Download

newer

older

Use wfMsg instead of str_replace

Custom tags

Ashar Voultoiz

28 Jul 2004 28 Jul '04

2:17 p.m.

Hello,

There is a bad bug currently with templates : http://sourceforge.net/tracker/?func=detail&atid=411192&aid=965725&a...

Basicly, we use | as a delimiter between parameters which prevent user to rename links ( [[foo|blabla]] ) :

Wich will not be parsed correctly.

In the bug report, a proposal is to use |CR instead :

The parser will then be much easier and the |CR doesn't conflict with anything.

The fix in the code seems trivial but users will have to fix the already existing templates (but he we are still in beta).

-- Ashar Voultoiz

Show replies by date

Rowan Collins

28 Jul 28 Jul

4:48 p.m.

...

There is a bad bug currently with templates : http://sourceforge.net/tracker/?func=detail&atid=411192&aid=965725&a...

...

{{template para1=[[bla|something]]| para2=ooo| }}

The parser will then be much easier and the |CR doesn't conflict with anything.

I've replied at Sourceforge already, but to reiterate: I think enforcing <CR> is a bad idea, because it forces layout which may not be appropriate in all circumstances. Better to use something equally unique, but less obtrusive, such as "||":

-- Rowan Collins BSc [IMSoP]

Timwi

5:27 p.m.

Rowan Collins wrote:

...

I've replied at Sourceforge already, but to reiterate: I think enforcing <CR> is a bad idea, because it forces layout which may not be appropriate in all circumstances. Better to use something equally unique, but less obtrusive, such as "||":

{{template || para1=[[bla|something]] || para2=ooo }}

Neither of your suggestions would fix something something like this:

{{template1||para1={{template2||para=hah, you're screwed!}}}}

(or would it?)

Better to fix the parser properly...

Timwi

David Iberri

5:41 p.m.

--- Rowan Collins rowan.collins@gmail.com wrote:

...

...
There is a bad bug currently with templates : http://sourceforge.net/tracker/?func=detail&atid=411192&aid=965725&a...

I've replied at Sourceforge already, but to reiterate: I think enforcing <CR> is a bad idea, because it forces layout which may not be appropriate in all circumstances. Better to use something equally unique, but less obtrusive, such as "||":

{{template || para1=[[bla|something]] || para2=ooo }}

Sorry if this is obvious, but maybe something could be learned by the way the parser handles wikilinks in image captions. For example, wikilinks with alt. titles work fine, even though image properties are separated by pipes:

[[Image:George-Washington.jpg|thumb|200px|[[George Washington|Washington]] was...]]

-- David Iberri

Rowan Collins

7:52 p.m.

On Wed, 28 Jul 2004 10:41:18 -0700 (PDT), David Iberri diberri@yahoo.com wrote:

...

Sorry if this is obvious, but maybe something could be learned by the way the parser handles wikilinks in image captions. For example, wikilinks with alt. titles work fine, even though image properties are separated by pipes

So they do... Question is, does anyone know how that *does* work? [Parser.php is more than a little, um, opaque...]

-- Rowan Collins BSc [IMSoP]

Gabriel Wicke

8:01 p.m.

On Wed, 2004-07-28 at 20:52 +0100, Rowan Collins wrote:

...

So they do... Question is, does anyone know how that *does* work? [Parser.php is more than a little, um, opaque...]

Yes, by calling replaceInternalLinks twice ;-)

-- Gabriel Wicke

Timwi

10:30 p.m.

Gabriel Wicke wrote:

...

On Wed, 2004-07-28 at 20:52 +0100, Rowan Collins wrote:

...
So they do... Question is, does anyone know how that *does* work? [Parser.php is more than a little, um, opaque...]

Yes, by calling replaceInternalLinks twice ;-)

Ouch.

Gabriel Wicke

11:21 p.m.

On Wed, 2004-07-28 at 23:30 +0100, Timwi wrote:

...

...
Yes, by calling replaceInternalLinks twice ;-)

Ouch.

Not necessarily- it's more or less the fastest way to do this with a regex-based parser like the current MediaWiki one. The second call ist not expensive. Searches the entire text though..

-- Gabriel Wicke

Ævar Arnfjörð Bjarmason

29 Jul 29 Jul

10:38 a.m.

I think this is one of the bugs with the highest priority, people are using all sorts of hacks to get around it currently:

1. using things like {| {{infobox_args}} ... in tables 2. Just making incorrect links and creating redirects for them 3. Manually creating thumbnails to use as images.

Isnt this just the kind of parser in use currently, it is possible to do something like this in the bash shell for example: $ echo $(($((2+2))+1+$(echo 1))) 6

On Thu, 29 Jul 2004 01:21:09 +0200, Gabriel Wicke lists@wikidev.net wrote:

...

On Wed, 2004-07-28 at 23:30 +0100, Timwi wrote:

...
...
Yes, by calling replaceInternalLinks twice ;-)

Ouch.

Not necessarily- it's more or less the fastest way to do this with a regex-based parser like the current MediaWiki one. The second call ist not expensive. Searches the entire text though.. -- Gabriel Wicke

Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Bill Clark

28 Jul 28 Jul

8:01 p.m.

On Wed, 28 Jul 2004 20:52:24 +0100, Rowan Collins rowan.collins@gmail.com wrote:

...

So they do... Question is, does anyone know how that *does* work? [Parser.php is more than a little, um, opaque...]

I haven't looked at it (and don't have time at this exact moment) but I imagine it's something like this: Keep track of how deep you are nested within the [['s, and treat the | characters differently according to context. If you're at the top level, a | character is a delimiter for that level, but if you're already nested inside another [[ level (i.e. you're inside a link) then treat the | character according to the piped-link interpretation. Look for ]]'s to know when you should pop back out to the previous level/context.

-Bill Clark

Ashar Voultoiz

29 Jul 29 Jul

5:52 p.m.

Bill Clark wrote:

...

On Wed, 28 Jul 2004 20:52:24 +0100, Rowan Collins rowan.collins@gmail.com wrote:

...
So they do... Question is, does anyone know how that *does* work? [Parser.php is more than a little, um, opaque...]

I haven't looked at it (and don't have time at this exact moment) but I imagine it's something like this: Keep track of how deep you are nested within the [['s, and treat the | characters differently according to context. If you're at the top level, a | character is a delimiter for that level, but if you're already nested inside another [[ level (i.e. you're inside a link) then treat the | character according to the piped-link interpretation. Look for ]]'s to know when you should pop back out to the previous level/context.

-Bill Clark

Hello,

That will be the way to do it using a tokenizer. Unfortunatly the tokenizer is disabled for performance issue.

-- Ashar Voultoiz

Ævar Arnfjörð Bjarmason

7:59 p.m.

So in other words the code exists but is not currently enabled?

On Thu, 29 Jul 2004 19:52:39 +0200, Ashar Voultoiz thoane@altern.org wrote:

...

Bill Clark wrote:

...
On Wed, 28 Jul 2004 20:52:24 +0100, Rowan Collins rowan.collins@gmail.com wrote:

...
So they do... Question is, does anyone know how that *does* work? [Parser.php is more than a little, um, opaque...]

I haven't looked at it (and don't have time at this exact moment) but I imagine it's something like this: Keep track of how deep you are nested within the [['s, and treat the | characters differently according to context. If you're at the top level, a | character is a delimiter for that level, but if you're already nested inside another [[ level (i.e. you're inside a link) then treat the | character according to the piped-link interpretation. Look for ]]'s to know when you should pop back out to the previous level/context.

-Bill Clark

Hello,

That will be the way to do it using a tokenizer. Unfortunatly the tokenizer is disabled for performance issue.

-- Ashar Voultoiz

Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Gabriel Wicke

31 Jul 31 Jul

2:45 p.m.

On Thu, 2004-07-29 at 19:59 +0000, Ævar Arnfjörð Bjarmason wrote:

...

So in other words the code exists but is not currently enabled?

The code exists in php, but is a few times slower than the current parser. That tokenizer also handles only a few of [count > 20] passes.

I'm currently writing a new parser using BisonGen (builds both a C python module and a pure python parser) that handles the entire parsing in one step. The C version also performs very well (0.014 seconds vs. 0.17 seconds for the pure python version). The output will be a DOM object tree, includes and the like will be handled by manipulating that tree before dumping it as [insert your favourite format here]. Where feasible, this parser also supports the current Moin syntax additional to the MW one, it's intended to work with Moin of course (which has a relatively clean design and profits from the python infrastructure). Some more details at http://moinmoin.wikiwikiweb.de/NewWikiParser.

-- Gabriel Wicke

Timwi

3:01 p.m.

Gabriel Wicke wrote:

...

I'm currently writing a new parser using BisonGen (builds both a C python module and a pure python parser) that handles the entire parsing in one step.

Oooh. Very good. I was wondering when someone would start doing this and whether I would have to do it.

I would like to know -- how are you going to describe the syntax? Is it going to be extensible enough to allow for new syntax elements later (of if a bug creeps into the grammar, will it be easy to fix without re-writing the entire parser)?

Thanks! Timwi

7301

Age (days ago)

7304

Last active (days ago)

wikitech-l@lists.wikimedia.org

13 comments

7 participants

tags (0)

participants (7)

Ashar Voultoiz
Bill Clark
David Iberri
Gabriel Wicke
Rowan Collins
Timwi
Ævar Arnfjörð Bjarmason