Wikitext-l January 2008

wikitext-l@lists.wikimedia.org

5 participants
6 discussions

Philosophising about grammars
by Steve Bennett 30 Jan '08

30 Jan '08

One issue I've been having has to do with high level punctuation getting tangled up in embedded text. In wikitext, it's generally ok to write a literal ]] - it means two right square brackets in a row. But of course in an [[image:foo.jpg|caption - ]] means the end of the image element, not just raw text. I can see, and have sort of tried, three ways to handle this: 1) Using traditional grammar approaches, backtracking and so forth, hoping the parser is smart enough to match the right string, and "pull back" at the right moment. Unfortunately, this seems very difficult without an extremely good knowledge of the compiler compiler, and is probably slow to boot. 2) Using bottom up* context flags like "inside image element", so when an "]]" is found, we know whether or not we can treat them as literals. Problem: you end up smearing knowledge about the image element everywhere: why does the RIGHT_SQUARE_BRACKET literal want to know anything about image elements? 3) Using top down restrictions on literals like "prohibit literal double right square bracket". Similar to 2), but when a "]]" is found it just dumbly looks at the corresponding flag to decide whether to match it as literal. Method 3 seems the most promising now. I was using 2), but it seemed to become very complex all of a sudden. I now have code that looks like this: image_caption @init {prohibit_literal_link_end++; prohibit_literal_pipe++;} : inline_text? -> ^(TEXT inline_text); finally {prohibit_literal_link_end--; prohibit_literal_pipe--;} ... literal_link_end: {prohibit_literal_link_end <= 0}? => link_end; This seems to be relatively readable too: "An image caption is any text, except that there can't be an unescaped literal pipe or link_end (]]) in it." and "A literal link end is whenever you encounter a raw link_end, unless someone has said you can't." Seems to keep me a bit saner, too. Anyway, just thought I would share. Steve * I'm using the terms 'bottom up' and 'top down' extremely loosely here.

1 0

Re: [Wikitext-l] ANTLR grammar
by David Gerard 25 Jan '08

25 Jan '08

On 25/01/2008, Steve Bennett <stevage(a)gmail.com> wrote: > Meant to publish the grammar before I go away for this long weekend. > Didn't get around to it, so in case someone starts clamoring for a > look at it, here it is. In totally rough, ugly, draft state of course. I've put it up on mediawiki.org for further hacking :-) http://www.mediawiki.org/wiki/Markup_spec/ANTLR/draft Also edited: http://www.mediawiki.org/wiki/Markup_spec/ANTLR http://www.mediawiki.org/wiki/Markup_spec/ANTLR/Images - d.

1 0

What needs to be done/can be done for the ANTLR grammar?
by David Gerard 24 Jan '08

24 Jan '08

Steve (and others): What needs to be done for the ANTLR grammar that can be parallelised, so that the many people desperately after reliable independent parsing of wikitext can contribute to the effort? Also: how to speed up ANTLR-generated PHP, so this has half a chance of being implemented? - d.

2 1

Fwd: [Mediawiki-l] Wiki Creole grammar, schema, transformations made available
by Thomas Dalton 21 Jan '08

21 Jan '08

Forwarding, just in case anyone is on this list that isn't on the main mediawiki one. ---------- Forwarded message ---------- From: Dirk Riehle <dirk(a)riehle.org> Date: 20 Jan 2008 19:25 Subject: [Mediawiki-l] Wiki Creole grammar, schema, transformations made available To: wiki-research-l(a)lists.wikimedia.org, mediawiki-l(a)lists.wikimedia.org For those who were interested in a Mediawiki grammar etc, here is a first step: -------- For research purposes as well as the Wiki Creole community's convenience, we are making our EBNF grammar, the XML schema definition, and the to/from XML transformations available. You can use these specifications to create your own parsers as well as use standard technology (DOM, XSLT) to work with wiki pages and display or save them. For more, see the dedicated Wiki Creole page at http://www.riehle.org/wiki-creole as well as the WikiCreole community at http://www.wikicreole.com Dirk -- Phone: + 1 (650) 215 3459 Web: http://www.riehle.org _______________________________________________ MediaWiki-l mailing list MediaWiki-l(a)lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

3 2

Fwd: [Mediawiki-l] numbered list broken by image or template
by David Gerard 20 Jan '08

20 Jan '08

How are numbered lists implemented in the present grammar? Would it be hard (in future) to put in some sort of number-from provision or tell the parser not to insert a </ol>? - d. ---------- Forwarded message ---------- From: Herta Van den Eynde <herta.vandeneynde(a)gmail.com> Date: 16 Jan 2008 13:23 Subject: Re: [Mediawiki-l] numbered list broken by image or template To: MediaWiki announcements and site admin list <mediawiki-l(a)lists.wikimedia.org> On 16/01/2008, Kilian <winkelklammern(a)texttheater.de> wrote: > Am Mittwoch, den 16.01.2008, 13:38 +0100 schrieb Herta Van den Eynde: > > When you use a numbered list, and insert an image or a template, the > > numbering is broken. > > E.g. > > > > # one > > # two > > [[Image:some-image.png]] > > # three > > > > will display: > > > > 1. one > > 2. two > > > > Image:some-image.png > > > > 1. three > > > > > > Is there a way to restart the numbering where you left of, so that the > > third element still reads: > > > > 3. three > > > > Kind regards, > > > > Herta > > > > Hi Herta, > > the problem is not the image but the line break. Here's how to mask it > such as not to break the item: > > # one > # two<br/>[[Image:Some-image.png]] > # three > > ~ Kilian Thanks, Kilian. That does indeed solve the problem with images. Unfortunately many (most?) of our templates contain line breaks. Any way to work around those? Kind regards, Herta -- Herta Van den Eynde "Life on Earth may be expensive, but it comes with a free ride around the Sun." _______________________________________________ MediaWiki-l mailing list MediaWiki-l(a)lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

5 7

Nested pre/nowiki blocks - bug or just undefined behaviour?
by Steve Bennett 03 Jan '08

03 Jan '08

Compare and contrast: 1. <pre> a <nowiki> block </nowiki> </pre> 2. <pre> a <nowiki> block </pre> 3. a <nowiki> block </nowiki> 4. a <nowiki> block Why is the <nowiki> rendered literally in 2, but stripped out in 1? My working understanding of nowiki and pre was that both of them altered the parsing/lexing behaviour, treating everything other than its closing partner literally. So <pre> <nowiki> </pre> should render <nowiki> literally, and <nowiki> <pre> </nowiki> should render <pre> literally. But this doesn't seem to be quite the case. Would anyone care to hazard a guess as to what the correct behaviour *should* be? Does anyone rely on one treatment over the other? The current behaviour seems inconsistent, especially comparing 2 with 4 above. Steve

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Wikitext-l January 2008