Steve (and others): What needs to be done for the ANTLR grammar that
can be parallelised, so that the many people desperately after
reliable independent parsing of wikitext can contribute to the effort?
Also: how to speed up ANTLR-generated PHP, so this has half a chance
of being implemented?
- d.
Forwarding, just in case anyone is on this list that isn't on the main
mediawiki one.
---------- Forwarded message ----------
From: Dirk Riehle <dirk(a)riehle.org>
Date: 20 Jan 2008 19:25
Subject: [Mediawiki-l] Wiki Creole grammar, schema, transformations
made available
To: wiki-research-l(a)lists.wikimedia.org, mediawiki-l(a)lists.wikimedia.org
For those who were interested in a Mediawiki grammar etc, here is a
first step:
--------
For research purposes as well as the Wiki Creole community's
convenience, we are making our EBNF grammar, the XML schema definition,
and the to/from XML transformations available. You can use these
specifications to create your own parsers as well as use standard
technology (DOM, XSLT) to work with wiki pages and display or save them.
For more, see the dedicated Wiki Creole page at
http://www.riehle.org/wiki-creole as well as the WikiCreole community at
http://www.wikicreole.com
Dirk
--
Phone: + 1 (650) 215 3459
Web: http://www.riehle.org
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
How are numbered lists implemented in the present grammar? Would it be
hard (in future) to put in some sort of number-from provision or tell
the parser not to insert a </ol>?
- d.
---------- Forwarded message ----------
From: Herta Van den Eynde <herta.vandeneynde(a)gmail.com>
Date: 16 Jan 2008 13:23
Subject: Re: [Mediawiki-l] numbered list broken by image or template
To: MediaWiki announcements and site admin list
<mediawiki-l(a)lists.wikimedia.org>
On 16/01/2008, Kilian <winkelklammern(a)texttheater.de> wrote:
> Am Mittwoch, den 16.01.2008, 13:38 +0100 schrieb Herta Van den Eynde:
> > When you use a numbered list, and insert an image or a template, the
> > numbering is broken.
> > E.g.
> >
> > # one
> > # two
> > [[Image:some-image.png]]
> > # three
> >
> > will display:
> >
> > 1. one
> > 2. two
> >
> > Image:some-image.png
> >
> > 1. three
> >
> >
> > Is there a way to restart the numbering where you left of, so that the
> > third element still reads:
> >
> > 3. three
> >
> > Kind regards,
> >
> > Herta
> >
>
> Hi Herta,
>
> the problem is not the image but the line break. Here's how to mask it
> such as not to break the item:
>
> # one
> # two<br/>[[Image:Some-image.png]]
> # three
>
> ~ Kilian
Thanks, Kilian. That does indeed solve the problem with images.
Unfortunately many (most?) of our templates contain line breaks. Any
way to work around those?
Kind regards,
Herta
--
Herta Van den Eynde
"Life on Earth may be expensive,
but it comes with a free ride around the Sun."
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Compare and contrast:
1. <pre> a <nowiki> block </nowiki> </pre>
2. <pre> a <nowiki> block </pre>
3. a <nowiki> block </nowiki>
4. a <nowiki> block
Why is the <nowiki> rendered literally in 2, but stripped out in 1?
My working understanding of nowiki and pre was that both of them
altered the parsing/lexing behaviour, treating everything other than
its closing partner literally. So <pre> <nowiki> </pre> should render
<nowiki> literally, and <nowiki> <pre> </nowiki> should render <pre>
literally. But this doesn't seem to be quite the case.
Would anyone care to hazard a guess as to what the correct behaviour
*should* be? Does anyone rely on one treatment over the other? The
current behaviour seems inconsistent, especially comparing 2 with 4
above.
Steve
http://uncyclopedia.org/wiki/Print_Screen
Wikitext: [[Image:Print Screen.gif|thumb|[[Image:Print Screen B.gif|50px]]]]
I expect it would also be useful if you don't have the right script to
hand for a caption.
Steve, does your Image: syntax allow this sort of thing?
- d.
I gather that when wikis were first invented, it was more or less
assumed that everyone knew some HTML and the wikitext syntax language
was simply a shorthand. However, is this still the case, or should it
be considered a markup language in its own right?
Here's a simple example to demonstrate the difference:
----
:one
:two
:three
:four
----
If you consider wikitext to be a markup/formatting/display language,
then you would expect there to be little or no gap between "one" and
"two", a much bigger gap between "two and "three", and twice as big
again between "three" and "four".
That's not what happens. Instead, it's converted to this:
<dl>
<dd>one</dd>
<dd>two</dd>
</dl>
<dl>
<dd>three</dd>
</dl>
<p><br /></p>
<dl>
<dd>four</dd>
</dl>
The significant thing is that the only difference between one/two and
two/three is that the latter is two separate "definition lists" rather
than two list items in the same list. The visual difference is minute.
So, to properly use the : operator, you need to know how the : is
converted into HTML, then how that HTML will render in most browsers.
Is this really what we want? Don't we generally want the wikitext to
render the way the user expects it to, rather than how HTML dictates
it should render? Should we consider going as far as to convert the
above into <span> tags with styles to indent a certain distance from
the left, rather than abusing the <dl> tag this way?
Opinions and comments please!
Steve
I'm about to head off for a week and a half, so here's a quick
progress stop. My ANTLR grammar so far is here:
http://www.mediawiki.org/wiki/User:Stevage/ANTLR
It does many features, but most aren't really complete.
Supports:
* Internal links
* External links (limited range of characters allowed)
* Images (all options)
* Headings (limits on ='s in the text)
* Nowiki, pre
* French punctuation ( foo ? -> foo ?)
* HTML entities ( is recognised, &foo; is converted to literals)
* Dangerous HTML, < -> < etc
* Bold, italics (supports the basic rules, not the single-character stuff)
* Paragraphs
* Space-indented blocks
* Lists (intentionally doesn't support nested ; lists, does support ;foo:blah)
* ISBN, RFC, PMID (fully, I think)
Does not support:
* Categories
* Tables
* Inline HTML (<b>, <div> etc)
* __TOC__ etc
* HTML comments
Other limitations:
* Very reduced ranges of characters for many things, like it doesn't
know that é is a letter rather than punctuation, for instance
* Case sensitivity in some places (<NOWIKI> is not recognised)
At the moment, it simply builds an AST, but converting from that AST
to HTML should be pretty trivial. I have mind some simply
tree-cleaning steps first, like concatenating consecutive P blocks
into one (I'm using BR to indicate a gap of two or more new lines),
concatenating consecutive OL etc.
I offer this up just for curiosity's sake - no one should try and hack on it ;)
[hrm, on closer inspection, that's not the latest version of that
file. oh well.]
Steve
Currently you can pretty much use <pre> anywhere with weird,
unpredictable results:
[[foo|Here's some <pre>text</pre>. Blah.]]
The result isn't terribly useful. This case is slightly more useful:
----
[[image:foo.jpg|thumb|Some <pre>
Pre
formatted
text.
</pre>]]
----
Would it be reasonable to restrict the use of <pre> to:
- General paragraph texts, but not internal or external link captions
- Image captions
Perhaps anywhere else it could be either treated as equivalent to
<nowikI>, ignored, or rendered literally?
Thoughts?
Steve
Can someone with the know-how set this list up so that it is carried by
Gmane? It is one of the few MW lists that isn't, and I find it much easier
to read as a newsgroup than a mailing list.
I had a go myself, but didn't have enough info to complete it.
Cheers,
Mark Clements (HappyDog).
I'm surprised to discover that this works:
----
[[Foo|
blah
blah]]
----
And so does this:
----
[[Image:foo.jpg|
blah
|
thumb
]]
----
And even:
----
[[image:foo.jpg|
some text over here|thumb
]]
----
And even more annoyingly:
----
[[Link|
***This is not a list
]]
----
In all these cases all the newlines in link or image captions are
collapsed down to a single whitespace.
So, the usual questions arise:
1. Does anyone know about, let alone use, this feature?
2. Is it useful?
3. Would anyone mind if it was gone?
My concerns are:
1. Lines that look like lists or space-indented text may not be (which
reduces the number of ways you can parse the text)
2. The behaviour of newlines is different from normal text (normally 2
newlines in a row would give you a paragraph break)
3. The behaviour of newlines is different before and after the first pipe.
Steve