Wiki parser in JavaScript

List overview All Threads
Download

newer

older

[Announce] <graph> plugin

Standalone Wikitext Parsing Code...

Pedro Fayolle

23 Jan 2005 23 Jan '05

12:48 p.m.

Hi,

I've been coding a wiki parser in JavaScript with the hope it could be of some use for the project (especially in giving some relief to the servers).

You can view a demo here: http://gusanos.sourceforge.net/wp/wikitest.htm

So far it supports the following features:

*Headings (all levels) *Normal paragraphs *Internal and external links (with hidden namespaces and parentheses even) *Normal inline formatting (italics and bold) *Lists and definition lists (can be nested) *Tables with full nesting (can even nest other tables) *Horizontal bars

What still needs to be done:

*<nowiki> tags *Undesired HTML stripping *Images *Interproject links *Interwiki links *Categories *Signatures *TOC *Hierogliphs *Templates *?

I think this could be useful for quick previews, avoiding extra server hits.

Some notes:

I've tested it in the following browsers (all of in which more or less works): Firefox, Opera 7, Konqueror 3.3, Internet Explorer 6.

IE5 just yields an error and does nothing, but I don't feel like wasting time making it work in it.

What do you think?

-Pedro Fayolle (aka Pilaf)

Show replies by date

Robert Jones

23 Jan 23 Jan

4:01 p.m.

This is a great script. These links might be useful to you when making the script display images as the first folder is the first character of an MD5 hash of the image name, and the sub-folder is the first two characters of the same hash. E.g. the MD5 of Wiki.jpg is dfc8b3d43bf2b5d7ef76c43459f6b06f so Wiki.jpg so the location of the image path ends: /d/df/Wiki.jpg

http://pajhome.org.uk/crypt/md5/ http://pajhome.org.uk/crypt/md5/md5src.html

...

-----Original Message----- From: wikitech-l-bounces@wikimedia.org [mailto:wikitech-l-bounces@wikimedia.org] On Behalf Of Pedro Fayolle Sent: 23 January 2005 18:48 To: wikitech-l@wikimedia.org Subject: [Wikitech-l] Wiki parser in JavaScript

Hi,

I've been coding a wiki parser in JavaScript with the hope it could be of some use for the project (especially in giving some relief to the servers).

You can view a demo here: http://gusanos.sourceforge.net/wp/wikitest.htm

So far it supports the following features:

*Headings (all levels) *Normal paragraphs *Internal and external links (with hidden namespaces and parentheses even) *Normal inline formatting (italics and bold) *Lists and definition lists (can be nested) *Tables with full nesting (can even nest other tables) *Horizontal bars

What still needs to be done:

*<nowiki> tags *Undesired HTML stripping *Images *Interproject links *Interwiki links *Categories *Signatures *TOC *Hierogliphs *Templates *?

I think this could be useful for quick previews, avoiding extra server hits.

Some notes:

I've tested it in the following browsers (all of in which more or less works): Firefox, Opera 7, Konqueror 3.3, Internet Explorer 6.

IE5 just yields an error and does nothing, but I don't feel like wasting time making it work in it.

What do you think?

-Pedro Fayolle (aka Pilaf) _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Magnus Manske

4:51 p.m.

Pedro Fayolle schrieb:

...

Hi,

I've been coding a wiki parser in JavaScript with the hope it could be of some use for the project (especially in giving some relief to the servers).

You can view a demo here: http://gusanos.sourceforge.net/wp/wikitest.htm

This works very well, but is it really a *parser* or "just" a *converter*? Don't get me wrong, I fell for the same thing at least twice :-)

Timwi and I (well, mostly Timwi) have been working on a *parser* wiki-to-XML, written in Bison, a parser-generator language that outputs C code. It is already pretty advanced (HTML parsing is currently kinda broken, though), and having an XML file as output beats wiki-to-HTML, as it can be converted into (X)HTML just as easily (relatively speaking;-) as into PDF, OpenOffice XML, RDF, or something else entirely (even back into wiki text, as a markup beautifier!).

So, if you've written a converter, well, great, but I wrote at least three of them in C++, and abandoned them all in different stages of development (at least one should do most of yours, including TOCs). Language flamewars aside, I think exchanging the current PHP converter for another is such a complex task (talking integration in the existing framework) that "merly" improving conversion speed might not be worth the effort.

If, however, you've written a "real" parser, it would be most interesting to know if it could produce XML instead.

Anyway, you might like to look at the Bison code of Timwis parser, in the CVS, module "flexbisonparse". You can try it with the CVS HEAD wikipedia, if you follow the instructions in "ParserXML.php", in case you're interested.

Magnus

David Gerard

4:54 p.m.

Magnus Manske (magnus.manske@web.de) [050124 09:51]:

...

Pedro Fayolle schrieb:

...

...
You can view a demo here: http://gusanos.sourceforge.net/wp/wikitest.htm

...

So, if you've written a converter, well, great, but I wrote at least three of them in C++, and abandoned them all in different stages of development (at least one should do most of yours, including TOCs). Language flamewars aside, I think exchanging the current PHP converter for another is such a complex task (talking integration in the existing framework) that "merly" improving conversion speed might not be worth the effort.

The difference with this one is that it's browser-side. As an editor, I would find something like this most useful for those times when the wiki is too slow even to give previews!

What sort of load does hitting 'preview' put on the site?

(I suppose it'd still have to load images and so on to really work ...)

- d.

Frank v Waveren

5:51 p.m.

On Mon, Jan 24, 2005 at 09:54:57AM +1100, David Gerard wrote:

...

(I suppose it'd still have to load images and so on to really work ...)

The images should be cached, and even if they're not, they're static content and should only hurt bandwidth-wise, not server-load-wise.

-- Frank v Waveren Fingerprint: BDD7 D61E fvw@[var.cx|stack.nl] ICQ#10074100 5D39 CF05 4BFC F57A Public key: hkp://wwwkeys.pgp.net/468D62C8 FA00 7D51 468D 62C8

Pedro Fayolle

6:34 p.m.

I guess I don't have such a clear understanding of what "parser" stands for and what it doesn't, so I might have misnamed my script as it better fits the definition of a converter since it's wiki to HTML only. Sorry about the confusion.

Anyway, the aim of this script is not to add flexibility to wiki code as Timwi's project (which looks very promissing, BTW), but to allow editors to generate instantaneous page previews in their browsers, thus enhancing their experience editing the wiki while reducing the load on the servers.

I haven't tried this yet, but I think a live-preview feature could be attached to the edit page by adding the script to the footer of the page (which is a simple MediaWiki message). I have done this before with the insertable special characters box in the Spanish wiki and worked just fine (seems like the English wiki is using now too, I haven't checked if it's the same method though). I will try it later on my own MediaWiki installation.

Once again, sorry about the confusion. I will rename the project to avoid further misunderstandings.

-Pedro

On Sun, 23 Jan 2005 23:51:01 +0100, Magnus Manske magnus.manske@web.de wrote:

...

Pedro Fayolle schrieb:

...
Hi,

I've been coding a wiki parser in JavaScript with the hope it could be of some use for the project (especially in giving some relief to the servers).

You can view a demo here: http://gusanos.sourceforge.net/wp/wikitest.htm

This works very well, but is it really a *parser* or "just" a *converter*? Don't get me wrong, I fell for the same thing at least twice :-)

Timwi and I (well, mostly Timwi) have been working on a *parser* wiki-to-XML, written in Bison, a parser-generator language that outputs C code. It is already pretty advanced (HTML parsing is currently kinda broken, though), and having an XML file as output beats wiki-to-HTML, as it can be converted into (X)HTML just as easily (relatively speaking;-) as into PDF, OpenOffice XML, RDF, or something else entirely (even back into wiki text, as a markup beautifier!).

So, if you've written a converter, well, great, but I wrote at least three of them in C++, and abandoned them all in different stages of development (at least one should do most of yours, including TOCs). Language flamewars aside, I think exchanging the current PHP converter for another is such a complex task (talking integration in the existing framework) that "merly" improving conversion speed might not be worth the effort.

If, however, you've written a "real" parser, it would be most interesting to know if it could produce XML instead.

Anyway, you might like to look at the Bison code of Timwis parser, in the CVS, module "flexbisonparse". You can try it with the CVS HEAD wikipedia, if you follow the instructions in "ParserXML.php", in case you're interested.

Magnus _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Black Fox

6:43 p.m.

Having when editing 50% of the screen place occupied by the wiki source and 50% occupied by the preview could be cool. Especially if the preview is in a scrollable box, scrolled by default to the position of the cursor in the textarea and updated in realtime (This have to be optional since for slow computers it could be dangerous :D)

Even if the preview isn't perfect the fact that it could be "real time" and on the client side make it usefull i think.

Tim Starling

7:07 p.m.

Black Fox wrote:

...

Having when editing 50% of the screen place occupied by the wiki source and 50% occupied by the preview could be cool. Especially if the preview is in a scrollable box, scrolled by default to the position of the cursor in the textarea and updated in realtime (This have to be optional since for slow computers it could be dangerous :D)

Even if the preview isn't perfect the fact that it could be "real time" and on the client side make it usefull i think.

I think a better way to produce instant previews would be using Mozile, or the IE equivalent whatever it's called.

http://mozile.mozdev.org/

That way you get WYSIWYG editing, not just fast previews. Just press F7 on any page view (except ones containing irreversible wikitext to XHTML conversion), and you can edit the text right there on the view page.

Converting back from XHTML to wikitext is made a bit difficult because it's ambiguous. The algorithm would have to minimise the differences between the old wikitext and the new wikitext.

Infobox-style templates can be handled by putting a separate editable div around them. Articles with templates that are inline with text couldn't be handled in this way.

Just something I've been musing about. It was also discussed on meta a while ago.

-- Tim Starling

Pedro Fayolle

7:29 p.m.

...

...
Having when editing 50% of the screen place occupied by the wiki source and 50% occupied by the preview could be cool. Especially if the preview is in a scrollable box, scrolled by default to the position of the cursor in the textarea and updated in realtime (This have to be optional since for slow computers it could be dangerous :D)

As-you-type previews with the current code is not something I would recommend, especially for long pages, as the script would run through the entire wikicode every time you press a key. Anyway, what's the big deal of hitting the preview button every once in a while?

...

...
Even if the preview isn't perfect the fact that it could be "real time" and on the client side make it usefull i think.

That's the whole point of it :o) I'll try to do it as feature-complete as possible, though, so it should be close-to-perfect in the end.

-Pedro

Brion Vibber

24 Jan 24 Jan

12:23 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Pedro Fayolle wrote: | I've been coding a wiki parser in JavaScript with the hope it could be | of some use for the project (especially in giving some relief to the | servers). [snip] | I think this could be useful for quick previews, avoiding extra server hits.

Honestly, I don't think this is a viable path.

Preview is valuable because it produces *exactly* the output that the wiki does. A JavaScript work-alike parser is unlikely to work exactly the same even in the best case, and isn't going to provide for extensions at all without invoking the PHP parser on the server.

Having two parallel parsers is also a bad practice, introducing extra work in maintaining them both and keeping them in sync. Someday we may have a real working 'alternate' parser, but if so it's going to have to prove itself worth the effort of maintaining it; as it is we have enough trouble with sloppily-written pages not working when copied to another wiki that's not running with tidy to clean up HTML kinks, and that's just a post-processing phase rather than the parser itself.

There's some experimental code in 1.5 for fetching previews via an XmlHTTPRequest, which avoids the skin rendering overhead (and in many cases should avoid message cache initialization overhead) required for a full HTML page submission while letting the PHP parser return real rendering results. Most of the time spent in rendering non-trivial pages is concentrated in a few hotspots (particularly title normalization, link checking, and link generation) and IMO optimization effort would be better spent on these.

This is not to say that a JavaScript wikitext parser is useless; but I don't think we would be able to use it for things like previews.

- -- brion vibber (brion @ pobox.com)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (Darwin) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFB9JRcwRnhpk1wk44RApfvAJ9h+DVRGvZt2CoOJmgYAOvUEuEJVACdFtFa P6UpbmxHcgq2BWo99t7n9c8= =LnJY -----END PGP SIGNATURE-----

Pedro Fayolle

11 a.m.

...

Preview is valuable because it produces *exactly* the output that the wiki does. A JavaScript work-alike parser is unlikely to work exactly the same even in the best case, and isn't going to provide for extensions at all without invoking the PHP parser on the server.

I never intended this to be a replacement for the current preview as it simply is impossible to provide a feature-complete preview without invoking server-side processing. On the contrary, this is just an extension for people who understands the limitations of using such an alternative. For those people I believe this could prove to be very useful.

...

Having two parallel parsers is also a bad practice, introducing extra work in maintaining them both and keeping them in sync. Someday we may have a real working 'alternate' parser, but if so it's going to have to prove itself worth the effort of maintaining it; as it is we have enough trouble with sloppily-written pages not working when copied to another wiki that's not running with tidy to clean up HTML kinks, and that's just a post-processing phase rather than the parser itself.

Now, I don't even pretend this to become integrated with the current software, I don't even see the need for it as long as there are ways of customizing one's wiki (namely with monobook.js). Of course not everyone would understand the difference between server-side and client-side previews, so having this enabled by default would be a big mistake.

So, in other words, I want this to be some sort of plug-in which particular users could install for themselves _if_ they want to. I'm sure there's nothing wrong with that. What's more, I think MediaWiki could do with some more ways of customizing user interfaces.

...

There's some experimental code in 1.5 for fetching previews via an XmlHTTPRequest, which avoids the skin rendering overhead (and in many cases should avoid message cache initialization overhead) required for a full HTML page submission while letting the PHP parser return real rendering results. Most of the time spent in rendering non-trivial pages is concentrated in a few hotspots (particularly title normalization, link checking, and link generation) and IMO optimization effort would be better spent on these.

Sounds really interesting. I'm really for that kind of a approaches. Look at GMail for instance, I think it's amazing. MediaWiki could take great advatange from that kind of techniques.

...

This is not to say that a JavaScript wikitext parser is useless; but I don't think we would be able to use it for things like previews.

Hm... since I can't fully agree here I can only hope I stated my view clearly. In the meanwhile I will continue to enhance and extend the script to support as many features as possible.

Regards,

-Pedro

7268

Age (days ago)

7269

Last active (days ago)

wikitech-l@lists.wikimedia.org

10 comments

8 participants

tags (0)

participants (8)

Black Fox
Brion Vibber
David Gerard
Frank v Waveren
Magnus Manske
Pedro Fayolle
Robert Jones
Tim Starling