As some of you might remember, I tried to write a wiki(pedia) offline reader/editor a while ago. After some trouble with the GUI, and because someone pointed (rightly!) out that my parser won't do unicode, I stopped working on this.
Lately, I had several people asking me about a stand-alone wiki(pedia) parser. The latest related request was on the mailing list yesterday ("Java code...").
Now, I have created a (partly) working parser, written in C++. It is based on a string object I also wrote, natively using 16-bit chars and thus supporting unicode. (A function to actually import/export mysql unicode remains to be written, though).
Together with this, I started rewriting the common wikipedia functions (skins, languages, user management, etc.). That part is still at its beginning, but it can already render the "Wikipedia:How to edit a page" in a half-complete standard skin. This includes a function to parse the "LanguageXX.php" files, so no need to rewrite all of that. (Import is a little slow, though, so that won't be a permanent solution; it rather screams "conversion").
As an example, the whole thing comes as a command line tool, which can render a wiki-style text (from file or pipe) into HTML (no skin or standard skin, by parameter).
I have commited the sources (hereby under GPL) at the wikipedia CVS, into a new module called "Waikiki" (seemed like the natural extension of wiki to me;-)
This could be the basis for several wiki(pedia) software projects: * Offline editor * Reader, to be distributed on CD/DVD * Wiki(pedia) apache module
So, fire up those compilers and go to work! ;-)
Magnus
Why not just use the existing PHP code? Can't PHP be invoked from the command line just like any C++ program? I'm not very familiar with Wikimedia, but if it's modular then you should just be able to use the existing parser to do exactly what you've done in C++. Plus PHP is free software so it can be used for things like distribution on physical media.
On Sat, 2003-10-11 at 15:04, Magnus Manske wrote:
As some of you might remember, I tried to write a wiki(pedia) offline reader/editor a while ago. After some trouble with the GUI, and because someone pointed (rightly!) out that my parser won't do unicode, I stopped working on this.
Lately, I had several people asking me about a stand-alone wiki(pedia) parser. The latest related request was on the mailing list yesterday ("Java code...").
Now, I have created a (partly) working parser, written in C++. It is based on a string object I also wrote, natively using 16-bit chars and thus supporting unicode. (A function to actually import/export mysql unicode remains to be written, though).
Together with this, I started rewriting the common wikipedia functions (skins, languages, user management, etc.). That part is still at its beginning, but it can already render the "Wikipedia:How to edit a page" in a half-complete standard skin. This includes a function to parse the "LanguageXX.php" files, so no need to rewrite all of that. (Import is a little slow, though, so that won't be a permanent solution; it rather screams "conversion").
As an example, the whole thing comes as a command line tool, which can render a wiki-style text (from file or pipe) into HTML (no skin or standard skin, by parameter).
I have commited the sources (hereby under GPL) at the wikipedia CVS, into a new module called "Waikiki" (seemed like the natural extension of wiki to me;-)
This could be the basis for several wiki(pedia) software projects:
- Offline editor
- Reader, to be distributed on CD/DVD
- Wiki(pedia) apache module
So, fire up those compilers and go to work! ;-)
Magnus
Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Ok, I heard in the last days about Java and C++ parsers. Add mine in Perl and there are three. Next, there's TomeRaider, plus some others I don't remember clearly. Of course one could modify the existing PHP scripts... I think wikipedia will need to take a decision soon.
About my code, it's a SQL dump -> static html converter written in Perl that is working fairly well (for now, only on the English wiki) I didn't make any news about it since i had some fairly busy weeks, and given the current hardware problems, it's better if the focus stays there for a while.
Ah, while it runs, my script can be configured to download new images and Tex formulas directly from the Wikipedia web server, so of course I will NOT run it again until the new server is up :-)
The script is temporarily hosted at:
http://www.arcetri.astro.it/~puglisi/wiki2static.txt
(renamed to "txt" due to some server misconfig). I don't have web space to host the 1GB output...
Ciao, Alfio
On Sat, 11 Oct 2003, Chris Seaton wrote:
Why not just use the existing PHP code? Can't PHP be invoked from the command line just like any C++ program? I'm not very familiar with Wikimedia, but if it's modular then you should just be able to use the existing parser to do exactly what you've done in C++. Plus PHP is free software so it can be used for things like distribution on physical media.
Magnus Manske magnus.manske@web.de writes:
Now, I have created a (partly) working parser, written in C++. It is based on a string object I also wrote, natively using 16-bit chars and thus supporting unicode. (A function to actually import/export mysql unicode remains to be written, though).
...
This could be the basis for several wiki(pedia) software projects:
- Offline editor
- Reader, to be distributed on CD/DVD
- Wiki(pedia) apache module
I've been working on an Emacs major mode for editing wikipedia articles offline (which I've placed at [[Image:Wikipedia-mode.el]], for lack of a better location.) The major missing feature is the ability to download and upload article source (i.e. the stuff in the edit box) automagically to and from the Wikipedia. Is this the kind of thing your parser can help with?
-- CYD
Magnus Manske wrote in part:
This could be the basis for several wiki(pedia) software projects:
- Offline editor
- Reader, to be distributed on CD/DVD
- Wiki(pedia) apache module
In the short term, I think it would be nice to:
* Split off the PHP parser from the OutputPage class, into a PhpParser class. * Create another PHP class called WaikikiParser. This would call Waikiki, through whatever means are convenient. Ideally, it would be a statically linked PHP extension, apparently that's the only means available for efficiently calling C++ from PHP. * Allow people to pick which one they want to use in LocalSettings.php
So in Setup.php we'd have:
$wgParser = new $wgParserName;
And in OutputPage.php we'd have:
function addWikiText( $text ) { global $wgParser; $this->addHTML( $wgParser->parse( $text ) ); }
Just my opinion. I don't know if I will have time to implement this myself.
-- Tim Starling.
wikitech-l@lists.wikimedia.org