I hacked a little C++ utility (hereby under GPL, source files attached) that converts HTML to wiki markup. A few points: * It should compile on any Windows/*nix system (you might have to turn off warnings, though) * Pipe the HMTL in, and get wiki markup out * Shouldn't touch existing wiki markup * Keeps HTML if there's no wiki markup defined for it * Other wiki markup (for other wikis) can be added with only a few lines of source * Internally uses a new string class with 32-bit-chars (potential for unicode there, also it should work with unicode "as is")
Downsides: * Doesn't check HTML/wiki markup validity (broken HTML will become broken wiki markup, which might be less bad though) * Ignores <nowiki> (though I don't think that matters)
Idea: Have a checkbox on the edit page (or maybe on preferences instead) that says: "Convert HTML to wiki markup on preview"
Conversion *should* only take place prior to preview, so a human can make certain nothing's broken.
Magnus
I want to try to connect this somehow to a CGI to make a webpage where people can paste in HTML and get wikitext out by clicking a button. This would make it easy to convert HTML tables I run across to wikitables quickly, by just a couple copy and pastes. However, I seem to be completely inept at compiling this. I tried "cc main.cpp" but I get complaints about errors from ld about Undefined symbols: std:: ....
I apologize for seeming clueless but my only C++ experience is with Visual C++. Is my build environment messed up or do I need to be compiling differently? Perhaps a makefile would be helpful here. I'm running Mac OS X 10.3 with the developer tools installed.
- David
Magnus Manske wrote:
I hacked a little C++ utility (hereby under GPL, source files attached) that converts HTML to wiki markup. A few points:
- It should compile on any Windows/*nix system (you might have to turn
off warnings, though)
- Pipe the HMTL in, and get wiki markup out
- Shouldn't touch existing wiki markup
- Keeps HTML if there's no wiki markup defined for it
- Other wiki markup (for other wikis) can be added with only a few lines
of source
- Internally uses a new string class with 32-bit-chars (potential for
unicode there, also it should work with unicode "as is")
Downsides:
- Doesn't check HTML/wiki markup validity (broken HTML will become
broken wiki markup, which might be less bad though)
- Ignores <nowiki> (though I don't think that matters)
Idea: Have a checkbox on the edit page (or maybe on preferences instead) that says: "Convert HTML to wiki markup on preview"
Conversion *should* only take place prior to preview, so a human can make certain nothing's broken.
Magnus
Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
David,
I haven't looked at the source, but try 'g++ main.cpp'; it's a C++ program (std is a namespace, a feature that C doesn't know about).
Cheers, Ivan
David Friedland wrote:
I want to try to connect this somehow to a CGI to make a webpage where people can paste in HTML and get wikitext out by clicking a button. This would make it easy to convert HTML tables I run across to wikitables quickly, by just a couple copy and pastes. However, I seem to be completely inept at compiling this. I tried "cc main.cpp" but I get complaints about errors from ld about Undefined symbols: std:: ....
I apologize for seeming clueless but my only C++ experience is with Visual C++. Is my build environment messed up or do I need to be compiling differently? Perhaps a makefile would be helpful here. I'm running Mac OS X 10.3 with the developer tools installed.
- David
Magnus Manske wrote:
I hacked a little C++ utility (hereby under GPL, source files attached) that converts HTML to wiki markup. A few points:
- It should compile on any Windows/*nix system (you might have to turn
off warnings, though)
- Pipe the HMTL in, and get wiki markup out
- Shouldn't touch existing wiki markup
- Keeps HTML if there's no wiki markup defined for it
- Other wiki markup (for other wikis) can be added with only a few
lines of source
- Internally uses a new string class with 32-bit-chars (potential for
unicode there, also it should work with unicode "as is")
Downsides:
- Doesn't check HTML/wiki markup validity (broken HTML will become
broken wiki markup, which might be less bad though)
- Ignores <nowiki> (though I don't think that matters)
Idea: Have a checkbox on the edit page (or maybe on preferences instead) that says: "Convert HTML to wiki markup on preview"
Conversion *should* only take place prior to preview, so a human can make certain nothing's broken.
Magnus
Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Following Ivan's instructions, I got it to compile. I guess I was mistaken in assuming that "cc" auto-detects the language based on file type and calls the correct compiler. The --help was needless to say misleading. However, the resulting executable seems to be broken. The following inputs produce the following outputs:
input: <i>day</i> output: ''dayday
input: <table border=1 cellpadding=2 cellspacing=0> <tr> <td>eI</td> <td>eI</td> <td> <i>day</i> </td> </tr> </table> output: {| border="1" cellpadding="2" cellspacing="0" |eIeI<||eIeI<|| ''dayday< < <
I appreciate your hard work in coding this, Magnus. I find it hard to believe the code you wrote is this profoundly broken. Perhaps I didn't compile it correcty? I haven't analyzed the code at all (that's my next step), but it looks to me like the parsing is (mostly) correct, but the wrong variables are being output, or something. Thanks in advance for anyone's help.
- David
New wikitask: transform the 698 lines of C++ into 15 lines of Perl :) Any takers? I'm too busy at the moment.
(With sincere apologies to Magnus)
Cheers, Ivan
David Friedland wrote:
Following Ivan's instructions, I got it to compile. I guess I was mistaken in assuming that "cc" auto-detects the language based on file type and calls the correct compiler. The --help was needless to say misleading. However, the resulting executable seems to be broken. The following inputs produce the following outputs:
input: <i>day</i> output: ''dayday
input:
<table border=1 cellpadding=2 cellspacing=0> <tr> <td>eI</td> <td>eI</td> <td> <i>day</i> </td> </tr> </table> output: {| border="1" cellpadding="2" cellspacing="0" |eIeI<||eIeI<|| ''dayday< < <
I appreciate your hard work in coding this, Magnus. I find it hard to believe the code you wrote is this profoundly broken. Perhaps I didn't compile it correcty? I haven't analyzed the code at all (that's my next step), but it looks to me like the parsing is (mostly) correct, but the wrong variables are being output, or something. Thanks in advance for anyone's help.
- David
On Mon, May 03, 2004 at 04:11:43PM -0400, David Friedland wrote:
I want to try to connect this somehow to a CGI to make a webpage where people can paste in HTML and get wikitext out by clicking a button. This would make it easy to convert HTML tables I run across to wikitables quickly, by just a couple copy and pastes.
Try the tremendous table2wiki.py from the pywikipediabot at http://pywikipediabot.sf.net
At best copy the html-table into wikipedia and run the bot over it. ciao tom
--- "Thomas R. Koll" tomk32@gmx.de wrote:
On Mon, May 03, 2004 at 04:11:43PM -0400, David Friedland wrote:
I want to try to connect this somehow to a CGI to make a webpage where people can paste in HTML and get wikitext out by clicking a button. This would make it easy to convert HTML tables I run across to wikitables quickly, by just a couple copy and pastes.
Try the tremendous table2wiki.py from the pywikipediabot at http://pywikipediabot.sf.net
Earlier in this thread, someone asked for a Perl program similar to Magnus' C++ HTML-to-wikitext converter. I decided to give it a whirl and wound up writing HTML::WikiConverter (for lack of a better package name). I've created a small Mason page that uses the module, at:
http://diberri.dyndns.org/html2wiki.html
Some more instructions are contained at that URL. Basically, just enter an HTML snipped into the text box, press the convert button, and wikitext magically appears :-) I'll be posting the source of the module within the next day or so (after I get add a few more features like IMG-tag processing).
Cheers, David (User:Diberri)
wikitech-l@lists.wikimedia.org