Anyone written an MS word/javascript macro to convert to/from MS word format? I was just thinking it might be kind of cool to be able to pop the code of the page out into word, where ==Level 2== would be converted into the Heading 2 format and so forth. It might even be able to embed images within the article.
Then, when you were finished editing, it could convert it all back again.
Stupid idea? :)
Steve
"Steve Bennett" wrote:
Anyone written an MS word/javascript macro to convert to/from MS word format? I was just thinking it might be kind of cool to be able to pop the code of the page out into word, where ==Level 2== would be converted into the Heading 2 format and so forth. It might even be able to embed images within the article.
Then, when you were finished editing, it could convert it all back again.
Stupid idea? :)
Steve
Well, not really stupid, as we had people trying to upload Word documents but complicated. Word is a propietary format. Data is quite binary and there's no public specification. The format has more-or-less been reversed so prograsm as OpenOffice or AbiWord can work with it, but there're still issues, specially the embedded objects (images, equations...).
Maybe it could be done with RTF. Easier of course with OpenDocument http://tools.wikimedia.de/~magnus/wiki2xml/w2x.php ;) BTW Magnus, your page says it's able to export to PDF but there's not such option in the tool. I think you used the XML export to then PDF it but i found no references about how to do it. Maybe you should explain it at the page.
On 5/21/06, Platonides Platonides@gmail.com wrote:
Well, not really stupid, as we had people trying to upload Word documents but complicated. Word is a propietary format. Data is quite binary and there's no public specification. The format has more-or-less been reversed so prograsm as OpenOffice or AbiWord can work with it, but there're still issues, specially the embedded objects (images, equations...).
Oh, I wasn't thinking of trying to actually read Word documents, but simply using Word's macro language to generate Word content on the fly. So, you'd be in Wikipedia, and using Mozex or similar, extract the current page markup, and magically turn it into something that could be pasted into Word. After editing, a different macro would do the reverse - turn the Word content (disregarding anything not supported by MediaWiki) back into Wiki markup.
The major issue would obviously be achieving idempotency: towiki(toword(X)) = X. Otherwise, every time you went through the process, all the text on the page would risk being changed, causing spurious diffs etc...
Steve
maybe this could be solved by the implementation of a wysiwyg editor. For example, tinyMCE automatically translates copy/pastes from Word into html.
Plyd
On 5/21/06, Steve Bennett stevage@gmail.com wrote:
On 5/21/06, Platonides Platonides@gmail.com wrote:
Well, not really stupid, as we had people trying to upload Word documents but complicated. Word is a propietary format. Data is quite binary and there's no public specification. The format has more-or-less been reversed so prograsm as OpenOffice or AbiWord can work with it, but there're still issues, specially the embedded objects (images, equations...).
Oh, I wasn't thinking of trying to actually read Word documents, but simply using Word's macro language to generate Word content on the fly. So, you'd be in Wikipedia, and using Mozex or similar, extract the current page markup, and magically turn it into something that could be pasted into Word. After editing, a different macro would do the reverse - turn the Word content (disregarding anything not supported by MediaWiki) back into Wiki markup.
The major issue would obviously be achieving idempotency: towiki(toword(X)) = X. Otherwise, every time you went through the process, all the text on the page would risk being changed, causing spurious diffs etc...
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 5/21/06, Plyd wiki.vincent@amplyd.com wrote:
maybe this could be solved by the implementation of a wysiwyg editor. For example, tinyMCE automatically translates copy/pastes from Word into html.
Sounds excellent, a tiny in place content editor that avoids the need to even use wiki syntax.
Steve
Steve Bennett wrote:
On 5/21/06, Plyd wiki.vincent@amplyd.com wrote:
maybe this could be solved by the implementation of a wysiwyg editor. For example, tinyMCE automatically translates copy/pastes from Word into html.
Sounds excellent, a tiny in place content editor that avoids the need to even use wiki syntax.
Steve
I dunno... I have TinyMCE embedded into my copy of Movable Type, and I can tell you it does some /serious/ reformatting every time you load it. There's another user on that copy of MT who always copy-pastes stuff from Word, and things like unclosed tags frequently ensure that the resulting page is mangled to pieces. TinyMCE attempts to account for mismatched tags, but the result is that it italicizes the entire post because the user didn't think to close <i>, an inline tag that doesn't even need to be closed in HTML.
TinyMCE might work for things like forums, where the post is rarely edited and the forum code is relatively simple. But with MediaWiki's rather complicated wiki syntax ([[Template:Infobox Language]] anyone?), combined with various editors' coding ideosynchracies, even an embedded code highlighter would be a pain to implement AFAICT.
On 5/22/06, Minh Nguyen mxn@zoomtown.com wrote:
I dunno... I have TinyMCE embedded into my copy of Movable Type, and I can tell you it does some /serious/ reformatting every time you load it.
[snip]
I was going to stay quiet on this one, but someone else is objecting...
WYSISYG editing isn't what we want most editors on Wikipedia to have.
We want them to have WYSIWYM (What you see is what you mean). We generally do not want editors fussing with appearance too much while working on content, at least so long as we want people to stay productive (rather than spending hours futzing with getting the most attractive customized fonts for every heading) and so long as we want the look and feel to be even remotely consistent.
Our current system gets us pretty close to this, although the syntax does present a little learning curve. However, if people are really put off by the inability to do pretty formatting without a minimal amount of studying... can we really expect them to grok wiki writing overall?
So, improved editing systems.. not a bad thing, so long as we stay clear of turning the interface into a toy.
Gregory Maxwell wrote:
On 5/22/06, Minh Nguyen mxn@zoomtown.com wrote:
I dunno... I have TinyMCE embedded into my copy of Movable Type, and I can tell you it does some /serious/ reformatting every time you load it.
[snip]
I was going to stay quiet on this one, but someone else is objecting...
WYSISYG editing isn't what we want most editors on Wikipedia to have.
We want them to have WYSIWYM (What you see is what you mean). We generally do not want editors fussing with appearance too much while working on content, at least so long as we want people to stay productive (rather than spending hours futzing with getting the most attractive customized fonts for every heading) and so long as we want the look and feel to be even remotely consistent.
Our current system gets us pretty close to this, although the syntax does present a little learning curve. However, if people are really put off by the inability to do pretty formatting without a minimal amount of studying... can we really expect them to grok wiki writing overall?
So, improved editing systems.. not a bad thing, so long as we stay clear of turning the interface into a toy.
Well, I would be more in favor of having a built-in code highlighter instead of a WYSIWYG editor. I think it'd improve readability, especially with the new endnote syntax threatening to take over some articles. I was going to propose doing something like this for the Summer of Code, but I just couldn't think of an elegant way to implement it. Using TinyMCE for code would require converting HTML-encoded HTML/wikitext into HTML/wikitext, and a feature using Flash or Java probably wouldn't be that popular with users.
A developer for the XUL Widgets project http://xulwidgets.mozdev.org/ has been doing some thinking about this, though: http://weblogs.mozillazine.org/weirdal/archives/015930.html and http://weblogs.mozillazine.org/weirdal/archives/015932.html. An pretty-printing editor written in XUL for Firefox and Netscape 6/7 would be pretty cool IMO.
On 5/22/06, Minh Nguyen mxn@zoomtown.com wrote:
Well, I would be more in favor of having a built-in code highlighter instead of a WYSIWYG editor. I think it'd improve readability, especially with the new endnote syntax threatening to take over some articles. I was going to propose doing something like this for the Summer of Code, but I just couldn't think of an elegant way to implement it. Using TinyMCE for code would require converting HTML-encoded HTML/wikitext into HTML/wikitext, and a feature using Flash or Java probably wouldn't be that popular with users.
I wasn't telling we should directly take TinyMCE. This was just an example of what could be done.
About Wysiwyg, I think that we could limit functionalities in order to encourage people to write "normally" and don't add lots of unusefull stupid tags. But Wysiwyg has its real value for tables&co, because for beginners this is a complete mess. What about an hybrid system ?
Plyd
On 5/22/06, Plyd wiki.vincent@amplyd.com wrote:
About Wysiwyg, I think that we could limit functionalities in order to encourage people to write "normally" and don't add lots of unusefull stupid tags. But Wysiwyg has its real value for tables&co, because for beginners this is a complete mess. What about an hybrid system ?
Ok, there are obviously two different meanings to WYSIWYG that are being used.
1) Allowing free formatting, such that any combinations of colours, fonts, font sizes etc can be used (not a good idea) 2) Allowing normal Wiki formatting (bold, italics, headings, tables, images) to be edited graphically, rather than through Wiki syntax (good idea?)
Steve
Steve Bennett wrote:
On 5/22/06, Plyd wiki.vincent@amplyd.com wrote:
About Wysiwyg, I think that we could limit functionalities in order to encourage people to write "normally" and don't add lots of unusefull stupid tags. But Wysiwyg has its real value for tables&co, because for beginners this is a complete mess. What about an hybrid system ?
Ok, there are obviously two different meanings to WYSIWYG that are being used.
- Allowing free formatting, such that any combinations of colours,
fonts, font sizes etc can be used (not a good idea)
We have this right now, so I can tell you right now that this is orthogonal to the issue of WYSIWYG (or WYSIWYM).
Fancy colors aren't something we'd want to make _easy_ because they're not something we generally encourage in clean text. But they'd be _possible_ because they're possible in our markup.
- Allowing normal Wiki formatting (bold, italics, headings, tables,
images) to be edited graphically, rather than through Wiki syntax (good idea?)
At least potentially this is a good idea.
In practice, public-ready implementation will have to hold off until the markup is defined well enough that we have reversible transformations and can 'hold out' fancy stuff like images, templates, extensions, and parser functions cleanly into WYSIWYM-friendly chunks.
We will likely not have "true" WYSIWYG in the sense of it the editing view being exactly 100% one and the same as the reading view, because things like embedded data, plugins, templates etc really require some separation.
I'd like to get a working group together to start making the specification happen soon:
http://www.mediawiki.org/wiki/Markup_spec
-- brion vibber (brion @ pobox.com)
On 5/22/06, Brion Vibber brion@pobox.com wrote:
Fancy colors aren't something we'd want to make _easy_ because they're not something we generally encourage in clean text. But they'd be _possible_ because they're possible in our markup.
Agree, and agree with not making them part of any WYSIWYG package, precisely to avoid making them more accessible than necessary.
In practice, public-ready implementation will have to hold off until the markup is defined well enough that we have reversible transformations and can 'hold out' fancy stuff like images, templates, extensions, and parser functions cleanly into WYSIWYM-friendly chunks.
Can you elaborate on what WYSIWYM means in this context? What's the distinction? Our [[WYSIWYM]] article is not helpful.
Also, why the dependency on a well-defined markup? To avoid implementing a moving target?
We will likely not have "true" WYSIWYG in the sense of it the editing view being exactly 100% one and the same as the reading view, because things like embedded data, plugins, templates etc really require some separation.
That's fine. At some stage, fine tuning will always require hand tweaking. But just basic editing, refactoring, reformatting etc would be great. Even something that displays {{templates}} and [[Category:Links]] like that would be fine, IMHO.
There'd be a slight question over how to display <nowiki> code, I suppose. True WYSIWYG would display <nowiki>[[link]]</nowiki> as simply: [[link]], but that would be conflict with my suggestion above, and would be ambiguous for the reverse transformation (<nowiki>[[</nowiki>link]] would be mapped onto the same).
I guess you already know about all these issues :)
I'd like to get a working group together to start making the specification happen soon:
What's required before that starts? Can people just start hacking on it, wiki style?
Steve
On Mon, May 22, 2006 at 12:02:56PM +0200, Steve Bennett wrote:
I'd like to get a working group together to start making the specification happen soon:
What's required before that starts? Can people just start hacking on it, wiki style?
I just did... :-)
Cheers, -- jra
On 5/22/06, Minh Nguyen mxn@zoomtown.com wrote:
Well, I would be more in favor of having a built-in code highlighter instead of a WYSIWYG editor.
I sometimes edit using a wikitext mode in emacs.. it is fantastic and removes a number of annoyances, such as when a stray tick causes a huge chunk of an article stuck in bold.
That would very much be a worthwhile feature.
"Gregory" == Gregory Maxwell gmaxwell@gmail.com writes:
Gregory> I sometimes edit using a wikitext mode in emacs.. it is Gregory> fantastic and removes a number of annoyances, such as when Gregory> a stray tick causes a huge chunk of an article stuck in Gregory> bold. Just curious: What wikitext mode do you use?
Uwe Brauer
On 5/22/06, Gregory Maxwell gmaxwell@gmail.com wrote:
We want them to have WYSIWYM (What you see is what you mean). We generally do not want editors fussing with appearance too much while working on content, at least so long as we want people to stay productive (rather than spending hours futzing with getting the most attractive customized fonts for every heading) and so long as we want the look and feel to be even remotely consistent.
I don't think anyone has proposed allowing customised fonts for each heading. By WYSIWYG, I simple mean "You see bold when you want bold - you don't see ''' ". What distinction are getting at with "WYSIWYM"?
Our current system gets us pretty close to this, although the syntax does present a little learning curve. However, if people are really put off by the inability to do pretty formatting without a minimal amount of studying... can we really expect them to grok wiki writing overall?
You're belittling the value of such a project by suggesting it would only be useful for those "unable" to cope with wiki syntax. I have no problems with it, but I suspect I would be faster editing directly with word, being able to select text, press ctrl+b to bold/unbold etc. But god, can we please not turn this discussion into an argument on the value of WYSIWYG in general? There are those who like it, and those who don't. So long as those who like it do not impinge upon those who don't (by forcing it on them, reformatting whole articles etc) and as long as there is someone willing to develop such a thing, there should not be a problem.
So, improved editing systems.. not a bad thing, so long as we stay clear of turning the interface into a toy.
I'm not sure what you mean by that.
I might have a go at hacking up a MS Word macro conversion script. It could be cute at any rate.
Steve
On 5/22/06, Steve Bennett stevage@gmail.com wrote:
I might have a go at hacking up a MS Word macro conversion script. It could be cute at any rate.
Actually this isn't a bad start: http://www.homeopathy.at/programming/word2wiki.bas
It has the slightly curious behaviour of embedding wiki tags within the word document, so you end up with both Word (WYSIWYG) and Wiki formatting simultaneously. However, copying and pasting back to the wiki should be fine.
There are some other solutions too: http://meta.wikimedia.org/wiki/Word_macros
However, the other half (wiki->word) is still missing. Any ideas?
Steve
Steve Bennett wrote:
You're belittling the value of such a project by suggesting it would only be useful for those "unable" to cope with wiki syntax. I have no problems with it, but I suspect I would be faster editing directly with word, being able to select text, press ctrl+b to bold/unbold etc. But god, can we please not turn this discussion into an argument on the value of WYSIWYG in general? There are those who like it, and those who don't. So long as those who like it do not impinge upon those who don't (by forcing it on them, reformatting whole articles etc) and as long as there is someone willing to develop such a thing, there should not be a problem.
While I actually like the idea of your macro script, I'm worried that any WYSIWYG-style project /would/ impinge on those wouldn't prefer to use it. From the fun I've had with TinyMCE and FrontPage, it seems that WYSIWYG editors have a knack for reformatting entire documents. As far as TinyMCE is concerned, that editor actually reduces its posts down to a single, incredibly lengthy line of code by removing all linebreaks.
Hasn't Wikipedia seen edit wars where users have converted whole articles to fit a certain footnote templating scheme? Imagine if that were done to all but the most basic aspects of an article's source code. One consequence would be that diffs would quickly become useless as every line of an article's source is automatically changed in some way. I'd imagine that a macro like the one you're proposing would be more suitable for a company's internal MediaWiki installation, where the underlying page source wouldn't matter so much to people.
I'm not familiar with Word's macro language, but for some inspiration, you may want to take a look at Parser.php http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Parser.php?view=log, which contains the functions that convert wikitext into XHTML.
On Mon, May 22, 2006 at 10:03:45AM +0200, Steve Bennett wrote:
On 5/22/06, Gregory Maxwell gmaxwell@gmail.com wrote:
We want them to have WYSIWYM (What you see is what you mean). We generally do not want editors fussing with appearance too much while working on content, at least so long as we want people to stay productive (rather than spending hours futzing with getting the most attractive customized fonts for every heading) and so long as we want the look and feel to be even remotely consistent.
I don't think anyone has proposed allowing customised fonts for each heading. By WYSIWYG, I simple mean "You see bold when you want bold - you don't see ''' ". What distinction are getting at with "WYSIWYM"?
Don't we already get that with the preview?
On 5/22/06, Chad Perrin perrin@apotheon.com wrote:
Don't we already get that with the preview?
Yep. Sort of how people with traditional cameras could see what their photos looked like by printing them.
Editing *in* the preview is an improvement.
Steve
On Mon, May 22, 2006 at 08:47:24PM +0200, Steve Bennett wrote:
On 5/22/06, Chad Perrin perrin@apotheon.com wrote:
Don't we already get that with the preview?
Yep. Sort of how people with traditional cameras could see what their photos looked like by printing them.
Editing *in* the preview is an improvement.
According to exactly what criteria?
On Mon, May 22, 2006 at 10:03:45AM +0200, Steve Bennett wrote:
You're belittling the value of such a project by suggesting it would only be useful for those "unable" to cope with wiki syntax. I have no problems with it, but I suspect I would be faster editing directly with word,
This conflates a different issue, and I'd like to haul it up into the light:
"editing directly with Word..."
Any work-cycle which entails pulling part or all of a page out from an active wiki into a PC-based external editor will have lag-time/edit collision issues which will increase in importance proportionally to the traffic on the wiki.
This is to say that you might get away with it on a small to medium intranet wiki, but I don't expect it would be a practical approach to Wikipedia.
Cheers, -- jra
"Steve" == Steve Bennett stevage@gmail.com writes:
Steve> Anyone written an MS word/javascript macro to convert Steve> to/from MS word format? I was just thinking it might be kind Steve> of cool to be able to pop the code of the page out into Steve> word, where ==Level 2== would be converted into the Heading Steve> 2 format and so forth. It might even be able to embed images Steve> within the article.
You could save the word file in HTML. (You might have to run tidy or something like this since the HTML output of words was/is hm not very clean) then there is a perl script html2wiki (--dialect MediaWiki) which works quite nicely.
Uwe Brauer
On 5/22/06, Uwe Brauer oub@mat.ucm.es wrote:
You could save the word file in HTML. (You might have to run tidy or something like this since the HTML output of words was/is hm not very clean) then there is a perl script html2wiki (--dialect MediaWiki) which works quite nicely.
Yeah, that's a good point. Since my goal is to make something that quickly and painlessly converts back and forth from Word and Wiki, I'd have to make that scripting launchable automatically. But I suspect the dedicated Word macros would be easier.
Steve
something like this since the HTML output of words was/is hm not very clean)
The whole export to Word and back thing will only be feasable it the wikicode that is put in and the wikicode that is put out after editing differ just in the edited parts. Otherwise diffs between revisions will be cluttered and rendered useless.
wikitech-l@lists.wikimedia.org