Allow me, ladies, gentlemen and those in between, to shove a screenshot under your noses.
http://upload.wikimedia.org/wikipedia/mediawiki/0/09/Drafts_%28mockup%29.png
This assumes Tim doesn't beat me to it again, of course. But if I have another ton of notes and planning go to waste, I will be pissed off.
(Nothing planned at present, aside from the UI mockup here. I'm not starting this for a while, one reason being that I think our users can suffer a bit longer. Teach them a lesson in simple manners...)
Rob Church
On Sun, Aug 27, 2006 at 04:03:29AM +0100, Rob Church wrote:
Allow me, ladies, gentlemen and those in between, to shove a screenshot under your noses.
http://upload.wikimedia.org/wikipedia/mediawiki/0/09/Drafts_%28mockup%29.png
Velly intelesting...
This assumes Tim doesn't beat me to it again, of course. But if I have another ton of notes and planning go to waste, I will be pissed off.
Now, now...
(Nothing planned at present, aside from the UI mockup here. I'm not starting this for a while, one reason being that I think our users can suffer a bit longer. Teach them a lesson in simple manners...)
Could you expand a bit on the use case you're trying to solve? I can see a lot of usage for that on non-WMF MW's, but I'm not sure how well it would fit in on The Big Ones...
Cheers, -- jra
On Sun, Aug 27, 2006 at 04:03:29AM +0100, Rob Church wrote:
Allow me, ladies, gentlemen and those in between, to shove a screenshot under your noses.
http://upload.wikimedia.org/wikipedia/mediawiki/0/09/Drafts_%28mockup%29.png
I'm getting a 404 for that URL, but I would like to check it out; Has the image maybe been moved somewhere else?
Better having them with a preview than having them saying "I don't like multiple drafts" after several weeks of working to implement them effectively. :-)
Exactly - better to get any feedback (good and bad) before doing something, if you possibly can. Even if the implementation phase is ages away or just a maybe, it's still a very inexpensive and quick way to improve the idea or discover whether it should even be done at all - so I for one have no problem at all with people sticking something out there and saying: "this is just a mock-up, it may never happen, but nevertheless I'd welcome any thoughts you may have".
All the best, Nick.
On 28/08/06, Nick Jenkins nickpj@gmail.com wrote:
On Sun, Aug 27, 2006 at 04:03:29AM +0100, Rob Church wrote:
Allow me, ladies, gentlemen and those in between, to shove a screenshot under your noses.
http://upload.wikimedia.org/wikipedia/mediawiki/0/09/Drafts_%28mockup%29.png
I'm getting a 404 for that URL, but I would like to check it out; Has the image maybe been moved somewhere else?
Whee, I love the whole Alt-D = delete thing. :(
Better having them with a preview than having them saying "I don't like multiple drafts" after several weeks of working to implement them effectively. :-)
Exactly - better to get any feedback (good and bad) before doing something, if you possibly can. Even if the implementation phase is ages away or just a maybe, it's still a very inexpensive and quick way to improve the idea or discover whether it should even be done at all - so I for one have no problem at all with people sticking something out there and saying: "this is just a mock-up, it may never happen, but nevertheless I'd welcome any thoughts you may have".
Unfortunately, many people can't accept that "maybe" is a perfectly viable conditional.
Rob Church
On Sun, Aug 27, 2006 at 04:03:29AM +0100, Rob Church wrote:
Allow me, ladies, gentlemen and those in between, to shove a screenshot under your noses.
Unfortunately, many people can't accept that "maybe" is a perfectly viable conditional.
It looks good. You won't get any arguments against auto-saving of drafts from me (it's on my list of "software features that I would like to see become standard").
Some random thoughts on the idea: * Could it maybe discard the current auto-saved draft if the user clicks the "Cancel" link under the edit textarea? Generally if I click this, I've decided what I was about to say wasn't worth saying, in which case the draft probably isn't worth keeping either. * Is it worth storing a diff, rather than the whole page? Say you're on a busy page, your browser crashes or there's a power cut, you reconnect, and reload the saved draft - however other people have modified the page, and now you have to merge your changes with their changes. However, maybe this is taking a decent idea and making it just too damn complicated - in which case please disregard this - I'd rather see a slightly simpler version that might happen, than an over-complex version which won't happen ;-)
All the best, Nick.
On 28/08/06, Nick Jenkins nickpj@gmail.com wrote:
Some random thoughts on the idea:
- Could it maybe discard the current auto-saved draft if the user clicks the "Cancel" link under the edit textarea? Generally if I
click this, I've decided what I was about to say wasn't worth saying, in which case the draft probably isn't worth keeping either.
Yes, that makes sense...** grabs a piece of paper and begins scribbling **
- Is it worth storing a diff, rather than the whole page? Say you're on a busy page, your browser crashes or there's a power cut,
you reconnect, and reload the saved draft - however other people have modified the page, and now you have to merge your changes with their changes. However, maybe this is taking a decent idea and making it just too damn complicated - in which case please disregard this - I'd rather see a slightly simpler version that might happen, than an over-complex version which won't happen ;-)
I won't store diffs, but it might be nice to see if, when loading a draft, the changes from the draft could be merged into the current edit window, rather than replacing it.
Rob Church
I won't store diffs, but it might be nice to see if, when loading a draft, the changes from the draft could be merged into the current edit window, rather than replacing it.
That sounds good.
And another possibility whilst we're kicking around ideas would be to have a "my drafts" link at the top of the page, which when you clicked on it would show a Special page with the list of outstanding drafts, which would make it easy to take up where you left off. Think of "my watchlist" being kind of like an email inbox, and "my drafts" being kind of like a draft emails folder. Oh, and it also it should allow drafts to be deleted, so that you can keep them under control. And it should show the edit summary description, like an email subject line, which would also have the side-benefit of encouraging people to enter useful edit descriptions early because that'd be what would show up in their "my drafts" page. In fact, to make it easier to visualize, here's mock-up screenshot of what I was thinking: http://files.nickj.org/MediaWiki/my-drafts-mockup.png (it's possible to expand this with various sort options - e.g. sort by most recent draft, sort by oldest draft, sort by article name, etc - but you get the idea).
All the best, Nick.
On 28/08/06, Nick Jenkins nickpj@gmail.com wrote:
And another possibility whilst we're kicking around ideas would be to have a "my drafts" link at the top of the page, which when you clicked on it would show a Special page with the list of outstanding drafts, which would make it easy to take up where you left off.
Yes, that also makes sense. :)
Rob Church
Nick Jenkins wrote:
It looks good. You won't get any arguments against auto-saving of drafts from me (it's on my list of "software features that I would like to see become standard").
Me, too. I can't help but think, though, that the right place to standardize it is in ~5 browsers, not on N million web servers. (Which is not to discourage Rob from implementing this nifty feature. It's just that in an ideal world he wouldn't have to.)
On 8/28/06, Nick Jenkins nickpj@gmail.com wrote:
Some random thoughts on the idea:
- Could it maybe discard the current auto-saved draft if the user clicks the "Cancel" link under the edit textarea? Generally if I
click this, I've decided what I was about to say wasn't worth saying, in which case the draft probably isn't worth keeping either.
Come to think of it, if it behaved exactly like Gmail, that wouldn't be a bad thing. Rough summary:
* Every few seconds while typing an email, it's saved as a draft * When clicking reply to a message for which you have started a draft, the edit box is automatically populated with the draft. * You can see all drafts that you have for all messages by clicking the "Drafts" link.
Steve
On Mon, Aug 28, 2006 at 09:27:20AM +0200, Steve Bennett wrote:
On 8/28/06, Nick Jenkins nickpj@gmail.com wrote: > Some random thoughts on the idea: > * Could it maybe discard the current auto-saved draft if the user clicks the "Cancel" link under the edit textarea? Generally if I > click this, I've decided what I was about to say wasn't worth saying, in which case the draft probably isn't worth keeping either.
Come to think of it, if it behaved exactly like Gmail, that wouldn't be a bad thing. Rough summary:
- Every few seconds while typing an email, it's saved as a draft
Does the JS/DOM implementation premit optimizing that to
* Every time, while typing in the edit box, that you pause for more than N milliseconds, save the draft
? (Default; I think, 5 seconds; watch the load and tune as necessary)
Cheers, -- jra
On 8/28/06, Jay R. Ashworth jra@baylink.com wrote:
- Every few seconds while typing an email, it's saved as a draft
Does the JS/DOM implementation premit optimizing that to
- Every time, while typing in the edit box, that you pause for more
than N milliseconds, save the draft
?
Yes, because that's what Gmail actually does. Or, at least, I think it does. It may be rather more involved. I notice it hasn't saved this, despite my patiently waiting . . . ah, there it goes. I think it's kind of complicated how it decides (maybe it doesn't bother if you've only typed a certain amount?). Anyway, other stuff Gmail does with drafts:
* It has a Save Now button, right between Save and Discard. When the current text is saved, the button grays out and changes to read "Saved". * If you have a reply entered and try leaving the page, it shows a popup dialog asking if you want to discard changes (if not, it cancels the page-leave, but doesn't save the draft, I think).
On Mon, Aug 28, 2006 at 01:20:29PM -0400, Simetrical wrote:
On 8/28/06, Jay R. Ashworth jra@baylink.com wrote:
- Every few seconds while typing an email, it's saved as a draft
Does the JS/DOM implementation premit optimizing that to
- Every time, while typing in the edit box, that you pause for more
than N milliseconds, save the draft
?
Yes, because that's what Gmail actually does. Or, at least, I think it does. It may be rather more involved. I notice it hasn't saved this, despite my patiently waiting . . . ah, there it goes. I think it's kind of complicated how it decides (maybe it doesn't bother if you've only typed a certain amount?). Anyway, other stuff Gmail does with drafts:
- It has a Save Now button, right between Save and Discard.
I like this.
When the
current text is saved, the button grays out and changes to read "Saved".
I don't like this. Browsers aren't smart enough for that. Hell, *apps* aren't smart enough for that. When I say save, *save* damnit.
- If you have a reply entered and try leaving the page, it shows a
popup dialog asking if you want to discard changes (if not, it cancels the page-leave, but doesn't save the draft, I think).
Yeah; I've seen that on-leave idea a couple of places. I like it quite a bit, though permitting a browser to do it can have.. consequences.
Cheers, -- jra
On 8/28/06, Jay R. Ashworth jra@baylink.com wrote:
When the
current text is saved, the button grays out and changes to read "Saved".
I don't like this. Browsers aren't smart enough for that. Hell, *apps* aren't smart enough for that. When I say save, *save* damnit.
Fwiw, it works well. If you really desperately want to save again for some paranoid reason, you can always add and remove a few spaces and press 'save' then.
- If you have a reply entered and try leaving the page, it shows a
popup dialog asking if you want to discard changes (if not, it cancels the page-leave, but doesn't save the draft, I think).
Yeah; I've seen that on-leave idea a couple of places. I like it quite a bit, though permitting a browser to do it can have.. consequences.
It works well here.
Steve
On Mon, Aug 28, 2006 at 08:11:45PM +0200, Steve Bennett wrote:
On 8/28/06, Jay R. Ashworth jra@baylink.com wrote:
current text is saved, the button grays out and changes to read "Saved".
I don't like this. Browsers aren't smart enough for that. Hell, *apps* aren't smart enough for that. When I say save, *save* damnit.
Fwiw, it works well. If you really desperately want to save again for some paranoid reason, you can always add and remove a few spaces and press 'save' then.
I've seen editors smart enough that whitespace that doesn't affect formatting is ignored. I still don't like it.
Example:
MySpace: click "Log In" button. Button changes to "Logging In" and greys out. Page submit times out (as it does all too frequently). Now you can't re-click it, you have to reload the page first. Novice user gets frustrated.
The only common reason given for this that's *almost* acceptable is "I don't want them to submit their credit card sale twice... though the *proper* answer for that one is "send a random cookie as a hidden field on the form, pass it all the way to the SQL insert statement; if the Unique trigger fires, complain."
- If you have a reply entered and try leaving the page, it shows a
popup dialog asking if you want to discard changes (if not, it cancels the page-leave, but doesn't save the draft, I think).
Yeah; I've seen that on-leave idea a couple of places. I like it quite a bit, though permitting a browser to do it can have.. consequences.
It works well here.
Oh; I'm just saying that leaving that feature on in your browser can enable... Bad Guys.
Cheers, -- jra
On 8/27/06, Nick Jenkins nickpj@gmail.com wrote:
On Sun, Aug 27, 2006 at 04:03:29AM +0100, Rob Church wrote:
Allow me, ladies, gentlemen and those in between, to shove a screenshot under your noses.
http://upload.wikimedia.org/wikipedia/mediawiki/0/09/Drafts_%28mockup%29.png
I'm getting a 404 for that URL, but I would like to check it out; Has the image maybe been moved somewhere else?
Still works for me. Regards, - Dan Li
On Sun, Aug 27, 2006 at 04:03:29AM +0100, Rob Church wrote:
Allow me, ladies, gentlemen and those in between, to shove a screenshot under your noses.
http://upload.wikimedia.org/wikipedia/mediawiki/0/09/Drafts_%28mockup%29.png
I'm getting a 404 for that URL, but I would like to check it out; Has the image maybe been moved somewhere else?
Still works for me. Regards,
- Dan Li
Yep, I can see it too now, but that's because it was deleted and then restored: http://www.mediawiki.org/wiki/Special:Log/delete
All the best, Nick.
On 8/27/06, Nick Jenkins nickpj@gmail.com wrote:
On Sun, Aug 27, 2006 at 04:03:29AM +0100, Rob Church wrote:
Allow me, ladies, gentlemen and those in between, to shove a screenshot under your noses.
http://upload.wikimedia.org/wikipedia/mediawiki/0/09/Drafts_%28mockup%29.png
I'm getting a 404 for that URL, but I would like to check it out; Has the image maybe been moved somewhere else?
Still works for me. Regards,
- Dan Li
Yep, I can see it too now, but that's because it was deleted and then restored: http://www.mediawiki.org/wiki/Special:Log/delete
All the best, Nick.
Yeah, sorry about that. Your email and Rob's email took about 3 minutes to get to me ;)
Regards, - Dan Li
Rob Church wrote:
Allow me, ladies, gentlemen and those in between, to shove a screenshot under your noses.
http://upload.wikimedia.org/wikipedia/mediawiki/0/09/Drafts_%28mockup%29.png
This assumes Tim doesn't beat me to it again, of course. But if I have another ton of notes and planning go to waste, I will be pissed off.
(Nothing planned at present, aside from the UI mockup here. I'm not starting this for a while, one reason being that I think our users can suffer a bit longer. Teach them a lesson in simple manners...)
It's an excellent idea. I suggest an alternative/complementary approach: only keep the last "draft", & save it when the user clicks on "Show preview" or "Show changes". This will make the feature available to users which don't use javascript (I sometimes use w3m or Dillo), and will fulfill the purpose (I don't know if it's the one you had in mind) of encouraging users not to hit "Save" every five minutes.
Greetings.
Rob Church wrote:
http://upload.wikimedia.org/wikipedia/mediawiki/0/09/Drafts_%28mockup%29.png
Very interesting. Fear for loosing edits is one of the driving forces for high frequency consecutive saves.
On 8/27/06, Ligulem ligulem@pobox.com wrote:
Rob Church wrote:
http://upload.wikimedia.org/wikipedia/mediawiki/0/09/Drafts_%28mockup%29.png
Very interesting. Fear for loosing edits is one of the driving forces for high frequency consecutive saves.
Are HFCS's bad? They also reduce the chances of edit conflict don't they?
The thing I don't get in this mockup is, why would you have multiple drafts for one article? Also, are you planning on storing the version number that the original draft was created from, to let MediaWiki's normal edit conflict/resolution processes work their magic?
I like the idea of having one draft that I can come back to in the next half an hour or so. I'm not sure I understand the benefit of having multpile drafts that hang around for months at a time?
Steve
Steve Bennett wrote:
Are HFCS's bad? They also reduce the chances of edit conflict don't they?
(I assume HFCS = "high frequency consecutive saves"). Some people don't like them (they inflate edit count, use resources). Thus I try to reduce them myself. But I'm prone to HFCS myself (not on templates!).
The thing I don't get in this mockup is, why would you have multiple drafts for one article?
Why limit to only one? I can imagine having several backups, going to older variants if I see I'm on a wrong track.
An undo function comes to mind. Revert before published.
That might lead to private revision trees per user and page, forked at some time in the past. Duh.
Just some ugly brainstorming ;-) ...
On Sun, Aug 27, 2006 at 02:13:08PM +0200, Ligulem wrote:
Steve Bennett wrote:
Are HFCS's bad? They also reduce the chances of edit conflict don't they?
(I assume HFCS = "high frequency consecutive saves"). Some people don't like them (they inflate edit count, use resources). Thus I try to reduce them myself. But I'm prone to HFCS myself (not on templates!).
I concur with Steve; my mental model for Save Early and Often is the shorter the window you use, the less likely you are to have to clean up an edit conflict.
The thing I don't get in this mockup is, why would you have multiple drafts for one article?
Why limit to only one? I can imagine having several backups, going to older variants if I see I'm on a wrong track.
An undo function comes to mind. Revert before published.
That might lead to private revision trees per user and page, forked at some time in the past. Duh.
Indeed; this could well be overimplemented, if one wasn't careful.
I too tend to think, Rob, that one draft per user/page is a limit... and even that might have ... notable effects on the DBMS. Separate table, at the least, I would assume...
Cheers, -- jra
On 27/08/06, Steve Bennett stevage@gmail.com wrote:
The thing I don't get in this mockup is, why would you have multiple drafts for one article? Also, are you planning on storing the version number that the original draft was created from, to let MediaWiki's normal edit conflict/resolution processes work their magic?
Yes and no. Bear in mind I've given away nothing about how I'm thinking of implementing this, remember? I haven't worked out all the details myself, yet.
I like the idea of having one draft that I can come back to in the next half an hour or so. I'm not sure I understand the benefit of having multpile drafts that hang around for months at a time?
Then you clearly didn't read the information properly, did you? Because had you done so, you'd have found that I state drafts are cleaned up after a specific time period.
Rob Church
On 8/27/06, Rob Church robchur@gmail.com wrote:
On 27/08/06, Steve Bennett stevage@gmail.com wrote:
The thing I don't get in this mockup is, why would you have multiple drafts for one article? Also, are you planning on storing the version number that the original draft was created from, to let MediaWiki's normal edit conflict/resolution processes work their magic?
Yes and no. Bear in mind I've given away nothing about how I'm thinking of implementing this, remember? I haven't worked out all the details myself, yet.
Ok, well for me that question falls into user experience, rather than implementation, but as you like.
I like the idea of having one draft that I can come back to in the next half an hour or so. I'm not sure I understand the benefit of having multpile drafts that hang around for months at a time?
Then you clearly didn't read the information properly, did you? Because had you done so, you'd have found that I state drafts are cleaned up after a specific time period.
Sure I did, it says quite clearly: "...removed on a regular basis as they become too old." Could you be more specific? Is "too old" a question of days, weeks, months, or years?
Steve
On 27/08/06, Steve Bennett stevage@gmail.com wrote:
Sure I did, it says quite clearly: "...removed on a regular basis as they become too old." Could you be more specific? Is "too old" a question of days, weeks, months, or years?
I've already bloody stated I haven't finished working out the sodding implementation details and don't plan to do anything for a long time, so no, I couldn't be more bloody specific.
"Too old" is what I consider to be a good idea at the time if and when I actually do this, after discussion with users and with Brion.
Christ, in future, remind me not to post tentative previews of possible things to come. It's not worth the bloody hassle of having fifty thousand trivial little questions.
Rob Church
On Sun, Aug 27, 2006 at 10:07:20PM +0200, Ligulem wrote:
Rob Church wrote:
Christ, in future, remind me not to post tentative previews of possible things to come. It's not worth the bloody hassle of having fifty thousand trivial little questions.
Assume good faith. I've learned that from you ;-)
Don't be so sensitive.
I hate to agree with him Rob, but relax. This is the price of open-source *design*; people will kibitz. Don't take it so seriously.
If you haven't made those decisions, then clearly no one's taking issue with yours... :-)
Cheers, -- jra
On 8/27/06, Jay R. Ashworth jra@baylink.com wrote:
On Sun, Aug 27, 2006 at 10:07:20PM +0200, Ligulem wrote:
Rob Church wrote:
Christ, in future, remind me not to post tentative previews of possible things to come. It's not worth the bloody hassle of having fifty thousand trivial little questions.
Assume good faith. I've learned that from you ;-)
Don't be so sensitive.
I hate to agree with him Rob, but relax. This is the price of open-source *design*; people will kibitz. Don't take it so seriously.
If you haven't made those decisions, then clearly no one's taking issue with yours... :-)
Ok, so fwiw, I was just trying to contribute to the pre-implementation requirements gathering phase. Sorry if it was more noise than signal.
Steve
Hi everyone,
I'm currently working on a (python) wiki to pdf converter, based on wiki2pdf, which is no longer actively maintained.
One of the -big- modifications to this script is to replace the all-python parser by a combination of wiki->xml, using flexbisonparse, and xml-> pdf, using parts of wiki2pdf. This was proposed by one of the developers of wiki2pdf to make it a bit... more maintainable.
At the moment, things work fine, except for 2 things with flexbisonparse: 1- the external urls are not handled, and are included verbatim in the output. 2- some people use <ref name="XX"> </ref> and <ref name="XX"/> to make references, but these are modified by flexbisonparse, into >ref name="XX"<, which is not very convenient.
I understand that the first problem is somehow difficult to manage, as many cases are to be analysed, but I have the impression that the second is simpler. However, I'm not (at all) an expert in flex and bison, and have the biggest difficulties understanding the code of flexbisonparse.
Is there a developer of this parser around? Do you think the modifications are feasible?
Also, I proposed a patch for minor modifications of the code on bugzilla, but I don't know if it is the right place to do that. http://bugzilla.wikimedia.org/show_bug.cgi?id=7001
Regards
Cyril
On Sun, Aug 27, 2006 at 11:14:51PM +0100, Buttay cyril wrote:
I'm currently working on a (python) wiki to pdf converter, based on wiki2pdf, which is no longer actively maintained.
My hobby horse is "Get The Glue Right", which leans me to ask:
would it not be {easier,more useful} to direct effort towards wikitext2docbook? Doesn't docbook already know how to get to PDF?
/derail
Cheers, -- jra
Jay R. Ashworth wrote:
On Sun, Aug 27, 2006 at 11:14:51PM +0100, Buttay cyril wrote:
I'm currently working on a (python) wiki to pdf converter, based on wiki2pdf, which is no longer actively maintained.
My hobby horse is "Get The Glue Right", which leans me to ask:
would it not be {easier,more useful} to direct effort towards wikitext2docbook? Doesn't docbook already know how to get to PDF?
Well, I picked one of the existing project (wiki2pdf) for the following reasons: 1- it uses python, which is one of the only languages I am nearly comfortable with (I'm not a programmer) 2- the objective for me is to create a wikibook-to-Latex converter. I plan to keep this piece of code client-side, because I know that even a really good parser will need some tweaking of the LaTeX source to produce a good pdf on something as long as a wikibook. Among the features of the program is the automatic download of images and wiki pages, something I can easily do with python 3- the list of alternative parsers ( http://meta.wikimedia.org/wiki/Alternative_parsers ) does not mention wikitext2docbook, and says that flexbisonparse is "Intended as an eventual replacement to the parsing code inside MediaWiki itself", which is rather promising!
In an other hand, the tests I made with docbook were not very good from a typographic point of view (I think the docbook to pdf conversion uses LaTeX, but the stylesheets are oriented towards automation rather than quality). I will have a look at wikitext2docbook, though.
Regards
Cyril
/derail
Cheers, -- jra
On Mon, Aug 28, 2006 at 12:01:00AM +0100, Buttay cyril wrote:
Jay R. Ashworth wrote:
On Sun, Aug 27, 2006 at 11:14:51PM +0100, Buttay cyril wrote:
I'm currently working on a (python) wiki to pdf converter, based on wiki2pdf, which is no longer actively maintained.
My hobby horse is "Get The Glue Right", which leans me to ask:
would it not be {easier,more useful} to direct effort towards wikitext2docbook? Doesn't docbook already know how to get to PDF?
Well, I picked one of the existing project (wiki2pdf) for the following reasons: 1- it uses python, which is one of the only languages I am nearly comfortable with (I'm not a programmer) 2- the objective for me is to create a wikibook-to-Latex converter. I plan to keep this piece of code client-side, because I know that even a really good parser will need some tweaking of the LaTeX source to produce a good pdf on something as long as a wikibook. Among the features of the program is the automatic download of images and wiki pages, something I can easily do with python
Indeed, and I vote for client side, as well.
3- the list of alternative parsers ( http://meta.wikimedia.org/wiki/Alternative_parsers ) does not mention wikitext2docbook, and says that flexbisonparse is "Intended as an eventual replacement to the parsing code inside MediaWiki itself", which is rather promising!
I don't know that that is what Magnus is calling it, but that's what it does. I forget what language he's doing it in. Check the list archives; he's mentioned it here in the last couple of months (and may well chime in here).
In an other hand, the tests I made with docbook were not very good from a typographic point of view (I think the docbook to pdf conversion uses LaTeX, but the stylesheets are oriented towards automation rather than quality). I will have a look at wikitext2docbook, though.
Worth a couple minutes, at least, I would think. My perception of it is that if someone's going to put work into fixing the typography, the more people who can benefit from that, the better. Hence, I kibitz.
Cheers, -- jra
Jay R. Ashworth wrote:
On Mon, Aug 28, 2006 at 12:01:00AM +0100, Buttay cyril wrote:
3- the list of alternative parsers ( http://meta.wikimedia.org/wiki/Alternative_parsers ) does
not mention
wikitext2docbook, and says that flexbisonparse is "Intended as an eventual replacement to the parsing code inside MediaWiki itself", which is rather promising!
I don't know that that is what Magnus is calling it, but that's what it does. I forget what language he's doing it in. Check the list archives; he's mentioned it here in the last couple of months (and may well chime in here).
As someone who's been playing with alternative parsers (though not Magnus), I'm pretty sure the flexbisonparse project is currently dead. Magnus moved to his wiki2xml project (also available in the MediaWiki repository), which is actually coded in PHP. As far as I know, though, it's the single most feature-complete alternative parser we have. Not claiming it's perfect, but it's good... I haven't worked with flexbisonparse, though, so maybe it's better than I know.
I've actually been working on a Python-based wikitext parser, using some techniques that should make the system a bit faster and cleaner... With a lot of luck, I should start making progress on that again in the next month or so.
For anyone who cares, I'll probably be trying to implement a PEG-based parser using mxTextTools, since I think that should be able to parse all of MediaWiki's wikitext, and should be about twice as fast as the current Parser.php (which is about as fast as wiki2xml)... Or I might just end up using ANTLR, if I can bully my current semi-grammar into working in that framework... If anyone knows of a decent PEG parser with a Python API (a packrat parser might be ideal), that'd be great too. *shrugs*
- Eric Astor
Eric Astor wrote:
I've actually been working on a Python-based wikitext parser, using some techniques that should make the system a bit faster and cleaner... With a lot of luck, I should start making progress on that again in the next month or so.
For anyone who cares, I'll probably be trying to implement a PEG-based parser using mxTextTools, since I think that should be able to parse all of MediaWiki's wikitext, and should be about twice as fast as the current Parser.php (which is about as fast as wiki2xml)... Or I might just end up using ANTLR, if I can bully my current semi-grammar into working in that framework... If anyone knows of a decent PEG parser with a Python API (a packrat parser might be ideal), that'd be great too. *shrugs*
- Eric Astor
Yes! I also believe that PEGs and [[packrat parser]]s are the way to go with parsing wikitext, because of the very ad-hoc definition of wikitext.
A basic packrat parser is pretty easy to implement; it's simply a brute-force recursive-descent parser with memoization of (offset, term) -> production mappings. Scheme is a pretty good language to write a packrat parser in, since the grammar itself can be written as an S-expression, and is easy to use for program transformation (see below).
A simple implementation just interprets the grammar tree, matching as it goes.
You can achieve considerable speedups by:
1 using the grammar to generate code, and compiling and executing that instead of interpreting the grammar by hand
2 allowing the grammar to contain both PEG expressions and regexps for low-level lexical matching: regexps will be at least an order of magnitude faster than even compiled PEGs for matching low-level lexical tokens like numbers and names, without removing the ability of PEGs to blur the distinction between lexical and syntactic analysis, which is important for parsing strange things like wikitext.
I've implemented packrat parsing in both Python and Scheme: Scheme was faster, and ultimately more natural.
The one awkward bit is left-recursion removal, which breaks packrat parsers unless you alter the grammar to an equivalent form without left recursion. I did it by hand on my input grammars, but it could easily be done programatically at grammar-generation time.
I'm not sure about the best way to implement an API: have you considered just using the parser to convert from wikitext to somthing like PYX, which is a very simple-to-parse and Python-friendly representation of an XML data structure, and can then be used either to build an in-core parse tree, or drive something like a SAX API, or whatever other form of post-processing you like (for example, direct procedural text-to-text generation, which could be very simple indeed, since the output of a successful parse is guaranteed by definition to _exactly_ conform to the grammar specification).
-- Neil
[PYX: http://www.xml.com/pub/a/2000/03/15/feature/index.html]
Neil Harris wrote:
Yes! I also believe that PEGs and [[packrat parser]]s are the way to go with parsing wikitext, because of the very ad-hoc definition of wikitext.
Absolutely agreed. I only wish PEGs could support backreference matches, as it would clean up list, allowed HTML, and extension handling. In fact, I'm not quite sure how to handle lists without backreferences.
You can achieve considerable speedups by:
1 using the grammar to generate code, and compiling and executing that instead of interpreting the grammar by hand
Definitely - come to think of it, I bet this could be done VERY nicely with Python. Or most other sufficiently self-exposed languages... Hm.
2 allowing the grammar to contain both PEG expressions and regexps for low-level lexical matching: regexps will be at least an order of magnitude faster than even compiled PEGs for matching low-level lexical tokens like numbers and names, without removing the ability of PEGs to blur the distinction between lexical and syntactic analysis, which is important for parsing strange things like wikitext.
This sounds like a great idea for extended PEGs anyway... I'll remember that if I end up building an mxTextTools frontend for PEGs, since mxTextTools can easily hook into arbitrary matching functions (including regex).
I've implemented packrat parsing in both Python and Scheme: Scheme was faster, and ultimately more natural.
That's quite possible - the problem would be that I don't know Scheme, and I am going to be extremely busy for the foreseeable future at school. I'd rather not have to write a packrat parser myself, anyway... However simple they may be, they improve drastically with optimizations, and I don't anticipate having the time to implement a proper system.
Unless a good Python-accessible packrat parser already exists, I'm most likely to just build a solid PEG frontend for mxTextTools. It's a very powerful text parser, and tends to be fast (the module's mostly written in C). I think it could easily support all PEG features. Actually, I think SimpleParse (another mxTextTools frontend) already supports at least 90% of PEG features, so maybe the best idea is simply to rework SimpleParse to use standard PEG syntax instead of its extremely extended BNF variant.
I'm not sure about the best way to implement an API: have you considered just using the parser to convert from wikitext to somthing like PYX, which is a very simple-to-parse and Python-friendly representation of an XML data structure...
Something like that would probably be ideal, although I'd tend to prefer a more abstract data structure that's programmatically accessible - maybe an mxTextTools tag list (its normal output format) is closer to what I mean.
- Eric Astor
mxTextTools: http://www.egenix.com/files/python/mxTextTools.html SimpleParse: http://simpleparse.sourceforge.net/simpleparse_grammars.html
Jay R. Ashworth wrote:
3- the list of alternative parsers ( http://meta.wikimedia.org/wiki/Alternative_parsers ) does not mention wikitext2docbook, and says that flexbisonparse is "Intended as an eventual replacement to the parsing code inside MediaWiki itself", which is rather promising!
I don't know that that is what Magnus is calling it, but
Where and why did this misconception arise that flexbisonparse was written by Magnus? Quite honestly, it is driving me nuts...
Timwi
Timwi schrieb:
Jay R. Ashworth wrote:
3- the list of alternative parsers ( http://meta.wikimedia.org/wiki/Alternative_parsers ) does not mention wikitext2docbook, and says that flexbisonparse is "Intended as an eventual replacement to the parsing code inside MediaWiki itself", which is rather promising!
I don't know that that is what Magnus is calling it, but
Where and why did this misconception arise that flexbisonparse was written by Magnus? Quite honestly, it is driving me nuts...
Official clarification: flexbisonparse was written by Timwi, and Timwi alone :-)
I had a look at it once, and didn't find my way through the flex jungle, so I gave up quickly. I did, however, base the XML of wiki2xml on the flexbisonparse output; they're not identical, however.
Magnus
Magnus Manske wrote:
Official clarification: flexbisonparse was written by Timwi, and Timwi alone :-)
I had a look at it once, and didn't find my way through the flex jungle, so I gave up quickly. I did, however, base the XML of wiki2xml on the flexbisonparse output; they're not identical, however.
I've just given wiki2xml a go on a big page ( http://en.wikipedia.org/wiki/The_Adventures_of_Tintin ) that is full of references, and I noticed that there is a problem with some of them: the following wikicode: <ref name="Farr">{{cite journal | last =Farr | first =Michael | authorlink =Michael Farr | coauthors = | year =2004 | month =March | title =Thundering Typhoons | journal =History Today | volume =54 | issue =3 | pages =62 | id = | url = | format = | accessdate = }}</ref>
is translated as: <extension extension_name="ref" name="Farr"> <xhtml:cite style="font-style:normal">.</xhtml:cite> </extension>
As you can see, there is quite a bit of missing information.
At the moment, I think I'll carry on with flexbisonparse, adding some python patches to correct the output. Maybe later I'll switch to wiki2xml instead (although it is a bit slower than flexbisonparse to say the less). This shouldn't be to difficult as they both use some dialect of XML.
Concerning docbook, I'll also have to give it a try, but one of my concerns is that (as far as I know) there is no way to give specific formatting instructions, which, IMO is mandatory for a nice print output. I have nothing against semantic description, but sometimes you have to fine-tune some specific part (figure position, alignment tolerance...). I'm sure those who use LaTeX intensively will understand...
Cyril
On Mon, Aug 28, 2006 at 04:21:41PM +0100, Buttay cyril wrote:
At the moment, I think I'll carry on with flexbisonparse, adding some python patches to correct the output. Maybe later I'll switch to wiki2xml instead (although it is a bit slower than flexbisonparse to say the less). This shouldn't be to difficult as they both use some dialect of XML.
Well, if we end up with a standalone parser that gives output that can be transliterated to DocBook, I'll be happy, no matter who wrote it. :-)
Concerning docbook, I'll also have to give it a try, but one of my concerns is that (as far as I know) there is no way to give specific formatting instructions, which, IMO is mandatory for a nice print output. I have nothing against semantic description, but sometimes you have to fine-tune some specific part (figure position, alignment tolerance...). I'm sure those who use LaTeX intensively will understand...
Yeah, it's called a stylesheet, and it's the responsibility of the person who needs a specific kind of final output. Wiring it into the parser/converter would be A Bad Design. Been dealing with them since Ventura Publisher 3.1...
Cheers, -- jra
On Mon, Aug 28, 2006 at 02:21:46PM +0200, Magnus Manske wrote:
Timwi schrieb:
Jay R. Ashworth wrote:
3- the list of alternative parsers ( http://meta.wikimedia.org/wiki/Alternative_parsers ) does not mention wikitext2docbook, and says that flexbisonparse is "Intended as an eventual replacement to the parsing code inside MediaWiki itself", which is rather promising!
I don't know that that is what Magnus is calling it, but
Where and why did this misconception arise that flexbisonparse was written by Magnus? Quite honestly, it is driving me nuts...
Official clarification: flexbisonparse was written by Timwi, and Timwi alone :-)
And I wan't suggesting otherwise.
I had a look at it once, and didn't find my way through the flex jungle, so I gave up quickly. I did, however, base the XML of wiki2xml on the flexbisonparse output; they're not identical, however.
wiki2xml: that's what you call it.
Yeah: that.
Cheers, -- jra
Magnus Manske wrote:
Official clarification: flexbisonparse was written by Timwi, and Timwi alone :-)
The important thing is that it is "a piece of German engineering". You might not realize how strong this brand is in the English-speaking world. Perhaps we should add that line just under "the free encyclopedia" in the logotype?
Number of Google hits reported for:
976,000 "German engineering" Just look at the distance to No. 2. 273,000 "American engineering" 259,000 "Texas engineering" 147,000 "Michigan engineering" 119,000 "Swiss engineering" 110,000 "Canadian engineering" 104,000 "British engineering" 76,000 "European engineering" 74,700 "French engineering" 71,000 "Florida engineering" 58,600 "California engineering" 56,700 "Indian engineering" 40,200 "Swedish engineering" 37,500 "Chinese engineering" 35,900 "Utah engineering" 33,600 "Italian engineering" 33,400 "English engineering" 33,200 "Japanese engineering" 31,300 "Minnesota engineering" 31,100 "New York engineering" 29,800 "Washington engineering" 27,200 "Thai engineering" 26,200 "Russian engineering" 25,200 "Scottish engineering" 24,000 "Pennsylvania engineering" 21,700 "Oklahoma engineering" 19,600 "Arizona engineering" 19,500 "Ohio engineering" 18,500 "Alabama engineering" 17,600 "Norwegian engineering" 16,400 "Oregon engineering" 16,200 "New Jersey engineering" 13,700 "Dutch engineering" 13,200 "Connecticut engineering" 12,300 "Nevada engineering" 11,000 "Danish engineering" 10,600 "Egyptian engineering" 9,780 "Spanish engineering" 9,670 "Finnish engineering" 891 "Slovak engineering" 806 "Massachusetts engineering" 723 "Austrian engineering" 681 "Czech engineering" 650 "Israeli engineering" 544 "Ukrainian engineering" 529 "Iranian engineering" 485 "Greek engineering" 478 "Mexican engineering" 451 "Hungarian engineering" 432 "Polish engineering" 388 "Portuguese engineering" 325 "Scandinavian engineering" 269 "Welsh engineering" 223 "Belgian engineering"
On 28/08/06, Lars Aronsson lars@aronsson.se wrote:
Magnus Manske wrote:
Official clarification: flexbisonparse was written by Timwi, and Timwi alone :-)
The important thing is that it is "a piece of German engineering". You might not realize how strong this brand is in the English-speaking world. Perhaps we should add that line just under "the free encyclopedia" in the logotype?
Flexbisonparse isn't used on Wikipedia, though. However, MediaWiki started life as a script written by a German biology student...so... ;)
Rob Church
On Mon, Aug 28, 2006 at 11:51:14PM +0200, Lars Aronsson wrote:
Magnus Manske wrote:
Official clarification: flexbisonparse was written by Timwi, and Timwi alone :-)
The important thing is that it is "a piece of German engineering".
Works for me.
Cheers, -- jr "'87 e24" a
Hi,
At the moment, things work fine, except for 2 things with flexbisonparse:
You really think these two problems you have listed are the only ones? :-) I assure you there are heaps of other problems with it.
However, I'm not (at all) an expert in flex and bison, and have the biggest difficulties understanding the code of flexbisonparse.
I find this quite amazing. I am not, and have never been, an "expert" in flex and bison either, and I do agree that the code is not easy, but is it really that much harder than MediaWiki itself?
Is there a developer of this parser around? Do you think the modifications are feasible?
Yes, the modifications are feasible, but in order to implement them, you need to find someone who has the motivation to do that. I am currently not very motivated myself.
Timwi
Steve Bennett wrote:
Ok, so fwiw, I was just trying to contribute to the pre-implementation requirements gathering phase. Sorry if it was more noise than signal.
No problem. Just the "years" was possibly a little bit too inflammatory ;-)
But I think we are all grown up here and can stand it. Long live MediaWiki! You guys are doing an awesome job here.
Now let's get back to business everybody. I love reading about new stuff. Thanks to all of you!
--Ligulem
"Rob Church" wrote:
Christ, in future, remind me not to post tentative previews of possible things to come. It's not worth the bloody hassle of having fifty thousand trivial little questions.
Better having them with a preview than having them saying "I don't like multiple drafts" after several weeks of working to implement them effectively. :-)
wikitech-l@lists.wikimedia.org