On Thu, Aug 17, 2006 at 07:17:11PM -0700, Ben Garney wrote:
On 8/17/06, Jay R. Ashworth jra@baylink.com wrote:
I don't think that a Flag Day for some exceedingly esoteric construction which needs to be cleaned up to make a formal parser necessary is completely impossible, but it would have to be pretty negligible, pretty important, or both... it goes back to that circle I mentioned.
So what if we had a "lossless" wikisyntax to XML converter? It seems like that wouldn't be an impossibility (given we're already parsing wikisyntax to _HTML_).
What are the reactions to e.g. converting the backend to use that XML storage, then enforcing it on the editor side, as well?
I Am Not A Wikimedia Foundation Employee.
That said, there are two sorts of Flag Days: those which affect users and those which don't.
What you're suggesting here (and Simetrical has suggested before) would -- assuming that conversion is *really* lossless, which I don't know can be guaranteed at the moment -- only flag programs, not hundreds of thousands of heads.
That makes it just a tad liklier to ever happen.
Obviously we'd have to be clever on the conversion (like making VERY sure it's a "lossless" switch, and finding a computationally feasible way to get it done - maybe update every article as it's touched?).
Yeah; and which processor that XML<->WT conversion happened on would be critical for load reasons...
To my way of thinking, if we had an XML backend store and a reliable conversion path, then we could: a) Provide wikisyntax editing to those who want it (by filtering through the converter) b) Develop meaningful wysiwyg editing tools without having to first reimplement the wikisyntax parser in javascript and every other language we want to touch. c) Allow direct access to the XML, making all kinds of researchers happy. d) Incrementally roll out changes to bring things more in line with Semantic Web, again with conversion paths.
Engineering wise, a "lossless" path to me could be developed by developing these components:
- Wikisyntax <-> WikiXML converters.
- WikiXML -> HTML renderer.
Determing that it is working properly can be done by testing against the Wikipedia corpus. If we can go from WikiXML to Wikisyntax and back, byte-exact, we've acheived our goal. Maybe it's ok to relax that restriction (especially if we can determine in some other way the page is corrupt or invalid - or maybe we have a list of exceptions), but I think it's one that's both acheivable and reasonable.
Hmmm...
We may also want to do validation on the HTML render path; if we want to be really strict we can require that the conversion path gives identical output (perhaps sans whitespace?) to the current parser & renderer.
I, personally, am a bit less concerned there: it's like advertising photography (where the printed ad in the magazine actually has to match the PMS color of the object) vs color pictures of the local parade in the newspaper (where you only care that the clown's face is 'pretty').
Once we have everything in XML, there are a number of good tools and standards to enable us to be Unicode compliant, to do various kinds of conversions and updates on the XML, and otherwise process our data, so we can evolve it forward to meet our needs.
In any case - if we find that having a lossless path would satisfy the constraints, then those who are interested can focus on writing a validation framework... and then they can go implement it. :)
Aw, *ick*.
Y'all are talking me into it.
Quick! Somebody talk me back out of it! :-)
Cheers, -- jr "isn't it a good thing it doesn't matter what I think?" a