On 8/17/06, Jay R. Ashworth <jra(a)baylink.com> wrote:
I don't think that a Flag Day for some exceedingly esoteric
construction which needs to be cleaned up to make a formal parser
necessary is completely impossible, but it would have to be pretty
negligible, pretty important, or both... it goes back to that circle I
mentioned.
So what if we had a "lossless" wikisyntax to XML converter? It seems like
that wouldn't be an impossibility (given we're already parsing wikisyntax to
_HTML_).
What are the reactions to e.g. converting the backend to use that XML
storage, then enforcing it on the editor side, as well?
Obviously we'd have to be clever on the conversion (like making VERY sure
it's a "lossless" switch, and finding a computationally feasible way to get
it done - maybe update every article as it's touched?).
To my way of thinking, if we had an XML backend store and a reliable
conversion path, then we could:
a) Provide wikisyntax editing to those who want it (by filtering through
the converter)
b) Develop meaningful wysiwyg editing tools without having to first
reimplement the wikisyntax parser in javascript and every other language we
want to touch.
c) Allow direct access to the XML, making all kinds of researchers happy.
d) Incrementally roll out changes to bring things more in line with
Semantic Web, again with conversion paths.
Engineering wise, a "lossless" path to me could be developed by developing
these components:
1. Wikisyntax <-> WikiXML converters.
2. WikiXML -> HTML renderer.
Determing that it is working properly can be done by testing against the
Wikipedia corpus. If we can go from WikiXML to Wikisyntax and back,
byte-exact, we've acheived our goal. Maybe it's ok to relax that restriction
(especially if we can determine in some other way the page is corrupt or
invalid - or maybe we have a list of exceptions), but I think it's one
that's both acheivable and reasonable.
We may also want to do validation on the HTML render path; if we want to be
really strict we can require that the conversion path gives identical output
(perhaps sans whitespace?) to the current parser & renderer.
Once we have everything in XML, there are a number of good tools and
standards to enable us to be Unicode compliant, to do various kinds of
conversions and updates on the XML, and otherwise process our data, so we
can evolve it forward to meet our needs.
In any case - if we find that having a lossless path would satisfy the
constraints, then those who are interested can focus on writing a validation
framework... and then they can go implement it. :)
--
Ben Garney
Torque Technologies Director
GarageGames.Com, Inc.