Re: [Wikitech-l] Dajoo: a Java-based offline editor/viewer

18 Aug 2006

      On Thu, Aug 17, 2006 at 07:17:11PM -0700, Ben Garney wrote:
...
On 8/17/06, Jay R. Ashworth jra@baylink.com wrote:
...
I don't think that a Flag Day for some exceedingly esoteric
construction which needs to be cleaned up to make a formal parser
necessary is completely impossible, but it would have to be pretty
negligible, pretty important, or both... it goes back to that circle I
mentioned.
So what if we had a "lossless" wikisyntax to XML converter? It seems like
that wouldn't be an impossibility (given we're already parsing wikisyntax to
_HTML_).
What are the reactions to e.g. converting the backend to use that XML
storage, then enforcing it on the editor side, as well?
I Am Not A Wikimedia Foundation Employee.
That said, there are two sorts of Flag Days: those which affect users
and those which don't.
What you're suggesting here (and Simetrical has suggested before) would
-- assuming that conversion is *really* lossless, which I don't know
can be guaranteed at the moment -- only flag programs, not hundreds of
thousands of heads.
That makes it just a tad liklier to ever happen.
...
Obviously we'd have to be clever on the conversion (like making VERY sure
it's a "lossless" switch, and finding a computationally feasible way to get
it done - maybe update every article as it's touched?).
Yeah; and which processor that XML<->WT conversion happened on would be
critical for load reasons...
...
To my way of thinking, if we had an XML backend store and a reliable
conversion path, then we could:
   a) Provide wikisyntax editing to those who want it (by filtering through
the converter)
   b) Develop meaningful wysiwyg editing tools without having to first
reimplement the wikisyntax parser in javascript and every other language we
want to touch.
   c) Allow direct access to the XML, making all kinds of researchers happy.
   d) Incrementally roll out changes to bring things more in line with
Semantic Web, again with conversion paths.
Engineering wise, a "lossless" path to me could be developed by developing
these components:

Wikisyntax <-> WikiXML converters.
WikiXML -> HTML renderer.

Determing that it is working properly can be done by testing against the
Wikipedia corpus. If we can go from WikiXML to Wikisyntax and back,
byte-exact, we've acheived our goal. Maybe it's ok to relax that restriction
(especially if we can determine in some other way the page is corrupt or
invalid - or maybe we have a list of exceptions), but I think it's one
that's both acheivable and reasonable.
Hmmm...
...
We may also want to do validation on the HTML render path; if we want to be
really strict we can require that the conversion path gives identical output
(perhaps sans whitespace?) to the current parser & renderer.
I, personally, am a bit less concerned there: it's like advertising
photography (where the printed ad in the magazine actually has to match
the PMS color of the object) vs color pictures of the local parade in
the newspaper (where you only care that the clown's face is 'pretty').
...
Once we have everything in XML, there are a number of good tools and
standards to enable us to be Unicode compliant, to do various kinds of
conversions and updates on the XML, and otherwise process our data, so we
can evolve it forward to meet our needs.
In any case - if we find that having a lossless path would satisfy the
constraints, then those who are interested can focus on writing a validation
framework... and then they can go implement it. :)
Aw, *ick*.
Y'all are talking me into it.
Quick!  Somebody talk me back out of it! :-)
Cheers,
-- jr "isn't it a good thing it doesn't matter what I think?" a
-- 
Jay R. Ashworth                                                jra@baylink.com
Designer                          Baylink                             RFC 2100
Ashworth & Associates        The Things I Think                        '87 e24
St Petersburg FL USA      http://baylink.pitas.com             +1 727 647 1274

    The Internet: We paved paradise, and put up a snarking lot.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Dajoo: a Java-based offline editor/viewer