Re: [Wikitech-l] Dajoo: a Java-based offline editor/viewer

18 Aug 2006

On Thu, Aug 17, 2006 at 07:17:11PM -0700, Ben Garney wrote:
...
  On 8/17/06, Jay R. Ashworth &lt;jra(a)baylink.com&gt;
wrote:
  I don't think that a Flag Day for some
exceedingly esoteric
 construction which needs to be cleaned up to make a formal parser
 necessary is completely impossible, but it would have to be pretty
 negligible, pretty important, or both... it goes back to that circle I
 mentioned.  
 So what if we had a "lossless" wikisyntax to XML converter? It seems like
 that wouldn't be an impossibility (given we're already parsing wikisyntax to
 _HTML_).

 What are the reactions to e.g. converting the backend to use that XML
 storage, then enforcing it on the editor side, as well? 
I Am Not A Wikimedia Foundation Employee.

That said, there are two sorts of Flag Days: those which affect users
and those which don't.

What you're suggesting here (and Simetrical has suggested before) would
-- assuming that conversion is *really* lossless, which I don't know
can be guaranteed at the moment -- only flag programs, not hundreds of
thousands of heads.

That makes it just a tad liklier to ever happen.

...
  Obviously we'd have to be clever on the conversion
(like making VERY sure
 it's a "lossless" switch, and finding a computationally feasible way to
get
 it done - maybe update every article as it's touched?). 
Yeah; and which processor that XML<->WT conversion happened on would be
critical for load reasons...

...
  To my way of thinking, if we had an XML backend store
and a reliable
 conversion path, then we could:
    a) Provide wikisyntax editing to those who want it (by filtering through
 the converter)
    b) Develop meaningful wysiwyg editing tools without having to first
 reimplement the wikisyntax parser in javascript and every other language we
 want to touch.
    c) Allow direct access to the XML, making all kinds of researchers happy.
    d) Incrementally roll out changes to bring things more in line with
 Semantic Web, again with conversion paths.

 Engineering wise, a "lossless" path to me could be developed by developing
 these components:
 1. Wikisyntax <-> WikiXML converters.
 2. WikiXML -> HTML renderer.

 Determing that it is working properly can be done by testing against the
 Wikipedia corpus. If we can go from WikiXML to Wikisyntax and back,
 byte-exact, we've acheived our goal. Maybe it's ok to relax that restriction
 (especially if we can determine in some other way the page is corrupt or
 invalid - or maybe we have a list of exceptions), but I think it's one
 that's both acheivable and reasonable. 
Hmmm...

...
  We may also want to do validation on the HTML render
path; if we want to be
 really strict we can require that the conversion path gives identical output
 (perhaps sans whitespace?) to the current parser & renderer. 
I, personally, am a bit less concerned there: it's like advertising
photography (where the printed ad in the magazine actually has to match
the PMS color of the object) vs color pictures of the local parade in
the newspaper (where you only care that the clown's face is 'pretty').

...
  Once we have everything in XML, there are a number of
good tools and
 standards to enable us to be Unicode compliant, to do various kinds of
 conversions and updates on the XML, and otherwise process our data, so we
 can evolve it forward to meet our needs.

 In any case - if we find that having a lossless path would satisfy the
 constraints, then those who are interested can focus on writing a validation
 framework... and then they can go implement it. :) 
Aw, *ick*.

Y'all are talking me into it.

Quick!  Somebody talk me back out of it! :-)

Cheers,
-- jr "isn't it a good thing it doesn't matter what I think?" a
-- 
Jay R. Ashworth                                                jra(a)baylink.com
Designer                          Baylink                             RFC 2100
Ashworth & Associates        The Things I Think                        '87 e24
St Petersburg FL USA      http://baylink.pitas.com             +1 727 647 1274

	The Internet: We paved paradise, and put up a snarking lot.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Dajoo: a Java-based offline editor/viewer