-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Brion Vibber schrieb:
Magnus Manske wrote:
An alternative would be to (optionally) run the
wiki source through my
PHP-based converter. It generates roughly the same XML as Timwi's *real*
parser does (or would do), so there would be compatability if we ever
switch parsers.
Well, that does something completely different doesn't it? Here we're
providing the 'transparent copy' which is the 'preferred format for
editing', which is the source text. A transformed version for output
could be useful in addition, but is a separate issue. Think program
source and object code...
I'm aware that it's something different, but one can (without too much
hassle) reconstruct wiki text from XML. It might not be *exactly* like
it was, but wether there were one or five blank lines (for example)
doesn't really matter, does it?
The advantage of providing "real" XML would be that the wiki text is
already "pre-parsed", making it much easier for the end user to process
it. He *is* already using an XML parser...
Of course, for plain backup storage, it won't make much sense. Anyway, I
*did* include the word "optionally" :-)
Last not
least, I remember there's a free compression algorithm that
specializes on XML compression, and claims to be more effective for XML
texts tham plain-text compression (which goes up to 10 times). Alas, I
forgot the name...
The vast bulk of the dump is natural-language text (which is highly
repetitive due to consecutively stored versions), so I'd be surprised if
this would outweigh the benefits of using a standard tool like gzip.
Might be good to know about it for reference though!
I'll try to find it again.
Magnus
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (MingW32)
Comment: Using GnuPG with Thunderbird -
http://enigmail.mozdev.org
iD8DBQFCfiv6CZKBJbEFcz0RAhKhAJ0a6/PJF6EetGUapAtMH874iphLrgCdF42o
CJJRU8w8AbXSBqJXpOQ9U9k=
=UFKi
-----END PGP SIGNATURE-----