"Tim Starling"
<t.starling(a)physics.unimelb.edu.au> wrote in
message news:dtbtss$tkf$1@sea.gmane.org...
An enhanced version of the C++ diff extension,
wikidiff2, is now running
on both clusters. It now
does character-level diffs on Chinese, Japanese and Thai, so it produces
much better results than
the PHP diff algorithm, in a much shorter time to boot. Chinese had an
ad-hoc segmentation scheme
based on inserting a space between every character before the diff, then
removing the spaces
afterwards, but unfortunately that left spaces all over the place where
there shouldn't have been
spaces. Anyway, it's fixed now.
Any chances of improving the behaviour when several successive paragraphs
are split apart?
e.g. assuming that A, B, C, D, etc are words, when:
----
ABCDEF
GHIJKLM
----
becomes
----
A'
B'
C'
D'
E'
F'
G'
H'
I'
J'
K'
L'
----
At present the component parts of the second paragraph do not line up with
that paragraph, so it is difficult to compare the versions:
--Old edit-- --New edit--
ABCDEF A'
B'
GHIJKLM C'
D'
E'
F'
G'
H'
I'
J'
K'
L'
------ ------
As you cxan see, by the time the fragments G', H', I', J', K', L'
appear,
the original might have scrolled off the top. If the alterations were subtle
it might be difficult to check them adequately.
Would it be possible to represent this in this fashion?
--Old edit-- --New edit--
ABCDEF A'
B'
C'
D'
E'
F'
GHIJKLM G'
H'
I'
J'
K'
L'
------ ------
HTH HAND
--
Phil
[[en:User:Phil Boswell]]