* Brion Vibber <brion(a)pobox.com> [Thu, 2 Dec 2010 12:15:18 -0800]:
What is it that your system actually needs to be able
to do this for?
Is
there an issue with loading up the previous text
items, or are you
trying to
optimize storage on your end by not storing text twice when it
happened
to
use the same text blob on the origin site?
I try to synchronize "recent changes" of two wiki sites via XML chunks
(consequtive groups of 10 revisions), created by WikiExporter. It mostly
works (however I am still haven't checked all throughly, what will
happen if an revision with earlier timestamp is trying to import over
revision with older timestamp?), however, ImportReporter::reportPage
also creates an extra null revision for every revision page imported for
"informational purposes" ("Imported by WikiSync" in my case).
Unfortunately, at the next run of synchronization, such revision becomes
a difference between sites and synchronization reports that sites are
not equal (even though there really was no changes, except for
informational null revision).
Beware that there's not anything that really
distinguishes null
revisions
from their predecessors, other than that they come later than the
previous
ones. Note that it's also possible for the earlier revision to get
deleted
while a later revision using the same text blob still remains.
That's really bad for me - I probably should patch the deletion as well,
to remove a flag field of rev_null from null revision row, when it's
non-null match of rev_text_id was deleted :-( Too much of patches of the
core and I am even not sure that I can intercept all kinds of revision
deletion - should check that).
With GROUP BY on large set being slow and FIRST / LAST aggregators
unavailable, it probably would be easier to me just not to call
ImportReporter from by derived WikiImporter class? Informational null
revisions won't be simply created in such case. They are nice to end
user, that's why I have tried to keep them.
The previously referenced text blob might also have
originally come in
in a
much older revision, not the immediately preceding one; this may be
legit
for certain kinds of reverts, for instance.
Thanks for explanation.
Dmitriy