* Brion Vibber brion@pobox.com [Thu, 2 Dec 2010 12:15:18 -0800]:
What is it that your system actually needs to be able to do this for?
Is
there an issue with loading up the previous text items, or are you trying to optimize storage on your end by not storing text twice when it
happened
to use the same text blob on the origin site?
I try to synchronize "recent changes" of two wiki sites via XML chunks (consequtive groups of 10 revisions), created by WikiExporter. It mostly works (however I am still haven't checked all throughly, what will happen if an revision with earlier timestamp is trying to import over revision with older timestamp?), however, ImportReporter::reportPage also creates an extra null revision for every revision page imported for "informational purposes" ("Imported by WikiSync" in my case). Unfortunately, at the next run of synchronization, such revision becomes a difference between sites and synchronization reports that sites are not equal (even though there really was no changes, except for informational null revision).
Beware that there's not anything that really distinguishes null revisions from their predecessors, other than that they come later than the previous ones. Note that it's also possible for the earlier revision to get deleted while a later revision using the same text blob still remains.
That's really bad for me - I probably should patch the deletion as well, to remove a flag field of rev_null from null revision row, when it's non-null match of rev_text_id was deleted :-( Too much of patches of the core and I am even not sure that I can intercept all kinds of revision deletion - should check that).
With GROUP BY on large set being slow and FIRST / LAST aggregators unavailable, it probably would be easier to me just not to call ImportReporter from by derived WikiImporter class? Informational null revisions won't be simply created in such case. They are nice to end user, that's why I have tried to keep them.
The previously referenced text blob might also have originally come in in a much older revision, not the immediately preceding one; this may be legit for certain kinds of reverts, for instance.
Thanks for explanation. Dmitriy