On Thu, Dec 2, 2010 at 10:43 AM, Dmitriy Sintsov questpc@rambler.ru wrote:
Indices are not hard to add, that's true. However, even with indexes the GROUP BY rev_text_id query on large revision set is slow. I probably will have to patch Revision::newNullRevision to add a new field value there (for the existing it is possible to fill the new field with UPDATE, however there will be new null revisions).
What is it that your system actually needs to be able to do this for? Is there an issue with loading up the previous text items, or are you trying to optimize storage on your end by not storing text twice when it happened to use the same text blob on the origin site?
Beware that there's not anything that really distinguishes null revisions from their predecessors, other than that they come later than the previous ones. Note that it's also possible for the earlier revision to get deleted while a later revision using the same text blob still remains.
The previously referenced text blob might also have originally come in in a much older revision, not the immediately preceding one; this may be legit for certain kinds of reverts, for instance.
-- brion