The text storage backend could quite legitimately do that on its own. I'm
not quite sure why the reference to page/archive tables: no two revisions
are "identical" (different rev_timestamp if nothing else); each revision has
a text_id to the text of the revision in the text table: you mean that a
revision entry could potentially refer to an existing text_id if it was
demonstrably identical, rather than creating a new entry and potentially
duplicating the text itself. But the text table is not the final stage in
the process, or at least it doesn't have to be; MediaWiki is happy as long
as throwing that text_id into the database and cranking the handle churns
out the appropriate text; it doesn't care how that text is stored or
retrieved. Only in the default setting is each old_text field populated
with the full text.
That said, I do agree that this should be done. We do it for images, we
should do it for text, because it's useful for more than just data
compression, as suggested by the OP. It could be used to make evaluation of
reversions in extensions like AbuseFilter and FlaggedRevs much more
effective and efficient, for instance. And it probably *could* be used to
improve the compression of the fully-written text table.
--HM
<jidanni(a)jidanni.org> wrote in message news:87hbxlr3va.fsf@jidanni.org...
Also it could be used to say "do I really need to
store this revision in
the 'page' or 'archive' tables, or can I just refer to an existing
identical revision".