The text storage backend could quite legitimately do that on its own. I'm not quite sure why the reference to page/archive tables: no two revisions are "identical" (different rev_timestamp if nothing else); each revision has a text_id to the text of the revision in the text table: you mean that a revision entry could potentially refer to an existing text_id if it was demonstrably identical, rather than creating a new entry and potentially duplicating the text itself. But the text table is not the final stage in the process, or at least it doesn't have to be; MediaWiki is happy as long as throwing that text_id into the database and cranking the handle churns out the appropriate text; it doesn't care how that text is stored or retrieved. Only in the default setting is each old_text field populated with the full text.
That said, I do agree that this should be done. We do it for images, we should do it for text, because it's useful for more than just data compression, as suggested by the OP. It could be used to make evaluation of reversions in extensions like AbuseFilter and FlaggedRevs much more effective and efficient, for instance. And it probably *could* be used to improve the compression of the fully-written text table.
--HM
jidanni@jidanni.org wrote in message news:87hbxlr3va.fsf@jidanni.org...
Also it could be used to say "do I really need to store this revision in the 'page' or 'archive' tables, or can I just refer to an existing identical revision".