Re: [Wikitech-l] Proposal/RFC: Checksum of revision text

9 Jul 2009


      The text storage backend could quite legitimately do that on its own.  I'm 
not quite sure why the reference to page/archive tables: no two revisions 
are "identical" (different rev_timestamp if nothing else); each revision has 
a text_id to the text of the revision in the text table: you mean that a 
revision entry could potentially refer to an existing text_id if it was 
demonstrably identical, rather than creating a new entry and potentially 
duplicating the text itself.  But the text table is not the final stage in 
the process, or at least it doesn't have to be; MediaWiki is happy as long 
as throwing that text_id into the database and cranking the handle churns 
out the appropriate text; it doesn't care how that text is stored or 
retrieved.  Only in the default setting is each old_text field populated 
with the full text.
That said, I do agree that this should be done.  We do it for images, we 
should do it for text, because it's useful for more than just data 
compression, as suggested by the OP.  It could be used to make evaluation of 
reversions in extensions like AbuseFilter and FlaggedRevs much more 
effective and efficient, for instance.  And it probably *could* be used to 
improve the compression of the fully-written text table.
--HM
jidanni@jidanni.org wrote in message news:87hbxlr3va.fsf@jidanni.org...
...
Also it could be used to say "do I really need to store this revision in
the 'page' or 'archive' tables, or can I just refer to an existing
identical revision".

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Proposal/RFC: Checksum of revision text