On Mon, Apr 6, 2009 at 3:57 PM, <jidanni(a)jidanni.org> wrote:
I'm curious what does
SELECT COUNT(DISTINCT old_text), COUNT(*) FROM text;
show on Wikipedia's database? On mine I get
COUNT(DISTINCT old_text): 2913
COUNT(*): 3560
I.e., 1/7 of the rows are redundant.
As others have noted, Wikimedia compresses everything and doesn't
really store lots of redundant text.
That said, past analysis of edit summaries suggest that about 1 edit
in 10 is a revert on enwiki.
-Robert Rohde