Gentlemen, why doesn't the database's text table use Linus Torvalds' "git" style mapping of identical contents to the same row, as they have the same SHA1 hash?
Currently even undoing a user's edit just points to a fresh row in the text table instead of pointing to the identical old one.
This despite these words in tables.sql: -- It's possible for multiple revisions to use the same text, -- for instance revisions where only metadata is altered -- or a rollback to a previous version.
Examining my wiki, echo "SELECT old_text FROM text;"|mysql --default-character-set=binary radioscanningtw -N|\ perl -nwle 'use Digest::SHA1 qw/sha1_hex/;$h{sha1_hex($_)}++;END{for(keys %h){print $h{$_}}}'|sort -nr|uniq -c 1 247 1 5 2 4 10 3 261 2 1206 1 I find all but the last 1206 records are duplicated.
echo "SELECT old_text FROM text;"|mysql --default-character-set=binary radioscanningtw -N|\ perl -lnwe 'use Digest::SHA1 qw/sha1_hex/;print sha1_hex($_),"\t", $_'|sort|uniq -c|sort -nr|\ perl -C -nwle '/.{0,88}/;print $&;exit if $.==5' 247 da39a3ee5e6b4b0d3255bfef95601890afd80709 5 bf36408b7db0ea4b834b935ae2992e97fd438539 請問台中港務警察局的頻率、頻道,有人知道嗎?可以分享嗎? 4 fa21b2d9a4ace2bb86917e7a83ad20a1f5301917 {{c|486.1000}}|{{c|DCS 065}}|{{c|呼 8xx} 4 a860f97b87c81344239766c2f243bfff05ae7cdd #REDIRECT [[Project:幫助]] 3 edabb0d4f0f21cd9dfba867ef9cdbc584c8937c1 全國監獄 {{c|150.6750}} I even have 247 separate entries for a file with 0 bytes, from a page blanking incident. One would be enough.