Gentlemen, why doesn't the database's text table use Linus Torvalds'
"git" style mapping of identical contents to the same row, as they
have the same SHA1 hash?
Currently even undoing a user's edit just points to a fresh row in the
text table instead of pointing to the identical old one.
This despite these words in tables.sql:
-- It's possible for multiple revisions to use the same text,
-- for instance revisions where only metadata is altered
-- or a rollback to a previous version.
Examining my wiki,
echo "SELECT old_text FROM text;"|mysql --default-character-set=binary
radioscanningtw -N|\
perl -nwle 'use Digest::SHA1 qw/sha1_hex/;$h{sha1_hex($_)}++;END{for(keys %h){print
$h{$_}}}'|sort -nr|uniq -c
1 247
1 5
2 4
10 3
261 2
1206 1
I find all but the last 1206 records are duplicated.
echo "SELECT old_text FROM text;"|mysql --default-character-set=binary
radioscanningtw -N|\
perl -lnwe 'use Digest::SHA1 qw/sha1_hex/;print sha1_hex($_),"\t",
$_'|sort|uniq -c|sort -nr|\
perl -C -nwle '/.{0,88}/;print $&;exit if $.==5'
247 da39a3ee5e6b4b0d3255bfef95601890afd80709
5 bf36408b7db0ea4b834b935ae2992e97fd438539 請問台中港務警察局的頻率、頻道,有人知道嗎?可以分享嗎?
4 fa21b2d9a4ace2bb86917e7a83ad20a1f5301917 {{c|486.1000}}|{{c|DCS 065}}|{{c|呼 8xx}
4 a860f97b87c81344239766c2f243bfff05ae7cdd #REDIRECT [[Project:幫助]]
3 edabb0d4f0f21cd9dfba867ef9cdbc584c8937c1 全國監獄 {{c|150.6750}}
I even have 247 separate entries for a file with 0 bytes, from a page
blanking incident. One would be enough.