Gentlemen, why doesn't the database's text table use Linus Torvalds' "git" style mapping of identical contents to the same row, as they have the same SHA1 hash?
Currently even undoing a user's edit just points to a fresh row in the text table instead of pointing to the identical old one.
This despite these words in tables.sql: -- It's possible for multiple revisions to use the same text, -- for instance revisions where only metadata is altered -- or a rollback to a previous version.
Examining my wiki, echo "SELECT old_text FROM text;"|mysql --default-character-set=binary radioscanningtw -N|\ perl -nwle 'use Digest::SHA1 qw/sha1_hex/;$h{sha1_hex($_)}++;END{for(keys %h){print $h{$_}}}'|sort -nr|uniq -c 1 247 1 5 2 4 10 3 261 2 1206 1 I find all but the last 1206 records are duplicated.
echo "SELECT old_text FROM text;"|mysql --default-character-set=binary radioscanningtw -N|\ perl -lnwe 'use Digest::SHA1 qw/sha1_hex/;print sha1_hex($_),"\t", $_'|sort|uniq -c|sort -nr|\ perl -C -nwle '/.{0,88}/;print $&;exit if $.==5' 247 da39a3ee5e6b4b0d3255bfef95601890afd80709 5 bf36408b7db0ea4b834b935ae2992e97fd438539 請問台中港務警察局的頻率、頻道,有人知道嗎?可以分享嗎? 4 fa21b2d9a4ace2bb86917e7a83ad20a1f5301917 {{c|486.1000}}|{{c|DCS 065}}|{{c|呼 8xx} 4 a860f97b87c81344239766c2f243bfff05ae7cdd #REDIRECT [[Project:幫助]] 3 edabb0d4f0f21cd9dfba867ef9cdbc584c8937c1 全國監獄 {{c|150.6750}} I even have 247 separate entries for a file with 0 bytes, from a page blanking incident. One would be enough.
Hello,
Gentlemen, why doesn't the database's text table use Linus Torvalds' "git" style mapping of identical contents to the same row, as they have the same SHA1 hash?
mediawiki-l? anyway, you can use external storage to use CAS-based storage, if you really want.
DM> you can use external storage to use CAS-based storage, if you really want. Ah, http://en.wikipedia.org/wiki/Content-addressable_storage#Open_Source_Impleme... http://en.wikipedia.org/wiki/Git_(software)#Implementation And while you're at it, he says subversion is for goners, http://www.google.com/search?q=torvalds+subversion+git
Hi!
Ah, http://en.wikipedia.org/wiki/Content-addressable_storage#Open_Source_Impleme... http://en.wikipedia.org/wiki/Git_(software)#Implementation
Thanks for sharing these extremely valuable links. How did you find them?
And while you're at it, he says subversion is for goners, http://www.google.com/search?q=torvalds+subversion+git
Good for him. Should we store all our content in GIT, from now on?
On Mon, Mar 30, 2009 at 12:49 PM, jidanni@jidanni.org wrote:
DM> you can use external storage to use CAS-based storage, if you really want. Ah, http://en.wikipedia.org/wiki/Content-addressable_storage#Open_Source_Impleme... http://en.wikipedia.org/wiki/Git_(software)#Implementation
Neither git nor Linus Torvalds invented Content-Addressable Storage. They've been around for years, but we haven't ever needed it enough to implement it. I assume that if we did need it, we would, as Tim Starling, one of our staff developers, has been working actively on a history recompression project.
And while you're at it, he says subversion is for goners, http://www.google.com/search?q=torvalds+subversion+git
While we would like to move to a distributed RCS, we're not doing it because Linus Torvalds told us to.
wikitech-l@lists.wikimedia.org