On 11-09-20 02:26 PM, Platonides wrote:
Domas Mituzas wrote:
- When reverting, do a select count(*) where md5=? and then do something
more advanced when more than one match is found
finally "we don't need an index on it" becomes "we need an index on it", and storage efficiency becomes much more interesting (binary packing yay ;-)
so, what are the use cases and how does one index for them? is it global hash check, per page? etc
Domas
I don't know why people have started asking about talking hashes when reverting. MediaWiki already knows in a rollback that the revision is identical to the previous one. It even avoids storing the same text twice. However, if you want to "check if something is a revert" say clearly that you are testing on every edit if there's a previous revision [for the same page?] with the same text. Is there a "something more advanced to be done" or just dreams? Would for instance the WikiTrust extension make use of it, or instead would need to store its own checksums on another table?
PS: I don't consider bug 2939 a good need. I think it is good to see that there were messages. The provided usecase could be solved with rollback in bot mode.
Care to re-open that bug then and suggest how to implement it? Because that bug was closed as LATER specifically because we don't have a checksum on the revision table.
Also rollbacks are different than undos and manual reverts. On rollbacks we re-use the text field, however on a normal undo or edit I don't believe we scan old revisions for matching content and re-use the text. Hence on any type of revert but a rollback we don't know it's a revert without a checksum. And iirc it was brion that didn't like the idea of having to fetch and compare at least two text blobs when we need to do this.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]