On 11-09-20 02:26 PM, Platonides wrote:
Domas Mituzas wrote:
* When
reverting, do a select count(*) where md5=? and then do something
more advanced when more than one match is found
finally "we don't need an
index on it" becomes "we need an index on it", and storage efficiency
becomes much more interesting (binary packing yay ;-)
so, what are the use cases and how does one index for them? is it global hash check, per
page? etc
Domas
I don't know why people have started asking about talking hashes when
reverting. MediaWiki already knows in a rollback that the revision is
identical to the previous one. It even avoids storing the same text twice.
However, if you want to "check if something is a revert" say clearly
that you are testing on every edit if there's a previous revision [for
the same page?] with the same text.
Is there a "something more advanced to be done" or just dreams? Would
for instance the WikiTrust extension make use of it, or instead would
need to store its own checksums on another table?
PS: I don't consider bug 2939 a good need. I think it is good to see
that there were messages. The provided usecase could be solved with
rollback in bot mode.
Care to re-open that bug then and suggest how to implement
it? Because
that bug was closed as LATER specifically because we don't have a
checksum on the revision table.
Also rollbacks are different than undos and manual reverts. On rollbacks
we re-use the text field, however on a normal undo or edit I don't
believe we scan old revisions for matching content and re-use the text.
Hence on any type of revert but a rollback we don't know it's a revert
without a checksum. And iirc it was brion that didn't like the idea of
having to fetch and compare at least two text blobs when we need to do this.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
http://daniel.friesen.name]