On 8/25/07, Brion Vibber <brion(a)wikimedia.org> wrote:
Ok, the image metadata for sha-1 hashes is now updated
when actually
needed for deletion (or in a batch process) instead of on metadata read,
which was what was bogging down the system.
Please setup a batch job to eventually populate the sha-1 metadata for
non-deleted images. We'd like to use it for duplicate image
detection.
We're already doing this against the deleted images... A bot downloads
the image, computes that sha-1, checks the filearchive table based on
the SHA1, and if there is a match it complains in IRC and the new
image is tagged.
This would be easier to perform if we could skip the download/compute
sha1 step.. and being able to test against non-deleted images would be
handy too.
Of course, it doesn't detect anything that isn't bit-identical, but
catching bit-identical duplicates is still useful.