Gregory Maxwell wrote:
It's also the case that we're doing more than
just duplicate matching.
For example, here are things which are already done or are being
*File type double-checking (currently offsetting bug 10823)
*Malicious content scanning (with clamav)
*Automated google image lookup (google image search the file name,
grab the top couple results and compare hashes)
*Image fingerprinting (to detect non-bit-identical duplicates)
*Suspect EXIF data (corbis, getty, AP copyright data in exif tags).
I'm not sure that putting all of that into MediaWiki makes sense.
lot of it works best asynchronously.
A lot of it works best as part of
a workflow where software and people work as peers, and we don't
really have good ways for the mediawiki software to participate in
A wiki is inherently an asynchronous people-oriented bit of software. ;)
Certainly we'd like even more support for this.
-- brion vibber (brion @ wikimedia.org