Gregory Maxwell wrote:
It's also the case that we're doing more than just duplicate matching. For example, here are things which are already done or are being worked on:
*File type double-checking (currently offsetting bug 10823) *Malicious content scanning (with clamav) *Automated google image lookup (google image search the file name, grab the top couple results and compare hashes) *Image fingerprinting (to detect non-bit-identical duplicates) *Suspect EXIF data (corbis, getty, AP copyright data in exif tags). etc.
I'm not sure that putting all of that into MediaWiki makes sense.
It does.
A lot of it works best asynchronously.
Yep!
A lot of it works best as part of a workflow where software and people work as peers, and we don't really have good ways for the mediawiki software to participate in workflows today.
A wiki is inherently an asynchronous people-oriented bit of software. ;) Certainly we'd like even more support for this.
-- brion vibber (brion @ wikimedia.org)