Does anyone see a reason we can't add filearchive table to the public databases?
Filearchive lists deleted images.
If we suppress the following columns: fa_size fa_width fa_height fa_metadata
fa_bits fa_media_type fa_major_mime fa_minor_mime fa_description
the only piece of data it will disclose which isn't already public is the cryptographic hash of the file content (fa_storage_key). This data is already available through the UI to admins.
I'd like to put up a tool that lets users find deleted images by their content hash. This would also require carrying an index on fa_storage_key. The tool would be used by the automated upload scanning IRC bots to flag bit identical duplicates of previously deleted uploads.
Hello,
Am Samstag 04 August 2007 23:03 schrieb Gregory Maxwell:
Does anyone see a reason we can't add filearchive table to the public databases?
I saw no one and created the views. The only field not visible by users is fa_description.
Perhaps a few fields in recentchanges were created too, if some databases didn't have they yet.
Sincerly, DaB.
On 8/4/07, DaB. WP@daniel.baur4.info wrote:
I saw no one and created the views. The only field not visible by users is fa_description.
Fantastic. I already have a tool up using it, it's not user friendly.. it's really just a tool meant for other tools to use.
http://tools.wikimedia.de/~gmaxwell/cgi-bin/deletedimage.py?hash=f85b9e4a404...
hash= sha1 hash of the content of an image file. Wiki is a comma split list of databases to check.
If a file with that hash has been previously deleted the tool will list it.
There is a IRC bot which uses this now. It is in both #wikipedia-en-image-uploads and #wikimedia-commons-uploads2 which watches new uploads and will alert if a file which is bit-identical to a previously uploaded image is uploaded.
Its output looks like:
Aug 04 22:46:32 <wimb> [[User:Bladez636]] {1255} uploaded [[Image:Butters Fall.gif]] <50 × 50> <7 KB> (Tag {{PD-self}}) : This i s an Image of Butters falling from ''[[South Park]]'' this was edited using several scenes from ''[[South Park]]''. This is a Free Image for me creating them. Copyright 1997 [[Trey Parker]] and [[Matt Stone]]. Copyright 2007 Ian McCormick (created Image owns n othing of ''[[South Park]]'') Aug 04 22:46:32 <wimb> [[Image:Butters Fall.gif]] has an identical image at [[Butters_Fall.gif]] found at enwiki. Uploaded on 200 70615002133
Gregory Maxwell wrote:
Fantastic. I already have a tool up using it, it's not user friendly.. it's really just a tool meant for other tools to use.
http://tools.wikimedia.de/~gmaxwell/cgi-bin/deletedimage.py?hash=f85b9e4a404...
hash= sha1 hash of the content of an image file. Wiki is a comma split list of databases to check.
It gives a Content-Type: text/text; charset=UTF-8, which is not a valid MIME (Did you mean text/plain?)
Also, i'm being unable of getting an output from it :( http://tools.wikimedia.de/~gmaxwell/cgi-bin/deletedimage.py?hash=0t8kj4820en...
On 8/5/07, Platonides platonides@gmail.com wrote:
It gives a Content-Type: text/text; charset=UTF-8, which is not a valid MIME (Did you mean text/plain?)
heh. Thanks. I'll fix that once eagle gets online and I can confirm that the change won't break his tool.
Also, i'm being unable of getting an output from it :( http://tools.wikimedia.de/~gmaxwell/cgi-bin/deletedimage.py?hash=0t8kj4820en...
It takes base-16 input. So you want: http://tools.wikimedia.de/~gmaxwell/cgi-bin/deletedimage.py?hash=fa4d4ef5024... which is what you were trying to do, and it works fine.
Would base 36 input actually be useful to you? I could add a base 36 interface easily enough.
I made the interface base-16 because most SHA-1 APIs output base-16 and I was told that doing conversion is hard. ;)
I'm planning on adding the deletion reason and the image view URL (for sysops) next.
Gregory Maxwell wrote:
Would base 36 input actually be useful to you? I could add a base 36 interface easily enough.
I made the interface base-16 because most SHA-1 APIs output base-16 and I was told that doing conversion is hard. ;)
Well, i was too lazy to sha an image, so i grabbed the hash from commons :-) As mediawiki uses base-36, then it's probably a good idea. It's not so hard to do the conversion... wfBaseConvert at Globalfunctions.php :D
On 8/5/07, Platonides platonides@gmail.com wrote:
Well, i was too lazy to sha an image, so i grabbed the hash from commons :-) As mediawiki uses base-36, then it's probably a good idea. It's not so hard to do the conversion... wfBaseConvert at Globalfunctions.php :D
I coded a python equivlent. Too. (since I needed it to convert the base 16 input to the base 36 database format).
But do you actually have a need? I can't figure out anything to do with it on the site.
I wish the site stored the MW hash for all images.. then I could come up with some really cool things to do with it. ;)
Gregory Maxwell wrote:
I coded a python equivlent. Too. (since I needed it to convert the base 16 input to the base 36 database format).
But do you actually have a need? I can't figure out anything to do with it on the site.
Being restricted to the filearchive, not too much. We can only use it to count how many times we've been goatsed :P
I wish the site stored the MW hash for all images.. then I could come up with some really cool things to do with it. ;)
Agreed. I requested it a year ago http://bugzilla.wikimedia.org/show_bug.cgi?id=5763
As we're supposed to use filestore on all images, so they could be renamed, it would be fixed at the same time... (I
toolserver-l@lists.wikimedia.org