hi all!
i have two image files i want to upload to my wiki that are detected as zip archives though they in fact are images (which `file` says, too). now i have no idea how to teach mediawiki to identify the files correctly. i can change the metadata in the database, but that only leads to another surprise: those changes are reverted the next time the image pages are accessed. huh? where does this information come from then? why are the metadata even stored in the database when it's not authoritative?
can someone please shed some light on this matter. i'm absolutely new to running a mediawiki instance of my own. you can find the two files in question here:
http://linux2.fbi.fh-koeln.de/rdk-smw/Datei:01-0557-3.jpg http://linux2.fbi.fh-koeln.de/rdk-smw/Datei:05-0995-2.jpg
cheers jens
Those images match the signature of a central directory.
unzip -t 01-0557-3.jpg
warning [01-0557-3.jpg]: zipfile claims to be last disk of a multi-part archive; attempting to process anyway, assuming all parts have been concatenated together in order. Expect "errors" and warnings...true multi-part support doesn't exist yet (coming soon). error [01-0557-3.jpg]: missing 1852006951 bytes in zipfile (attempting to process anyway) error [01-0557-3.jpg]: attempt to seek before beginning of zipfile (please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly)
unzip -t 05-0995-2.jpg
warning [05-0995-2.jpg]: zipfile claims to be last disk of a multi-part archive; attempting to process anyway, assuming all parts have been concatenated together in order. Expect "errors" and warnings...true multi-part support doesn't exist yet (coming soon). error [05-0995-2.jpg]: missing 1689894850 bytes in zipfile (attempting to process anyway) error [05-0995-2.jpg]: attempt to seek before beginning of zipfile (please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly)
Whereas with a normal file: unzip -t 01-0557-2.jpg
Archive: 01-0557-2.jpg End-of-central-directory signature not found. Either this file is not a zipfile, (...)
01-0557-3.jpg contains "50 4B 05 06" at offset 0x4E24B and 05-0995-2.jpg at 0x309DE. Since they're just 4 bytes at an arbitrary offset amongst "random" data, it's apparently a statistical false positive. In fact, the more throughly ZipDirectoryReader detects it as 'zip-bad', 'trailing bytes after the end of the file comment'. Maybe we should strength the checks at MimeMagic.php
What were you changing in the db? I don't get that "magic change back" that you report. Perhaps it was metadata update was triggering it.
Setting these values you should be safe:
img_name img_size img_width img_height img_metadata img_bits img_media_type img_major_mime img_minor_mime img_sha1 01-0557-3.jpg 377382 599 800 a:1:{s:22:"MEDIAWIKI_EXIF_VERSION";i:2;} 8 BITMAP image jpeg 1fyq01rqsw47yhvl4sqg66gr4d60j0f 05-0995-2.jpg 220187 800 565 a:1:{s:22:"MEDIAWIKI_EXIF_VERSION";i:2;} 8 BITMAP image jpeg ghuv3z91zq81vub5v8zj9tmjziojzrh
Platonides [2011-12-10 23:56]:
01-0557-3.jpg contains "50 4B 05 06" at offset 0x4E24B and 05-0995-2.jpg at 0x309DE. Since they're just 4 bytes at an arbitrary offset amongst "random" data, it's apparently a statistical false positive.
ok, thanks for your input!
What were you changing in the db? I don't get that "magic change back" that you report. Perhaps it was metadata update was triggering it.
mysql> SELECT img_name FROM image WHERE img_media_type <> 'BITMAP'; +---------------+ | img_name | +---------------+ | 01-0557-3.jpg | | 05-0995-2.jpg | +---------------+ 2 rows in set (0.00 sec)
mysql> UPDATE image SET img_media_type = 'BITMAP', img_major_mime = 'image', img_minor_mime = 'jpeg' WHERE img_media_type <> 'BITMAP'; Query OK, 2 rows affected (0.04 sec) Rows matched: 2 Changed: 2 Warnings: 0
mysql> SELECT img_name FROM image WHERE img_media_type <> 'BITMAP'; Empty set (0.01 sec)
[...access File:01-0557-3.jpg in browser...]
mysql> SELECT img_name FROM image WHERE img_media_type <> 'BITMAP'; +---------------+ | img_name | +---------------+ | 01-0557-3.jpg | +---------------+ 1 row in set (0.00 sec)
[...access File:05-0995-2.jpg in browser...]
mysql> SELECT img_name FROM image WHERE img_media_type <> 'BITMAP'; +---------------+ | img_name | +---------------+ | 01-0557-3.jpg | | 05-0995-2.jpg | +---------------+ 2 rows in set (0.00 sec)
so that seems a bit weird to me.
Setting these values you should be safe:
well, that didn't work either :(
but what did work, though, is a plain `convert` (without actually changing anything) and uploading that instead. this way it must have gotten rid of any "garbage" there might have been. so, thanks again. for me the problem is solved :) as far as the database issue is concerned i don't know... to be honest, as long as it doesn't affect me, i don't really care. i just don't understand what it's doing there.
cheers jens
On 11/12/11 00:27, Jens Wille wrote:
but what did work, though, is a plain `convert` (without actually changing anything) and uploading that instead. this way it must have gotten rid of any "garbage" there might have been.
It will have compressed it with a different dictionary, not matching that specific sequence.
so, thanks again. for me the problem is solved :) as far as the database issue is concerned i don't know... to be honest, as long as it doesn't affect me, i don't really care. i just don't understand what it's doing there.
cheers jens
Do you know what was the old value of img_metadata?
Platonides [2011-12-11 00:39]:
It will have compressed it with a different dictionary, not matching that specific sequence.
yeah, makes sense.
Do you know what was the old value of img_metadata?
it was empty in both cases.
mediawiki-l@lists.wikimedia.org