Hi,
img_description is tinyblob in the dump, simply because it's tinyblob in the database too [1], so there is no truncation going on.
I'm not sure what did you expect to find there, but that column contains only the description of the image (similar to edit summary for a revision), not the content of the associated File: page. If you want to get that, you will have to get the pages-articles dump.
Petr Onderka [[en:User:Svick]]
[1]: http://www.mediawiki.org/wiki/Manual:Image_table
On Tue, Apr 24, 2012 at 10:41, Bastian Koell bastian.koell@gmail.com wrote:
Hello everyone, I am just working on a wikipedia reader when I noticed this little issue. The data in the image metadata dumps (e.g.: enwiki-20120403-image.sql.gz) get somewhat truncated.
This appears in the img_description column being defined as tinyblob. Tinyblobs apparently hold 255 bytes, max. I'd really love to use this dump instead of straining the servers..and taking forever.
Is this my fault or can you do something to address this issue? Most interesting for me would be commons of course, then the german, french and spanish wikipedias.
Best from Berlin, Bastian
Please see the column definition: img_description` tinyblob NOT NULL
And the table structure:
CREATE TABLE `image` ( `img_name` varbinary(255) NOT NULL DEFAULT '', `img_size` int(8) unsigned NOT NULL DEFAULT '0', `img_width` int(5) NOT NULL DEFAULT '0', `img_height` int(5) NOT NULL DEFAULT '0', `img_metadata` mediumblob NOT NULL, `img_bits` int(3) NOT NULL DEFAULT '0', `img_media_type` enum('UNKNOWN','BITMAP','DRAWING','AUDIO','VIDEO','MULTIMEDIA','OFFICE','TEXT','EXECUTABLE','ARCHIVE') DEFAULT NULL, `img_major_mime` enum('unknown','application','audio','image','text','video','message','model','multipart') NOT NULL DEFAULT 'unknown', `img_minor_mime` varbinary(32) NOT NULL DEFAULT 'unknown', `img_description` tinyblob NOT NULL, `img_user` int(5) unsigned NOT NULL DEFAULT '0', `img_user_text` varbinary(255) NOT NULL DEFAULT '', `img_timestamp` varbinary(14) NOT NULL DEFAULT '', `img_sha1` varbinary(32) NOT NULL DEFAULT '', PRIMARY KEY (`img_name`), KEY `img_size` (`img_size`), KEY `img_timestamp` (`img_timestamp`), KEY `img_usertext_timestamp` (`img_user_text`,`img_timestamp`), KEY `img_sha1` (`img_sha1`) ) ENGINE=InnoDB DEFAULT CHARSET=binary;
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l