New subject: [Xmldatadumps-l] …-image.sql.gz metadata dumps get truncated

24 Apr 2012


      Hello everyone,
I am just working on a wikipedia reader when I noticed this little issue.
The data in the image metadata dumps (e.g.: enwiki-20120403-image.sql.gz) get somewhat truncated.
This appears in the img_description column being defined as tinyblob. Tinyblobs apparently hold 255 bytes, max. 
I'd really love to use this dump instead of straining the servers..and taking forever.
Is this my fault or can you do something to address this issue?
Most interesting for me would be commons of course, then the german, french and spanish wikipedias.
Best from Berlin,
Bastian
Please see the column definition:
img_description` tinyblob NOT NULL
And the table structure:
CREATE TABLE `image` (
  `img_name` varbinary(255) NOT NULL DEFAULT '',
  `img_size` int(8) unsigned NOT NULL DEFAULT '0',
  `img_width` int(5) NOT NULL DEFAULT '0',
  `img_height` int(5) NOT NULL DEFAULT '0',
  `img_metadata` mediumblob NOT NULL,
  `img_bits` int(3) NOT NULL DEFAULT '0',
  `img_media_type` enum('UNKNOWN','BITMAP','DRAWING','AUDIO','VIDEO','MULTIMEDIA','OFFICE','TEXT','EXECUTABLE','ARCHIVE') DEFAULT NULL,
  `img_major_mime` enum('unknown','application','audio','image','text','video','message','model','multipart') NOT NULL DEFAULT 'unknown',
  `img_minor_mime` varbinary(32) NOT NULL DEFAULT 'unknown',
  `img_description` tinyblob NOT NULL,
  `img_user` int(5) unsigned NOT NULL DEFAULT '0',
  `img_user_text` varbinary(255) NOT NULL DEFAULT '',
  `img_timestamp` varbinary(14) NOT NULL DEFAULT '',
  `img_sha1` varbinary(32) NOT NULL DEFAULT '',
  PRIMARY KEY (`img_name`),
  KEY `img_size` (`img_size`),
  KEY `img_timestamp` (`img_timestamp`),
  KEY `img_usertext_timestamp` (`img_user_text`,`img_timestamp`),
  KEY `img_sha1` (`img_sha1`)
) ENGINE=InnoDB DEFAULT CHARSET=binary;