Hello everyone, I am just working on a wikipedia reader when I noticed this little issue. The data in the image metadata dumps (e.g.: enwiki-20120403-image.sql.gz) get somewhat truncated.
This appears in the img_description column being defined as tinyblob. Tinyblobs apparently hold 255 bytes, max. I'd really love to use this dump instead of straining the servers..and taking forever.
Is this my fault or can you do something to address this issue? Most interesting for me would be commons of course, then the german, french and spanish wikipedias.
Best from Berlin, Bastian
Please see the column definition: img_description` tinyblob NOT NULL
And the table structure:
CREATE TABLE `image` ( `img_name` varbinary(255) NOT NULL DEFAULT '', `img_size` int(8) unsigned NOT NULL DEFAULT '0', `img_width` int(5) NOT NULL DEFAULT '0', `img_height` int(5) NOT NULL DEFAULT '0', `img_metadata` mediumblob NOT NULL, `img_bits` int(3) NOT NULL DEFAULT '0', `img_media_type` enum('UNKNOWN','BITMAP','DRAWING','AUDIO','VIDEO','MULTIMEDIA','OFFICE','TEXT','EXECUTABLE','ARCHIVE') DEFAULT NULL, `img_major_mime` enum('unknown','application','audio','image','text','video','message','model','multipart') NOT NULL DEFAULT 'unknown', `img_minor_mime` varbinary(32) NOT NULL DEFAULT 'unknown', `img_description` tinyblob NOT NULL, `img_user` int(5) unsigned NOT NULL DEFAULT '0', `img_user_text` varbinary(255) NOT NULL DEFAULT '', `img_timestamp` varbinary(14) NOT NULL DEFAULT '', `img_sha1` varbinary(32) NOT NULL DEFAULT '', PRIMARY KEY (`img_name`), KEY `img_size` (`img_size`), KEY `img_timestamp` (`img_timestamp`), KEY `img_usertext_timestamp` (`img_user_text`,`img_timestamp`), KEY `img_sha1` (`img_sha1`) ) ENGINE=InnoDB DEFAULT CHARSET=binary;
Hi,
img_description is tinyblob in the dump, simply because it's tinyblob in the database too [1], so there is no truncation going on.
I'm not sure what did you expect to find there, but that column contains only the description of the image (similar to edit summary for a revision), not the content of the associated File: page. If you want to get that, you will have to get the pages-articles dump.
Petr Onderka [[en:User:Svick]]
[1]: http://www.mediawiki.org/wiki/Manual:Image_table
On Tue, Apr 24, 2012 at 10:41, Bastian Koell bastian.koell@gmail.com wrote:
Hello everyone, I am just working on a wikipedia reader when I noticed this little issue. The data in the image metadata dumps (e.g.: enwiki-20120403-image.sql.gz) get somewhat truncated.
This appears in the img_description column being defined as tinyblob. Tinyblobs apparently hold 255 bytes, max. I'd really love to use this dump instead of straining the servers..and taking forever.
Is this my fault or can you do something to address this issue? Most interesting for me would be commons of course, then the german, french and spanish wikipedias.
Best from Berlin, Bastian
Please see the column definition: img_description` tinyblob NOT NULL
And the table structure:
CREATE TABLE `image` ( `img_name` varbinary(255) NOT NULL DEFAULT '', `img_size` int(8) unsigned NOT NULL DEFAULT '0', `img_width` int(5) NOT NULL DEFAULT '0', `img_height` int(5) NOT NULL DEFAULT '0', `img_metadata` mediumblob NOT NULL, `img_bits` int(3) NOT NULL DEFAULT '0', `img_media_type` enum('UNKNOWN','BITMAP','DRAWING','AUDIO','VIDEO','MULTIMEDIA','OFFICE','TEXT','EXECUTABLE','ARCHIVE') DEFAULT NULL, `img_major_mime` enum('unknown','application','audio','image','text','video','message','model','multipart') NOT NULL DEFAULT 'unknown', `img_minor_mime` varbinary(32) NOT NULL DEFAULT 'unknown', `img_description` tinyblob NOT NULL, `img_user` int(5) unsigned NOT NULL DEFAULT '0', `img_user_text` varbinary(255) NOT NULL DEFAULT '', `img_timestamp` varbinary(14) NOT NULL DEFAULT '', `img_sha1` varbinary(32) NOT NULL DEFAULT '', PRIMARY KEY (`img_name`), KEY `img_size` (`img_size`), KEY `img_timestamp` (`img_timestamp`), KEY `img_usertext_timestamp` (`img_user_text`,`img_timestamp`), KEY `img_sha1` (`img_sha1`) ) ENGINE=InnoDB DEFAULT CHARSET=binary;
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Is there a chance of getting an all page names dump, like the ns0 we already have?
Hmm, I think this issue was raised before in this mailing list, but am not too sure what the resolution was.
On Sun, May 6, 2012 at 9:06 PM, Richard Farmbrough <richard@farmbrough.co.uk
wrote:
Is there a chance of getting an all page names dump, like the ns0 we already have?
______________________________**_________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.**wikimedia.org Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**lhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
And pointing to you the relevant bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=19542
On Sun, May 6, 2012 at 9:52 PM, Hydriz Wikipedia admin@alphacorp.tk wrote:
Hmm, I think this issue was raised before in this mailing list, but am not too sure what the resolution was.
On Sun, May 6, 2012 at 9:06 PM, Richard Farmbrough < richard@farmbrough.co.uk> wrote:
Is there a chance of getting an all page names dump, like the ns0 we already have?
______________________________**_________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.**wikimedia.org Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**lhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
-- Regards, Hydriz
We've created the greatest collection of shared knowledge in history. Help protect Wikipedia. Donate now: http://donate.wikimedia.org
xmldatadumps-l@lists.wikimedia.org