Xmldatadumps-l

xmldatadumps-l@lists.wikimedia.org

720 discussions

…-image.sql.gz metadata dumps get truncated
by Bastian Koell 07 May '12

07 May '12

Hello everyone, I am just working on a wikipedia reader when I noticed this little issue. The data in the image metadata dumps (e.g.: enwiki-20120403-image.sql.gz) get somewhat truncated. This appears in the img_description column being defined as tinyblob. Tinyblobs apparently hold 255 bytes, max. I'd really love to use this dump instead of straining the servers..and taking forever. Is this my fault or can you do something to address this issue? Most interesting for me would be commons of course, then the german, french and spanish wikipedias. Best from Berlin, Bastian Please see the column definition: img_description` tinyblob NOT NULL And the table structure: CREATE TABLE `image` ( `img_name` varbinary(255) NOT NULL DEFAULT '', `img_size` int(8) unsigned NOT NULL DEFAULT '0', `img_width` int(5) NOT NULL DEFAULT '0', `img_height` int(5) NOT NULL DEFAULT '0', `img_metadata` mediumblob NOT NULL, `img_bits` int(3) NOT NULL DEFAULT '0', `img_media_type` enum('UNKNOWN','BITMAP','DRAWING','AUDIO','VIDEO','MULTIMEDIA','OFFICE','TEXT','EXECUTABLE','ARCHIVE') DEFAULT NULL, `img_major_mime` enum('unknown','application','audio','image','text','video','message','model','multipart') NOT NULL DEFAULT 'unknown', `img_minor_mime` varbinary(32) NOT NULL DEFAULT 'unknown', `img_description` tinyblob NOT NULL, `img_user` int(5) unsigned NOT NULL DEFAULT '0', `img_user_text` varbinary(255) NOT NULL DEFAULT '', `img_timestamp` varbinary(14) NOT NULL DEFAULT '', `img_sha1` varbinary(32) NOT NULL DEFAULT '', PRIMARY KEY (`img_name`), KEY `img_size` (`img_size`), KEY `img_timestamp` (`img_timestamp`), KEY `img_usertext_timestamp` (`img_user_text`,`img_timestamp`), KEY `img_sha1` (`img_sha1`) ) ENGINE=InnoDB DEFAULT CHARSET=binary;

4 4

Old dump for Wikipedia (August 8th, 2008)
by Nicolai Erbs 27 Apr '12

27 Apr '12

Dear all, I'm looking for an old Wikipedia dump (August 8th, 2008). Any ideas where I can get it? Thanks in advance! Nicolai

5 8

Question about enwiki pages-meta-history splits
by Napolitano, Diane 25 Apr '12

25 Apr '12

Hello, I was wondering how the decision is reached to split enwiki pages-meta-history into, say, N XML files. How is N determined? Is it based on something like "let's try to have X many pages per XML file" or "Y many revisions per XML file" or trying to keep the size (GB) of each XML file roughly equivalent? Or is N just an arbitrary number chosen because it sounds nice? :) Thanks, Diane

2 4

issues on wikis running 1.20wmf1
by Ariel T. Glenn 20 Apr '12

20 Apr '12

I'm seeing some failures to retrieve certain revision texts for wikis now running MediaWiki 1.20wmf1. The problem is being investigated. Ariel

1 1

Re: [Xmldatadumps-l] uploaded media for WMF projects available via rsync
by Alex Buie 07 Apr '12

07 Apr '12

Hey guys, Sorry for breaking the thread, but I just subscribed, so I think this'll probably break mailman's threading headers. This is very exciting news, and IA would love to have a copy! We're more interested in being a historical mirror (on our item infrastructure), rather than a live rsync/http/ftp mirror, but perhaps we can also work something out mirroring the latest dumps. (How big are the last 2 or so?) I suppose the next step is for me and Ariel to talk about technical procedures and details, et cetera, but I just wanted to subscribe to this ml and introduce myself. Ariel, when you have a minute to chat, shoot me an email (or skype). I'm thinking we just pull things at whatever frequency you guys push out the data to your.org (which may or may not be scheduled yet) and throw them into new items on the cluster. Others' thoughts are, of course, always welcome. Thanks! Alex Buie Collections Group Internet Archive, a registered California non-profit library abuie(a)archive.org

6 13

Re: [Xmldatadumps-l] Fwd: I DID IT!!!
by andreasmeier80＠gmx.de 03 Apr '12

03 Apr '12

Online success awaits! Discover how you can earn mega profits http://mgolebatmaz.av.tr/currentevents/54GaryWallace/ ___________________________________________________________ Neu: Geschenkt! 50% mehr Speicher für Ihr Freemail-Postfach! Jetzt informieren: https://service.gmx.net/de/cgi/g.fcgi/products/mailcheck

1 0

uploaded media for WMF projects available via rsync
by Ariel T. Glenn 03 Apr '12

03 Apr '12

This is phase one of a plan to make uploaded media from WMF projects accessible for download in bulk. It, like many other things lately, is experimental and subject to breakage, change, etc. First, a big thanks to Kevin Day from Your.org who offered us the space and worked with us many hours to sort out networking issues, try different NAS setups, and generally do what was needed to get this going. Rsync url: ftpmirror.your.org::wikimedia-images/projectname/languagecode For example: rsync -a ftpmirror.yours.org::wikimedia-images/wikipedia/commons /my/dir would get you all of commons including archived versions (no deleted images of course). Folks who are trying to download media for a specific project should bear in mind that they will need the files not only from that project but also those which are hosted on commons and used on the local project. I'm looking into producing lists of those files for easy use by rsyncers. I would suggest rather than everyone downloading a zillion copies of commons at once, that folks coordinate a little bit, or just get the pieces they need :-D The data that is there now is probably about 15-20 days old. It will likely be a little while before I get the media rsync going on a regular basis, I'm juggling a lot of pieces right now. Ariel P.S. This is not an April fools joke, it's April 2 here already :-P

3 4

claning up deployment procedures
by Ariel T. Glenn 19 Mar '12

19 Mar '12

I'm doing a little bit of work on deployment procedures for the dump scripts as I push out a few small bug fixes and turn on logging. Over the next day or so you'll notice interruptions or delays while the conversion is happening. Ariel

1 1

Geo Tags?
by toni hernández 15 Mar '12

15 Mar '12

Hi all, I have beem loooking at the wikipedia database scheme and I haven't found any field that suggest that some contents are geographical located. Am I wrong? If it is possible I would like to download the geographical located contents of Wikipedia to do something similar to what googleearth does with the wikipedia layer Is that possible? Thanks in advanced.

4 6

a panel for wikimania that some folks should do
by Ariel T. Glenn 12 Mar '12

12 Mar '12

I can't go but some people on this list should think about a panel that disusses forkability, archival of content and other related things. In case this sounds attractive to someone that is planning to go, deadline for submission is in a week! http://wikimania2012.wikimedia.org/wiki/Submissions I'm willing to have my brain picked by anyone who decides this is worth doing, in case that's helpful. Ariel

1 0

← Newer
1
...
54
55
56
57
58
59
60
...
72
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Xmldatadumps-l