Hi.
I am a MSc. student in admistration in Montreal, Canada, and am doing my master's thesis on Wikipedia.
I am having troubles importing the table "old" of wikipedia.
I have first downloaded all the dump files (exported on April 21st) and then concatenated them (cat ...).The concatenation of the dump files (english version of Wikipedia) has ended up with a file of around 31 gigabytes Apparently, the compression format has changed for bzip2 does not recognize the resulting file as a bz2 one but gunzip is able to uncompress the file (by naming the compressed file old_table.sql.gz).
Has the compression format been officially changed ?
Moreover, the uncompressed file has a final size of only 34,201,462 KB which is not much bigger than the compressed file. Is that normal?
Nonetheless, the resulting sql file seems to be readable for it is possible to import the 'old table' from it. But I don't know whether the file is complete or not, and whether the old table that I got, will not miss any record.
I successfully downloaded & installed several times already the english version of the wikipedia database; but this time, there is something I don't get. Could you please indicate me what is wrong in what I am doing, or whether there has been any change in the procedures to follow?
Thank you.
Kevin Carillo
On 4/26/05, Kevin Carillo kd_caril@jmsb.concordia.ca wrote:
I have first downloaded all the dump files (exported on April 21st) and then concatenated them (cat ...).The concatenation of the dump files (english version of Wikipedia) has ended up with a file of around 31 gigabytes Apparently, the compression format has changed for bzip2 does not recognize the resulting file as a bz2 one but gunzip is able to uncompress the file (by naming the compressed file old_table.sql.gz).
Has the compression format been officially changed ?
Yes. It was bz2 until very recently, when it was changed to .gz
So clearly you've been unduly confused by an old instruction page - was it http://en.wikipedia.org/wiki/Wikipedia:Database_dump_import_problems or something else (either way, we'll need to change it)
Kevin Carillo wrote:
Hi.
I am a MSc. student in admistration in Montreal, Canada, and am doing my master's thesis on Wikipedia.
I am having troubles importing the table "old" of wikipedia.
I have first downloaded all the dump files (exported on April 21st) and then concatenated them (cat ...).The concatenation of the dump files (english version of Wikipedia) has ended up with a file of around 31 gigabytes Apparently, the compression format has changed for bzip2 does not recognize the resulting file as a bz2 one but gunzip is able to uncompress the file (by naming the compressed file old_table.sql.gz).
Has the compression format been officially changed ?
If you hover your mouse over the "old" link, you'll see that it is indeed supposed to be a gz, not a bzip2.
If you didn't get *any* error message while uncompressing *and* while importing into your database, then the result is very unlikely to have any problems such as ommissions or data corruption.
Timwi
wikitech-l@lists.wikimedia.org