hi,
I run www.ameisenwiki.de and i want to create dumps for wikitaxi. i need the pages-articles.xml.bz2 format.
currently i try this with php dumpBackup.php --full > /var/www/wiki/dump/pager-articles.xml and create the .bz2 file afterwards.
wikitaxi is not able to import - parser error if i use dumpgenerator.py
i also get an incompatible xml Which tool is used to create these exports?
You can have a look on the Dumps here: http://www.ameisenwiki.de/dump/
Hi,
On Thu, Jun 13, 2013 at 09:33:27AM +0200, witzman@ameisenwiki.de wrote:
[...] i need the pages-articles.xml.bz2 format. [...] and create the .bz2 file afterwards
It looks like, you effectively created a pages-articles.xml.tar.bz2 (xml /in a tar file/, which is bzip2 compressed), instead of a pages-articles.xml.bz2.
You can use the bzip2 program directly (see "man bzip2") to compress the .xml, instead of using it through "tar". That should give you a proper .bz2 file.
Which tool is used to create these exports?
For wikimedia, they are generated using https://gerrit.wikimedia.org/r/#/admin/projects/operations/dumps (use the "ariel" branch)
First stubs (Xml structure without the actual Wikitext) are generated. Afterwards, those stubs get filled with the Wikitext.
But dumpBackup.php should give you a working xml as well (see above). If it does not, that is a bug from my perspective.
Best regards, Christian
Στις 13-06-2013, ημέρα Πεμ, και ώρα 12:26 +0200, ο/η Christian Aistleitner έγραψε:
Hi,
On Thu, Jun 13, 2013 at 09:33:27AM +0200, witzman@ameisenwiki.de wrote:
[...] i need the pages-articles.xml.bz2 format. [...] and create the .bz2 file afterwards
It looks like, you effectively created a pages-articles.xml.tar.bz2 (xml /in a tar file/, which is bzip2 compressed), instead of a pages-articles.xml.bz2.
You can use the bzip2 program directly (see "man bzip2") to compress the .xml, instead of using it through "tar". That should give you a proper .bz2 file.
Which tool is used to create these exports?
For wikimedia, they are generated using https://gerrit.wikimedia.org/r/#/admin/projects/operations/dumps (use the "ariel" branch)
First stubs (Xml structure without the actual Wikitext) are generated. Afterwards, those stubs get filled with the Wikitext.
But dumpBackup.php should give you a working xml as well (see above). If it does not, that is a bug from my perspective.
Best regards, Christian
You don't need to pipe the output to anything; you can specify that you want bzip2 when giving the --output option to dumpBackup.php, as follows:
php ./dumpBackup.php --full --output=bzip2:/path/to/output/fulls.xml.bz2
This is the command for one-pass dumps, as Christian noted above.
Ariel
xmldatadumps-l@lists.wikimedia.org