don't know if this issue came up already - in case it did and has been
dismissed, I beg your pardon. In case it didn't...
I hereby propose, that pbzip2 (https://launchpad.net/pbzip2) is used
to compress the xml dumps instead of bzip2. Why? Because its sibling
(pbunzip2) has a bug bunzip2 hasn't. :-)
Strange? Read on.
A few hours ago, I filed a bug report for pbzip2 (see
https://bugs.launchpad.net/pbzip2/+bug/922804) together with some test
results done even some few hours before that.
The results indicate that:
bzip2 and pbzip2 are vice-versa compatible each one can create
archives, the other one can read. But if it is for uncomressing, only
pbzip2 compressed archives are good for pbunzip2.
I propose compressing the archives with pbzip2 for the following
1) If your archiving machines are SMP systems this could lead to a
better usage of system ressources (i.e. faster compression).
2) Compression with pbzip2 is harmless for regular users of bunzip2,
so everything should run for these people as usual.
3) pbzip2-compressed archives can be uncompressed with pbunzip2 with a
speedup that scales nearly linearly with the number of CPUs in the
So to sum up: It's a no loose and two win situation if you migrate to
pbzip2. And that just because pbunzip2 is slightly buggy. Isn't that
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH - www.petamem.com Geschäftsführer: Richard Jelinek
Human Language Technology Experts Sitz der Gesellschaft: Fürth
69216618 Mind Units Registergericht: AG Fürth, HRB-9201
I run www.ameisenwiki.de and i want to create dumps for wikitaxi. i
need the pages-articles.xml.bz2 format.
currently i try this with php dumpBackup.php --full >
/var/www/wiki/dump/pager-articles.xml and create the .bz2 file
wikitaxi is not able to import - parser error if i use dumpgenerator.py
i also get an incompatible xml Which tool is used to create these
You can have a look on the Dumps here:
Dear list members,
I am pleased to announce the release of WP-MIRROR 0.6.
The main design objective was this: PERFORMANCE. WP-MIRROR 0.6 now
builds the `enwiki' (which is the most demanding case) with 80% less
time and 75% less memory than v0.5.
Feature: One new feature was added. WP-MIRROR 0.6 can now mirror
wikis from most other WMF projects (e.g. wikibooks, wiktionary, etc).
Reliability: Downloads are now performed with the aid of `wget' which
has an automatic restart feature. This virtually eliminates the
problem of partial downloads.
Images: WP-MIRROR 0.6 makes use of image dump tarballs found at
<http://ftpmirror.your.org/>. WP-MIRROR 0.6 then does a thorough job
of identifying image files missing from the tarballs, and downloads
them efficiently using HTTP/1.1 persistent connections.
Packaging: The DEB package for WP-MIRROR 0.6 should work
`out-of-the-box' with no user configuration for the following
o Debian GNU/Linux 7.0 (wheezy)
o Ubuntu 12.10 (quantal)
Virtual Hosts: Browsing of mirrored wikis is done via virtual hosts
with names like <http://simple.wikipedia.site/> and
<http://simple.wiktionary.site/>. Simply take the URL that WMF
offers, and replace `.org' with `.site'.
Project Home Page: <http://www.nongnu.org/wp-mirror/> has been updated.
Feedback is welcome.
Dr. Kent L. Miller
Some time ago, I generated a DEB package named
`mwxml2sql_0.0.2-1_amd64.deb'. It works very well, and was the source
of the patches that I submitted upstream. Now that I have learned
that you welcome such patches, it occurs to me that you might want the
DEB package as well. If so, then there are a number of things we
Your other DEB packages have names like:
For naming consistency, would you like the `mwxml2sql' package to be
renamed something like
Debian policy requires that new packages first be announced with an
Intent-To-Package (ITP) bug report. Then a `Debian Developer' may or
may not step forward to sponsor the package for inclusion in a Debian
Do you have someone in-house, who is serving as a `Debian Maintainer'?
If so, could you introduce us?
All my systems are AMD64. Whereas `mwxml2sql' contains C language
programs, and whereas Debian is a binary distribution; a set of
`mwxml2sql' DEB packages should be prepared, one for each
architecture. Do you have a way of generating DEB packages for other
I am helping the charity Volunteer Uganda set up an offline eLearning
computer system with 15 Raspberry Pi's and cheap desktop computer for a
server. Server stats:
- 2TB disk
- 8GB DDR3 ram
- 3ghz i5 quad core.
I am trying to import enwiki-20130403-pages-articles-multistream.xml.bz2
using mwdumper-1.16.jar, but I have a few questions.
1. I was originally using a GUI version of mwdumper-1.16.jar, but that
errored out a few time with duplicate pages so I decided to use the
pre-built one recommended on the media wiki page. Having looked at the
stats on Wikipedia I can see that there are roughly 30 million pages,
however I have found this morning that mwdumper-1.16.jar has finished (no
errors) with roughtly 13.3 million pages. Without any errors I assumed that
it had finished, but I appear to be 17 million pages short?
2. The pages that have imported are missing templates. Is there another
XML file that I can import which will add the missing templates? As the
screen shot below shows, it is almost unreadable without them.
Many thanks in advance for your help.
[image: Inline images 2]
Richard Ive • Metafour UK Ltd • 2 Berghem Mews, London W14 0HN •
registered in England: 01528556
tel: +4420 7912 2000 • direct: +4420 7912 2006 • mobile: +447854
569 205 • website: www.metafour.com
This email is private & confidential; if you received it in error, please
notify us and delete it from your system
Dear list members,
I would like some advise on how to submit a `mediawiki` related DEB
package. Jeremy Baron recommended that I contact this mailing list.
0) New utilities
Ariel T. Glenn at WMF wrote a set of utilities, `mwxml2sql', that help
convert XML dump files into a format that can be readily loaded into
the database for a local instance of MediaWiki. These utilities are
written using C language, and offer some performance advantage over
existing utilities such as `importDump.php'.
The upstream source code may be found at
1) Reason for packaging
I wrote `wp-mirror' which is a free utility for mirroring any desired
set of WMF wikis. This I distribute as a DEB package. My next
release, wp-mirror-0.6, is focused on performance improvement; and,
among other things, will make use of Ariel's utilities.
To facilitate the handling of dependencies, I decided to package
2) DEB package
I prepared a DEB package which is now named
`mediawiki-mwxml2sql_0.0.2-1_amd64.deb'. It builds correctly with
`debuild' and with `pbuilder'. `Lintian' only complains that it does
not close any ITP bug.
I patched Ariel's source code and Makefile, so that man pages could be
generated using `help2man'. I submitted the patch upstream, and Ariel
graciously applied it. One more patch is under review (a few typos).
I submitted an Intent-To-Package (ITP) bug to Debian, but have not yet
received the bug number.
Do you know anyone who would like to sponsor the package?
On 5/28/13, Jeremy Baron <jeremy(a)tuxmachine.com> wrote:
> On May 28, 2013 12:34 AM, "Ariel T. Glenn" <ariel(a)wikimedia.org> wrote:
>> Στις 27-05-2013, ημέρα Δευ, και ώρα 21:00 -0400, ο/η wp mirror έγραψε:
>> looking at http://packages.debian.org/sid/mediawiki-extensions-base it
>> seems we want to get in contact with Romain Beauxis or Thorsten Glaser
>> and see how to proceed.
> pkg-mediawiki-devel(a)lists.alioth.debian.org is the place to mail.
>> Hmm I really have no idea what will happen to some of these on a 32-bit
>> system, I should check that out in a vm sometime...
> sounds like you just need tests in the Debian package and then Debian can
> run those for you on all archs/ports.