Hi,
don't know if this issue came up already - in case it did and has been
dismissed, I beg your pardon. In case it didn't...
I hereby propose, that pbzip2 (https://launchpad.net/pbzip2) is used
to compress the xml dumps instead of bzip2. Why? Because its sibling
(pbunzip2) has a bug bunzip2 hasn't. :-)
Strange? Read on.
A few hours ago, I filed a bug report for pbzip2 (see
https://bugs.launchpad.net/pbzip2/+bug/922804) together with some test
results done even some few hours before that.
The results indicate that:
bzip2 and pbzip2 are vice-versa compatible each one can create
archives, the other one can read. But if it is for uncomressing, only
pbzip2 compressed archives are good for pbunzip2.
I propose compressing the archives with pbzip2 for the following
reasons:
1) If your archiving machines are SMP systems this could lead to a
better usage of system ressources (i.e. faster compression).
2) Compression with pbzip2 is harmless for regular users of bunzip2,
so everything should run for these people as usual.
3) pbzip2-compressed archives can be uncompressed with pbunzip2 with a
speedup that scales nearly linearly with the number of CPUs in the
host.
So to sum up: It's a no loose and two win situation if you migrate to
pbzip2. And that just because pbunzip2 is slightly buggy. Isn't that
interesting? :-)
cheers,
--
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH - www.petamem.com Geschäftsführer: Richard Jelinek
Human Language Technology Experts Sitz der Gesellschaft: Fürth
69216618 Mind Units Registergericht: AG Fürth, HRB-9201
hi,
I run www.ameisenwiki.de and i want to create dumps for wikitaxi. i
need the pages-articles.xml.bz2 format.
currently i try this with php dumpBackup.php --full >
/var/www/wiki/dump/pager-articles.xml and create the .bz2 file
afterwards.
wikitaxi is not able to import - parser error if i use dumpgenerator.py
i also get an incompatible xml Which tool is used to create these
exports?
You can have a look on the Dumps here:
http://www.ameisenwiki.de/dump/
As usual I am running a couple steps manually, which is why it looks
like nothing is happening and that the dump is failed. Don't worry, in
a day or two the last bits will magically show up.
Ariel
Dear list members,
I am pleased to announce the release of WP-MIRROR 0.6.
The main design objective was this: PERFORMANCE. WP-MIRROR 0.6 now
builds the `enwiki' (which is the most demanding case) with 80% less
time and 75% less memory than v0.5.
Feature: One new feature was added. WP-MIRROR 0.6 can now mirror
wikis from most other WMF projects (e.g. wikibooks, wiktionary, etc).
Reliability: Downloads are now performed with the aid of `wget' which
has an automatic restart feature. This virtually eliminates the
problem of partial downloads.
Images: WP-MIRROR 0.6 makes use of image dump tarballs found at
<http://ftpmirror.your.org/>. WP-MIRROR 0.6 then does a thorough job
of identifying image files missing from the tarballs, and downloads
them efficiently using HTTP/1.1 persistent connections.
Packaging: The DEB package for WP-MIRROR 0.6 should work
`out-of-the-box' with no user configuration for the following
distributions:
o Debian GNU/Linux 7.0 (wheezy)
o Ubuntu 12.10 (quantal)
Virtual Hosts: Browsing of mirrored wikis is done via virtual hosts
with names like <http://simple.wikipedia.site/> and
<http://simple.wiktionary.site/>. Simply take the URL that WMF
offers, and replace `.org' with `.site'.
Project Home Page: <http://www.nongnu.org/wp-mirror/> has been updated.
Feedback is welcome.
Sincerely Yours,
Dr. Kent L. Miller
Dear Ariel,
Some time ago, I generated a DEB package named
`mwxml2sql_0.0.2-1_amd64.deb'. It works very well, and was the source
of the patches that I submitted upstream. Now that I have learned
that you welcome such patches, it occurs to me that you might want the
DEB package as well. If so, then there are a number of things we
should discuss.
0) Naming
Your other DEB packages have names like:
mediawiki_1.19.6-1_all.deb
mediawiki-extensions-base_3.3_all.deb
mediawiki-math_1.0+git20120528-7_amd64.deb
For naming consistency, would you like the `mwxml2sql' package to be
renamed something like
mediawiki-mwxml2sql_0.0.2-1_amd64.deb
1) ITP
Debian policy requires that new packages first be announced with an
Intent-To-Package (ITP) bug report. Then a `Debian Developer' may or
may not step forward to sponsor the package for inclusion in a Debian
distribution.
Do you have someone in-house, who is serving as a `Debian Maintainer'?
If so, could you introduce us?
2) Architectures
All my systems are AMD64. Whereas `mwxml2sql' contains C language
programs, and whereas Debian is a binary distribution; a set of
`mwxml2sql' DEB packages should be prepared, one for each
architecture. Do you have a way of generating DEB packages for other
architectures?
Sincerely Yours,
Kent
Hi all,
I am helping the charity Volunteer Uganda set up an offline eLearning
computer system with 15 Raspberry Pi's and cheap desktop computer for a
server. Server stats:
- 2TB disk
- 8GB DDR3 ram
- 3ghz i5 quad core.
I am trying to import enwiki-20130403-pages-articles-multistream.xml.bz2
using mwdumper-1.16.jar, but I have a few questions.
1. I was originally using a GUI version of mwdumper-1.16.jar, but that
errored out a few time with duplicate pages so I decided to use the
pre-built one recommended on the media wiki page. Having looked at the
stats on Wikipedia I can see that there are roughly 30 million pages,
however I have found this morning that mwdumper-1.16.jar has finished (no
errors) with roughtly 13.3 million pages. Without any errors I assumed that
it had finished, but I appear to be 17 million pages short?
2. The pages that have imported are missing templates. Is there another
XML file that I can import which will add the missing templates? As the
screen shot below shows, it is almost unreadable without them.
Many thanks in advance for your help.
Kind regards,
Richard Ive
[image: Inline images 2]
--
Richard Ive • Metafour UK Ltd • 2 Berghem Mews, London W14 0HN •
registered in England: 01528556
tel: +4420 7912 2000 • direct: +4420 7912 2006 • mobile: +447854
569 205 • website: www.metafour.com
This email is private & confidential; if you received it in error, please
notify us and delete it from your system
Dear list members,
I would like some advise on how to submit a `mediawiki` related DEB
package. Jeremy Baron recommended that I contact this mailing list.
0) New utilities
Ariel T. Glenn at WMF wrote a set of utilities, `mwxml2sql', that help
convert XML dump files into a format that can be readily loaded into
the database for a local instance of MediaWiki. These utilities are
written using C language, and offer some performance advantage over
existing utilities such as `importDump.php'.
The upstream source code may be found at
<https://gerrit.wikimedia.org/r/#/admin/projects/operations/dumps>.
1) Reason for packaging
I wrote `wp-mirror' which is a free utility for mirroring any desired
set of WMF wikis. This I distribute as a DEB package. My next
release, wp-mirror-0.6, is focused on performance improvement; and,
among other things, will make use of Ariel's utilities.
To facilitate the handling of dependencies, I decided to package
Ariel's utilities.
2) DEB package
I prepared a DEB package which is now named
`mediawiki-mwxml2sql_0.0.2-1_amd64.deb'. It builds correctly with
`debuild' and with `pbuilder'. `Lintian' only complains that it does
not close any ITP bug.
3) Patches
I patched Ariel's source code and Makefile, so that man pages could be
generated using `help2man'. I submitted the patch upstream, and Ariel
graciously applied it. One more patch is under review (a few typos).
4) ITP
I submitted an Intent-To-Package (ITP) bug to Debian, but have not yet
received the bug number.
Do you know anyone who would like to sponsor the package?
Sincerely Yours,
Kent
On 5/28/13, Jeremy Baron <jeremy(a)tuxmachine.com> wrote:
> On May 28, 2013 12:34 AM, "Ariel T. Glenn" <ariel(a)wikimedia.org> wrote:
>> Στις 27-05-2013, ημέρα Δευ, και ώρα 21:00 -0400, ο/η wp mirror έγραψε:
>> From
>> looking at http://packages.debian.org/sid/mediawiki-extensions-base it
>> seems we want to get in contact with Romain Beauxis or Thorsten Glaser
>> and see how to proceed.
>
> pkg-mediawiki-devel(a)lists.alioth.debian.org is the place to mail.
>
>> Hmm I really have no idea what will happen to some of these on a 32-bit
>> system, I should check that out in a vm sometime...
>
> sounds like you just need tests in the Debian package and then Debian can
> run those for you on all archs/ports.
>
> -Jeremy
>