Hi,
don't know if this issue came up already - in case it did and has been
dismissed, I beg your pardon. In case it didn't...
I hereby propose, that pbzip2 (https://launchpad.net/pbzip2) is used
to compress the xml dumps instead of bzip2. Why? Because its sibling
(pbunzip2) has a bug bunzip2 hasn't. :-)
Strange? Read on.
A few hours ago, I filed a bug report for pbzip2 (see
https://bugs.launchpad.net/pbzip2/+bug/922804) together with some test
results done even some few hours before that.
The results indicate that:
bzip2 and pbzip2 are vice-versa compatible each one can create
archives, the other one can read. But if it is for uncomressing, only
pbzip2 compressed archives are good for pbunzip2.
I propose compressing the archives with pbzip2 for the following
reasons:
1) If your archiving machines are SMP systems this could lead to a
better usage of system ressources (i.e. faster compression).
2) Compression with pbzip2 is harmless for regular users of bunzip2,
so everything should run for these people as usual.
3) pbzip2-compressed archives can be uncompressed with pbunzip2 with a
speedup that scales nearly linearly with the number of CPUs in the
host.
So to sum up: It's a no loose and two win situation if you migrate to
pbzip2. And that just because pbunzip2 is slightly buggy. Isn't that
interesting? :-)
cheers,
--
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH - www.petamem.com Geschäftsführer: Richard Jelinek
Human Language Technology Experts Sitz der Gesellschaft: Fürth
69216618 Mind Units Registergericht: AG Fürth, HRB-9201
Dear Kevin,
Thanks for your help. I can now see the index.html file (see 1
below). However, I can no longer see the other files in that directory
(see 2 below).
1) index.html
The following
(shell)$ curl --ipv4 --verbose
http://ftpmirror.your.org/pub/wikimedia/dumps/
(shell)$ curl --ipv6 --verbose
http://ftpmirror.your.org/pub/wikimedia/dumps/
now both report
HTTP/1.1 301 Moved Permanently
Location: http://dumps.wikimedia.your.org/
But the following
(shell)$ curl --ipv4 --verbose --location
http://ftpmirror.your.org/pub/wikimedia/dumps/
(shell)$ curl --ipv6 --verbose --location
http://ftpmirror.your.org/pub/wikimedia/dumps/
follow the redirect and report
HTTP/1.1 200 OK
2) Files
Now, however, I can not get other files from that directory. For
example, the following
(shell)$ curl --ipv4 --verbose
http://ftpmirror.your.org/pub/wikimedia/dumps/rsync-dirlist-last-1-good.txt
(shell)$ curl --ipv6 --verbose
http://ftpmirror.your.org/pub/wikimedia/dumps/rsync-dirlist-last-1-good.txt
both report
HTTP/1.1 301 Moved Permanently
Location: http://dumps.wikimedia.your.org/r
The rather odd value for "Location" makes it impossible to follow the
redirect. Thus
(shell)$ curl --ipv4 --verbose --location
http://ftpmirror.your.org/pub/wikimedia/dumps/rsync-dirlist-last-1-good.txt
(shell)$ curl --ipv6 --verbose --location
http://ftpmirror.your.org/pub/wikimedia/dumps/rsync-dirlist-last-1-good.txt
report
HTTP/1.1 404 Not Found
Sincerely Yours,
Kent
+------------------------------------------------------------------------+
pub 1024D/359E5142 2008-09-01 GPG key available on pgpkeys.mit.edu
Key fingerprint = 8D4F 4485 7F7D 5406 230C 9749 B821 2572 359E 5142
uid Dr. Kent L. Miller <kent.l.miller(a)alumni.cmu.edu>
+------------------------------------------------------------------------+
On Sat, 16 Nov 2013, Kevin Day wrote:
> Nice catch!
>
> That should be fixed now, and sending a 301 redirect on both ipv4 and ipv6.
>
> Several of the pages on the “dumps” collection have hardcoded links that require them to be in the / (root) directory, so they require their own hostname.
>
> — Kevin
>
> On Nov 16, 2013, at 2:32 PM, Dr. Kent L. Miller <kent.l.miller(a)alumni.cmu.edu> wrote:
>
>> Dear Ariel,
>>
>> When I compare the following two commands
>>
>> (shell)$ curl --ipv4 --verbose http://ftpmirror.your.org/pub/wikimedia/dumps/
>> (shell)$ curl --ipv6 --verbose http://ftpmirror.your.org/pub/wikimedia/dumps/
>>
>> The first reports "HTTP/1.1 301 Moved Permanently"
>> The second reports "HTTP/1.1 200 OK"
>>
>> Has anyone else noticed this?
>>
>> Sincerely Yours,
>> Kent
>>
>>
>> +------------------------------------------------------------------------+
>> pub 1024D/359E5142 2008-09-01 GPG key available on pgpkeys.mit.edu
>> Key fingerprint = 8D4F 4485 7F7D 5406 230C 9749 B821 2572 359E 5142
>> uid Dr. Kent L. Miller <kent.l.miller(a)alumni.cmu.edu>
>> +------------------------------------------------------------------------+
>>
>>
>> _______________________________________________
>> Xmldatadumps-l mailing list
>> Xmldatadumps-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
Dear Ariel,
When I compare the following two commands
(shell)$ curl --ipv4 --verbose
http://ftpmirror.your.org/pub/wikimedia/dumps/
(shell)$ curl --ipv6 --verbose
http://ftpmirror.your.org/pub/wikimedia/dumps/
The first reports "HTTP/1.1 301 Moved Permanently"
The second reports "HTTP/1.1 200 OK"
Has anyone else noticed this?
Sincerely Yours,
Kent
+------------------------------------------------------------------------+
pub 1024D/359E5142 2008-09-01 GPG key available on pgpkeys.mit.edu
Key fingerprint = 8D4F 4485 7F7D 5406 230C 9749 B821 2572 359E 5142
uid Dr. Kent L. Miller <kent.l.miller(a)alumni.cmu.edu>
+------------------------------------------------------------------------+
Yannick Guigui, 08/11/2013 12:22:
> I'm camerounian I built a webapp whose allows students to consult
> wikipedia articles
So you don't need the originals, only thumbs.
> without internet connectivity,many school accepted
> the application and the application is hosted on a server and shared by
> wifi on each school.
Nice! It seems you may want to use this existing software solution:
http://kiwix.org/wiki/Kiwix-serve
That way, you can use the available ZIM files with no need to generate
or download (and compress) the thumbnails yourself.
Example: http://www.wikimedia.fr/afripedia
Your help developing the software would be very useful and you could
avoid doing yourself what you don't have the resources (bandwidth) to do.
> I have all the other dumps of wikipedia articles in
> french and english;but I don't have any image because they are too heavy
> for me to be downloaded to my side (3 TB) and I have a low bandwidth (40
> ko/s when it's fast).
>
> The webapp works on a browser and i don't know if the zim format can be
> undecompressed to get small images (jpeg,png,svg...).
Kiwix is a browser, you can save anything you want AFAIK.
Nemo
>
> This is the video demo (3min in french) of the webapp
> :https://www.youtube.com/watch?v=0f-HJhOw1-U
>
> If I get small images in french and english to download to the app,my
> problem will revolved.
>
> Tank a lot Federico
>
>
>
> Le vendredi 8 novembre 2013, Federico Leva (Nemo) a écrit :
>
> Yannick Guigui, 08/11/2013 10:11:
>
> Please I want to get all images of wikipedia frensh and English,
> I much
> did it cost to book it on hardisk? In can't download it because
> I don't
> have enought bandwidth from my country.
>
>
> What do you need them for?
> Originals would be about 2+1 TB and anyone can download and ship
> them for you:
> http://ftpmirror.your.org/pub/__wikimedia/imagedumps/tarballs/__fulls/20121…
> <http://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/fulls/20121201/>
> Otherwise there are the ZIM files with thumbnails compressed,
> fr.wiki is 14 GB but en.wiki is not available yet.
> http://download.kiwix.org/zim/__0.9/
> <http://download.kiwix.org/zim/0.9/>
>
> Nemo
>
Please I want to get all images of wikipedia frensh and English, I much did
it cost to book it on hardisk? In can't download it because I don't have
enought bandwidth from my country.
Tank's I really need it.
hello all,
-Past mail was failed.. Sry
im kisoong.jang from korea
nowdays i'm interested of wikidata.
I'm trying to make wikidata clone on my localhost at the moment
I've installed mediawiki, wikibase client & wikibase repository successfully
and i downloaded whole dumps file in
http://dumps.wikimedia.org/wikidatawiki/latest/
there are many files in http://dumps.wikimedia.org/wikidatawiki/latest/