Im running a basic check (parsing every page in the wiki) using the
toolserver's 6-1 copy. Ill let you know if I see any issues.
John
On Thu, Jun 7, 2012 at 2:45 PM, Felipe Ortega <glimmer_phoenix(a)yahoo.es>wrote;wrote:
De: Platonides
<platonides(a)gmail.com>
Para: Felipe Ortega <glimmer_phoenix(a)yahoo.es>
CC: "xmldatadumps-l(a)lists.wikimedia.org" <
xmldatadumps-l(a)lists.wikimedia.org>
Enviado: Jueves 7 de junio de 2012 18:52
Asunto: Re: [Xmldatadumps-l] Problems with frwiki dumps
On 06/06/12 20:22, Felipe Ortega wrote:
Hello.
I'm finding strange issues when trying to decompress the 7z version of
this
dump for the French Wikipedia:
>
>
http://dumps.wikimedia.org/frwiki/20120430/
>
> At some point around 3M revisions the 7z process stalls. After a long
time
(few hours) it recovers normal execution, but
then stalls again around
55M
revisions to never recover normal cruise again.
Maybe there are some issues with frwiki dumps, since I can see that
subsequent
processes are experimenting failures (in May and June).
I'm now checking with the previous dump
(
http://dumps.wikimedia.org/frwiki/20120404/). I'll let you know in
case I
find any more problems.
Best,
Felipe.
It apparently decompresses ok.
time md5sum
frwiki-20120430-pages-meta-history.xml.7z && ( time 7z
e -so
frwiki-20120430-pages-meta-history.xml.7z > /dev/null )
> 78eda06a57ea738a2e21697e31e52128
frwiki-20120430-pages-meta-history.xml.7z
>
> real 25m55.503s
> user 0m28.549s
> sys 0m19.489s
>
> 7-Zip 4.55 beta Copyright (c) 1999-2007 Igor Pavlov 2007-09-05
> p7zip Version 4.55 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,8 CPUs)
>
> Processing archive: frwiki-20120430-pages-meta-history.xml.7z
>
> Extracting frwiki-20120430-pages-meta-history.xml
>
> Everything is Ok
>
Thanks, Platonides.
It's strange, then it might be something related to process scheduling (in
Ubuntu server 12.04), but I haven't had any issues with other languages
(including the many files in English).
So, last alternative would be to decompress it first and parse the xml (I
see the size is ~125 GB).
Best,
Felipe.
Total:
Folders: 0
Files: 1
Size: 1249323572065
Compressed: 7526979951
real 163m59.124s
user 138m30.290s
sys 0m29.328s
The content might be completely bogus, though. It'd need further checks.
----- Mensaje original -----
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l