Im running a basic check (parsing every page in the wiki) using the toolserver's 6-1 copy. Ill let you know if I see any issues.
John
> De: Platonides <platonides@gmail.com>
> Para: Felipe Ortega <glimmer_phoenix@yahoo.es>
> CC: "xmldatadumps-l@lists.wikimedia.org" <xmldatadumps-l@lists.wikimedia.org>
> Enviado: Jueves 7 de junio de 2012 18:52
> Asunto: Re: [Xmldatadumps-l] Problems with frwiki dumps
Thanks, Platonides.>
> On 06/06/12 20:22, Felipe Ortega wrote:
>> Hello.
>>
>> I'm finding strange issues when trying to decompress the 7z version of
> this dump for the French Wikipedia:
>>
>> http://dumps.wikimedia.org/frwiki/20120430/
>>
>> At some point around 3M revisions the 7z process stalls. After a long time
> (few hours) it recovers normal execution, but then stalls again around 55M
> revisions to never recover normal cruise again.
>>
>> Maybe there are some issues with frwiki dumps, since I can see that
> subsequent processes are experimenting failures (in May and June).
>>
>> I'm now checking with the previous dump
> (http://dumps.wikimedia.org/frwiki/20120404/). I'll let you know in case I
> find any more problems.
>>
>> Best,
>> Felipe.
>
>
> It apparently decompresses ok.
>> time md5sum frwiki-20120430-pages-meta-history.xml.7z && ( time 7z
> e -so frwiki-20120430-pages-meta-history.xml.7z > /dev/null )
>> 78eda06a57ea738a2e21697e31e52128 frwiki-20120430-pages-meta-history.xml.7z
>>
>> real 25m55.503s
>> user 0m28.549s
>> sys 0m19.489s
>>
>> 7-Zip 4.55 beta Copyright (c) 1999-2007 Igor Pavlov 2007-09-05
>> p7zip Version 4.55 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,8 CPUs)
>>
>> Processing archive: frwiki-20120430-pages-meta-history.xml.7z
>>
>> Extracting frwiki-20120430-pages-meta-history.xml
>>
>> Everything is Ok
>>
It's strange, then it might be something related to process scheduling (in Ubuntu server 12.04), but I haven't had any issues with other languages (including the many files in English).
So, last alternative would be to decompress it first and parse the xml (I see the size is ~125 GB).
Best,
Felipe.
> ----- Mensaje original -----
>>
>> Total:
>> Folders: 0
>> Files: 1
>> Size: 1249323572065
>> Compressed: 7526979951
>>
>> real 163m59.124s
>> user 138m30.290s
>> sys 0m29.328s
>
>
> The content might be completely bogus, though. It'd need further checks.
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l