Im running a basic check (parsing every page in the wiki) using the toolserver's 6-1 copy. Ill let you know if I see any issues.

John

On Thu, Jun 7, 2012 at 2:45 PM, Felipe Ortega <glimmer_phoenix@yahoo.es> wrote:
> De: Platonides <platonides@gmail.com>
> Para: Felipe Ortega <glimmer_phoenix@yahoo.es>
> CC: "xmldatadumps-l@lists.wikimedia.org" <xmldatadumps-l@lists.wikimedia.org>
> Enviado: Jueves 7 de junio de 2012 18:52
> Asunto: Re: [Xmldatadumps-l] Problems with frwiki dumps
>
> On 06/06/12 20:22, Felipe Ortega wrote:
>>  Hello.
>>
>>  I'm finding strange issues when trying to decompress the 7z version of
> this dump for the French Wikipedia:
>>
>>  http://dumps.wikimedia.org/frwiki/20120430/
>>
>>  At some point around 3M revisions the 7z process stalls. After a long time
> (few hours) it recovers normal execution, but then stalls again around 55M
> revisions to never recover normal cruise again.
>>
>>  Maybe there are some issues with frwiki dumps, since I can see that
> subsequent processes are experimenting failures (in May and June).
>>
>>  I'm now checking with the previous dump
> (http://dumps.wikimedia.org/frwiki/20120404/). I'll let you know in case I
> find any more problems.
>>
>>  Best,
>>  Felipe.
>
>
> It apparently decompresses ok.
>>  time md5sum frwiki-20120430-pages-meta-history.xml.7z && ( time 7z
> e -so frwiki-20120430-pages-meta-history.xml.7z > /dev/null )
>>  78eda06a57ea738a2e21697e31e52128  frwiki-20120430-pages-meta-history.xml.7z
>>
>>  real    25m55.503s
>>  user    0m28.549s
>>  sys     0m19.489s
>>
>>  7-Zip 4.55 beta  Copyright (c) 1999-2007 Igor Pavlov  2007-09-05
>>  p7zip Version 4.55 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,8 CPUs)
>>
>>  Processing archive: frwiki-20120430-pages-meta-history.xml.7z
>>
>>  Extracting  frwiki-20120430-pages-meta-history.xml
>>
>>  Everything is Ok
>>

Thanks, Platonides.

It's strange, then it might be something related to process scheduling (in Ubuntu server 12.04), but I haven't had any issues with other languages (including the many files in English).

So, last alternative would be to decompress it first and parse the xml (I see the size is ~125 GB).

Best,
Felipe.

>>
>>  Total:
>>  Folders: 0
>>  Files: 1
>>  Size: 1249323572065
>>  Compressed: 7526979951
>>
>>  real    163m59.124s
>>  user    138m30.290s
>>  sys     0m29.328s
>
>
> The content might be completely bogus, though. It'd need further checks.
> ----- Mensaje original -----


_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l