Hi. I think there may still be problems with the 2016-04-07 English Wikipedia dump. It's missing many articles in the Module namespace.

Here are some details:
* I downloaded https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-pages-articles.xml.bz2 . I got an XML file that was 10.8 GB (i.e.: it does not look severely truncated)
* I ran the following grep commands. Note that Module:Hatnote is blank. I ran the last grep to show that the criteria should be correct.
root~> grep "<title>Earth</title>" /home/root/xowa/wiki/en.wikipedia.org/enwiki-latest-pages-articles.xml                            
    <title>Earth</title>
root~> grep "<title>Template:About</title>" /home/root/xowa/wiki/en.wikipedia.org/enwiki-latest-pages-articles.xml                            
    <title>Template:About</title>
root~> grep "<title>Module:Hatnote</title>" /home/root/xowa/wiki/en.wikipedia.org/enwiki-latest-pages-articles.xml            
root~> grep "<title>Module:" /home/root/xowa/wiki/en.wikipedia.org/enwiki-latest-pages-articles.xml              
    <title>Module:Location map/data/Croatia/doc</title>
    <title>Module:Location map/data/USA Alabama/doc</title>
    ...
* The following Modules appear to be missing in the 2016-04-07 dump
Module:Use_mdy_dates
Module:Pp-move-indef
Module:Protection_banner
Module:Unsubst
* By my count, there were 2,970 articles in the Module namespace in the 2016-03-05 dump. In contrast, there are only 652 in the 2016-04-07 dump.

Let me know if you need any other information. I believe that the above can be verified by anyone else, but I'd be happy to provide more detail

Thanks.




On Thu, Apr 14, 2016 at 8:49 AM, Ariel Glenn WMF <ariel@wikimedia.org> wrote:
It hasn't failed.  It's still running but the jobs that previously failed have been left in that status until they get rerun.  That's standard behavior.  Don't worry, be happy! :-)

Ariel

On Thu, Apr 14, 2016 at 2:15 PM, Nicolas Vervelle <nvervelle@gmail.com> wrote:
But at least, pages-articles worked, so it's ok for me.

On Thu, Apr 14, 2016 at 1:13 PM, Nicolas Vervelle <nvervelle@gmail.com> wrote:
Well, enwiki failed again today...

On Wed, Apr 13, 2016 at 4:37 PM, Ariel Glenn WMF <ariel@wikimedia.org> wrote:
You are right. Two jobs were competing for enwiki since I allocated one more lousy core to the host that runs them. I've fixed the config to avoid that. It will resume in a few hours with cron.

Ariel

On Wed, Apr 13, 2016 at 4:37 PM, Nicolas Vervelle <nvervelle@gmail.com> wrote:
Thanks Ariel,

It seems to have worked for some dumps (frwiki for example), but other dumps are still failing (enwiki for example)

Nico

On Tue, Apr 12, 2016 at 11:04 AM, Ariel Glenn WMF <ariel@wikimedia.org> wrote:
Hi Nicolas,

These will be picked up on reruns, which will happen over the next day or so.  The failure was caused by an obscure hhvm bug which only triggers under certain circumstances.  For more information about that, see: https://phabricator.wikimedia.org/T94277 

This morning I did jobs cleanup, switched the dump jobs to use php5 again and the dumps have restarted.

Ariel

On Tue, Apr 12, 2016 at 11:25 AM, Nicolas Vervelle <nvervelle@gmail.com> wrote:
Hi,

Is anyone working on the failed dumps for April ? (enwiki, frwiki, ruwiki, itwiki, ...)

Nico

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l








_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l