---------- Forwarded message ----------
From: Ariel Glenn WMF <ariel@wikimedia.org>
Date: Fri, Apr 22, 2016 at 9:21 PM
Subject: Re: [Xmldatadumps-l] Failed dumps
To: InfoSports <al@infosports.com>
Cc: gnosygnu <gnosygnu@gmail.com>, Ariel Glenn WMF <ariel@wikimedia.org>


I've been out ill this week.  First day back today.  I'm tracking this issue here, including any reruns: https://phabricator.wikimedia.org/T133416

Ariel

On Fri, Apr 15, 2016 at 6:26 PM, InfoSports <al@infosports.com> wrote:
I noticed the decreased size as well.

Also, there are many duplicates in the download. Example article titles…

Rainbow, California
Murrieta, California
Fallbrook, California
Temecula, California
Wildomar, California
Sedeco Hills, California
Palomar Mountain
...and many more


Please re-run the process. There are too many errors in this one to be usable.

Thank you in advance.

-Al


> On Apr 14, 2016, at 8:56 PM, gnosygnu <gnosygnu@gmail.com> wrote:
>
> Hi. I think there may still be problems with the 2016-04-07 English Wikipedia dump. It's missing many articles in the Module namespace.
>
> Here are some details:
> * I downloaded https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-pages-articles.xml.bz2 . I got an XML file that was 10.8 GB (i.e.: it does not look severely truncated)
> * I ran the following grep commands. Note that Module:Hatnote is blank. I ran the last grep to show that the criteria should be correct.
> root~> grep "<title>Earth</title>" /home/root/xowa/wiki/en.wikipedia.org/enwiki-latest-pages-articles.xml
>     <title>Earth</title>
> root~> grep "<title>Template:About</title>" /home/root/xowa/wiki/en.wikipedia.org/enwiki-latest-pages-articles.xml
>     <title>Template:About</title>
> root~> grep "<title>Module:Hatnote</title>" /home/root/xowa/wiki/en.wikipedia.org/enwiki-latest-pages-articles.xml
> root~> grep "<title>Module:" /home/root/xowa/wiki/en.wikipedia.org/enwiki-latest-pages-articles.xml
>     <title>Module:Location map/data/Croatia/doc</title>
>     <title>Module:Location map/data/USA Alabama/doc</title>
>     ...
> * The following Modules appear to be missing in the 2016-04-07 dump
> Module:Use_mdy_dates
> Module:Pp-move-indef
> Module:Protection_banner
> Module:Unsubst
> * By my count, there were 2,970 articles in the Module namespace in the 2016-03-05 dump. In contrast, there are only 652 in the 2016-04-07 dump.
>
> Let me know if you need any other information. I believe that the above can be verified by anyone else, but I'd be happy to provide more detail
>
> Thanks.
>
>
>
>
> On Thu, Apr 14, 2016 at 8:49 AM, Ariel Glenn WMF <ariel@wikimedia.org> wrote:
> It hasn't failed.  It's still running but the jobs that previously failed have been left in that status until they get rerun.  That's standard behavior.  Don't worry, be happy! :-)
>
> Ariel
>
> On Thu, Apr 14, 2016 at 2:15 PM, Nicolas Vervelle <nvervelle@gmail.com> wrote:
> But at least, pages-articles worked, so it's ok for me.
>
> On Thu, Apr 14, 2016 at 1:13 PM, Nicolas Vervelle <nvervelle@gmail.com> wrote:
> Well, enwiki failed again today...
>
> On Wed, Apr 13, 2016 at 4:37 PM, Ariel Glenn WMF <ariel@wikimedia.org> wrote:
> You are right. Two jobs were competing for enwiki since I allocated one more lousy core to the host that runs them. I've fixed the config to avoid that. It will resume in a few hours with cron.
>
> Ariel
>
> On Wed, Apr 13, 2016 at 4:37 PM, Nicolas Vervelle <nvervelle@gmail.com> wrote:
> Thanks Ariel,
>
> It seems to have worked for some dumps (frwiki for example), but other dumps are still failing (enwiki for example)
>
> Nico
>
> On Tue, Apr 12, 2016 at 11:04 AM, Ariel Glenn WMF <ariel@wikimedia.org> wrote:
> Hi Nicolas,
>
> These will be picked up on reruns, which will happen over the next day or so.  The failure was caused by an obscure hhvm bug which only triggers under certain circumstances.  For more information about that, see: https://phabricator.wikimedia.org/T94277
>
> This morning I did jobs cleanup, switched the dump jobs to use php5 again and the dumps have restarted.
>
> Ariel
>
> On Tue, Apr 12, 2016 at 11:25 AM, Nicolas Vervelle <nvervelle@gmail.com> wrote:
> Hi,
>
> Is anyone working on the failed dumps for April ? (enwiki, frwiki, ruwiki, itwiki, ...)
>
> Nico
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
>
>
>
>
>
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l