Hi. I think there may still be problems with the 2016-04-07 English
Wikipedia dump. It's missing many articles in the Module namespace.
Here are some details:
* I downloaded
https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-pages-articles.…
. I got an XML file that was 10.8 GB (i.e.: it does not look severely
truncated)
* I ran the following grep commands. Note that Module:Hatnote is blank. I
ran the last grep to show that the criteria should be correct.
root~> grep "<title>Earth</title>" /home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
<title>Earth</title>
root~> grep "<title>Template:About</title>"
/home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
<title>Template:About</title>
root~> grep "<title>Module:Hatnote</title>"
/home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
root~> grep "<title>Module:" /home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
<title>Module:Location map/data/Croatia/doc</title>
<title>Module:Location map/data/USA Alabama/doc</title>
...
* The following Modules appear to be missing in the 2016-04-07 dump
Module:Use_mdy_dates
Module:Pp-move-indef
Module:Protection_banner
Module:Unsubst
* By my count, there were 2,970 articles in the Module namespace in the
2016-03-05 dump. In contrast, there are only 652 in the 2016-04-07 dump.
Let me know if you need any other information. I believe that the above can
be verified by anyone else, but I'd be happy to provide more detail
Thanks.
On Thu, Apr 14, 2016 at 8:49 AM, Ariel Glenn WMF <ariel(a)wikimedia.org>
wrote:
It hasn't failed. It's still running but the
jobs that previously failed
have been left in that status until they get rerun. That's standard
behavior. Don't worry, be happy! :-)
Ariel
On Thu, Apr 14, 2016 at 2:15 PM, Nicolas Vervelle <nvervelle(a)gmail.com>
wrote:
But at least, pages-articles worked, so it's
ok for me.
On Thu, Apr 14, 2016 at 1:13 PM, Nicolas Vervelle <nvervelle(a)gmail.com>
wrote:
Well, enwiki failed again today...
On Wed, Apr 13, 2016 at 4:37 PM, Ariel Glenn WMF <ariel(a)wikimedia.org>
wrote:
You are right. Two jobs were competing for enwiki
since I allocated one
more lousy core to the host that runs them. I've fixed the config to avoid
that. It will resume in a few hours with cron.
Ariel
On Wed, Apr 13, 2016 at 4:37 PM, Nicolas Vervelle <nvervelle(a)gmail.com>
wrote:
> Thanks Ariel,
>
> It seems to have worked for some dumps (frwiki for example), but other
> dumps are still failing (enwiki for example)
>
> Nico
>
> On Tue, Apr 12, 2016 at 11:04 AM, Ariel Glenn WMF <ariel(a)wikimedia.org
> > wrote:
>
>> Hi Nicolas,
>>
>> These will be picked up on reruns, which will happen over the next
>> day or so. The failure was caused by an obscure hhvm bug which only
>> triggers under certain circumstances. For more information about that,
>> see:
https://phabricator.wikimedia.org/T94277
>>
>> This morning I did jobs cleanup, switched the dump jobs to use php5
>> again and the dumps have restarted.
>>
>> Ariel
>>
>> On Tue, Apr 12, 2016 at 11:25 AM, Nicolas Vervelle <
>> nvervelle(a)gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is anyone working on the failed dumps for April ? (enwiki, frwiki,
>>> ruwiki, itwiki, ...)
>>>
>>> Nico
>>>
>>> _______________________________________________
>>> Xmldatadumps-l mailing list
>>> Xmldatadumps-l(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>>>
>>>
>>
>
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l