---------- Forwarded message ----------
From: Ariel Glenn WMF <ariel(a)wikimedia.org>
Date: Fri, Apr 22, 2016 at 9:21 PM
Subject: Re: [Xmldatadumps-l] Failed dumps
To: InfoSports <al(a)infosports.com>
Cc: gnosygnu <gnosygnu(a)gmail.com>om>, Ariel Glenn WMF <ariel(a)wikimedia.org>
I've been out ill this week. First day back today. I'm tracking this
issue here, including any reruns:
Ariel
On Fri, Apr 15, 2016 at 6:26 PM, InfoSports <al(a)infosports.com> wrote:
I noticed the decreased size as well.
Also, there are many duplicates in the download. Example article titles…
Rainbow, California
Murrieta, California
Fallbrook, California
Temecula, California
Wildomar, California
Sedeco Hills, California
Palomar Mountain
...and many more
Please re-run the process. There are too many errors in this one to be
usable.
Thank you in advance.
-Al
On Apr 14, 2016, at 8:56 PM, gnosygnu
<gnosygnu(a)gmail.com> wrote:
Hi. I think there may still be problems with the 2016-04-07 English
Wikipedia
dump. It's missing many articles in the Module namespace.
Here are some details:
* I downloaded
https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-pages-articles.…
. I got an XML file that was 10.8 GB (i.e.: it does not look severely
truncated)
* I ran the following grep commands. Note that
Module:Hatnote is blank.
I ran the last grep to show that the criteria should be
correct.
root~> grep
"<title>Earth</title>" /home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
<title>Earth</title>
root~> grep "<title>Template:About</title>"
/home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
<title>Template:About</title>
root~> grep "<title>Module:Hatnote</title>"
/home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
root~> grep "<title>Module:"
/home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
<title>Module:Location
map/data/Croatia/doc</title>
<title>Module:Location map/data/USA Alabama/doc</title>
...
* The following Modules appear to be missing in the 2016-04-07 dump
Module:Use_mdy_dates
Module:Pp-move-indef
Module:Protection_banner
Module:Unsubst
* By my count, there were 2,970 articles in the Module namespace in the
2016-03-05
dump. In contrast, there are only 652 in the 2016-04-07 dump.
Let me know if you need any other information. I believe that the above
can be
verified by anyone else, but I'd be happy to provide more detail
Thanks.
On Thu, Apr 14, 2016 at 8:49 AM, Ariel Glenn WMF <ariel(a)wikimedia.org>
wrote:
It hasn't failed. It's still running but
the jobs that previously
failed have been left in that status until they get rerun.
That's standard
behavior. Don't worry, be happy! :-)
Ariel
On Thu, Apr 14, 2016 at 2:15 PM, Nicolas Vervelle <nvervelle(a)gmail.com>
wrote:
But at least, pages-articles worked, so it's
ok for me.
On Thu, Apr 14, 2016 at 1:13 PM, Nicolas Vervelle <nvervelle(a)gmail.com>
wrote:
Well, enwiki failed again today...
On Wed, Apr 13, 2016 at 4:37 PM, Ariel Glenn WMF <ariel(a)wikimedia.org>
wrote:
You are right. Two jobs were competing for enwiki
since I allocated one
more lousy core to the host that runs them. I've fixed
the config to avoid
that. It will resume in a few hours with cron.
Ariel
On Wed, Apr 13, 2016 at 4:37 PM, Nicolas Vervelle <nvervelle(a)gmail.com>
wrote:
Thanks Ariel,
It seems to have worked for some dumps (frwiki for example), but other
dumps are
still failing (enwiki for example)
Nico
On Tue, Apr 12, 2016 at 11:04 AM, Ariel Glenn WMF <ariel(a)wikimedia.org>
wrote:
Hi Nicolas,
These will be picked up on reruns, which will happen over the next day
or so. The
failure was caused by an obscure hhvm bug which only triggers
under certain circumstances. For more information about that, see:
https://phabricator.wikimedia.org/T94277
This morning I did jobs cleanup, switched the dump jobs to use php5
again and the
dumps have restarted.
Ariel
On Tue, Apr 12, 2016 at 11:25 AM, Nicolas Vervelle <nvervelle(a)gmail.com>
wrote:
Hi,
Is anyone working on the failed dumps for April ? (enwiki, frwiki,
ruwiki, itwiki,
...)
Nico
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l