---------- Forwarded message ---------- From: Ariel Glenn WMF ariel@wikimedia.org Date: Fri, Apr 22, 2016 at 9:21 PM Subject: Re: [Xmldatadumps-l] Failed dumps To: InfoSports al@infosports.com Cc: gnosygnu gnosygnu@gmail.com, Ariel Glenn WMF ariel@wikimedia.org
I've been out ill this week. First day back today. I'm tracking this issue here, including any reruns: https://phabricator.wikimedia.org/T133416
Ariel
On Fri, Apr 15, 2016 at 6:26 PM, InfoSports al@infosports.com wrote:
I noticed the decreased size as well.
Also, there are many duplicates in the download. Example article titles…
Rainbow, California Murrieta, California Fallbrook, California Temecula, California Wildomar, California Sedeco Hills, California Palomar Mountain ...and many more
Please re-run the process. There are too many errors in this one to be usable.
Thank you in advance.
-Al
On Apr 14, 2016, at 8:56 PM, gnosygnu gnosygnu@gmail.com wrote:
Hi. I think there may still be problems with the 2016-04-07 English
Wikipedia dump. It's missing many articles in the Module namespace.
Here are some details:
- I downloaded
https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-pages-articles.x... . I got an XML file that was 10.8 GB (i.e.: it does not look severely truncated)
- I ran the following grep commands. Note that Module:Hatnote is blank.
I ran the last grep to show that the criteria should be correct.
root~> grep "<title>Earth</title>" /home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
<title>Earth</title>
root~> grep "<title>Template:About</title>" /home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
<title>Template:About</title>
root~> grep "<title>Module:Hatnote</title>" /home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
root~> grep "<title>Module:" /home/root/xowa/wiki/
en.wikipedia.org/enwiki-latest-pages-articles.xml
<title>Module:Location map/data/Croatia/doc</title> <title>Module:Location map/data/USA Alabama/doc</title> ...
- The following Modules appear to be missing in the 2016-04-07 dump
Module:Use_mdy_dates Module:Pp-move-indef Module:Protection_banner Module:Unsubst
- By my count, there were 2,970 articles in the Module namespace in the
2016-03-05 dump. In contrast, there are only 652 in the 2016-04-07 dump.
Let me know if you need any other information. I believe that the above
can be verified by anyone else, but I'd be happy to provide more detail
Thanks.
On Thu, Apr 14, 2016 at 8:49 AM, Ariel Glenn WMF ariel@wikimedia.org
wrote:
It hasn't failed. It's still running but the jobs that previously
failed have been left in that status until they get rerun. That's standard behavior. Don't worry, be happy! :-)
Ariel
On Thu, Apr 14, 2016 at 2:15 PM, Nicolas Vervelle nvervelle@gmail.com
wrote:
But at least, pages-articles worked, so it's ok for me.
On Thu, Apr 14, 2016 at 1:13 PM, Nicolas Vervelle nvervelle@gmail.com
wrote:
Well, enwiki failed again today...
On Wed, Apr 13, 2016 at 4:37 PM, Ariel Glenn WMF ariel@wikimedia.org
wrote:
You are right. Two jobs were competing for enwiki since I allocated one
more lousy core to the host that runs them. I've fixed the config to avoid that. It will resume in a few hours with cron.
Ariel
On Wed, Apr 13, 2016 at 4:37 PM, Nicolas Vervelle nvervelle@gmail.com
wrote:
Thanks Ariel,
It seems to have worked for some dumps (frwiki for example), but other
dumps are still failing (enwiki for example)
Nico
On Tue, Apr 12, 2016 at 11:04 AM, Ariel Glenn WMF ariel@wikimedia.org
wrote:
Hi Nicolas,
These will be picked up on reruns, which will happen over the next day
or so. The failure was caused by an obscure hhvm bug which only triggers under certain circumstances. For more information about that, see: https://phabricator.wikimedia.org/T94277
This morning I did jobs cleanup, switched the dump jobs to use php5
again and the dumps have restarted.
Ariel
On Tue, Apr 12, 2016 at 11:25 AM, Nicolas Vervelle nvervelle@gmail.com
wrote:
Hi,
Is anyone working on the failed dumps for April ? (enwiki, frwiki,
ruwiki, itwiki, ...)
Nico
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
xmldatadumps-l@lists.wikimedia.org