The importing of the 20060911 XML dumps produces a lot of garbage in the output of importDump.php in MediaWiki 1.7.1 (in additional to running about 50 times slower that 1.6.8 during import of XML dumps 1.6.8 avg is 56 pages/second, 1.7.1 is .8 pages/second):
100 (1.2361754306537 pages/sec 1.2361754306537 revs/sec) This is dvips(k) 5.95a Copyright 2005 Radical Eye Software (www.radicaleye.com) ' TeX output 2006.09.17:2322' -> <tex.pro><texps.pro>. <cmmi12.pfb><cmr12.pfb>[1] This is dvips(k) 5.95a Copyright 2005 Radical Eye Software (www.radicaleye.com) ' TeX output 2006.09.17:2322' -> <tex.pro><texps.pro>. <cmsy10.pfb><cmr8.pfb><cmex10.pfb><cmmi8.pfb> <cmmi12.pfb><cmr12.pfb>[1] This is dvips(k) 5.95a Copyright 2005 Radical Eye Software (www.radicaleye.com) ' TeX output 2006.09.17:2322' -> <tex.pro><texps.pro>. <cmr12.pfb><cmr8.pfb><cmmi12.pfb>[1] This is dvips(k) 5.95a Copyright 2005 Radical Eye Software (www.radicaleye.com) ' TeX output 2006.09.17:2322' -> <tex.pro><texps.pro>. <cmr12.pfb><cmr8.pfb><cmmi12.pfb>[1] 200 (0.86948243858959 pages/sec 0.86948243858959 revs/sec) This is dvips(k) 5.95a Copyright 2005 Radical Eye Software (www.radicaleye.com) ' TeX output 2006.09.17:2324' -> <tex.pro><texps.pro>. <cmr12.pfb><cmmi12.pfb>[1] 300 (0.77232476934321 pages/sec 0.77232476934321 revs/sec) This is dvips(k) 5.95a Copyright 2005 Radical Eye Software (www.radicaleye.com) ' TeX output 2006.09.17:2325' -> <tex.pro><texps.pro>. <cmmi12.pfb><cmr12.pfb>[1] This is dvips(k) 5.95a Copyright 2005 Radical Eye Software (www.radicaleye.com) ' TeX output 2006.09.17:2325' -> <tex.pro><texps.pro>. <cmmi12.pfb><cmr12.pfb>[1] This is dvips(k) 5.95a Copyright 2005 Radical Eye Software (www.radicaleye.com) ' TeX output 2006.09.17:2326' -> <tex.pro><texps.pro>. <cmsy10.pfb><cmr12.pfb><cmmi12.pfb>[1] This is dvips(k) 5.95a Copyright 2005 Radical Eye Software (www.radicaleye.com) ' TeX output 2006.09.17:2326' -> <tex.pro><texps.pro>. <cmsy10.pfb><cmr12.pfb><cmmi12.pfb>[1] 400 (0.80568079080928 pages/sec 0.80568079080928 revs/sec)
Jeff
Jeffrey V. Merkey wrote:
The importing of the 20060911 XML dumps produces a lot of garbage in the output of importDump.php in MediaWiki 1.7.1
That's normal.
(in additional to running about 50 times slower that 1.6.8 during import of XML dumps 1.6.8 avg is 56 pages/second, 1.7.1 is .8 pages/second):
importDump renders pages in order to update link tables. This makes it much slower, but makes the import process work completely.
If you are doing a clean import of a large, complete wiki, importDump.php is of course very inefficient. Use mwdumper for large, fast imports of a complete wiki's page data and import the other SQL tables alongside it.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org