An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.12alpha (r26389).
Reading tests from "maintenance/parserTests.txt"... Reading tests from "extensions/Cite/citeParserTests.txt"... Reading tests from "extensions/Poem/poemParserTests.txt"... Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"...
17 still FAILING test(s) :( * URL-encoding in URL functions (single parameter) [Has never passed] * URL-encoding in URL functions (multiple parameters) [Has never passed] * Table security: embedded pipes (http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-April/022293.html) [Has never passed] * Link containing double-single-quotes '' (bug 4598) [Has never passed] * message transform: <noinclude> in transcluded template (bug 4926) [Has never passed] * message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed] * BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed] * HTML bullet list, unclosed tags (bug 5497) [Has never passed] * HTML ordered list, unclosed tags (bug 5497) [Has never passed] * HTML nested bullet list, open tags (bug 5497) [Has never passed] * HTML nested ordered list, open tags (bug 5497) [Has never passed] * Inline HTML vs wiki block nesting [Has never passed] * Mixing markup for italics and bold [Has never passed] * dt/dd/dl test [Has never passed] * Images with the "|" character in the comment [Has never passed] * Parents of subpages, two levels up, without trailing slash or name. [Has never passed] * Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 527 of 544 tests (96.88%)... 17 tests failed!
/ The 10,218,632 number includes redirects, if that helps. There must
/> >/ be more than 2.6 million pages by now, as just articles accounts for /> >/ 2 million, so my guess is 5,654,236 is the number of non-redirect /> >/ pages. /> /> Oddly, with my (slightly-modified) mwdumper, I get the exact number of
rows that I expect, with the latest enwiki input. It takes about 30 minutes to import the whole thing from a cold "drop if exists" on the relevant tables and import it.
Perhaps Matt is running out of some resource on his machine? MySQL limit? RAM limit? Something else?
/ The server 8 gigs of RAM, so I don't think that can be it.
I was actually mistaken on the 10 million rows - that's for pages-meta-current.xml, but I'm inserting pages-articles.xml. The number of rows that is supposed to have is 5,654,236, which is the same number mwdumper says it inserted.
But the actual MySQL text table only shows the 2.615 million rows. (note that the md5 checksum for the download file is correct, so it's not corrupt or anything) I redone this multiple times and it's always 2.615 million rows in the text table.
Another issue I noticed is that the number of rows (and cardinality of the indexes) in the page and revision tables keep changing every time I look - the number goes up and down by thousands, sometimes varying by over 100,000. It might go down one time, then up another time. The number of rows in text table stays constant. I couldn't think of any reason for this. Note that the table sizes don't seem to change - the page table is 581,696 KiB and the revision table is 1,046 MiB.
Also, if I go to the end of each table in PHPMyAdmin, both the page and revision tables always show as having 2,614,000 total rows. But the number of rows for these tables given by SHOW TABLE STATUS is often greater than this number.
Anyone know whether the 2.615 million rows is the right number that enwiki should have, and why MySQL would keep changing its mind about how many rows the page and revision tables have? Here's my /etc/my.cnf file if that helps:
[mysqld] set-variable = max_connections=1000 safe-show-database log_slow_queries long_query_time=5 max_allowed_packet=64M ft_min_word_len=3 query_cache_limit=2M query_cache_size=64M default-collation=UTF8_general_ci default-character-set=UTF8
Thanks.
wikitech-l@lists.wikimedia.org