download.wikimedia.org says enwiki 20070908 has 10,218,632 pages, and when I run it mwdumper says it inserted 5,654,236 pages into the database. But MySQL is only showing about 2.6 million rows. I ran mwdumper twice, the first time gave 2,639,569/2,678,371/2,615,000 rows for the page/revision/text tables, respectively. The second time gave 2,583,864/2,510,365/2,615,000 rows.
Which number of rows is correct - 10 million, 5 million, or 2.6 million? Did something go wrong with the DB insert? Note that this is on a Linux machine with MySQL 4.1.22, and there is plenty of space on the hard drive.
Thanks.
Saqib Kadri wrote:
download.wikimedia.org says enwiki 20070908 has 10,218,632 pages, and when I run it mwdumper says it inserted 5,654,236 pages into the database. But MySQL is only showing about 2.6 million rows. I ran mwdumper twice, the first time gave 2,639,569/2,678,371/2,615,000 rows for the page/revision/text tables, respectively. The second time gave 2,583,864/2,510,365/2,615,000 rows.
Which number of rows is correct - 10 million, 5 million, or 2.6 million? Did something go wrong with the DB insert? Note that this is on a Linux machine with MySQL 4.1.22, and there is plenty of space on the hard drive.
Thanks.
The 10,218,632 number includes redirects, if that helps. There must be more than 2.6 million pages by now, as just articles accounts for 2 million, so my guess is 5,654,236 is the number of non-redirect pages.
None of which explains why there are only half as many rows in the tables, of course.
-Gurch
The 10,218,632 number includes redirects, if that helps. There must be more than 2.6 million pages by now, as just articles accounts for 2 million, so my guess is 5,654,236 is the number of non-redirect pages.
Oddly, with my (slightly-modified) mwdumper, I get the exact number of rows that I expect, with the latest enwiki input. It takes about 30 minutes to import the whole thing from a cold "drop if exists" on the relevant tables and import it.
Perhaps Matt is running out of some resource on his machine? MySQL limit? RAM limit? Something else?
wikitech-l@lists.wikimedia.org