Saqib Kadri wrote:
download.wikimedia.org says enwiki 20070908 has 10,218,632 pages, and when I run it mwdumper says it inserted 5,654,236 pages into the database. But MySQL is only showing about 2.6 million rows. I ran mwdumper twice, the first time gave 2,639,569/2,678,371/2,615,000 rows for the page/revision/text tables, respectively. The second time gave 2,583,864/2,510,365/2,615,000 rows.
Which number of rows is correct - 10 million, 5 million, or 2.6 million? Did something go wrong with the DB insert? Note that this is on a Linux machine with MySQL 4.1.22, and there is plenty of space on the hard drive.
Thanks.
The 10,218,632 number includes redirects, if that helps. There must be more than 2.6 million pages by now, as just articles accounts for 2 million, so my guess is 5,654,236 is the number of non-redirect pages.
None of which explains why there are only half as many rows in the tables, of course.
-Gurch