Den 23-08-2013 17:45, A B skrev:
Apart of the problems I have related to speed inserting page table data from the SQL dump, there is a thing I don't understand. Why there are about 30 millions "INSERT INTO page" in page.sql but only about 10 millions in page-articles.sql dump?
The page.sql file have a table row for each page in the wiki.
I do not know any page-articles.sql file, but maybe you mean the pages-articles.xml file which have info about pages including the actual page content but only for pages in certain namespaces (articles and some others) in xml format. The pages-meta-current.xml file are similar and contains info all pages.
My goal is to have all pages with its correct page_len.
Then it is easiest to get the page_len field in the page.sql files. The much larger pages-meta-current.xml which includes all page content can also be used, but you will have to count the lengths yourself.
Regards, - Byrial