Στις 16-05-2013, ημέρα Πεμ, και ώρα 11:03 +0200, ο/η Michael Tsikerdekis έγραψε:
Hi everyone,
I am trying to restore the revision table from Wikipedia dumps. I understand that the file that I need is probably enwiki-XX-pages- logging.xml.gz
Actually you want one of the xml files with page content, either enwiki-20130503-pages-articles.xml.bz2 , enwiki-20130503-pages-meta-current.xml.bz2 or the various meta-history bz2 or 7z files. This depends on whether you want the current revision for the articles and related namespace pages only, the current revision for all pages, or all revisions for all pages.
Mwdumper will generate sql from these files to populate the revision, text and page tables, all in one output file. All of these will be written in one file intermingled, so you'll want to grab just the sql statements pertaining to the revision table if that's all you want to recreate. I don't know if it works well with the latest dumps.
The pages-logging xml file could conceivably be used for repopulating the logging table and part of the user table (poorly); I imagine most folks use it for research purposes rather than import data.
PS: I've also tried to build mwdumper: git clone https://gerrit.wikimedia.org/r/p/mediawiki/tools/mwdumper.git mwdumper
However I couldn't use make or ant since there was not build.xml or makefile in the git.
You can backtrack a couple revisions in git to get one that's buildable. I'm cc-ing Chad on this since he knows about the build setup.
Ariel