...another dump. August is done, July 7z are done, the last of the May
history and 7z are done. That brings us up to date.
I expect to test new code with production of many small files, as
previously discussed on this list, starting within the next few days.
This test will be for en wikipedia only, as that's the dump that's
hardest to run to completion. The results might be a perfectly good
dump, or not. Even if they are, I do not plan to try running en
wikipedia dumps twice a month, so don't get your hopes up. (Who would
process all that data every two weeks anyways?)
I hope I'm not disturbing you too much, I have the following question:
I'm considering to download the enwiki-latest-pages-articles.xml, but I need to know if this contains enough information to rebuild the category structure (parent categories, subcategories, including the Category:Contents, etc.). Does the dump include the category pages or only the articles?
Thank you very much,
Hello, are there any plans to combine all of the pages-meta-history XML dumps from the 7/22 dump into one file? This is useful for importing into JWPL.
Diane M. Napolitano
Associate Research Engineer
Educational Testing Service
Turnbull Hall R-239
Princeton, New Jersey 08540
A new month, another couple of en wikipedia dumps...
It looks like the various upgrade issues are all straightened out. The
June files that were truncated have all been rerun and are ready for
download. In the meantime the July dumps are ready, for those willing
to grab the bz2 files. 7z files should be available in anouther couple
of days, barring any site issues. If the July files look ok to folks,
I'll do the last of our OS upgrades so that all our dump servers will be
up to date.