The conversion script now includes old revisions of articles. I've tried it on the Esperanto and Polish databases which I had handy, so far it seems to work. Please test on other languages if you dare.
It shouldn't be too hard to modify this code to extract just the old versions from the old English wikipedia and drop them into the current database there, as well.
Notes: * Since user accounts are not transferred, there is no numerical user ID to put in the old_user field. Currently this results in the wiki thinking the user name in old_user_text was an IP address and trying to mask the last 3 digits, and not making links to user pages in the history lists. The digit masking is definitely wrong, however not making the links is arguably correct behavior.
* The most recent revision still has its user, comment, and timestamp wiped and replaced with "conversion script", "automatic conversion", and time of conversion. Would it not be nicer to keep the previous user, comment, and timestamp, as is done with the older revisions?
* We might, however, still want to add a note that conversion took place, so it's an obvious cutoff in the history list.
* Do we want to run fixLinks() on the old page versions? (This changes /subpage links into Page:subpage links.) Right now I do so to preserve link functionality, but this may not be appropriate, as it changes the content of previous versions slightly. The purpose of keeping old versions is to see what changed, so we might prefer to have the unchanged (and no longer working) /subpage links. Comments?
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org