Di (rut) wrote:
Dear All, specially Anthony and Platonides,
I hope the heroic duo will give their blessing to this post.
I'm not techy - so why hasn't it been possible to have a non-corrupt dump in a long time (that includes history?). A professor of mine asked if the problem could be man(person)-power and if it would be interesting/useful to have the university help with a programmer to help the dump happen.
In my opinion, it would be a lot easier to generate a full dump if it was split into multiple XML files for each wiki. Then the job could be checkpointed on the file level. Checkpoint/resume is quite difficult with the current single-file architecture.
Tolerant parsers on the client side would help a bit. A dump shouldn't be considered "failed" just because it has a region of garbage and some unclosed tags in the middle of the file.
Also - now I've got a file from 2006 but I still wonder if there is no place where one can access old dumps - these will/could be very important research wise.
The Foundation does not host old dumps. Maybe someone else has one.
And last but not least - If the dumps don't work, then it is very important to be able to dump some articles with their full histories in other fashions. I ask my pledge again - do you know who made the block so that export would only allow for 100 revisions? any way to hack that? Would it be possible to open an exception to get the data for a research study?
There's an offset parameter which allows you to get specified revisions or revision ranges. Read the relevant code in includes/SpecialExport.php before use, it's a bit counterintuitive (buggy?).
-- Tim Starling