On 11/21/12 1:54 PM, vitalif@yourcmc.ru wrote:
While working on my improvements to MediaWiki Import&Export, I've discovered a feature that is totally new for me: 2-phase backup dump. I.e. the first pass dumper creates XML file without page texts, and the second pass dumper adds page texts.
I have several questions about it - what it is intended for? Is it a sort of optimisation for large databases and why such method of optimisation was chosen?
Also, does anyone use it? (does Wikimedia use it?)
I'm not sure if this is the reason it was created, but one useful outcome is that Wikimedia can make the output of both passes available at dumps.wikimedia.org. This can be useful for researchers (myself included), because the metadata-only (pass 1) dump is sufficient for doing some kinds of analyses, while being *much* smaller than the full dump.
-Mark