Page history structure isn't quite immutable;
revisions may be added
or
deleted, pages may be renamed, etc etc.
Shelling out to an external process means when that process dies due
to a
dead database connection etc, we can restart it cleanly.
Brion, thanks for clarifying it.
Also, I want to ask you and other developers about the idea of packing
export XML file along with all exported uploads to ZIP archive (instead
of putting them to XML in base64) - what do you think about it? We use
it in our Mediawiki installations ("mediawiki4intranet") and find it
quite convenient. Actually, ZIP was the idea of Tim Starling, before ZIP
we used very strange "multipart/related" archives (I don't know why we
did it :))
I want to try to get this change reviewed at last... What do you think
about it?
Other improvements include advanced page selection (based on
namespaces, categories, dates, imagelinks, templatelinks and pagelinks)
and an advanced import report (including some sort of "conflict
detection"). I should probably need to split them to separate patches in
Gerrit for the ease of review?
Also, do all the archiving methods (7z) really need to be built in the
Export.php as dump filters? (especially when using ZIP?) I.e. with
simple XML dumps you could just pipe the output to the compressor.
Or are they really needed to save the temporary disk space during
export? I ask because my version of import/export does not build the
archive "on-the-fly" - it puts all the contents to a temporary directory
and then archives it fully. Is it an acceptable method?
--
With best regards,
Vitaliy Filippov