This sounds really interesting to me (as in, I would seriously consider applying for this project).
Few questions: Do you think most of this should be written in PHP (since dumpBackup.php is currently in PHP)? Or could it be written in another language (most likely Python)?
The description talks about "smart choice for compression of multiple items together", how would that work with deleting items? Especially with history dumps, I think it would make a lot of sense to use some kind of delta compression (like git's pack files do). But this would cause problems with deleting revisions that other revisions use as a base for their delta (though certainly not unsolvable problems). I guess figuring this out would be a part of the project.
Petr Onderka [[en:User:Svick]]
On Mon, Mar 25, 2013 at 12:22 PM, Ariel T. Glenn ariel@wikimedia.org wrote:
So I was thinking about things I can't undertake, and one of those things is the 'dumps 2.0' which has been rolling around in the back of my mind. The TL;DR version is: sparse compressed archive format that allows folks to add/subtract changes to it random-access (including during generation).
See here:
https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#XML_dum...
What do folks think? Workable? Nuts? Low priority? Interested?
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l