This sounds really interesting to me (as in, I would seriously
consider applying for this project).
Few questions:
Do you think most of this should be written in PHP (since
dumpBackup.php is currently in PHP)?
Or could it be written in another language (most likely Python)?
The description talks about "smart choice for compression of multiple
items together", how would that work with deleting items?
Especially with history dumps, I think it would make a lot of sense to
use some kind of delta compression (like git's pack files do).
But this would cause problems with deleting revisions that other
revisions use as a base for their delta (though certainly not
unsolvable problems).
I guess figuring this out would be a part of the project.
Petr Onderka
[[en:User:Svick]]
On Mon, Mar 25, 2013 at 12:22 PM, Ariel T. Glenn <ariel(a)wikimedia.org> wrote:
So I was thinking about things I can't undertake,
and one of those
things is the 'dumps 2.0' which has been rolling around in the back of
my mind. The TL;DR version is: sparse compressed archive format that
allows folks to add/subtract changes to it random-access (including
during generation).
See here:
https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#XML_du…
What do folks think? Workable? Nuts? Low priority? Interested?
Ariel
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l