I think this would work well only for the use case where you're always looking though the whole history of all pages.
How would you find the current revision of a specific page? Or all revisions of a page? What if you don't want the whole history, just current versions of all pages? And don't forget about deletions (and undeletions).
You could somewhat solve some of these problems (e.g. by adding indexes), but I don't think you can solve all of them.
Petr Onderka
On Mon, Jul 1, 2013 at 9:13 PM, Dmitriy Sintsov questpc@rambler.ru wrote:
On 01.07.2013 22:56, Tyler Romeo wrote:
Petr is right on par with this one. The purpose of this version 2 for dumps is to allow protocol-specific incremental updating of the dump, which would be significantly more difficult in non-binary format.
Why the dumps cannot be just split into daily or weekly XML files
(optionally compressed ones). Such way the seeking will be performed by simply opening YYYY.MM.DD.xml file. It is so much simplier than going for binary git-like formats. Which would take a bit less space but are more prone to bugs and impossible to extract and analyze/edit via text/XML processing utils. Dmitriy
______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l