It already works that way on the backend, pretty much.
We can't make the old increments available forever beacuse of things
we ar obligated to discontinue distributing, so incrementals to the
users would not be so useful.
On 10/19/07, Lars Aronsson <lars(a)aronsson.se> wrote:
In the recent weeks I have been following the database dumps of
some languages of Wikipedia. I download and analyze a dump, do
various improvements, and then wait for the next dump to become
available for a new analysis. There are 2 or 3 weeks between each
dump. There appear to be two parallel dump processes continuously
running,
http://download.wikimedia.org/backup-index.html
What takes most time in each dump is the large file with complete
version history, pages-meta-history.xml.bz2 and
pages-meta-history.xml.7z
This is the largest file in compressed format, but since it
contains every version of every article it is also very highly
compressed, and expands to become enormous. I guess that very few
people find use for this file. In addition, only a very small
portion of its contents is changed between two dumps. So we spend
a lot of time and effort (and delay of other things) in order to
create very little for very few users.
I think that this dump should be made incremental. Every week,
only that week's additional versions need to be dumped. This can
then be added to the dump of the previous week, the week before
that, etc., which hasn't really changed. This way, the dump
process could be made much faster, and the two parallel dump
processes would complete the cycle in less time, so new dumps of
the same project could be made available more frequently.
Or is it already done this way, behind the scenes, only that it
isn't visible from the outside?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l