On 9/30/07, Thomas Dalton <thomas.dalton(a)gmail.com> wrote:
Can Brion or
Tim give us more detail on why the dumps are failing?
That's the key question. Without knowing exactly what the problem is,
it's very hard to come up with solutions.
Yes, surely to fix the problem of breaking dump, it should be known
the details and, if possible, the source.
But what was proposed by Luca may be interesting for other reasons too.
The idea from the idea of have different parts merged together has
risen me the question if it is possible in that way not do do the full
dump every time, but to use previous dumps (or more reasonable part of
them) to create the new one.
Now unfortunately I do not know much on the dump process, so the
following are only sparse consideration.
The easier case is that of pages that are not modified. Can in this
case the old dump be reused for that? And in this case the best
advantage would be if there is a some set of pages, each of them
dumped to a separate file and I know (how?) that all the pages of a
particular set were unmodified: in that case for the whole set the old
file could be reused.
But even if the page were edited can the old versions be taken from a
old dump (or from a partial file for a previous dump)?
And the reasons that has risen me curiosity on that is not just to
improve the and speed up the dumping process on the wiki server, but
also to find a way to reduce the length.
While until now what was considered was to create partial dump and
then merging them to create a full dump, one can try to find a way so
that the user who download can download instead of the full dump just
the modified set, and, moving on this path, if it is possible to
download just the diff.
And I am not speaking of just the full dump of the "All pages with
complete edit history". Also other dump can be rather large to
download. For instance the en wikipedia dump of current version of
"Articles, templates, image descriptions, and primary meta-pages" is
now at 2.8 GB. Not at all a small file especially for someone who has
limited internet access.
Of course I fully understand that all of this is not so easy to
implement (for instance dealing with delete revisions can be not
easy), but before discussing about the difficulties, I would like to
know if you consider this objectives interesting.
Regards
AnyFile