Lars Aronsson wrote:
Or is it already done this way, behind the scenes,
only that it
isn't visible from the outside?
No.
AFAIK it is done as follows:
Precondition: The last full dump (if not present, treat as empty).
1- Take an snapshot of the wiki status (page table?) and create
stub-meta-history
2- Read stub-meta-history and fill the page content with the last dump
page contents. If a page content is not on previous dump, get it from
the external storage in a blocking way.
Result: A bzipped2 full history dump.
The bzip2 dump is then uncompressed and 7zipped.
If there's an error on a call to the external storage, the process can't
be resumed and the dump fails.
I had been recently thinking on it and think it could be done as this:
Precondition: The last full dump (if not present, treat as empty) and
its greatest revid.
1a- Take an snapshot of the wiki status (page table?) and create
stub-meta-history
1b- While reading the revisions, if revid is greater than the
lastdumpgreaterrevid (LDGR), add it to N files (a file per M revisions).
2-Run N processes grabbing these page contents. Store them on a
new-format dump (the external storage equivalent), one per revid list
file. If one fails, just rerun it.
3- Read stub-meta-history and fill the page content with the last dump
page contents. If a page text is not on previous dump, grab from the
list file if revid > LDGR else, get it from the external storage saving
it on a different file.
Revisions not present on last dump nor incremental dumps will occur on
restored pages, and still be able to block it, but being much less, it's
much more unlikely that they fail.
4-Save the new dump LDGR with the new bzipped dump.
Making available the M+1 incremental dumps, using the smaller
meta-stubs-history, last dump can be recreated using the previous one
(=less download size).
Wikimedia would still provide the full dumps, but you would only be need
ed the first time.
Comments?