Bleh. Someone pulling increments couldn't build a point in time snapshot, they would need to always pull the full. And we want people using point in time versions of the site not mangled mixes.
Also, I expect that once 7zed the incremets will not be too much smaller than the full, especially if partitoned by revid.
On 10/19/07, Platonides Platonides@gmail.com wrote:
Lars Aronsson wrote:
Or is it already done this way, behind the scenes, only that it isn't visible from the outside?
No.
AFAIK it is done as follows:
Precondition: The last full dump (if not present, treat as empty). 1- Take an snapshot of the wiki status (page table?) and create stub-meta-history 2- Read stub-meta-history and fill the page content with the last dump page contents. If a page content is not on previous dump, get it from the external storage in a blocking way.
Result: A bzipped2 full history dump. The bzip2 dump is then uncompressed and 7zipped.
If there's an error on a call to the external storage, the process can't be resumed and the dump fails.
I had been recently thinking on it and think it could be done as this: Precondition: The last full dump (if not present, treat as empty) and its greatest revid. 1a- Take an snapshot of the wiki status (page table?) and create stub-meta-history 1b- While reading the revisions, if revid is greater than the lastdumpgreaterrevid (LDGR), add it to N files (a file per M revisions). 2-Run N processes grabbing these page contents. Store them on a new-format dump (the external storage equivalent), one per revid list file. If one fails, just rerun it.
3- Read stub-meta-history and fill the page content with the last dump page contents. If a page text is not on previous dump, grab from the list file if revid > LDGR else, get it from the external storage saving it on a different file.
Revisions not present on last dump nor incremental dumps will occur on restored pages, and still be able to block it, but being much less, it's much more unlikely that they fail.
4-Save the new dump LDGR with the new bzipped dump.
Making available the M+1 incremental dumps, using the smaller meta-stubs-history, last dump can be recreated using the previous one (=less download size).
Wikimedia would still provide the full dumps, but you would only be need ed the first time.
Comments?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l