On 24/11/11 11:00, Ariel T. Glenn wrote:
So this has been running all of one day now, and I expect it to break in
wild and crazy ways over the next period while we get the bugs out.
But, throwing caution to the winds...
I'm generating dumps each day for each non-closed non-private project,
of revisions added since the previous day. It uses the standard xml
format, writing out stubs and history files.
This is a sort of poor person's incremental dump. What do I mean by
that? Well... It doesn't contain a list of deletions, page moves,
undeletes. It just dumps the metadata and text for every revision
between X1 (last revision dumped the day before) and X2 (last revision
in db as of the time it's dumped). The reason for that? Dumping a
range of revisions is relatively easy. Accounting for page deletions,
moves etc. since the previous dump is hard, so that is an exercise left
to the reader :-P
Even with these limitations I'm hoping the data will be useful to folks.
Looks good. I think this can be very useful for the toolserver, were we
have all the db metadata except the text. I think I'll make up
something. (downloading now)
I may well be patching things tomorrow at this time
for jobs that failed
to run, so feel free to point out issues, but also don't be surprised by
The md5sum file lacks the filenames. :)
The .dblist files in the branch are dummy.
What are checkforbz2footer and writeuptopageid tools? (Which seem