[Xmldatadumps-l] "add/changes" dumps... highly experimental, you have been warned

Platonides platonides at gmail.com
Thu Nov 24 22:47:02 UTC 2011


On 24/11/11 11:00, Ariel T. Glenn wrote:
> Hello folks,
> 
> So this has been running all of one day now, and I expect it to break in
> wild and crazy ways over the next period while we get the bugs out.
> But, throwing caution to the winds...
> 
> I'm generating dumps each day for each non-closed non-private project,
> of revisions added since the previous day.  It uses the standard xml
> format, writing out stubs and history files.
> 
> This is a sort of poor person's incremental dump.  What do I mean by
> that? Well...  It doesn't contain a list of deletions, page moves,
> undeletes.  It just dumps the metadata and text for every revision
> between X1 (last revision dumped the day before) and X2 (last revision
> in db as of the time it's dumped).  The reason for that?  Dumping a
> range of revisions is relatively easy.  Accounting for page deletions,
> moves etc. since the previous dump is hard, so that is an exercise left
> to the reader :-P
> 
> Even with these limitations I'm hoping the data will be useful to folks.

Looks good. I think this can be very useful for the toolserver, were we
have all the db metadata except the text. I think I'll make up
something. (downloading now)

> I may well be patching things tomorrow at this time for jobs that failed
> to run, so feel free to point out issues, but also don't be surprised by
> frequent outages.

The md5sum file lacks the filenames. :)

> The code is in my branch in svn, see 
> http://svn.wikimedia.org/viewvc/mediawiki/branches/ariel/xmldumps-backup/incrementals/

The .dblist files in the branch are dummy.
What are checkforbz2footer and writeuptopageid tools? (Which seem
unused, btw)




More information about the Xmldatadumps-l mailing list