To catch everyone who would stop reading right after the updates, let me
put the question first.
Who uses the abstract dumps? Anyone here? Anyone you know? Please
forwar this to other lists where there might be users of these dumps.
We're trying to figure out if we need to keep generating them or not.
Now the updates.
We got more space for the dumps server, which means we don't need to
reduce the number of dumps kept for some time. You'll also see other
items showing up there soon-ish, not part of the xml dumps.
We've long had a request to run stubs early on in the dumps process so
that stats can be produced right away, and we finally have that going.
As of this month all dump runs will be done in stages, stubs first, then
tables, then page logs, and then the rest. I'm open to negotiation
about the order of jobs after the stubs, if folks have other
preferences.
We've worked around the eternal php memory leak(s), which lets us now
run 7 workers for small wikis at once. This means we'll get through
those dumps quicker.
Nemo_bis did some testing with an option to 7zip which means much faster
compression with a relatively small increase in size. I've adopted that
everywhere and we should see the difference, primarily in the big wikis,
this month and on.
New code brings new bugs. This month's stub and page log runs for
smaller wikis may have a duplicate entry at the end, the last item
appearing twice. This has been fixed for all future runs. It shouldn't
have a real impact on stats but folks importing from these dumps should
be aware.
Happy June,
Ariel