On Wed, Jan 19, 2011 at 4:15 PM, Anthony <wikimail(a)inbox.org> wrote:
No, the question is why the relevant code is totally
unrelated.
Well, you might ask why we don't just (selectively) dump the page,
revision, and text tables instead of doing XML dumps -- it seems like
it would be much simpler -- but I have no idea. Perhaps it's to ease
processing with non-MediaWiki tools, but I'm not sure why that's a
design goal compared to the simplicity of SQL dumps. Surely it
wouldn't be too hard to write a maintenance/ tool that just fetches
the revision text for a particular article at a particular point,
using only those three tables without any MediaWiki framework so it
can be used standalone. Not to mention, the text table is immutable,
so creating and publishing text table dumps incrementally should be
trivial.
But I'm not going to criticize anyone from the peanut gallery here. I
don't actually know much about the dumps work. Happy-melon is correct
to point out that it might not be trivial to snip private info (even
oversighted revisions) from the text table, depending on how it's
constructed. There might be other concerns too.
And there are lots of lower-priority things that are
being done. And
lots of dollars sitting on the sidelines doing nothing.
That's a discussion for foundation-l, not wikitech-l.
On Thu, Jan 20, 2011 at 4:04 AM, Anthony <wikimail(a)inbox.org> wrote:
It wouldn't be trivial, but it wouldn't be
particularly hard either.
Most of the work is already being done. It's just being done
inefficiently.
I'm glad to see you know what you're talking about here. Presumably
you've examined the relevant code closely and determined exactly how
you'd implement the necessary changes in order to evaluate the
difficulty. Needless to say, patches are welcome.