On Wed, Jan 19, 2011 at 4:15 PM, Anthony wikimail@inbox.org wrote:
No, the question is why the relevant code is totally unrelated.
Well, you might ask why we don't just (selectively) dump the page, revision, and text tables instead of doing XML dumps -- it seems like it would be much simpler -- but I have no idea. Perhaps it's to ease processing with non-MediaWiki tools, but I'm not sure why that's a design goal compared to the simplicity of SQL dumps. Surely it wouldn't be too hard to write a maintenance/ tool that just fetches the revision text for a particular article at a particular point, using only those three tables without any MediaWiki framework so it can be used standalone. Not to mention, the text table is immutable, so creating and publishing text table dumps incrementally should be trivial.
But I'm not going to criticize anyone from the peanut gallery here. I don't actually know much about the dumps work. Happy-melon is correct to point out that it might not be trivial to snip private info (even oversighted revisions) from the text table, depending on how it's constructed. There might be other concerns too.
And there are lots of lower-priority things that are being done. And lots of dollars sitting on the sidelines doing nothing.
That's a discussion for foundation-l, not wikitech-l.
On Thu, Jan 20, 2011 at 4:04 AM, Anthony wikimail@inbox.org wrote:
It wouldn't be trivial, but it wouldn't be particularly hard either. Most of the work is already being done. It's just being done inefficiently.
I'm glad to see you know what you're talking about here. Presumably you've examined the relevant code closely and determined exactly how you'd implement the necessary changes in order to evaluate the difficulty. Needless to say, patches are welcome.