This would triple the disk space requirements for the data dumps (quadruple after the next major upgrade, quintuple the time after that...)
Surely it should only double the disk space requirements? XML format dumps I would say are the same size, or possibly even slightly larger, than SQL dumps. After all, the main content is the article text, and that's the same in both (apart from some extra slashes in SQL), but you lose the overhead of the XML gumpf. I'd be surprised if wasn't a wash, or close enough to as makes no real-world difference.
Besides, most people I think probably don't want every revision ever. Nor do they probably want talk pages. In other words, one extra file, namely the SQL version of pages_public.xml.gz, whose size is going to almost the same. For EN, the largest of all, that's only ~ 900 megs. For 900 meg it stops people whining.
you can transform to whatever local format you need. (And we provide software for you to do that if you like.)
What most people need is get it into a database for further work, and the fact there's software for this at all shows there's demand for it.
And what's the point of every user who wants an SQL dump downloading the XML version, downloading mwdumper, downloading mono, setting up mono, running mwdumper, and creating the dump? Wouldn't it make more sense to run the conversion software as part of a general fortnightly database dump cron job that did all the XML stuff, then took the XML file, converted it to SQL, and compressed it? That way the problem is solved once, in one place, forever, for all users who want SQL format.
and maybe a couple people might use some of them every once in
Au contraire - most people who want dumps will use them all the time!
Tell you what: If you don't believe me, try making one, uploading it, and then _next_ dump add a README that says "SQL dumps have been discontinued due to a lack of interest and demand from users. If you disagree, please address your comments to Brion on the wikitech-l mailing list (email: wikitech-l@wikimedia.org)". And then see what happens. :-)
Also, can we please have back the "is_redirect" field in the XML (and
Hmm, can probably do that yeah.
Sounds great, thank you!
All the best, Nick.
Nick Jenkins wrote:
This would triple the disk space requirements for the data dumps (quadruple after the next major upgrade, quintuple the time after that...)
Surely it should only double the disk space requirements? XML format dumps I would say are the same size, or possibly even slightly larger, than SQL dumps.
Don't forget that 1.4 and 1.5 have completely different page database layouts; 1 + 2 = 3.
Besides, most people I think probably don't want every revision ever. Nor do they probably want talk pages. In other words, one extra file, namely the SQL version of pages_public.xml.gz, whose size is going to almost the same. For EN, the largest of all, that's only ~ 900 megs. For 900 meg it stops people whining.
If you can narrow down the request a bit that makes it more likely we'll slip something into the backup script. :)
Is that actually what people want? In which version? What about everyone who wants something slightly different?
And what's the point of every user who wants an SQL dump downloading the XML version, downloading mwdumper, downloading mono, setting up mono, running mwdumper, and creating the dump?
Why do that when MediaWiki comes with an import tool built-in? ;)
Which version of MySQL do you want that dump for? (Or did you want it for PostgreSQL? Or Oracle?)
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
If you can narrow down the request a bit that makes it more likely we'll slip something into the backup script. :)
Anyway, there's a dump running currently, and it does not yet include such changes. The next one can, though, if we can agree on what to add. (Also the SQL dump output needs to actually be tested before we dedicate a few gigs to it.)
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org