This would triple the disk space requirements for the
data dumps
(quadruple after the next major upgrade, quintuple the time after
that...)
Surely it should only double the disk space requirements? XML format
dumps I would say are the same size, or possibly even slightly larger,
than SQL dumps. After all, the main content is the article text, and
that's the same in both (apart from some extra slashes in SQL), but
you lose the overhead of the XML gumpf. I'd be surprised if wasn't a
wash, or close enough to as makes no real-world difference.
Besides, most people I think probably don't want every revision ever.
Nor do they probably want talk pages. In other words, one extra file,
namely the SQL version of pages_public.xml.gz, whose size is going to
almost the same. For EN, the largest of all, that's only ~ 900 megs.
For 900 meg it stops people whining.
you can transform to whatever local format you
need. (And we provide software for you to do that if you like.)
What most people need is get it into a database for further work, and
the fact there's software for this at all shows there's demand for it.
And what's the point of every user who wants an SQL dump downloading
the XML version, downloading mwdumper, downloading mono, setting up
mono, running mwdumper, and creating the dump? Wouldn't it make more
sense to run the conversion software as part of a general fortnightly
database dump cron job that did all the XML stuff, then took the XML
file, converted it to SQL, and compressed it? That way the problem is
solved once, in one place, forever, for all users who want SQL format.
and maybe a couple people might use some of them every
once in
Au contraire - most people who want dumps will use them all the time!
Tell you what: If you don't believe me, try making one, uploading it,
and then _next_ dump add a README that says "SQL dumps have been
discontinued due to a lack of interest and demand from users. If you
disagree, please address your comments to Brion on the wikitech-l
mailing list (email: wikitech-l(a)wikimedia.org)"quot;. And then see what
happens. :-)
Also, can we
please have back the "is_redirect" field in the XML (and
Hmm, can
probably do that yeah.
Sounds great, thank you!
All the best,
Nick.