If you can narrow down the request a bit that makes it
more likely we'll
slip something into the backup script. :)
[...snip...]
Is that actually what people want?
Well, given the disk space limitation, straight off the bat we know
that people can't have everything they want. In particular, the
all-revisions version in SQL would be way too large I suspect (extra
= 40 gig).
That only leaves SQL versions of pages_current.xml.gz, and
pages_public.xml.gz. If there is space for both then that would be
ideal, but I understand that may be asking too much.
Personally, although I don't want the talk pages, others might, so if
there's only space for one of these two, then I think a SQL version of
pages_current.xml.gz is the way to go (i.e. current revisions of all
pages), because it would be applicable for the widest possible
audience.
Why do that when MediaWiki comes with an import tool
built-in? ;)
Because I'm not importing it into MediaWiki. I'm importing it into a
database, to then run non-MediaWiki software analysing the data - in
particular: looking for bad wiki syntax (
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Wiki_Syntax ),
suggesting new useful redirects and disambigs (
http://en.wikipedia.org/wiki/User:Nickj/Redirects ), and searching for
good potential wiki-links (
http://en.wikipedia.org/wiki/User:LinkBot
- [although for this one I haven't worked out how to get those
suggestions out to page authors in a really satisfactory way]). I want
to improve the Wikipedia, not mirror it.
In which version?
Ideally something that works with MySQL 3.23.49, but if that's too old
then something that works with MySQL 4.0.24 instead.
What about everyone
who wants something slightly different?
But the database dumps have never tried to be all things to all people.
Rather they've been snapshots of the various Wikipedias at
semi-regular intervals of time, which you can load into a database
(specifically MySQL, but if you can get it to work in another RDBMS,
then more power to you).
I'm not asking for something entirely new, rather I'm asking for an
equivalent replacement for what we already had.
(Also the SQL dump output needs to actually be tested
before we dedicate
a few gigs to it.)
I'm happy to be a guinea pig. Just give tell me where to get it from,
and I'll leave it importing overnight and report back any
errors/problems and whether it worked or not.
All the best,
Nick.