This may or may not be appropriate to this list -- this is where I
found most of the discussions on the matter, so posting here.
From reading the past couple of weeks of messages, I
surmise that
there isn't a way to get a current data dump (for enwiki), while
the
server is fubar.
I have the 20100312 dump, which seems to be more recent than others
available from
archive.org, Amazon EC2, and others. However, even this
dump is significantly behind the current article revisions from
en.wikipedia.org.
I pulled 333 semi-random articles from the live API -- of those, 329
of them have significant content changes since 20100312 dump.
Thus, my question:
What is the current preference/recommendation regarding pulling
significant quantities of articles (250k/ish) from the live API, until
the dumps are available again?
Sidenote 1: I'm in the process of uploading the 20100312 dump to a
public web location, in case it is helpful to others.
Sidenote 2: Is there any discussion regarding insuring current dumps
are mirrored in the future, say with
archive.org ?
--------------------------------------
James Linden
kodekrash(a)gmail.com
--------------------------------------