This may or may not be appropriate to this list -- this is where I found most of the discussions on the matter, so posting here.
From reading the past couple of weeks of messages, I surmise that
there isn't a way to get a current data dump (for enwiki), while the server is fubar.
I have the 20100312 dump, which seems to be more recent than others available from archive.org, Amazon EC2, and others. However, even this dump is significantly behind the current article revisions from en.wikipedia.org.
I pulled 333 semi-random articles from the live API -- of those, 329 of them have significant content changes since 20100312 dump.
Thus, my question:
What is the current preference/recommendation regarding pulling significant quantities of articles (250k/ish) from the live API, until the dumps are available again?
Sidenote 1: I'm in the process of uploading the 20100312 dump to a public web location, in case it is helpful to others.
Sidenote 2: Is there any discussion regarding insuring current dumps are mirrored in the future, say with archive.org ?
-------------------------------------- James Linden kodekrash@gmail.com --------------------------------------
2010/12/10 James Linden kodekrash@gmail.com
This may or may not be appropriate to this list -- this is where I found most of the discussions on the matter, so posting here.
From reading the past couple of weeks of messages, I surmise that there isn't a way to get a current data dump (for enwiki), while the server is fubar.
I have the 20100312 dump, which seems to be more recent than others available from archive.org, Amazon EC2, and others. However, even this dump is significantly behind the current article revisions from en.wikipedia.org.
I pulled 333 semi-random articles from the live API -- of those, 329 of them have significant content changes since 20100312 dump.
Thus, my question:
What is the current preference/recommendation regarding pulling significant quantities of articles (250k/ish) from the live API, until the dumps are available again?
Sidenote 1: I'm in the process of uploading the 20100312 dump to a public web location, in case it is helpful to others.
Thanks
Sidenote 2: Is there any discussion regarding insuring current dumps are mirrored in the future, say with archive.org ?
http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
James Linden kodekrash@gmail.com
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org