On Mon, 2007-10-29 at 12:47 +0200, Osnat Etgar wrote:
I don't want all the history. I just want the current articles, so I am downloading pages-meta-current.xml.bz2 and pages-articles.xml.bz2
You don't want the history, but you want all of the discussion and user pages? Are you sure?
I'm testing a download of the -meta-current.xml.bz2 right now to see if it does indeed work, but it will take 1/2 day to get it all. I'll post back and let you know what happens.
Where else can I get the pages-meta-current? The previous dump? When I look for the previous one, I can only find a status.html file. Maybe I don't really need the pages-meta-current if I only want the current articles?
The server claims to have the right amount of bytes, so let's see what happens when my download completes:
Server: Wikimedia dump service 20050523 (lighttpd) Content-Length: 5780471837