Thanks for your replay. I am a newbie in this field.
1. I only want the articles. No history, no user information, no
I do want articles, lists and disambiguation.
Maybe I understood it wrong and I don't need the page-meta-current? Only
2. I have some data from page-meta-current in my database
2.1. I got an error in the middle, after over a million pages where
Exception in thread "main" java.io.IOException: An invalid XML
character (Unicode: 0x2) was found in the element content of the
at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
at org.mediawiki.dumper.Dumper.main(Unknown Source)
While we are here, I'd like to ask some more questions, if you don't
1. How do I read the data from MySQL? I don't understand how entries are
connected to one another and how I should read it.
2. Do I have to clean up MySQL tables every time I want to insert
another dump? Either an update file or totally different one?
3. Is there a way to get only the delta file instead of the whole dump
4. How do I add .sql.gz files to MySQL?
Thanks a lot for your answers
P Please consider the environment before printing this e-mail
[mailto:firstname.lastname@example.org] On Behalf Of David A.
Sent: Monday, October 29, 2007 3:03 PM
To: Wikimedia developers
Subject: Re: [Wikitech-l] Dump is small
On Mon, 2007-10-29 at 12:47 +0200, Osnat Etgar wrote:
I don't want all the history. I just want the
current articles, so I
am downloading pages-meta-current.xml.bz2 and pages-articles.xml.bz2
You don't want the history, but you want all of the discussion and user
pages? Are you sure?
I'm testing a download of the -meta-current.xml.bz2 right now to see if
it does indeed work, but it will take 1/2 day to get it all. I'll post
back and let you know what happens.
Where else can I get the pages-meta-current? The
previous dump? When I
look for the previous one, I can only find a status.html file.
Maybe I don't really need the pages-meta-current if I only want the
The server claims to have the right amount of bytes, so let's see what
happens when my download completes:
Server: Wikimedia dump service 20050523 (lighttpd)
David A. Desrosiers
Wikitech-l mailing list