Hi,
I would like to access the Wiki dumps for Wikipedia
and Wikitravel.
Essentially, I am looking to get dumps for some US cities from both these
sources for product research. However, it is not very clear at
http://dumps.wikimedia.org/backup-index.html, which are the relevant files
to pick up.
Which Wikipedia do you want? What data do you want?
If you want the dumps of the English Wikipedia, find “enwiki” in the
index page, which will link to the most recent dump files (currently
http://dumps.wikimedia.org/enwiki/20120211/). If you want current text
of articles, you want the pages-articles dump (currently
http://dumps.wikimedia.org/enwiki/20120211/enwiki-20120211-pages-articles.x…,
7.6 GB).
Wikitravel is not run by Wikimedia, so you won't find their dump at
the same page. You can find information about their dump at
http://wikitravel.org/en/Wikitravel:Database_dump.
The other question I had was at what level of
atomicity data is available in
dumps. The Web Service allows us to retrieve a Wiki entry but it's not
easily parsed out into different sections or in more granular form. I was
wondering if the dumps solve this problem.
No, the dumps contain the article text is the same form as what you
can edit as a user or what you can get though the API. So, if you want
to get any useful information out of it, you have to parse it by
yourself.
Petr Onderka
[[en:User:Svick]]