On 15/04/13 15:03, Imran Latif wrote:
And download all sql and XML files and populates my
table using some utility, then the whole Wikipedia data is configured ?
I mean to say that this dump provide me whole data of Wikipedia, including content,
revision history etc. Or i need something more.
Yes. Installation of MediaWiki and its extensions to match what is
installed at wikipedia goes separatedly, of course.
What you won't get is:
- Deleted content
- User information (user list, preferences, watchlists, passwords, ips...)
Thanks All, but my question is till there :) let me
rephrase , suppose
we have the following dumps list
http://dumps.wikimedia.org/enwiki/20130403/
at there there are so much files with name
enwiki-20130403-pages-meta-current27.xml-p029625001p039009132.bz2
<http://dumps.wikimedia.org/enwiki/20130403/enwiki-20130403-pages-meta-current27.xml-p029625001p039009132.bz2>
of same data as well different SQL files for different tables.
Q1: SQL files clearly tells the mapping to database table but the ".bz2"
not telling about database mapping.
.bz2 means it's compressed with bzip2. Use the previous extension.
Q2: There are multiple "".bz2""
file with same name , we should take the
largest size file ?
No. There's no multiple .bz2 with same name.
You have for instance
enwiki-20130403-pages-meta-history27.xml-p038204154p039009132.bz2 and
enwiki-20130403-pages-meta-history27.xml-p038204154p039009132.7z, which
is the same file compressed in two different ways (bzip2 and 7-zip), but
that is different from eg.
enwiki-20130403-pages-meta-history27.xml-p038204154p039009132.bz2 as
it's a different piece of the content history.
In summary, you need all of them.
On the other hand, there's a bit of redundancy about files.
pages-meta-history contains everything from pages-meta-current, which
itself contains everything at pages-articles. And the stub-whatever
contain less than the whatever versions. You are unlikley to need the
abstract.xml, etc.
In fact, you probably don't even need the meta files, pages-articles is
probably enough for you.