On 15/04/13 15:03, Imran Latif wrote:
And download all sql and XML files and populates my table using some utility, then the whole Wikipedia data is configured ? I mean to say that this dump provide me whole data of Wikipedia, including content, revision history etc. Or i need something more.
Yes. Installation of MediaWiki and its extensions to match what is installed at wikipedia goes separatedly, of course. What you won't get is: - Deleted content - User information (user list, preferences, watchlists, passwords, ips...)
Thanks All, but my question is till there :) let me rephrase , suppose we have the following dumps list http://dumps.wikimedia.org/enwiki/20130403/
at there there are so much files with name enwiki-20130403-pages-meta-current27.xml-p029625001p039009132.bz2 http://dumps.wikimedia.org/enwiki/20130403/enwiki-20130403-pages-meta-current27.xml-p029625001p039009132.bz2 of same data as well different SQL files for different tables. Q1: SQL files clearly tells the mapping to database table but the ".bz2" not telling about database mapping.
.bz2 means it's compressed with bzip2. Use the previous extension.
Q2: There are multiple "".bz2"" file with same name , we should take the largest size file ?
No. There's no multiple .bz2 with same name. You have for instance enwiki-20130403-pages-meta-history27.xml-p038204154p039009132.bz2 and enwiki-20130403-pages-meta-history27.xml-p038204154p039009132.7z, which is the same file compressed in two different ways (bzip2 and 7-zip), but that is different from eg. enwiki-20130403-pages-meta-history27.xml-p038204154p039009132.bz2 as it's a different piece of the content history. In summary, you need all of them.
On the other hand, there's a bit of redundancy about files. pages-meta-history contains everything from pages-meta-current, which itself contains everything at pages-articles. And the stub-whatever contain less than the whatever versions. You are unlikley to need the abstract.xml, etc.
In fact, you probably don't even need the meta files, pages-articles is probably enough for you.