New subject: Importing Wikipedia XML Dumps into MediaWiki

8 Mar 2009


      --- El dom, 8/3/09, O. O. olson_ot@yahoo.com escribió:
...
I thought that the
pages-articles.xml.bz2 (i.e. the XML Dump) contains 
the templates – but I did not find a way to do install it
separately.
No, it only contains a dump of the current version of each article (involving the page, revision and text tables in the DB).
...
Another thing I noticed (with the Portuguese Wiki which is
a much 
smaller dump than the English Wiki) is that the number of
pages imported 
by importDump.php and MWDumper differ i.e. importDump.php
had much more 
pages than MWDumper. That is way I would have preferred to
do this using 
  importDump.php.
On download.wikimedia.org/your_lang_here you can check how many pages were supposed to be included in each dump.
You also have other parsers you may want to check (in my experience, my parser was slightly faster than mwdumper):
http://meta.wikimedia.org/wiki/WikiXRay_Python_parser
...
Also in a previous post, you mentioned about taking care
about the 
“secondary link tables”. How do I do that? Does
“secondary links” refer 
to language links, external links, template links, image
links, category 
links, page links or something else?
On the same page for downloads you have a list of additional dumps in SQL format (then compressed with gzip). I guess you may also want to import them (but of course, you don't need a parser for them, they can be directly loaded in the DB).
Best,
F.
...
Thanks for your patience
O.O.

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Importing Wikipedia XML Dumps into MediaWiki