--- El dom, 8/3/09, O. O. <olson_ot(a)yahoo.com> escribió:
I thought that the
pages-articles.xml.bz2 (i.e. the XML Dump) contains
the templates – but I did not find a way to do install it
separately.
No, it only contains a dump of the current version of each article (involving the page,
revision and text tables in the DB).
Another thing I noticed (with the Portuguese Wiki which is
a much
smaller dump than the English Wiki) is that the number of
pages imported
by importDump.php and MWDumper differ i.e. importDump.php
had much more
pages than MWDumper. That is way I would have preferred to
do this using
importDump.php.
On
download.wikimedia.org/your_lang_here you can check how many pages were supposed to be
included in each dump.
You also have other parsers you may want to check (in my experience, my parser was
slightly faster than mwdumper):
http://meta.wikimedia.org/wiki/WikiXRay_Python_parser
Also in a previous post, you mentioned about taking care
about the
“secondary link tables”. How do I do that? Does
“secondary links” refer
to language links, external links, template links, image
links, category
links, page links or something else?
On the same page for downloads you have a list of additional dumps in SQL format (then
compressed with gzip). I guess you may also want to import them (but of course, you
don't need a parser for them, they can be directly loaded in the DB).
Best,
F.
Thanks for your patience
O.O.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l