for future reference
http://business.zimzaz.com/wordpress/2011/10/how-to-clone-wikipedia-mirror-a...
----------------------------------------------------- Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for monthly updates
On 21/10/11 17:42, Fred Zimmerman wrote:
for future reference
http://business.zimzaz.com/wordpress/2011/10/how-to-clone-wikipedia-mirror-a...
Thanks for sharing this.
Some notes: * Useless cats
You seem to prefer cat enwiki-<date>.xml | mwimport > enwiki.sql to mwimport < enwiki-<date>.xml > enwiki.sql
While both commands do the same, and the overhead of cat can usually be ignored, given that it's a gigantic file, I would still prefer to remove that extra process.
* Temporary enwiki.sql file You only seem to use enwiki.sql once, so I think you could have done mwimport < enwiki.sql | mysql -f ... and save another 33 Gb (assuming the mysql command won't fail :)
* Redirecting instead of piping cat enwiki.sql > mysql -f ... should have been cat enwiki.sql | mysql -f ...
* SQL import instead of rebuildall.php. The step 8 could have been skipped if you had imported instead the *links dumps (although they are not completely in-sync with the xml).
* Blog entry to your inbox Not related to the provided entry, but your entry http://business.zimzaz.com/wordpress/2011/10/wikitech-l-5-zimzaz-wfzgmail-co... contains just a link to your gmail inbox: https://mail.google.com/mail/?shva=1&zx=wear6iy60ej8#label/Wikitech-l (so it's useless for anyone else)
xmldatadumps-l@lists.wikimedia.org