[Foundation-l] Archiving wikis

emijrp emijrp at gmail.com
Thu Apr 14 10:52:40 UTC 2011


Hi all;

We know that websites are fragile and that broken links are common.
Wikimedia (and other communities like Wikia) publish dumps of their wikis,
but, that is not common. Most wiki communities don't publish any backups, so
their users can't do anything when a disaster occurs (data loss, attack), if
they want to fork, etc. Of course they can use Special:Export, but that
requires a huge hand-made effort, and the images are not downloaded.

I'm working in WikiTeam,[1] a group inside Archive Team, where we want to
archive wikis, from Wikipedia to tiniest ones. As I said, Wikipedia
publishes backups, so not problem here. But I have developed a script that
downloads all the pages of a wiki (using Special:Export), it merges them
into an unique XML file (as pages-history dumps) and downloads all the
images (if you enable that option). That is great if you want to have a
backup of your favorite wiki, or to clone a defunct wiki (abandoned by its
administrator), or you want to move your wiki from a free wikifarm to a
personal paid hosting, etc.

Also, of course, you can use this script to retrieve the full histories of a
wiki, and research, just as a Wikipedia dump.

We are running this script in several wikis and uploading the complete
histories to the download section[2], building a little wiki library. Don't
be fooled by their sizes. They are 7zip files which usually expand to many
MB.

I hope you enjoy this script, make backups of your favorite wikis and
research them.

Regards,
emijrp

[1] http://code.google.com/p/wikiteam/
[2] http://code.google.com/p/wikiteam/downloads/list?can=1


More information about the foundation-l mailing list