Thank you as always for this work. It is enormously helpful, for casual analysis as well as deep research. SJ On Feb 6, 2015 12:37 AM, "Federico Leva (Nemo)" nemowiki@gmail.com wrote:
I just published https://archive.org/details/wikia_dump_20141219 :
Snapshot of all the known Wikia dumps. Where a Wikia public dump was missing, we produced one ourselves. 9 broken wikis, as well as lyricswikia and some wikis for which dumpgenerator.py failed, are still missing; some Wikia XML files are incorrectly terminated and probably incomplete.
In detail, this item contains dumps for 268 902 wikis in total, of which 21 636 full dumps produced by Wikia, 247 266 full XML dumps produced by us and 5610 image dumps produced by Wikia. Up to 60 752 wikis are missing. Nonetheless, this is the most complete Wikia dump ever produced.
We appreciate help to:
- verify the quality of the data (for Wikia dumps I only checked valid
gzipping; for WikiTeam dumps only XML well-formedness https://github.com/WikiTeam/wikiteam/issues/214 );
- figure out what's going on for those 60k missing wikis
https://github.com/WikiTeam/wikiteam/commit/a1921f0919c7b44cfef967f5d07ea4 953b0a736d ;
- improve dumpgenerator.py management of huge XML files
https://github.com/WikiTeam/wikiteam/issues/8 ;
- fix anything else! https://github.com/WikiTeam/wikiteam/issues
For all updates on Wikia dumps, please watchlist/subscribe to the feed of: http://archiveteam.org/index.php?title=Wikia (notable update: future Wikia dumps will be 7z).
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l