Thank you as always for this work.
It is enormously helpful, for casual analysis as well as deep research.  SJ

On Feb 6, 2015 12:37 AM, "Federico Leva (Nemo)" <nemowiki@gmail.com> wrote:
I just published https://archive.org/details/wikia_dump_20141219 :

----

Snapshot of all the known Wikia dumps. Where a Wikia public dump was missing, we produced one ourselves. 9 broken wikis, as well as lyricswikia and some wikis for which dumpgenerator.py failed, are still missing; some Wikia XML files are incorrectly terminated and probably incomplete.

In detail, this item contains dumps for 268 902 wikis in total, of which 21 636 full dumps produced by Wikia, 247 266 full XML dumps produced by us and 5610 image dumps produced by Wikia. Up to 60 752 wikis are missing. Nonetheless, this is the most complete Wikia dump ever produced.

----

We appreciate help to:
* verify the quality of the data (for Wikia dumps I only checked valid gzipping; for WikiTeam dumps only XML well-formedness https://github.com/WikiTeam/wikiteam/issues/214 );
* figure out what's going on for those 60k missing wikis https://github.com/WikiTeam/wikiteam/commit/a1921f0919c7b44cfef967f5d07ea4953b0a736d ;
* improve dumpgenerator.py management of huge XML files https://github.com/WikiTeam/wikiteam/issues/8 ;
* fix anything else! https://github.com/WikiTeam/wikiteam/issues

For all updates on Wikia dumps, please watchlist/subscribe to the feed of: http://archiveteam.org/index.php?title=Wikia (notable update: future Wikia dumps will be 7z).

Nemo

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l