Thank you as always for this work.
It is enormously helpful, for casual analysis as well as deep research. SJ
On Feb 6, 2015 12:37 AM, "Federico Leva (Nemo)" <nemowiki(a)gmail.com>
wrote:
I just published
https://archive.org/details/wikia_dump_20141219 :
----
Snapshot of all the known Wikia dumps. Where a Wikia public dump was
missing, we produced one ourselves. 9 broken wikis, as well as lyricswikia
and some wikis for which dumpgenerator.py failed, are still missing; some
Wikia XML files are incorrectly terminated and probably incomplete.
In detail, this item contains dumps for 268 902 wikis in total, of which
21 636 full dumps produced by Wikia, 247 266 full XML dumps produced by us
and 5610 image dumps produced by Wikia. Up to 60 752 wikis are missing.
Nonetheless, this is the most complete Wikia dump ever produced.
----
We appreciate help to:
* verify the quality of the data (for Wikia dumps I only checked valid
gzipping; for WikiTeam dumps only XML well-formedness
https://github.com/WikiTeam/wikiteam/issues/214 );
* figure out what's going on for those 60k missing wikis
https://github.com/WikiTeam/wikiteam/commit/a1921f0919c7b44cfef967f5d07ea4
953b0a736d ;
* improve dumpgenerator.py management of huge XML files
https://github.com/WikiTeam/wikiteam/issues/8 ;
* fix anything else!
https://github.com/WikiTeam/wikiteam/issues
For all updates on Wikia dumps, please watchlist/subscribe to the feed of:
http://archiveteam.org/index.php?title=Wikia (notable update: future
Wikia dumps will be 7z).
Nemo
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l