Benj. Mako Hill, 29/05/2014 18:27:
Without question, the current dumps put together by WikiTeam are an awesome resource for folks wanting to do Wikia research.
Thanks. I hope someone will use them. :-)
That said, they are a strange sample and it's not clear how they are representative of other Wikia wikis. This makes it hard to use the sample to confidently answer a question like Piotr's.
Earlier dumps are basically random, but the one we made last winter should include (save some errors) all the biggest wikis.
Basically, logged-in users have to "request" every dump individually and by hand. Once a dump is requested, it will be created and put in S3 and then seems to be kept around for at least several months. I've found some shockingly big and important wikis without dumps and 14k is a tiny proportion of all wikis! :-(
Wikia has some 400k wikis, but at least 350k of them have only one ns0 page. Some of the "shockingly big" wikis may be excluded from dumps for copyright reasons (the biggest example is lyricswiki).
If I can help or provide resources to help get a new comprehensive set of Wikia dumps, let me know.
Other than bugfixes for wikiteam [1] what we'd like to have is an up to date list of all relevant (or non-empty) Wikia wikis, say 20-30k biggest. The list I used was given me by an unnamed person a few years ago and I've always been too lazy to update it. It doesn't take much if you're not afraid of hitting Wikia APIs a bit. ;-) https://bugzilla.wikimedia.org/show_bug.cgi?id=59943
Nemo