I wanted to cite a statistic on whether vandalism at Wikia is higher or lower than on Wikipedia, but couldn't find anything. Is anyone familiar with research that I may want to check out? I am drawing almost nothing for studies of Wikia, outside the recent paper by Aaron Shaw and Benjamin Mako Hill (CC-ed), which did not however focus on vandalism. Wikia (the largest wiki farm?) appears to be drastically under-researched...
Piotr Konieczny, 29/05/2014 05:56:
Wikia (the largest wiki farm?) appears to be drastically under-researched...
Part of the reason may be that they don't offer regular data dumps. But WikiTeam has remedied and recovered dumps for most of their top 14k wikis (as well as all images): https://archive.org/details/wikia_dump_20140125 https://archive.org/search.php?query=wikia_dump
It's possible to release updates if needed, just tell us with some advance because it takes weeks or months due to aggressive throttling and blocking policies.
Nemo
That's intriguing, any idea why Wikia is being so unfriendly with that? Are they doing the usual corporation "our data is ours/secrecy is good/we don't need your research as it may reveal things we don't want the world/competitors to know about" shtick?
--
Piotr Konieczny, PhD http://hanyang.academia.edu/PiotrKonieczny http://scholar.google.com/citations?user=gdV8_AEAAAAJ http://en.wikipedia.org/wiki/User:Piotrus
On 5/29/2014 15:40, Federico Leva (Nemo) wrote:
Piotr Konieczny, 29/05/2014 05:56:
Wikia (the largest wiki farm?) appears to be drastically under-researched...
Part of the reason may be that they don't offer regular data dumps. But WikiTeam has remedied and recovered dumps for most of their top 14k wikis (as well as all images): https://archive.org/details/wikia_dump_20140125 https://archive.org/search.php?query=wikia_dump
It's possible to release updates if needed, just tell us with some advance because it takes weeks or months due to aggressive throttling and blocking policies.
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Piotr Konieczny, 29/05/2014 12:22:
That's intriguing, any idea why Wikia is being so unfriendly with that? Are they doing the usual corporation "our data is ours/secrecy is good/we don't need your research as it may reveal things we don't want the world/competitors to know about" shtick?
Nothing like that: they consistently reply that dumps are wonderful and in their opinion they do all they should. http://archiveteam.org/index.php?title=Wikia#Download When you explain them that it's not enough, they don't disagree, but passive-aggresively refer to someone else in the chain of command (I think I covered it all by now). Their current excuse is that they're not sure they have enough disk space on http://s3.amazonaws.com/wikia_xml_dumps/*
Nemo
<quote who="Federico Leva (Nemo)" date="Thu, May 29, 2014 at 08:40:16AM +0200">
Piotr Konieczny, 29/05/2014 05:56:
Wikia (the largest wiki farm?) appears to be drastically under-researched...
Part of the reason may be that they don't offer regular data dumps. But WikiTeam has remedied and recovered dumps for most of their top 14k wikis (as well as all images): https://archive.org/details/wikia_dump_20140125 https://archive.org/search.php?query=wikia_dump
Wikia published comprehensive dumps for all of their wikis until sometime in 2010. This is how Kittur and Kraut could write the paper they did.
Without question, the current dumps put together by WikiTeam are an awesome resource for folks wanting to do Wikia research. That said, they are a strange sample and it's not clear how they are representative of other Wikia wikis. This makes it hard to use the sample to confidently answer a question like Piotr's.
Basically, logged-in users have to "request" every dump individually and by hand. Once a dump is requested, it will be created and put in S3 and then seems to be kept around for at least several months. I've found some shockingly big and important wikis without dumps and 14k is a tiny proportion of all wikis! :-(
If I can help or provide resources to help get a new comprehensive set of Wikia dumps, let me know.
Regards, Mako
Benj. Mako Hill, 29/05/2014 18:27:
Without question, the current dumps put together by WikiTeam are an awesome resource for folks wanting to do Wikia research.
Thanks. I hope someone will use them. :-)
That said, they are a strange sample and it's not clear how they are representative of other Wikia wikis. This makes it hard to use the sample to confidently answer a question like Piotr's.
Earlier dumps are basically random, but the one we made last winter should include (save some errors) all the biggest wikis.
Basically, logged-in users have to "request" every dump individually and by hand. Once a dump is requested, it will be created and put in S3 and then seems to be kept around for at least several months. I've found some shockingly big and important wikis without dumps and 14k is a tiny proportion of all wikis! :-(
Wikia has some 400k wikis, but at least 350k of them have only one ns0 page. Some of the "shockingly big" wikis may be excluded from dumps for copyright reasons (the biggest example is lyricswiki).
If I can help or provide resources to help get a new comprehensive set of Wikia dumps, let me know.
Other than bugfixes for wikiteam [1] what we'd like to have is an up to date list of all relevant (or non-empty) Wikia wikis, say 20-30k biggest. The list I used was given me by an unnamed person a few years ago and I've always been too lazy to update it. It doesn't take much if you're not afraid of hitting Wikia APIs a bit. ;-) https://bugzilla.wikimedia.org/show_bug.cgi?id=59943
Nemo
<quote who="Piotr Konieczny" date="Thu, May 29, 2014 at 12:56:25PM +0900">
I wanted to cite a statistic on whether vandalism at Wikia is higher or lower than on Wikipedia, but couldn't find anything. Is anyone familiar with research that I may want to check out? I am drawing almost nothing for studies of Wikia, outside the recent paper by Aaron Shaw and Benjamin Mako Hill (CC-ed), which did not however focus on vandalism.
This is the closet article I know about it. It doesn't talk about "vandalism" per se but it does look at levels of reverts and compares them to levels in Wikipedia. I think it's actually pretty close to what you are looking for:
Kittur, Aniket, and Robert E. Kraut. “Beyond Wikipedia: Coordination and Conflict in Online Production Groups.” In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, 215–24. Savannah, Georgia, USA: ACM, 2010. doi:10.1145/1718918.1718959.
Wikia (the largest wiki farm?) appears to be drastically under-researched...
I agree with this completely! art of the reason that there is so much research about Wikipedia is that the WMF goes to incredible length to make things easy for researchers with providing datasets, etc. Basically everybody else, including Wikia, does less.
Regards, Mako
wiki-research-l@lists.wikimedia.org