Hi;
I think this is the first time a full XML dump of Citizendium is publicly available[1] (CZ offers dumps but only the last revision for each article[2], and our previously efforts generated corrupted and incomplete dumps). It contains 168,262 pages and 753,651 revisions (9 GB, 99 MB in 7z). I think it may be useful for researchers, including quality analysis.
It was generated using WikiTeam tools.[3] This is part of our task force to make backups of thousands of wikis around the Internet.[4]
Regards, emijrp
[1] http://archive.org/details/wiki-encitizendiumorg [2] http://en.citizendium.org/wiki/CZ:Downloads [3] http://code.google.com/p/wikiteam/ [4] http://code.google.com/p/wikiteam/wiki/AvailableBackups
Just wow... Thank you WikiTeam and task force! Is scraperwiki involved? SJ
On Tue, Aug 7, 2012 at 5:18 AM, emijrp emijrp@gmail.com wrote:
Hi;
I think this is the first time a full XML dump of Citizendium is publicly available[1] (CZ offers dumps but only the last revision for each article[2], and our previously efforts generated corrupted and incomplete dumps). It contains 168,262 pages and 753,651 revisions (9 GB, 99 MB in 7z). I think it may be useful for researchers, including quality analysis.
It was generated using WikiTeam tools.[3] This is part of our task force to make backups of thousands of wikis around the Internet.[4]
Regards, emijrp
[1] http://archive.org/details/wiki-encitizendiumorg [2] http://en.citizendium.org/wiki/CZ:Downloads [3] http://code.google.com/p/wikiteam/ [4] http://code.google.com/p/wikiteam/wiki/AvailableBackups
-- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT http://code.google.com/p/avbot/ | StatMediaWikihttp://statmediawiki.forja.rediris.es | WikiEvidens http://code.google.com/p/wikievidens/ | WikiPapershttp://wikipapers.referata.com | WikiTeam http://code.google.com/p/wikiteam/ Personal website: https://sites.google.com/site/emijrp/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Sj
Scraperwiki is about playing with data (like a cool Excel), but WikiTeam extracts page histories and images. It is unrelated.
We surpassed 3,000 preserved wikis yesterday http://code.google.com/p/wikiteam/wiki/AvailableBackups and is quickly growing. We upload the dumps to Internet Archive, that folks know a bit about long-term preservation.
Wiki preservation is part of my research on wikis, and later I want to compare these wiki communities with Wikipedia. I'm open to suggestions.
Regards, emijrp
2012/8/10 Samuel Klein meta.sj@gmail.com
Just wow... Thank you WikiTeam and task force! Is scraperwiki involved? SJ
On Tue, Aug 7, 2012 at 5:18 AM, emijrp emijrp@gmail.com wrote:
Hi;
I think this is the first time a full XML dump of Citizendium is publicly available[1] (CZ offers dumps but only the last revision for each article[2], and our previously efforts generated corrupted and incomplete dumps). It contains 168,262 pages and 753,651 revisions (9 GB, 99 MB in 7z). I think it may be useful for researchers, including quality analysis.
It was generated using WikiTeam tools.[3] This is part of our task force to make backups of thousands of wikis around the Internet.[4]
Regards, emijrp
[1] http://archive.org/details/wiki-encitizendiumorg [2] http://en.citizendium.org/wiki/CZ:Downloads [3] http://code.google.com/p/wikiteam/ [4] http://code.google.com/p/wikiteam/wiki/AvailableBackups
-- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT http://code.google.com/p/avbot/ | StatMediaWikihttp://statmediawiki.forja.rediris.es | WikiEvidens http://code.google.com/p/wikievidens/ | WikiPapershttp://wikipapers.referata.com | WikiTeam http://code.google.com/p/wikiteam/ Personal website: https://sites.google.com/site/emijrp/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Samuel Klein @metasj w:user:sj +1 617 529 4266
Dear emijrp,
I have worked on comparing special purpose wikis in the past (please have a look on http://www.hucompute.org/data/pdf/mehler_2008_a.pdf) and am interested in a collaboration what regards the structural classification of such wikis based on their author and their text networks. What do you think about a cooperation?
Best wishes,
Alexander
Am 10.08.2012 13:02, schrieb emijrp:
Hi Sj
Scraperwiki is about playing with data (like a cool Excel), but WikiTeam extracts page histories and images. It is unrelated.
We surpassed 3,000 preserved wikis yesterday http://code.google.com/p/wikiteam/wiki/AvailableBackups and is quickly growing. We upload the dumps to Internet Archive, that folks know a bit about long-term preservation.
Wiki preservation is part of my research on wikis, and later I want to compare these wiki communities with Wikipedia. I'm open to suggestions.
Regards, emijrp
2012/8/10 Samuel Klein <meta.sj@gmail.com mailto:meta.sj@gmail.com>
Just wow... Thank you WikiTeam and task force! Is scraperwiki involved? SJ On Tue, Aug 7, 2012 at 5:18 AM, emijrp <emijrp@gmail.com <mailto:emijrp@gmail.com>> wrote: Hi; I think this is the first time a full XML dump of Citizendium is publicly available[1] (CZ offers dumps but only the last revision for each article[2], and our previously efforts generated corrupted and incomplete dumps). It contains 168,262 pages and 753,651 revisions (9 GB, 99 MB in 7z). I think it may be useful for researchers, including quality analysis. It was generated using WikiTeam tools.[3] This is part of our task force to make backups of thousands of wikis around the Internet.[4] Regards, emijrp [1] http://archive.org/details/wiki-encitizendiumorg [2] http://en.citizendium.org/wiki/CZ:Downloads [3] http://code.google.com/p/wikiteam/ [4] http://code.google.com/p/wikiteam/wiki/AvailableBackups -- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT <http://code.google.com/p/avbot/> | StatMediaWiki <http://statmediawiki.forja.rediris.es> | WikiEvidens <http://code.google.com/p/wikievidens/> | WikiPapers <http://wikipapers.referata.com> | WikiTeam <http://code.google.com/p/wikiteam/> Personal website: https://sites.google.com/site/emijrp/ _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Samuel Klein @metasj w:user:sj +1 617 529 4266
-- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT http://code.google.com/p/avbot/ | StatMediaWiki http://statmediawiki.forja.rediris.es | WikiEvidens http://code.google.com/p/wikievidens/ | WikiPapers http://wikipapers.referata.com | WikiTeam http://code.google.com/p/wikiteam/ Personal website: https://sites.google.com/site/emijrp/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org