Re: [Wiki-research-l] [wikiteam-discuss:547] Re: Citizendium full XML dump: 168, 262 pages and 753, 651 revisions

10 Aug 2012


      Hi Sj
Scraperwiki is about playing with data (like a cool Excel), but WikiTeam
extracts page histories and images. It is unrelated.
We surpassed 3,000 preserved wikis yesterday
http://code.google.com/p/wikiteam/wiki/AvailableBackups and is quickly
growing. We upload the dumps to Internet Archive, that folks know a bit
about long-term preservation.
Wiki preservation is part of my research on wikis, and later I want to
compare these wiki communities with Wikipedia. I'm open to suggestions.
Regards,
emijrp
2012/8/10 Samuel Klein meta.sj@gmail.com
...
Just wow...  Thank you WikiTeam and task force!  Is scraperwiki involved?
 SJ
On Tue, Aug 7, 2012 at 5:18 AM, emijrp emijrp@gmail.com wrote:
...
Hi;
I think this is the first time a full XML dump of Citizendium is publicly
available[1] (CZ offers dumps but only the last revision for each
article[2], and our previously efforts generated corrupted and incomplete
dumps). It contains 168,262 pages and 753,651 revisions (9 GB, 99 MB in
7z). I think it may be useful for researchers, including quality analysis.
It was generated using WikiTeam tools.[3] This is part of our task force
to make backups of thousands of wikis around the Internet.[4]
Regards,
emijrp
[1] http://archive.org/details/wiki-encitizendiumorg
[2] http://en.citizendium.org/wiki/CZ:Downloads
[3] http://code.google.com/p/wikiteam/
[4] http://code.google.com/p/wikiteam/wiki/AvailableBackups
--
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT http://code.google.com/p/avbot/ | StatMediaWikihttp://statmediawiki.forja.rediris.es
| WikiEvidens http://code.google.com/p/wikievidens/ | WikiPapershttp://wikipapers.referata.com
| WikiTeam http://code.google.com/p/wikiteam/
Personal website: https://sites.google.com/site/emijrp/

Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT http://code.google.com/p/avbot/ |
StatMediaWikihttp://statmediawiki.forja.rediris.es
| WikiEvidens http://code.google.com/p/wikievidens/ |
WikiPapershttp://wikipapers.referata.com
| WikiTeam http://code.google.com/p/wikiteam/
Personal website: https://sites.google.com/site/emijrp/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] [wikiteam-discuss:547] Re: Citizendium full XML dump: 168, 262 pages and 753, 651 revisions