Re: [Xmldatadumps-l] [Wiki-research-l] Wikipedia dumps downloader

28 Jun 2011


      emijrp wrote:
...
Hi;
@Derrick: I don't trust Amazon.
I disagree. Note that we only need them to keep a redundant copy of a 
file. If they tried to tamper the file we could detect it with the 
hashes (which should be properly secured, that's no problem).
I'd like having the hashes for the xml dumps content instead of the 
compressed one, though, so it could be easily stored with better 
compression without weakening the integrity check.
...
Really, I don't trust Wikimedia
Foundation either. They can't and/or they don't want to provide image
dumps (what is worst?).
Wikimedia Foundation has provided image dumps several times in the past, 
and also rsync3 access to some individuals so that they could clone it.
It's like the enwiki history dump. An image dump is complex, and even 
less useful.
...
Community donates images to Commons, community
donates money every year, and now community needs to develop a software
to extract all the images and packed them,
There's no *need* for that. In fact, such script would be trivial from 
the toolserver.
...
and of course, host them in a permanent way. Crazy, right?
WMF also tries hard to not lose images. We want to provide some 
redundance on our own. That's perfectly fine, but it's not a 
requirement. Consider that WMF could be automatically deleting page 
history older than a month, or images not used on any article. *That* 
would be a real problem.
...
@Milos: Instead of spliting image dump using the first letter of
filenames, I thought about spliting using the upload date (YYYY-MM-DD).
So, first chunks (2005-01-01) will be tiny, and recent ones of several
GB (a single day).
Regards,
emijrp
I like that idea since it means the dumps are static. They could be 
placed in tape inside a safe and not needed to be taken out unless data 
loss arises.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Xmldatadumps-l] [Wiki-research-l] Wikipedia dumps downloader