On 11/12/06, Gregory Maxwell gmaxwell@gmail.com wrote:
On 11/12/06, Anthony wikilegal@inbox.org wrote:
At the worst backups should require twice as much space as the images themselves. If Wikimedia's backups require much more Brion is doing something really really wrong.
Note that I didn't even say hard drive space was cheap. I just said it's cheaper than education.
*sigh*. If only things were ever that simple.
Simply having 2x disks in the same chassis isn't a backup, and the total costs for all complex things are non-linear.
Rotating tapes offsite is a backup. Transferring everything to two other data centers is a backup (put those data centers on different continents and it's a local cache too). Backup isn't a complex thing. And unless you're doing something dumb the cost of backup is most certainly linear (compared to the cost of the initial storage).
Put the image dumps in gigabyte chunks on a superseeding bittorrent server, and you could probably get backups nearly free (just the cost of transferring the images once if you can convince enough others to act as seeds, which you probably could). Of course, now I'm talking introducing a bit of design into things. Really stupid easy backups like the ones in my first message are still linear.
More importantly: categorization, verification, search, etc are not cheap. Nor is the time of the users we serve. We'd do a great disservice by allowing commons to become a disordered dumping ground.
You contradict yourself. Being a disordered dumping ground doesn't require categorization, verification, or search.
No I don't. I suspect you've been confused by my befuddled English.
The avoidance of being a disordered dumping ground requires non-trivial *per image* work for categorization, verification, etc. "Upload all your trash" doesn't scale and will ensure that we are never able to become well ordered... which is an outcome which would diminish our value to the public.
I'd say that "explain[ing to] people that not every ''shitty image'' they produce is worth publishing on Wikimedia projects" doesn't scale either, and that an image repository with some parts which are organized and some parts which aren't has an equal or even higher value than an image repository without those disorganized parts.
Of course, space *is* a consideration, and it wouldn't make sense to outright advertise "dump all your trash here". By all means rules should be in place that say that useless crap will be deleted. And sure, if space gets tight that rule might need to be enforced to a greater extent than when it isn't. But trying to explain to people what should be obvious, that I'd say is a waste of resources.
Anthony