Samuel Hoover, 20/03/19 21:24:
Does that mean Commons is currently culling its
content? and that it
makes most sense to wait for a post 2016 dump until after housecleaning
No, I just mean that it takes time to identify copyright violations and
so on. Most deletions happen for content uploaded in the last few
months, so I generally respected an "embargo" of at least six months
(hte Internet Archive items are supposed to be durable).
Even with fiber connection/torrents, downloading will take time. Does
any organization sell terabyte drives containing the Commons dump? Or
can one travel to a physical location and connect several terabyte
drives to quickly copy over?
Your best chance is probably to find some machines connected to the
CENIC/Internet2 network or "nearby" and download the Internet Archive
torrents from there. Hopefully you get 5-10 MiB/s per item and if you do
all of them concurrently you should manage in a day or two. Internet
Archive also routinely provides researcher access, but I'm not sure
whether that's for private items only.
Wikimedia Foundation used to provide data feeds for some companies back
in the day. If there is a significant need I suppose they could arrange
for someone to have rsync access or something, but it's not going to