Samuel Hoover, 20/03/19 21:24:
Does that mean Commons is currently culling its content? and that it makes most sense to wait for a post 2016 dump until after housecleaning is complete?
No, I just mean that it takes time to identify copyright violations and so on. Most deletions happen for content uploaded in the last few months, so I generally respected an "embargo" of at least six months (hte Internet Archive items are supposed to be durable).
Even with fiber connection/torrents, downloading will take time. Does any organization sell terabyte drives containing the Commons dump? Or can one travel to a physical location and connect several terabyte drives to quickly copy over?
Your best chance is probably to find some machines connected to the CENIC/Internet2 network or "nearby" and download the Internet Archive torrents from there. Hopefully you get 5-10 MiB/s per item and if you do all of them concurrently you should manage in a day or two. Internet Archive also routinely provides researcher access, but I'm not sure whether that's for private items only.
Wikimedia Foundation used to provide data feeds for some companies back in the day. If there is a significant need I suppose they could arrange for someone to have rsync access or something, but it's not going to happen overnight.
Federico
xmldatadumps-l@lists.wikimedia.org