Answers to most of these questions can be found at/from https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps You are especially welcome to reseed the Wikimedia Commons tarballs.
Probably of interest for you are also: * https://commons.wikimedia.org/wiki/Commons:Structured_data * http://elog.io
Nemo