Thanks, Paolo.
I hadn't added archive.org because it requires us to upload the files, and the files cannot be deleted in the future, so, upload the latest dumps (updated every month or so) could be a waste of resources for them. I was thinking about mirrors who runs wget to slurp all the files from download.wikimedia.org, and the next month delete the previous ones.
Of course, we can contact to archive.org to ask them about the wget idea.
Regards, emijrp
2010/11/16 paolo massa paolo@gnuband.org
I've added archive.org ;)
On Tue, Nov 16, 2010 at 12:05 PM, emijrp emijrp@gmail.com wrote:
Hi all;
I have started a new page in meta: for coordinating the efforts in
mirroring
Wikimedia project XML dumps. I asked some days ago to iBiblio if they
were
interested in this, but they replied: "Unfortunately, we do not have the resources to provide a mirror of wikipedia. Best of luck!"
I think that we must work on this, so, all the help is welcome. If you
know
about universities, archives, etc, that could be interested in get a copy
of
the XML files, for backup or research purposes, please, add them to the
list
and we can send them a letter.
We are compiling all the human knowledge! That deserves being mirroring
ad
nauseam!
Regards, emijrp
[1]
https://secure.wikimedia.org/wikipedia/meta/wiki/Mirroring_Wikimedia_project...
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
--
Paolo Massa Email: paolo AT gnuband DOT org Blog: http://gnuband.org
Emirjp:
the next month delete the previous ones
It seems prudent to keep several copies, to guard against incomplete or otherwise invalid dumps
Keeping files for all future would keep all privacy sensitive data available for all time, even when Wikipedia deleted it from the online database after mirroring.
I suggest we follow a middle road here.
Erik Zachte
xmldatadumps-l@lists.wikimedia.org