We now have three mirror sites, yay! The full list is linked to from http://dumps.wikimedia.org/ and is also available at http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current...
Summarizing, we have:
C3L (Brazil) with the last 5 good known dumps, Masaryk University (Czech Republic) with the last 5 known good dumps, Your.org (USA) with the complete archive of dumps, and
for the latest version of uploaded media, Your.org with http/ftp/rsync access.
Thanks to Carlos, Kevin and Yenya respectively at the above sites for volunteering space, time and effort to make this happen.
As people noticed earlier, a series of media tarballs per-project (excluding commons) is being generated. As soon as the first run of these is complete we'll announce its location and start generating them on a semi-regular basis.
As we've been getting the bugs out of the mirroring setup, it is getting easier to add new locations. Know anyone interested? Please let us know; we would love to have them.
Ariel
Good work. We are approaching finally to an indestructible corpus of knowledge.
2012/5/17 Ariel T. Glenn ariel@wikimedia.org
We now have three mirror sites, yay! The full list is linked to from http://dumps.wikimedia.org/ and is also available at
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current...
Summarizing, we have:
C3L (Brazil) with the last 5 good known dumps, Masaryk University (Czech Republic) with the last 5 known good dumps, Your.org (USA) with the complete archive of dumps, and
for the latest version of uploaded media, Your.org with http/ftp/rsync access.
Thanks to Carlos, Kevin and Yenya respectively at the above sites for volunteering space, time and effort to make this happen.
As people noticed earlier, a series of media tarballs per-project (excluding commons) is being generated. As soon as the first run of these is complete we'll announce its location and start generating them on a semi-regular basis.
As we've been getting the bugs out of the mirroring setup, it is getting easier to add new locations. Know anyone interested? Please let us know; we would love to have them.
Ariel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi, I am thinking about how to collect articles deleted based on the "not notable" criteria, is there any way we can extract them from the mysql binlogs? how are these mirrors working? I would be interested in setting up a mirror of deleted data, at least that which is not spam/vandalism based on tags. mike
On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn ariel@wikimedia.org wrote:
We now have three mirror sites, yay! The full list is linked to from http://dumps.wikimedia.org/ and is also available at http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current...
Summarizing, we have:
C3L (Brazil) with the last 5 good known dumps, Masaryk University (Czech Republic) with the last 5 known good dumps, Your.org (USA) with the complete archive of dumps, and
for the latest version of uploaded media, Your.org with http/ftp/rsync access.
Thanks to Carlos, Kevin and Yenya respectively at the above sites for volunteering space, time and effort to make this happen.
As people noticed earlier, a series of media tarballs per-project (excluding commons) is being generated. As soon as the first run of these is complete we'll announce its location and start generating them on a semi-regular basis.
As we've been getting the bugs out of the mirroring setup, it is getting easier to add new locations. Know anyone interested? Please let us know; we would love to have them.
Ariel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
There's a few other reasons articles get deleted: copyright issues, personal identifying data, etc. This makes maintaning the sort of mirror you propose problematic, although a similar mirror is here: http://deletionpedia.dbatley.com/w/index.php?title=Main_Page
The dumps contain only data publically available at the time of the run, without deleted data.
The articles aren't permanently deleted of course. The revisions texts live on in the database, so a query on toolserver, for example, could be used to get at them, but that would need to be for research purposes.
Ariel
Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike Dupont έγραψε:
Hi, I am thinking about how to collect articles deleted based on the "not notable" criteria, is there any way we can extract them from the mysql binlogs? how are these mirrors working? I would be interested in setting up a mirror of deleted data, at least that which is not spam/vandalism based on tags. mike
On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn ariel@wikimedia.org wrote:
We now have three mirror sites, yay! The full list is linked to from http://dumps.wikimedia.org/ and is also available at http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current...
Summarizing, we have:
C3L (Brazil) with the last 5 good known dumps, Masaryk University (Czech Republic) with the last 5 known good dumps, Your.org (USA) with the complete archive of dumps, and
for the latest version of uploaded media, Your.org with http/ftp/rsync access.
Thanks to Carlos, Kevin and Yenya respectively at the above sites for volunteering space, time and effort to make this happen.
As people noticed earlier, a series of media tarballs per-project (excluding commons) is being generated. As soon as the first run of these is complete we'll announce its location and start generating them on a semi-regular basis.
As we've been getting the bugs out of the mirroring setup, it is getting easier to add new locations. Know anyone interested? Please let us know; we would love to have them.
Ariel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 17/05/12 14:23, Ariel T. Glenn wrote:
There's a few other reasons articles get deleted: copyright issues, personal identifying data, etc. This makes maintaning the sort of mirror you propose problematic, although a similar mirror is here: http://deletionpedia.dbatley.com/w/index.php?title=Main_Page
The dumps contain only data publically available at the time of the run, without deleted data.
The articles aren't permanently deleted of course.
And is a much better way to retrieve them than the binlogs (which are only kept for a short time anyway).
The revisions texts live on in the database,
so a query on toolserver, for example, could be used to get at them, but that would need to be for research purposes.
Not really. You could get a list of deleted titles/authors from the toolserver, but not the page contents, which for some strange reason are not replicated there (not even available to the roots).
You can create a script that uses Special:Export to export all articles in the deletion categories just before they are deleted.
Then import them into your "Deletionpedia".
2012/5/17 Mike Dupont jamesmikedupont@googlemail.com
Hi, I am thinking about how to collect articles deleted based on the "not notable" criteria, is there any way we can extract them from the mysql binlogs? how are these mirrors working? I would be interested in setting up a mirror of deleted data, at least that which is not spam/vandalism based on tags. mike
On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn ariel@wikimedia.org wrote:
We now have three mirror sites, yay! The full list is linked to from http://dumps.wikimedia.org/ and is also available at
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current...
Summarizing, we have:
C3L (Brazil) with the last 5 good known dumps, Masaryk University (Czech Republic) with the last 5 known good dumps, Your.org (USA) with the complete archive of dumps, and
for the latest version of uploaded media, Your.org with http/ftp/rsync access.
Thanks to Carlos, Kevin and Yenya respectively at the above sites for volunteering space, time and effort to make this happen.
As people noticed earlier, a series of media tarballs per-project (excluding commons) is being generated. As soon as the first run of these is complete we'll announce its location and start generating them on a semi-regular basis.
As we've been getting the bugs out of the mirroring setup, it is getting easier to add new locations. Know anyone interested? Please let us know; we would love to have them.
Ariel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- James Michael DuPont Member of Free Libre Open Source Software Kosova http://flossk.org Contributor FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
xmldatadumps-l@lists.wikimedia.org