This is quite nice, though the item's metadata is too little :)
On Tue, May 29, 2012 at 3:40 AM, Mike Dupont <jamesmikedupont@googlemail.com
wrote:
first version of the Script is ready , it gets the versions, puts them in a zip and puts that on archive.org https://github.com/h4ck3rm1k3/pywikipediabot/blob/master/export_deleted.py
here is an example output : http://archive.org/details/wikipedia-delete-2012-05
http://ia601203.us.archive.org/24/items/wikipedia-delete-2012-05/archive2012...
I will cron this, and it should give a start of saving deleted data. Articles will be exported once a day, even if they they were exported yesterday as long as they are in one of the categories.
mike
On Mon, May 21, 2012 at 7:21 PM, Mike Dupont jamesmikedupont@googlemail.com wrote:
Thanks! and run that 1 time per day, they dont get deleted that quickly. mike
On Mon, May 21, 2012 at 9:11 PM, emijrp emijrp@gmail.com wrote:
Create a script that makes a request to Special:Export using this
category
as feed https://en.wikipedia.org/wiki/Category:Candidates_for_speedy_deletion
More info
https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export
2012/5/21 Mike Dupont jamesmikedupont@googlemail.com
Well I whould be happy for items like this : http://en.wikipedia.org/wiki/Template:Db-a7 would it be possible to extract them easily? mike
On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn ariel@wikimedia.org wrote:
There's a few other reasons articles get deleted: copyright issues, personal identifying data, etc. This makes maintaning the sort of mirror you propose problematic, although a similar mirror is here: http://deletionpedia.dbatley.com/w/index.php?title=Main_Page
The dumps contain only data publically available at the time of the
run,
without deleted data.
The articles aren't permanently deleted of course. The revisions
texts
live on in the database, so a query on toolserver, for example,
could be
used to get at them, but that would need to be for research purposes.
Ariel
Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike Dupont
έγραψε:
Hi, I am thinking about how to collect articles deleted based on the
"not
notable" criteria, is there any way we can extract them from the mysql binlogs? how are these mirrors working? I would be interested in setting up a mirror
of
deleted data, at least that which is not spam/vandalism based on
tags.
mike
On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn <
ariel@wikimedia.org>
wrote: > We now have three mirror sites, yay! The full list is linked to
from
> http://dumps.wikimedia.org/ and is also available at > >
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current...
> > Summarizing, we have: > > C3L (Brazil) with the last 5 good known dumps, > Masaryk University (Czech Republic) with the last 5 known good
dumps,
> Your.org (USA) with the complete archive of dumps, and > > for the latest version of uploaded media, Your.org with > http/ftp/rsync > access. > > Thanks to Carlos, Kevin and Yenya respectively at the above sites
for
> volunteering space, time and effort to make this happen. > > As people noticed earlier, a series of media tarballs per-project > (excluding commons) is being generated. As soon as the first run
of
> these is complete we'll announce its location and start generating > them > on a semi-regular basis. > > As we've been getting the bugs out of the mirroring setup, it is > getting > easier to add new locations. Know anyone interested? Please let
us
> know; we would love to have them. > > Ariel > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- James Michael DuPont Member of Free Libre Open Source Software Kosova http://flossk.org Contributor FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT | StatMediaWiki | WikiEvidens | WikiPapers | WikiTeam Personal website: https://sites.google.com/site/emijrp/
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
-- James Michael DuPont Member of Free Libre Open Source Software Kosova http://flossk.org Contributor FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
-- James Michael DuPont Member of Free Libre Open Source Software Kosova http://flossk.org Contributor FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l