https://github.com/h4ck3rm1k3/wikiteam code here
On Wed, May 30, 2012 at 6:26 AM, Mike Dupont jamesmikedupont@googlemail.com wrote:
Ok, I merged the code from wikteam and have a full history dump script that uploads to archive.org, next step is to fix the bucket metadata in the script mike
On Tue, May 29, 2012 at 3:08 AM, Mike Dupont jamesmikedupont@googlemail.com wrote:
Well, I have now updated the script to include the xml dump in raw format. I will have to add more information the achive.org item, at least a basic readme. other thing is that the wikipybot does not support the full history it seems, so that I will have to move over to the wikiteam version and rework it, I just spent 2 hours on this so i am pretty happy for the first version.
mike
On Tue, May 29, 2012 at 1:52 AM, Hydriz Wikipedia admin@alphacorp.tk wrote:
This is quite nice, though the item's metadata is too little :)
On Tue, May 29, 2012 at 3:40 AM, Mike Dupont <jamesmikedupont@googlemail.com
wrote:
first version of the Script is ready , it gets the versions, puts them in a zip and puts that on archive.org https://github.com/h4ck3rm1k3/pywikipediabot/blob/master/export_deleted.py
here is an example output : http://archive.org/details/wikipedia-delete-2012-05
http://ia601203.us.archive.org/24/items/wikipedia-delete-2012-05/archive2012...
I will cron this, and it should give a start of saving deleted data. Articles will be exported once a day, even if they they were exported yesterday as long as they are in one of the categories.
mike
On Mon, May 21, 2012 at 7:21 PM, Mike Dupont jamesmikedupont@googlemail.com wrote:
Thanks! and run that 1 time per day, they dont get deleted that quickly. mike
On Mon, May 21, 2012 at 9:11 PM, emijrp emijrp@gmail.com wrote:
Create a script that makes a request to Special:Export using this
category
as feed https://en.wikipedia.org/wiki/Category:Candidates_for_speedy_deletion
More info
https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export
2012/5/21 Mike Dupont jamesmikedupont@googlemail.com > > Well I whould be happy for items like this : > http://en.wikipedia.org/wiki/Template:Db-a7 > would it be possible to extract them easily? > mike > > On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn ariel@wikimedia.org > wrote: > > There's a few other reasons articles get deleted: copyright issues, > > personal identifying data, etc. This makes maintaning the sort of > > mirror you propose problematic, although a similar mirror is here: > > http://deletionpedia.dbatley.com/w/index.php?title=Main_Page > > > > The dumps contain only data publically available at the time of the
run,
> > without deleted data. > > > > The articles aren't permanently deleted of course. The revisions
texts
> > live on in the database, so a query on toolserver, for example,
could be
> > used to get at them, but that would need to be for research purposes. > > > > Ariel > > > > Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike Dupont
έγραψε:
> >> Hi, > >> I am thinking about how to collect articles deleted based on the
"not
> >> notable" criteria, > >> is there any way we can extract them from the mysql binlogs? how are > >> these mirrors working? I would be interested in setting up a mirror
of
> >> deleted data, at least that which is not spam/vandalism based on
tags.
> >> mike > >> > >> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn <
ariel@wikimedia.org>
> >> wrote: > >> > We now have three mirror sites, yay! The full list is linked to
from
> >> > http://dumps.wikimedia.org/ and is also available at > >> > > >> >
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current...
> >> > > >> > Summarizing, we have: > >> > > >> > C3L (Brazil) with the last 5 good known dumps, > >> > Masaryk University (Czech Republic) with the last 5 known good
dumps,
> >> > Your.org (USA) with the complete archive of dumps, and > >> > > >> > for the latest version of uploaded media, Your.org with > >> > http/ftp/rsync > >> > access. > >> > > >> > Thanks to Carlos, Kevin and Yenya respectively at the above sites
for
> >> > volunteering space, time and effort to make this happen. > >> > > >> > As people noticed earlier, a series of media tarballs per-project > >> > (excluding commons) is being generated. As soon as the first run
of
> >> > these is complete we'll announce its location and start generating > >> > them > >> > on a semi-regular basis. > >> > > >> > As we've been getting the bugs out of the mirroring setup, it is > >> > getting > >> > easier to add new locations. Know anyone interested? Please let
us
> >> > know; we would love to have them. > >> > > >> > Ariel > >> > > >> > > >> > _______________________________________________ > >> > Wikitech-l mailing list > >> > Wikitech-l@lists.wikimedia.org > >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > >> > >> > >> > > > > > > > > _______________________________________________ > > Wikitech-l mailing list > > Wikitech-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > > -- > James Michael DuPont > Member of Free Libre Open Source Software Kosova http://flossk.org > Contributor FOSM, the CC-BY-SA map of the world http://fosm.org > Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT | StatMediaWiki | WikiEvidens | WikiPapers | WikiTeam Personal website: https://sites.google.com/site/emijrp/
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
-- James Michael DuPont Member of Free Libre Open Source Software Kosova http://flossk.org Contributor FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
-- James Michael DuPont Member of Free Libre Open Source Software Kosova http://flossk.org Contributor FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Regards, Hydriz
We've created the greatest collection of shared knowledge in history. Help protect Wikipedia. Donate now: http://donate.wikimedia.org _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- James Michael DuPont Member of Free Libre Open Source Software Kosova http://flossk.org Contributor FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
-- James Michael DuPont Member of Free Libre Open Source Software Kosova http://flossk.org Contributor FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3