Ok, I merged the code from wikteam and have a
full history dump
script that uploads to
, next step is to fix the bucket
metadata in the script mike
On Tue, May 29, 2012 at 3:08 AM, Mike Dupont
<jamesmikedupont(a)googlemail.com> wrote:
Well, I have now updated the script to include
the xml dump in raw
format. I will have to add more information the
achive.org item, at
least a basic readme.
other thing is that the wikipybot does not support the full history
it seems, so that I will have to move over to the wikiteam version
and rework it, I just spent 2 hours on this so i am pretty happy for
the first version.
mike
On Tue, May 29, 2012 at 1:52 AM, Hydriz Wikipedia <admin(a)alphacorp.tk> wrote:
> This is quite nice, though the item's metadata is too little :)
>
> On Tue, May 29, 2012 at 3:40 AM, Mike Dupont
> <jamesmikedupont(a)googlemail.com
>> wrote:
>
>> first version of the Script is ready , it gets the versions, puts
>> them in a zip and puts that on
archive.org
>>
https://github.com/h4ck3rm1k3/pywikipediabot/blob/master/export_de
>> leted.py
>>
>> here is an example output :
>>
http://archive.org/details/wikipedia-delete-2012-05
>>
>>
http://ia601203.us.archive.org/24/items/wikipedia-delete-2012-05/a
>> rchive2012-05-28T21:34:02.302183.zip
>>
>> I will cron this, and it should give a start of saving deleted data.
>> Articles will be exported once a day, even if they they were
>> exported yesterday as long as they are in one of the categories.
>>
>> mike
>>
>> On Mon, May 21, 2012 at 7:21 PM, Mike Dupont
>> <jamesmikedupont(a)googlemail.com> wrote:
>> > Thanks! and run that 1 time per day, they dont get deleted that quickly.
>> > mike
>> >
>> > On Mon, May 21, 2012 at 9:11 PM, emijrp <emijrp(a)gmail.com> wrote:
>> >> Create a script that makes a request to Special:Export using
>> >> this
>> category
>> >> as feed
>> >>
https://en.wikipedia.org/wiki/Category:Candidates_for_speedy_de
>> >> letion
>> >>
>> >> More info
>>
https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export
>> >>
>> >>
>> >> 2012/5/21 Mike Dupont <jamesmikedupont(a)googlemail.com>
>> >>>
>> >>> Well I whould be happy for items like this :
>> >>>
http://en.wikipedia.org/wiki/Template:Db-a7
>> >>> would it be possible to extract them easily?
>> >>> mike
>> >>>
>> >>> On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn
>> >>> <ariel(a)wikimedia.org>
>> >>> wrote:
>> >>> > There's a few other reasons articles get deleted: copyright
>> >>> > issues, personal identifying data, etc. This makes
>> >>> > maintaning the sort of mirror you propose problematic, although
a similar mirror is here:
>> >>> >
http://deletionpedia.dbatley.com/w/index.php?title=Main_Page
>> >>> >
>> >>> > The dumps contain only data publically available at the time
>> >>> > of the
>> run,
>> >>> > without deleted data.
>> >>> >
>> >>> > The articles aren't permanently deleted of course. The
>> >>> > revisions
>> texts
>> >>> > live on in the database, so a query on toolserver, for
>> >>> > example,
>> could be
>> >>> > used to get at them, but that would need to be for research
purposes.
>> >>> >
>> >>> > Ariel
>> >>> >
>> >>> > Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike
>> >>> > Dupont
>> έγραψε:
>> >>> >> Hi,
>> >>> >> I am thinking about how to collect articles deleted based
>> >>> >> on the
>> "not
>> >>> >> notable" criteria,
>> >>> >> is there any way we can extract them from the mysql
>> >>> >> binlogs? how are these mirrors working? I would be
>> >>> >> interested in setting up a mirror
>> of
>> >>> >> deleted data, at least that which is not spam/vandalism
>> >>> >> based on
>> tags.
>> >>> >> mike
>> >>> >>
>> >>> >> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn <
>> ariel(a)wikimedia.org>
>> >>> >> wrote:
>> >>> >> > We now have three mirror sites, yay! The full list is
>> >>> >> > linked to
>> from
>> >>> >> >
http://dumps.wikimedia.org/ and is also available at
>> >>> >> >
>> >>> >> >
>>
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dum
>> ps#Current_Mirrors
>> >>> >> >
>> >>> >> > Summarizing, we have:
>> >>> >> >
>> >>> >> > C3L (Brazil) with the last 5 good known dumps, Masaryk
>> >>> >> > University (Czech Republic) with the last 5 known
good
>> dumps,
>> >>> >> >
Your.org (USA) with the complete archive of dumps,
and
>> >>> >> >
>> >>> >> > for the latest version of uploaded media,
Your.org
with
>> >>> >> > http/ftp/rsync access.
>> >>> >> >
>> >>> >> > Thanks to Carlos, Kevin and Yenya respectively at the
>> >>> >> > above sites
>> for
>> >>> >> > volunteering space, time and effort to make this
happen.
>> >>> >> >
>> >>> >> > As people noticed earlier, a series of media tarballs
>> >>> >> > per-project (excluding commons) is being generated.
As
>> >>> >> > soon as the first run
>> of
>> >>> >> > these is complete we'll announce its location and
start
>> >>> >> > generating them on a semi-regular basis.
>> >>> >> >
>> >>> >> > As we've been getting the bugs out of the
mirroring
>> >>> >> > setup, it is getting easier to add new locations.
Know
>> >>> >> > anyone interested? Please let
>> us
>> >>> >> > know; we would love to have them.
>> >>> >> >
>> >>> >> > Ariel
>> >>> >> >
>> >>> >> >
>> >>> >> > _______________________________________________
>> >>> >> > Wikitech-l mailing list
>> >>> >> > Wikitech-l(a)lists.wikimedia.org
>> >>> >> >
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> > _______________________________________________
>> >>> > Wikitech-l mailing list
>> >>> > Wikitech-l(a)lists.wikimedia.org
>> >>> >
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> James Michael DuPont
>> >>> Member of Free Libre Open Source Software Kosova
>> >>>
http://flossk.org Contributor FOSM, the CC-BY-SA map of the
>> >>> world
http://fosm.org Mozilla Rep
>> >>>
https://reps.mozilla.org/u/h4ck3rm1k3
>> >>>
>> >>> _______________________________________________
>> >>> Wikitech-l mailing list
>> >>> Wikitech-l(a)lists.wikimedia.org
>> >>>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
>> >> Pre-doctoral student at the University of Cádiz (Spain)
>> >> Projects: AVBOT | StatMediaWiki | WikiEvidens | WikiPapers |
>> >> WikiTeam Personal website:
>> >>
https://sites.google.com/site/emijrp/
>> >>
>> >>
>> >> _______________________________________________
>> >> Xmldatadumps-l mailing list
>> >> Xmldatadumps-l(a)lists.wikimedia.org
>> >>
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>> >>
>> >
>> >
>> >
>> > --
>> > James Michael DuPont
>> > Member of Free Libre Open Source Software Kosova
>> >
http://flossk.org Contributor FOSM, the CC-BY-SA map of the
>> > world
http://fosm.org Mozilla Rep
>> >
https://reps.mozilla.org/u/h4ck3rm1k3
>>
>>
>>
>> --
>> James Michael DuPont
>> Member of Free Libre Open Source Software Kosova
http://flossk.org
>> Contributor FOSM, the CC-BY-SA map of the world
http://fosm.org
>> Mozilla Rep
https://reps.mozilla.org/u/h4ck3rm1k3
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
>
> --
> Regards,
> Hydriz
>
> We've created the greatest collection of shared knowledge in
> history. Help protect Wikipedia. Donate now:
>
http://donate.wikimedia.org
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
James Michael DuPont
Member of Free Libre Open Source Software Kosova
http://flossk.org
Contributor FOSM, the CC-BY-SA map of the world
http://fosm.org
Mozilla Rep
https://reps.mozilla.org/u/h4ck3rm1k3