Dear Ariel,
Consider that people who would need to use Torrent most of all cannot host
a mirrors - this is a situation of the little guy being asked to do the
heavy lifting.
It would be saving WMF significant resources, - it would be more efficient
than Rsync. Doing this outside the the WMF infrastructure does not make
sense (authenticity, automation) and is the reason why use of torrents has
failed traditionally. If the WMF does this - it should be possible for
users to leverage all the mirrors simultaneously - which is why torrents
are the preferred form of transport for Linux distribution.
Installing a torrent server should not significantly impact workload.
The main problems, as I see it, is to write a maintenance script to create
the magnet link/.torernt files once the dumps are generated and to publish
them on the dump servers.
With your blessing - I would try to help with it in the context of say a
labs, if it would be integrated into the dump release process.
Thanks for the great job with the dumps!
Oren Bochman
On Tue, Jun 5, 2012 at 3:15 PM, Ariel T. Glenn <ariel(a)wikimedia.org> wrote:
This is a place where volunteers can step in and make
it happen without
the need for Wikimedia's infrastructure. (This means I can concentrate
on my already very full plate of things too.)
http://meta.wikimedia.org/wiki/Data_dump_torrents
Have at!
Ariel
Στις 05-06-2012, ημέρα Τρι, και ώρα 08:57 -0400, ο/η Derric Atzrott
έγραψε:
I second this idea. Large archives should always
be available using
bittorrent. I would actually suggest posting magnet links for
them though
instead of .torrent files. This way you can leverage the acceptable source
feature of magnet links.
https://en.wikipedia.org/wiki/Magnet_URI_scheme#Web_links_to_the_file
This way we get the best of both worlds: the constant availability of
direct
downloads, and the reduction in load that p2p filesharing provides.
Thank you,
Derric Atzrott
-----Original Message-----
From: wikitech-l-bounces(a)lists.wikimedia.org [mailto:
wikitech-l-bounces(a)lists.wikimedia.org] On Behalf Of Oren Bochman
Sent: 05 June 2012 08:44
To: 'Wikimedia developers'
Subject: Re: [Wikitech-l] [Xmldatadumps-l] XML dumps/Media mirrors update
Any chance that these archived can be served via bittorent - so that
even partial
downloaders can become servers - leveraging p2p to reduce
overall bandwidth load on the servers and increase download times?
-----Original Message-----
From: wikitech-l-bounces(a)lists.wikimedia.org [mailto:
wikitech-l-bounces(a)lists.wikimedia.org] On Behalf Of Mike Dupont
Sent: Saturday, June 02, 2012 1:28 AM
To: Wikimedia developers; wikiteam-discuss(a)googlegroups.com
Subject: Re: [Wikitech-l] [Xmldatadumps-l] XML dumps/Media mirrors update
I have run cron archiving now every 30 minutes,
http://ia700802.us.archive.org/34/items/wikipedia-delete-2012-06/
it is amazing how fast the stuff gets deleted on
wikipedia.
what about the proposed deletes are there categories for that?
thanks
mike
On Wed, May 30, 2012 at 6:26 AM, Mike Dupont <
jamesmikedupont(a)googlemail.com> wrote:
>
https://github.com/h4ck3rm1k3/wikiteam code
here
>
> On Wed, May 30, 2012 at 6:26 AM, Mike Dupont
> <jamesmikedupont(a)googlemail.com> wrote:
>> Ok, I merged the code from wikteam and have a full history dump
>> script that uploads to
archive.org, next step is to fix the bucket
>> metadata in the script mike
>>
>> On Tue, May 29, 2012 at 3:08 AM, Mike Dupont
>> <jamesmikedupont(a)googlemail.com> wrote:
>>> Well, I have now updated the script to include the xml dump in raw
>>> format. I will have to add more information the
achive.org item, at
>>> least a basic readme.
>>> other thing is that the wikipybot does not support the full history
>>> it seems, so that I will have to move over to the wikiteam version
>>> and rework it, I just spent 2 hours on this so i am pretty happy for
>>> the first version.
>>>
>>> mike
>>>
>>> On Tue, May 29, 2012 at 1:52 AM, Hydriz Wikipedia <
admin(a)alphacorp.tk> wrote:
>>>> This is quite nice, though the
item's metadata is too little :)
>>>>
>>>> On Tue, May 29, 2012 at 3:40 AM, Mike Dupont
>>>> <jamesmikedupont(a)googlemail.com
>>>>> wrote:
>>>>
>>>>> first version of the Script is ready , it gets the versions, puts
>>>>> them in a zip and puts that on
archive.org
>>>>>
https://github.com/h4ck3rm1k3/pywikipediabot/blob/master/export_de
>>>>> leted.py
>>>>>
>>>>> here is an example output :
>>>>>
http://archive.org/details/wikipedia-delete-2012-05
>>>>>
>>>>>
http://ia601203.us.archive.org/24/items/wikipedia-delete-2012-05/a
>>>>> rchive2012-05-28T21:34:02.302183.zip
>>>>>
>>>>> I will cron this, and it should give a start of saving deleted
data.
>>>>> Articles will be exported
once a day, even if they they were
>>>>> exported yesterday as long as they are in one of the categories.
>>>>>
>>>>> mike
>>>>>
>>>>> On Mon, May 21, 2012 at 7:21 PM, Mike Dupont
>>>>> <jamesmikedupont(a)googlemail.com> wrote:
>>>>> > Thanks! and run that 1 time per day, they dont get deleted that
quickly.
>>>>> > mike
>>>>> >
>>>>> > On Mon, May 21, 2012 at 9:11 PM, emijrp
<emijrp(a)gmail.com>
wrote:
>>>>> >> Create a script
that makes a request to Special:Export using
>>>>> >> this
>>>>> category
>>>>> >> as feed
>>>>> >>
https://en.wikipedia.org/wiki/Category:Candidates_for_speedy_de
>>>>> >> letion
>>>>> >>
>>>>> >> More info
>>>>>
https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export
>>>>> >>
>>>>> >>
>>>>> >> 2012/5/21 Mike Dupont
<jamesmikedupont(a)googlemail.com>
>>>>> >>>
>>>>> >>> Well I whould be happy for items like this :
>>>>> >>>
http://en.wikipedia.org/wiki/Template:Db-a7
>>>>> >>> would it be possible to extract them easily?
>>>>> >>> mike
>>>>> >>>
>>>>> >>> On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn
>>>>> >>> <ariel(a)wikimedia.org>
>>>>> >>> wrote:
>>>>> >>> > There's a few other reasons articles get
deleted: copyright
>>>>> >>> > issues, personal identifying data, etc. This
makes
>>>>> >>> > maintaning the sort of mirror you propose
problematic,
although a similar mirror is here:
>>>>> >>> >
http://deletionpedia.dbatley.com/w/index.php?title=Main_Page
>>>>> >>> >
>>>>> >>> > The dumps contain only data publically available
at the time
>>>>> >>> > of the
>>>>> run,
>>>>> >>> > without deleted data.
>>>>> >>> >
>>>>> >>> > The articles aren't permanently deleted of
course. The
>>>>> >>> > revisions
>>>>> texts
>>>>> >>> > live on in the database, so a query on toolserver,
for
>>>>> >>> > example,
>>>>> could be
>>>>> >>> > used to get at them, but that would need to be for
research
purposes.
>>> >>> >
>>> >>> > Ariel
>>> >>> >
>>> >>> > Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike
>>> >>> > Dupont
>>> έγραψε:
>>> >>> >> Hi,
>>> >>> >> I am thinking about how to collect articles deleted
based
>>> >>> >> on the
>>> "not
>>> >>> >> notable" criteria,
>>> >>> >> is there any way we can extract them from the mysql
>>> >>> >> binlogs? how are these mirrors working? I would be
>>> >>> >> interested in setting up a mirror
>>> of
>>> >>> >> deleted data, at least that which is not
spam/vandalism
>>> >>> >> based on
>>> tags.
>>> >>> >> mike
>>> >>> >>
>>> >>> >> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn <
>>> ariel(a)wikimedia.org>
>>> >>> >> wrote:
>>> >>> >> > We now have three mirror sites, yay! The full
list is
>>> >>> >> > linked to
>>> from
>>> >>> >> >
http://dumps.wikimedia.org/ and is also available
at
>>> >>> >> >
>>> >>> >> >
>>>
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dum
>>> ps#Current_Mirrors
>>> >>> >> >
>>> >>> >> > Summarizing, we have:
>>> >>> >> >
>>> >>> >> > C3L (Brazil) with the last 5 good known dumps,
Masaryk
>>> >>> >> > University (Czech Republic) with the last 5 known
good
>>> dumps,
>>> >>> >> >
Your.org (USA) with the complete archive of dumps,
and
>>> >>> >> >
>>> >>> >> > for the latest version of uploaded media,
Your.org
with
>>> >>> >> > http/ftp/rsync access.
>>> >>> >> >
>>> >>> >> > Thanks to Carlos, Kevin and Yenya respectively at
the
>>> >>> >> > above sites
>>> for
>>> >>> >> > volunteering space, time and effort to make this
happen.
>>> >>> >> >
>>> >>> >> > As people noticed earlier, a series of media
tarballs
>>> >>> >> > per-project (excluding commons) is being
generated. As
>>> >>> >> > soon as the first run
>>> of
>>> >>> >> > these is complete we'll announce its location
and start
>>> >>> >> > generating them on a semi-regular basis.
>>> >>> >> >
>>> >>> >> > As we've been getting the bugs out of the
mirroring
>>> >>> >> > setup, it is getting easier to add new locations.
Know
>>> >>> >> > anyone interested? Please let
>>> us
>>> >>> >> > know; we would love to have them.
>>> >>> >> >
>>> >>> >> > Ariel
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > _______________________________________________
>>> >>> >> > Wikitech-l mailing list
>>> >>> >> > Wikitech-l(a)lists.wikimedia.org
>>> >>> >> >
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > _______________________________________________
>>> >>> > Wikitech-l mailing list
>>> >>> > Wikitech-l(a)lists.wikimedia.org
>>> >>> >
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> James Michael DuPont
>>> >>> Member of Free Libre Open Source Software Kosova
>>> >>>
http://flossk.org Contributor FOSM, the CC-BY-SA map of the
>>> >>> world
http://fosm.org Mozilla Rep
>>> >>>
https://reps.mozilla.org/u/h4ck3rm1k3
>>> >>>
>>> >>> _______________________________________________
>>> >>> Wikitech-l mailing list
>>> >>> Wikitech-l(a)lists.wikimedia.org
>>> >>>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
>>> >> Pre-doctoral student at the University of Cádiz (Spain)
>>> >> Projects: AVBOT | StatMediaWiki | WikiEvidens | WikiPapers |
>>> >> WikiTeam Personal website:
>>> >>
https://sites.google.com/site/emijrp/
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Xmldatadumps-l mailing list
>>> >> Xmldatadumps-l(a)lists.wikimedia.org
>>> >>
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > James Michael DuPont
>>> > Member of Free Libre Open Source Software Kosova
>>> >
http://flossk.org Contributor FOSM, the CC-BY-SA map of the
>>> > world
http://fosm.org Mozilla Rep
>>> >
https://reps.mozilla.org/u/h4ck3rm1k3
>>>
>>>
>>>
>>> --
>>> James Michael DuPont
>>> Member of Free Libre Open Source Software Kosova
http://flossk.org
>>> Contributor FOSM, the CC-BY-SA map of the world
http://fosm.org
>>> Mozilla Rep
https://reps.mozilla.org/u/h4ck3rm1k3
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> Wikitech-l(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>
>>
>>
>> --
>> Regards,
>> Hydriz
>>
>> We've created the greatest collection of shared knowledge in
>> history. Help protect Wikipedia. Donate now:
>>
http://donate.wikimedia.org
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> James Michael DuPont
> Member of Free Libre Open Source Software Kosova
http://flossk.org
> Contributor FOSM, the CC-BY-SA map of the world
http://fosm.org
> Mozilla Rep
https://reps.mozilla.org/u/h4ck3rm1k3
--
James Michael DuPont
Member of Free Libre Open Source Software Kosova
http://flossk.org
Contributor FOSM, the CC-BY-SA map of the world
http://fosm.org
Mozilla Rep
https://reps.mozilla.org/u/h4ck3rm1k3
--
James Michael DuPont
Member of Free Libre Open Source Software Kosova
http://flossk.org
Contributor FOSM, the CC-BY-SA map of the world
http://fosm.org
Mozilla Rep
https://reps.mozilla.org/u/h4ck3rm1k3
--
James Michael DuPont
Member of Free Libre Open Source Software Kosova
http://flossk.orgContributor FOSM, the
CC-BY-SA map of the world
http://fosm.org Mozilla Rep
https://reps.mozilla.org/u/h4ck3rm1k3
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Oren Bochman
Office tel. 061 4921492
Mobile +36 30 866 6706
skype id: orenbochman
e-mail: oren(a)romai-horizon.com
site