Re: [Xmldatadumps-l] [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

List overview All Threads
Download

newer

older

POTY 2009 plus a few historical...

Re: [Xmldatadumps-l] Import of an...

John Vandenberg

11 Nov 2011 11 Nov '11

8:31 a.m.

On Fri, Nov 11, 2011 at 11:18 PM, emijrp emijrp@gmail.com wrote:

...

Forwarding...

---------- Forwarded message ---------- From: emijrp emijrp@gmail.com Date: 2011/11/11 Subject: Old English Wikipedia image dump from 2005 To: wikiteam-discuss@googlegroups.com

Hi all;

I want to share with you this Archive Team link[1]. It is an old English Wikipedia image dump from 2005. One of the last ones, probably, before Wikimedia Foundation stopped publishing image dumps. Enjoy.

Regards, emijrp

[1] http://www.archive.org/details/wikimedia-image-dump-2005-11

People interested in image dumps may be also interested in my post relating to the GFDL requirements, which I think mean images need to be included in the dumps.

https://meta.wikimedia.org/w/index.php?title=Talk:Terms_of_use&diff=prev...

excerpt:

"..the [GFDL] license requires that someone can download a ''complete'' Transparent copy for one year after the last Opaque copy is distributed. As a result, I believe the BoT needs to ensure that the dumps are available ''and'' that they can be available for one year after WMF turns of the lights on the core servers (it allows 'agents' to provide this service). As Wikipedia contains images, the images are required to be included. .."

discussion continues ..

https://meta.wikimedia.org/wiki/Talk:Terms_of_use#Right_to_Fork

-- John Vandenberg

Show replies by date

Ariel T. Glenn

16 Nov 16 Nov

2:40 p.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

Στις 12-11-2011, ημέρα Σαβ, και ώρα 00:31 +1100, ο/η John Vandenberg έγραψε:

...

On Fri, Nov 11, 2011 at 11:18 PM, emijrp emijrp@gmail.com wrote:

...
Forwarding...

---------- Forwarded message ---------- From: emijrp emijrp@gmail.com Date: 2011/11/11 Subject: Old English Wikipedia image dump from 2005 To: wikiteam-discuss@googlegroups.com

Hi all;

I want to share with you this Archive Team link[1]. It is an old English Wikipedia image dump from 2005. One of the last ones, probably, before Wikimedia Foundation stopped publishing image dumps. Enjoy.

Regards, emijrp

[1] http://www.archive.org/details/wikimedia-image-dump-2005-11

People interested in image dumps may be also interested in my post relating to the GFDL requirements, which I think mean images need to be included in the dumps.

https://meta.wikimedia.org/w/index.php?title=Talk:Terms_of_use&diff=prev...

excerpt:

"..the [GFDL] license requires that someone can download a ''complete'' Transparent copy for one year after the last Opaque copy is distributed. As a result, I believe the BoT needs to ensure that the dumps are available ''and'' that they can be available for one year after WMF turns of the lights on the core servers (it allows 'agents' to provide this service). As Wikipedia contains images, the images are required to be included. .."

discussion continues ..

https://meta.wikimedia.org/wiki/Talk:Terms_of_use#Right_to_Fork

I would read this as requiring access to the images to remain available, not necessarily in dump form.

Unrelated to that, I would still like it if there were a mirror of the Commons images; I guess this would avoid intellectual property rights issues that we could run into with non-free (i.e. fair use) images. In he past we have talked about how this could happen, most likely copying right off the disks. The conensus however was always that we would only be willing to do that if there was a guarantee from the instutition or group wanting the copy that they would maintain a publically accessible mirror.

Providing multiple terabyte sized files for download doesn't make any kind of sense to me. However, if we get concrete proposals for categories of Commons images people really want and would use, we can put those together. I think this has been said before on wikitech-l if not here.

So, can we find some institutions interested in 17T or so of data, growing every day? Educational instutions or organizations involved in open content would seem a good starting point.

Ariel

Erik Zachte

7:46 p.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

Ariel:

...

Providing multiple terabyte sized files for download doesn't make any kind of sense to me. However, if we get concrete proposals for categories of Commons images people really want and would use, we can put those together. I think this has been said before on wikitech-l if not here.

There is another way to cut down on download size, which would serve a whole class of content re-users, e.g. offline readers. For offline readers it is not so important to have pictures of 20 Mb each, rather to have pictures at all, preferably 10's Kb's in size. A download of all images, scaled down to say 600x600 max would be quite appropriate for many uses. Map and diagrams would not survive this scale down (illegible text), but are very compact already. In fact the compress ratio of each image is very reliable predictor of the type of content.

In 2005 I distributed a DVD [1] with all unabridged texts for English Wikipedia and all 320,000 images on one DVD, to be loaded on 4Gb CF card for handheld. Now we have 10 million images on Commons, so even scaled down images would need some filtering, but any collection would still be 100-1000 times smaller in size.

Erik Zachte

[1] http://www.infodisiac.com/Wikipedia/

Ariel T. Glenn

17 Nov 17 Nov

2:45 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

I had a quick look and it turns out that the English language Wikipedia uses over 2.8 million images today. So, as you point out, an off line reader that just used thumbnails would still have to be selective about its image use.

In any case, putting together collections of thumbs doesn't resolve the need for a mirror of the originals, which I would really like to see happen.

Ariel

Στις 17-11-2011, ημέρα Πεμ, και ώρα 01:46 +0100, ο/η Erik Zachte έγραψε:

...

Ariel:

...
Providing multiple terabyte sized files for download doesn't make any kind of sense to me. However, if we get concrete proposals for categories of Commons images people really want and would use, we can put those together. I think this has been said before on wikitech-l if not here.

There is another way to cut down on download size, which would serve a whole class of content re-users, e.g. offline readers. For offline readers it is not so important to have pictures of 20 Mb each, rather to have pictures at all, preferably 10's Kb's in size. A download of all images, scaled down to say 600x600 max would be quite appropriate for many uses. Map and diagrams would not survive this scale down (illegible text), but are very compact already. In fact the compress ratio of each image is very reliable predictor of the type of content.

In 2005 I distributed a DVD [1] with all unabridged texts for English Wikipedia and all 320,000 images on one DVD, to be loaded on 4Gb CF card for handheld. Now we have 10 million images on Commons, so even scaled down images would need some filtering, but any collection would still be 100-1000 times smaller in size.

Erik Zachte

[1] http://www.infodisiac.com/Wikipedia/

Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

emijrp

8:11 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

People can't mirror Commons if there is no public image dump. As there is no public image dump, people don't care about mirror. And so on...

You can offer monthly incremental image dumps.[1] Until mid-2008, month uploads are lower than 100 GB. Recently, it is on the 200-300GB rage. People is mirroring Domas visit logs at Internet Archive, ok, Commons monthly size in this case is about 10x, but it is not impossible. Arcnhive Team has mirrored GeoCities (0.9TB), Yahoo! Videos (20TB), Jamendo (2.5TB) and other huge sites. So, if you put that image dumps online, they are going to rage-download all.

You can start offering full resolution monthly dumps until 2007 or similar. But, man, we have to restart this soon or later.

[1] http://archiveteam.org/index.php?title=Wikimedia_Commons#Size_stats

2011/11/17 Ariel T. Glenn ariel@wikimedia.org

...

I had a quick look and it turns out that the English language Wikipedia uses over 2.8 million images today. So, as you point out, an off line reader that just used thumbnails would still have to be selective about its image use.

In any case, putting together collections of thumbs doesn't resolve the need for a mirror of the originals, which I would really like to see happen.

Ariel

Στις 17-11-2011, ημέρα Πεμ, και ώρα 01:46 +0100, ο/η Erik Zachte έγραψε:

...
Ariel:

...
Providing multiple terabyte sized files for download doesn't make any

kind of sense to me. However, if we get concrete proposals for categories of Commons images people really want and would use, we can put those together. I think this has been said before on wikitech-l if not here.

...
There is another way to cut down on download size, which would serve a

whole class of content re-users, e.g. offline readers.

...
For offline readers it is not so important to have pictures of 20 Mb

each, rather to have pictures at all, preferably 10's Kb's in size.

...
A download of all images, scaled down to say 600x600 max would be quite

appropriate for many uses.

...
Map and diagrams would not survive this scale down (illegible text), but

are very compact already.

...
In fact the compress ratio of each image is very reliable predictor of

the type of content.

...
In 2005 I distributed a DVD [1] with all unabridged texts for English

Wikipedia and all 320,000 images on one DVD, to be loaded on 4Gb CF card for handheld.

...
Now we have 10 million images on Commons, so even scaled down images

would need some filtering, but any collection would still be 100-1000 times smaller in size.

...
Erik Zachte

[1] http://www.infodisiac.com/Wikipedia/

Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Ariel T. Glenn

18 Nov 18 Nov

3:59 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

As I said below, providing multiterabyte dumps does not seem reasonable to me. Monthly incrementals don't provide a workaround, unless you are suggesting that we put dumps online for every month since the beginning of the project. I think that a much more workable way to jump-start a mirror is to copy directly to disks in the datacenter, for an organization which will provide public access to its copy. This requires three things: 1) an organization that wants to host such a mirror, 2) them sending us disks, 3) me clearing it with Rob and with our datacenter tech, but he's agreed to this in principle in the past.

Ariel

Στις 17-11-2011, ημέρα Πεμ, και ώρα 14:11 +0100, ο/η emijrp έγραψε:

...

People can't mirror Commons if there is no public image dump. As there is no public image dump, people don't care about mirror. And so on...

You can offer monthly incremental image dumps.[1] Until mid-2008, month uploads are lower than 100 GB. Recently, it is on the 200-300GB rage. People is mirroring Domas visit logs at Internet Archive, ok, Commons monthly size in this case is about 10x, but it is not impossible. Arcnhive Team has mirrored GeoCities (0.9TB), Yahoo! Videos (20TB), Jamendo (2.5TB) and other huge sites. So, if you put that image dumps online, they are going to rage-download all.

You can start offering full resolution monthly dumps until 2007 or similar. But, man, we have to restart this soon or later.

[1] http://archiveteam.org/index.php?title=Wikimedia_Commons#Size_stats

2011/11/17 Ariel T. Glenn ariel@wikimedia.org I had a quick look and it turns out that the English language Wikipedia uses over 2.8 million images today. So, as you point out, an off line reader that just used thumbnails would still have to be selective about its image use.
    In any case, putting together collections of thumbs doesn't
    resolve the
    need for a mirror of the originals, which I would really like
    to see
    happen.

    Ariel

    Στις 17-11-2011, ημέρα Πεμ, και ώρα 01:46 +0100, ο/η Erik
    Zachte έγραψε:

    > Ariel:
    > > Providing multiple terabyte sized files for download
    doesn't make any kind of sense to me. However, if we get
    concrete proposals for categories of Commons images people
    really want and would use, we can put those together. I think
    this has been said before on wikitech-l if not here.
    >
    > There is another way to cut down on download size, which
    would serve a whole class of content re-users, e.g. offline
    readers.
    > For offline readers it is not so important to have pictures
    of 20 Mb each, rather to have pictures at all, preferably 10's
    Kb's in size.
    > A download of all images, scaled down to say 600x600 max
    would be quite appropriate for many uses.
    > Map and diagrams would not survive this scale down
    (illegible text), but are very compact already.
    > In fact the compress ratio of each image is very reliable
    predictor of the type of content.
    >
    > In 2005 I distributed a DVD [1] with all unabridged texts
    for English Wikipedia and all 320,000 images on one DVD, to be
    loaded on 4Gb CF card for handheld.
    > Now we have 10 million images on Commons, so even scaled
    down images would need some filtering, but any collection
    would still be 100-1000 times smaller in size.
    >
    > Erik Zachte
    >
    > [1] http://www.infodisiac.com/Wikipedia/
    >
    >
    >
    > _______________________________________________
    > Xmldatadumps-l mailing list
    > Xmldatadumps-l@lists.wikimedia.org
    > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l



    _______________________________________________
    Xmldatadumps-l mailing list
    Xmldatadumps-l@lists.wikimedia.org
    https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

emijrp

19 Nov 19 Nov

7:47 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

2011/11/18 Ariel T. Glenn ariel@wikimedia.org

...

As I said below, providing multiterabyte dumps does not seem reasonable to me.

What is the problem? Bandwidth? Disk space?

...

Monthly incrementals don't provide a workaround, unless you are suggesting that we put dumps online for every month since the beginning of the project.

Yes, indeed.

...

I think that a much more workable way to jump-start a mirror is to copy directly to disks in the datacenter, for an organization which will provide public access to its copy. This requires three things: 1) an organization that wants to host such a mirror, 2) them sending us disks, 3) me clearing it with Rob and with our datacenter tech, but he's agreed to this in principle in the past.

Ariel

Στις 17-11-2011, ημέρα Πεμ, και ώρα 14:11 +0100, ο/η emijrp έγραψε:

...
People can't mirror Commons if there is no public image dump. As there is no public image dump, people don't care about mirror. And so on...

You can offer monthly incremental image dumps.[1] Until mid-2008, month uploads are lower than 100 GB. Recently, it is on the 200-300GB rage. People is mirroring Domas visit logs at Internet Archive, ok, Commons monthly size in this case is about 10x, but it is not impossible. Arcnhive Team has mirrored GeoCities (0.9TB), Yahoo! Videos (20TB), Jamendo (2.5TB) and other huge sites. So, if you put that image dumps online, they are going to rage-download all.

You can start offering full resolution monthly dumps until 2007 or similar. But, man, we have to restart this soon or later.

[1] http://archiveteam.org/index.php?title=Wikimedia_Commons#Size_stats

2011/11/17 Ariel T. Glenn ariel@wikimedia.org I had a quick look and it turns out that the English language Wikipedia uses over 2.8 million images today. So, as you point out, an off line reader that just used thumbnails would still have to be selective about its image use.
    In any case, putting together collections of thumbs doesn't
    resolve the
    need for a mirror of the originals, which I would really like
    to see
    happen.

    Ariel

    Στις 17-11-2011, ημέρα Πεμ, και ώρα 01:46 +0100, ο/η Erik
    Zachte έγραψε:

    > Ariel:
    > > Providing multiple terabyte sized files for download
    doesn't make any kind of sense to me. However, if we get
    concrete proposals for categories of Commons images people
    really want and would use, we can put those together. I think
    this has been said before on wikitech-l if not here.
    >
    > There is another way to cut down on download size, which
    would serve a whole class of content re-users, e.g. offline
    readers.
    > For offline readers it is not so important to have pictures
    of 20 Mb each, rather to have pictures at all, preferably 10's
    Kb's in size.
    > A download of all images, scaled down to say 600x600 max
    would be quite appropriate for many uses.
    > Map and diagrams would not survive this scale down
    (illegible text), but are very compact already.
    > In fact the compress ratio of each image is very reliable
    predictor of the type of content.
    >
    > In 2005 I distributed a DVD [1] with all unabridged texts
    for English Wikipedia and all 320,000 images on one DVD, to be
    loaded on 4Gb CF card for handheld.
    > Now we have 10 million images on Commons, so even scaled
    down images would need some filtering, but any collection
    would still be 100-1000 times smaller in size.
    >
    > Erik Zachte
    >
    > [1] http://www.infodisiac.com/Wikipedia/
    >
    >
    >
    > _______________________________________________
    > Xmldatadumps-l mailing list
    > Xmldatadumps-l@lists.wikimedia.org
    > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l



    _______________________________________________
    Xmldatadumps-l mailing list
    Xmldatadumps-l@lists.wikimedia.org
    https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Platonides

17 Nov 17 Nov

11:06 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

Erik Zachte wrote:

...

Ariel:

...
Providing multiple terabyte sized files for download doesn't make any kind of sense to me. However, if we get concrete proposals for categories of Commons

images people really want

...

...
and would use, we can put those together. I think this has been said before on wikitech-l if not here.

There is another way to cut down on download size, which would serve a whole class of content re-users, e.g. offline readers. For offline readers it is not so important to have pictures of 20 Mb each, rather to have pictures at all, preferably 10's Kb's in size. A download of all images, scaled down to say 600x600 max would be quite appropriate (...)

I made this tool last month, precisely to allow easy downloading all images from a given category (inspired by WLM needs). http://toolserver.org/~platonides/catdown/catdown.php

Your download is just a tiny script with the list of urls to download, but enough for doing it without further manual intervention. There's also a nice estimate on how much space you will need to finish the download.

John Vandenberg

18 Nov 18 Nov

4:33 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

On Thu, Nov 17, 2011 at 6:40 AM, Ariel T. Glenn ariel@wikimedia.org wrote:

...

Στις 12-11-2011, ημέρα Σαβ, και ώρα 00:31 +1100, ο/η John Vandenberg έγραψε:

...
On Fri, Nov 11, 2011 at 11:18 PM, emijrp emijrp@gmail.com wrote:

...
Forwarding...

---------- Forwarded message ---------- From: emijrp emijrp@gmail.com Date: 2011/11/11 Subject: Old English Wikipedia image dump from 2005 To: wikiteam-discuss@googlegroups.com

Hi all;

I want to share with you this Archive Team link[1]. It is an old English Wikipedia image dump from 2005. One of the last ones, probably, before Wikimedia Foundation stopped publishing image dumps. Enjoy.

Regards, emijrp

[1] http://www.archive.org/details/wikimedia-image-dump-2005-11

People interested in image dumps may be also interested in my post relating to the GFDL requirements, which I think mean images need to be included in the dumps.

https://meta.wikimedia.org/w/index.php?title=Talk:Terms_of_use&diff=prev...

excerpt:

"..the [GFDL] license requires that someone can download a ''complete'' Transparent copy for one year after the last Opaque copy is distributed. As a result, I believe the BoT needs to ensure that the dumps are available ''and'' that they can be available for one year after WMF turns of the lights on the core servers (it allows 'agents' to provide this service). As Wikipedia contains images, the images are required to be included. .."

discussion continues ..

https://meta.wikimedia.org/wiki/Talk:Terms_of_use#Right_to_Fork

I would read this as requiring access to the images to remain available, not necessarily in dump form.

I dont believe that is the case. The GFDL, like the GPL, requires that it is possible to rebuild the product from the distributed source, minus any seperately distributed dependencies.

It is necessary to provide a simple mechanism for reliably downloading the used images on each project and incorporating all of the dumps needed to regenerate a replica of each project.

The 'source' can be broken into chunks, but it would be obviously contray to the spirit of the license to require that each and every image needs to be downloaded individually.

_and_ it needs to be possible for any consumer to perform the task of obtaining the source. Does the WMF block people who attempt to mirror the project content one item at a time? IMO blocking them is very sane, but if that is the only way to obtain the source then it would again be breaking the licence.

InstantCommons means that those images dont need to be redistributed in order for the projects to be compliant with the GFDL.

-- John Vandenberg

Ariel T. Glenn

4:49 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

Στις 18-11-2011, ημέρα Παρ, και ώρα 20:33 +1100, ο/η John Vandenberg έγραψε:

...

On Thu, Nov 17, 2011 at 6:40 AM, Ariel T. Glenn ariel@wikimedia.org wrote:

...
Στις 12-11-2011, ημέρα Σαβ, και ώρα 00:31 +1100, ο/η John Vandenberg έγραψε:

...
On Fri, Nov 11, 2011 at 11:18 PM, emijrp emijrp@gmail.com wrote:

...
Forwarding...

---------- Forwarded message ---------- From: emijrp emijrp@gmail.com Date: 2011/11/11 Subject: Old English Wikipedia image dump from 2005 To: wikiteam-discuss@googlegroups.com

Hi all;

I want to share with you this Archive Team link[1]. It is an old English Wikipedia image dump from 2005. One of the last ones, probably, before Wikimedia Foundation stopped publishing image dumps. Enjoy.

Regards, emijrp

[1] http://www.archive.org/details/wikimedia-image-dump-2005-11

People interested in image dumps may be also interested in my post relating to the GFDL requirements, which I think mean images need to be included in the dumps.

https://meta.wikimedia.org/w/index.php?title=Talk:Terms_of_use&diff=prev...

excerpt:

"..the [GFDL] license requires that someone can download a ''complete'' Transparent copy for one year after the last Opaque copy is distributed. As a result, I believe the BoT needs to ensure that the dumps are available ''and'' that they can be available for one year after WMF turns of the lights on the core servers (it allows 'agents' to provide this service). As Wikipedia contains images, the images are required to be included. .."

discussion continues ..

https://meta.wikimedia.org/wiki/Talk:Terms_of_use#Right_to_Fork

I would read this as requiring access to the images to remain available, not necessarily in dump form.

I dont believe that is the case. The GFDL, like the GPL, requires that it is possible to rebuild the product from the distributed source, minus any seperately distributed dependencies.

It is necessary to provide a simple mechanism for reliably downloading the used images on each project and incorporating all of the dumps needed to regenerate a replica of each project.

The 'source' can be broken into chunks, but it would be obviously contray to the spirit of the license to require that each and every image needs to be downloaded individually.

There are scripts to download all media used on a project ( http://meta.wikimedia.org/wiki/Wikix ). As long as the end user runs one command, it doesn't matter what's happening on the back end.

...

_and_ it needs to be possible for any consumer to perform the task of obtaining the source. Does the WMF block people who attempt to mirror the project content one item at a time? IMO blocking them is very sane, but if that is the only way to obtain the source then it would again be breaking the licence.

AFAIK we do not block folks that are making serial requests, even if they crawl the entire media space. Serial requests don't incur a big cost on our servers.

...

InstantCommons means that those images dont need to be redistributed in order for the projects to be compliant with the GFDL.

-- John Vandenberg

However I would be happier if we had full media mirrors hosted by other folks (and they could provide packages of groups of files for download too).

Ariel

Ariel T. Glenn

2 Dec 2 Dec

4:38 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

Στις 18-11-2011, ημέρα Παρ, και ώρα 11:49 +0200, ο/η Ariel T. Glenn έγραψε:

...

There are scripts to download all media used on a project ( http://meta.wikimedia.org/wiki/Wikix ). As long as the end user runs one command, it doesn't matter what's happening on the back end.

...
_and_ it needs to be possible for any consumer to perform the task of obtaining the source. Does the WMF block people who attempt to mirror the project content one item at a time? IMO blocking them is very sane, but if that is the only way to obtain the source then it would again be breaking the licence.

AFAIK we do not block folks that are making serial requests, even if they crawl the entire media space. Serial requests don't incur a big cost on our servers.

I should clarify this.

Crawling the media server and requesting all images one at a time (as long as a pile of people aren't doing it at once) is fine. Requesting all images in a specific or several thumb sizes is not; in the first case we serve files that already exist while in the second case the files may need to be generated and put someplace. And we simply don't have space to keep generated thumbs of every image on commons in various arbitrary sizes at the moment. So folks that *do* want to crawl the media server and request thumbs for all of them should check in with me so we can figure out how to get you the data you need.

Ariel

emijrp

30 Jan 30 Jan

5:42 p.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

I see you are working on this https://wikitech.wikimedia.org/view/Dumps/Image_dumps

I don't have account there (how can i request one?). Why don't you offer incremental image backups, in one-day chunks? Since 2004-09-07 to (today - 1 year) to leave enough time to remove copyvios.

2011/12/2 Ariel T. Glenn ariel@wikimedia.org

...

Στις 18-11-2011, ημέρα Παρ, και ώρα 11:49 +0200, ο/η Ariel T. Glenn έγραψε:

...
There are scripts to download all media used on a project ( http://meta.wikimedia.org/wiki/Wikix ). As long as the end user runs one command, it doesn't matter what's happening on the back end.

...
_and_ it needs to be possible for any consumer to perform the task of obtaining the source. Does the WMF block people who attempt to mirror the project content one item at a time? IMO blocking them is very sane, but if that is the only way to obtain the source then it would again be breaking the licence.

AFAIK we do not block folks that are making serial requests, even if they crawl the entire media space. Serial requests don't incur a big cost on our servers.

I should clarify this.

Crawling the media server and requesting all images one at a time (as long as a pile of people aren't doing it at once) is fine. Requesting all images in a specific or several thumb sizes is not; in the first case we serve files that already exist while in the second case the files may need to be generated and put someplace. And we simply don't have space to keep generated thumbs of every image on commons in various arbitrary sizes at the moment. So folks that *do* want to crawl the media server and request thumbs for all of them should check in with me so we can figure out how to get you the data you need.

Ariel

Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

John Vandenberg

10:54 p.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

2012/1/31 emijrp emijrp@gmail.com:

...

I see you are working on this https://wikitech.wikimedia.org/view/Dumps/Image_dumps

I don't have account there (how can i request one?). Why don't you offer incremental image backups, in one-day chunks? Since 2004-09-07 to (today - 1 year) to leave enough time to remove copyvios.

I'd also like an account there; read-only would be OK so I can watch the progress!

-- John Vandenberg

Ariel T. Glenn

31 Jan 31 Jan

3:13 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

Στις 31-01-2012, ημέρα Τρι, και ώρα 14:54 +1100, ο/η John Vandenberg έγραψε:

...

2012/1/31 emijrp emijrp@gmail.com:

...
I see you are working on this https://wikitech.wikimedia.org/view/Dumps/Image_dumps

I don't have account there (how can i request one?). Why don't you offer incremental image backups, in one-day chunks? Since 2004-09-07 to (today - 1 year) to leave enough time to remove copyvios.

I'd also like an account there; read-only would be OK so I can watch the progress!

You don't need an account to read the content, only to edit.

Ariel

K. Peachey

4:17 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

On Tue, Jan 31, 2012 at 6:13 PM, Ariel T. Glenn ariel@wikimedia.org wrote:

...

You don't need an account to read the content, only to edit.

Ariel

I believe they mean watchlisting (so they get email notifs)

(If email alerts are even activated over there)

Ariel T. Glenn

4:32 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

Στις 31-01-2012, ημέρα Τρι, και ώρα 19:17 +1000, ο/η K. Peachey έγραψε:

...

On Tue, Jan 31, 2012 at 6:13 PM, Ariel T. Glenn ariel@wikimedia.org wrote:

...
You don't need an account to read the content, only to edit.

Ariel

I believe they mean watchlisting (so they get email notifs)

(If email alerts are even activated over there)

I never use that feature so it never even crossed my mind... Yes, it's enabled on wikitech. So, username and the email address you want, email me off list and I'll set you up (John and anyone else).

Ariel

Federico Leva (Nemo)

4:54 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

K. Peachey, 31/01/2012 10:17:

...

On Tue, Jan 31, 2012 at 6:13 PM, Ariel T. Glennariel@wikimedia.org wrote:

...
You don't need an account to read the content, only to edit.

Ariel

I believe they mean watchlisting (so they get email notifs)

(If email alerts are even activated over there)

You can also use Atom feeds anyway.

Nemo

Ariel T. Glenn

3:08 a.m.

New subject: [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

I don't plan to do dailies any time soon. We don't even have real incrementals for the text revs, which people have been begging for forever; cleaning up the current adds/changes dumps and making them more useful (and making them stable) has to be first. For the images, we need to get the main bulk of the images out of here and into other folks' hands first. Dailies, if they were to happen, would be quite some time down the road; it's why I haven't written them into the plan. Note that we don't have a place to keep a second copy of everything from 2004 til now, which is another reason I can't go that route right now.

To get an account on wikitech, please give me a user name you want and an email address you prefer and I'll set you up.

Ariel

Στις 30-01-2012, ημέρα Δευ, και ώρα 23:42 +0100, ο/η emijrp έγραψε:

...

I see you are working on this https://wikitech.wikimedia.org/view/Dumps/Image_dumps

I don't have account there (how can i request one?). Why don't you offer incremental image backups, in one-day chunks? Since 2004-09-07 to (today - 1 year) to leave enough time to remove copyvios.

2011/12/2 Ariel T. Glenn ariel@wikimedia.org Στις 18-11-2011, ημέρα Παρ, και ώρα 11:49 +0200, ο/η Ariel T. Glenn έγραψε:

    >
    > There are scripts to download all media used on a project
    > ( http://meta.wikimedia.org/wiki/Wikix ).  As long as the
    end user runs
    > one command, it doesn't matter what's happening on the back
    end.
    >
    > > _and_ it needs to be possible for any consumer to perform
    the task of
    > > obtaining the source.  Does the WMF block people who
    attempt to mirror
    > > the project content one item at a time?  IMO blocking them
    is very
    > > sane, but if that is the only way to obtain the source
    then it would
    > > again be breaking the licence.
    >
    > AFAIK we do not block folks that are making serial requests,
    even if
    > they crawl the entire media space.  Serial requests don't
    incur a big
    > cost on our servers.


    I should clarify this.

    Crawling the media server and requesting all images one at a
    time (as
    long as a pile of people aren't doing it at once) is fine.
     Requesting
    all images in a specific or several thumb sizes is not; in the
    first
    case we serve files that already exist while in the second
    case the
    files may need to be generated and put someplace.  And we
    simply don't
    have space to keep generated thumbs of every image on commons
    in various
    arbitrary sizes at the moment.  So folks that *do* want to
    crawl the
    media server and request thumbs for all of them should check
    in with me
    so we can figure out how to get you the data you need.


    Ariel



    _______________________________________________
    Xmldatadumps-l mailing list
    Xmldatadumps-l@lists.wikimedia.org
    https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

4538

Age (days ago)

4619

Last active (days ago)

xmldatadumps-l@lists.wikimedia.org

17 comments

7 participants

tags (0)

participants (7)

Ariel T. Glenn
emijrp
Erik Zachte
Federico Leva (Nemo)
John Vandenberg
K. Peachey
Platonides