On Fri, Nov 11, 2011 at 11:18 PM, emijrp emijrp@gmail.com wrote:
Forwarding...
---------- Forwarded message ---------- From: emijrp emijrp@gmail.com Date: 2011/11/11 Subject: Old English Wikipedia image dump from 2005 To: wikiteam-discuss@googlegroups.com
Hi all;
I want to share with you this Archive Team link[1]. It is an old English Wikipedia image dump from 2005. One of the last ones, probably, before Wikimedia Foundation stopped publishing image dumps. Enjoy.
Regards, emijrp
[1] http://www.archive.org/details/wikimedia-image-dump-2005-11
People interested in image dumps may be also interested in my post relating to the GFDL requirements, which I think mean images need to be included in the dumps.
https://meta.wikimedia.org/w/index.php?title=Talk:Terms_of_use&diff=prev...
excerpt:
"..the [GFDL] license requires that someone can download a ''complete'' Transparent copy for one year after the last Opaque copy is distributed. As a result, I believe the BoT needs to ensure that the dumps are available ''and'' that they can be available for one year after WMF turns of the lights on the core servers (it allows 'agents' to provide this service). As Wikipedia contains images, the images are required to be included. .."
discussion continues ..
https://meta.wikimedia.org/wiki/Talk:Terms_of_use#Right_to_Fork
Στις 12-11-2011, ημέρα Σαβ, και ώρα 00:31 +1100, ο/η John Vandenberg έγραψε:
On Fri, Nov 11, 2011 at 11:18 PM, emijrp emijrp@gmail.com wrote:
Forwarding...
---------- Forwarded message ---------- From: emijrp emijrp@gmail.com Date: 2011/11/11 Subject: Old English Wikipedia image dump from 2005 To: wikiteam-discuss@googlegroups.com
Hi all;
I want to share with you this Archive Team link[1]. It is an old English Wikipedia image dump from 2005. One of the last ones, probably, before Wikimedia Foundation stopped publishing image dumps. Enjoy.
Regards, emijrp
[1] http://www.archive.org/details/wikimedia-image-dump-2005-11
People interested in image dumps may be also interested in my post relating to the GFDL requirements, which I think mean images need to be included in the dumps.
https://meta.wikimedia.org/w/index.php?title=Talk:Terms_of_use&diff=prev...
excerpt:
"..the [GFDL] license requires that someone can download a ''complete'' Transparent copy for one year after the last Opaque copy is distributed. As a result, I believe the BoT needs to ensure that the dumps are available ''and'' that they can be available for one year after WMF turns of the lights on the core servers (it allows 'agents' to provide this service). As Wikipedia contains images, the images are required to be included. .."
discussion continues ..
https://meta.wikimedia.org/wiki/Talk:Terms_of_use#Right_to_Fork
I would read this as requiring access to the images to remain available, not necessarily in dump form.
Unrelated to that, I would still like it if there were a mirror of the Commons images; I guess this would avoid intellectual property rights issues that we could run into with non-free (i.e. fair use) images. In he past we have talked about how this could happen, most likely copying right off the disks. The conensus however was always that we would only be willing to do that if there was a guarantee from the instutition or group wanting the copy that they would maintain a publically accessible mirror.
Providing multiple terabyte sized files for download doesn't make any kind of sense to me. However, if we get concrete proposals for categories of Commons images people really want and would use, we can put those together. I think this has been said before on wikitech-l if not here.
So, can we find some institutions interested in 17T or so of data, growing every day? Educational instutions or organizations involved in open content would seem a good starting point.
Ariel
Ariel:
Providing multiple terabyte sized files for download doesn't make any kind of sense to me. However, if we get concrete proposals for categories of Commons images people really want and would use, we can put those together. I think this has been said before on wikitech-l if not here.
There is another way to cut down on download size, which would serve a whole class of content re-users, e.g. offline readers. For offline readers it is not so important to have pictures of 20 Mb each, rather to have pictures at all, preferably 10's Kb's in size. A download of all images, scaled down to say 600x600 max would be quite appropriate for many uses. Map and diagrams would not survive this scale down (illegible text), but are very compact already. In fact the compress ratio of each image is very reliable predictor of the type of content.
In 2005 I distributed a DVD [1] with all unabridged texts for English Wikipedia and all 320,000 images on one DVD, to be loaded on 4Gb CF card for handheld. Now we have 10 million images on Commons, so even scaled down images would need some filtering, but any collection would still be 100-1000 times smaller in size.
Erik Zachte
I had a quick look and it turns out that the English language Wikipedia uses over 2.8 million images today. So, as you point out, an off line reader that just used thumbnails would still have to be selective about its image use.
In any case, putting together collections of thumbs doesn't resolve the need for a mirror of the originals, which I would really like to see happen.
Ariel
Στις 17-11-2011, ημέρα Πεμ, και ώρα 01:46 +0100, ο/η Erik Zachte έγραψε:
Ariel:
Providing multiple terabyte sized files for download doesn't make any kind of sense to me. However, if we get concrete proposals for categories of Commons images people really want and would use, we can put those together. I think this has been said before on wikitech-l if not here.
There is another way to cut down on download size, which would serve a whole class of content re-users, e.g. offline readers. For offline readers it is not so important to have pictures of 20 Mb each, rather to have pictures at all, preferably 10's Kb's in size. A download of all images, scaled down to say 600x600 max would be quite appropriate for many uses. Map and diagrams would not survive this scale down (illegible text), but are very compact already. In fact the compress ratio of each image is very reliable predictor of the type of content.
In 2005 I distributed a DVD [1] with all unabridged texts for English Wikipedia and all 320,000 images on one DVD, to be loaded on 4Gb CF card for handheld. Now we have 10 million images on Commons, so even scaled down images would need some filtering, but any collection would still be 100-1000 times smaller in size.
Erik Zachte
[1] http://www.infodisiac.com/Wikipedia/
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
People can't mirror Commons if there is no public image dump. As there is no public image dump, people don't care about mirror. And so on...
You can offer monthly incremental image dumps.[1] Until mid-2008, month uploads are lower than 100 GB. Recently, it is on the 200-300GB rage. People is mirroring Domas visit logs at Internet Archive, ok, Commons monthly size in this case is about 10x, but it is not impossible. Arcnhive Team has mirrored GeoCities (0.9TB), Yahoo! Videos (20TB), Jamendo (2.5TB) and other huge sites. So, if you put that image dumps online, they are going to rage-download all.
You can start offering full resolution monthly dumps until 2007 or similar. But, man, we have to restart this soon or later.
[1] http://archiveteam.org/index.php?title=Wikimedia_Commons#Size_stats
2011/11/17 Ariel T. Glenn ariel@wikimedia.org
I had a quick look and it turns out that the English language Wikipedia uses over 2.8 million images today. So, as you point out, an off line reader that just used thumbnails would still have to be selective about its image use.
In any case, putting together collections of thumbs doesn't resolve the need for a mirror of the originals, which I would really like to see happen.
Ariel
Στις 17-11-2011, ημέρα Πεμ, και ώρα 01:46 +0100, ο/η Erik Zachte έγραψε:
Ariel:
Providing multiple terabyte sized files for download doesn't make any
kind of sense to me. However, if we get concrete proposals for categories of Commons images people really want and would use, we can put those together. I think this has been said before on wikitech-l if not here.
There is another way to cut down on download size, which would serve a
whole class of content re-users, e.g. offline readers.
For offline readers it is not so important to have pictures of 20 Mb
each, rather to have pictures at all, preferably 10's Kb's in size.
A download of all images, scaled down to say 600x600 max would be quite
appropriate for many uses.
Map and diagrams would not survive this scale down (illegible text), but
are very compact already.
In fact the compress ratio of each image is very reliable predictor of
the type of content.
In 2005 I distributed a DVD [1] with all unabridged texts for English
Wikipedia and all 320,000 images on one DVD, to be loaded on 4Gb CF card for handheld.
Now we have 10 million images on Commons, so even scaled down images
would need some filtering, but any collection would still be 100-1000 times smaller in size.
Erik Zachte
[1] http://www.infodisiac.com/Wikipedia/
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
As I said below, providing multiterabyte dumps does not seem reasonable to me. Monthly incrementals don't provide a workaround, unless you are suggesting that we put dumps online for every month since the beginning of the project. I think that a much more workable way to jump-start a mirror is to copy directly to disks in the datacenter, for an organization which will provide public access to its copy. This requires three things: 1) an organization that wants to host such a mirror, 2) them sending us disks, 3) me clearing it with Rob and with our datacenter tech, but he's agreed to this in principle in the past.
Ariel
Στις 17-11-2011, ημέρα Πεμ, και ώρα 14:11 +0100, ο/η emijrp έγραψε:
People can't mirror Commons if there is no public image dump. As there is no public image dump, people don't care about mirror. And so on...
You can offer monthly incremental image dumps.[1] Until mid-2008, month uploads are lower than 100 GB. Recently, it is on the 200-300GB rage. People is mirroring Domas visit logs at Internet Archive, ok, Commons monthly size in this case is about 10x, but it is not impossible. Arcnhive Team has mirrored GeoCities (0.9TB), Yahoo! Videos (20TB), Jamendo (2.5TB) and other huge sites. So, if you put that image dumps online, they are going to rage-download all.
You can start offering full resolution monthly dumps until 2007 or similar. But, man, we have to restart this soon or later.
[1] http://archiveteam.org/index.php?title=Wikimedia_Commons#Size_stats
2011/11/17 Ariel T. Glenn ariel@wikimedia.org I had a quick look and it turns out that the English language Wikipedia uses over 2.8 million images today. So, as you point out, an off line reader that just used thumbnails would still have to be selective about its image use.
In any case, putting together collections of thumbs doesn't resolve the need for a mirror of the originals, which I would really like to see happen. Ariel Στις 17-11-2011, ημέρα Πεμ, και ώρα 01:46 +0100, ο/η Erik Zachte έγραψε: > Ariel: > > Providing multiple terabyte sized files for download doesn't make any kind of sense to me. However, if we get concrete proposals for categories of Commons images people really want and would use, we can put those together. I think this has been said before on wikitech-l if not here. > > There is another way to cut down on download size, which would serve a whole class of content re-users, e.g. offline readers. > For offline readers it is not so important to have pictures of 20 Mb each, rather to have pictures at all, preferably 10's Kb's in size. > A download of all images, scaled down to say 600x600 max would be quite appropriate for many uses. > Map and diagrams would not survive this scale down (illegible text), but are very compact already. > In fact the compress ratio of each image is very reliable predictor of the type of content. > > In 2005 I distributed a DVD [1] with all unabridged texts for English Wikipedia and all 320,000 images on one DVD, to be loaded on 4Gb CF card for handheld. > Now we have 10 million images on Commons, so even scaled down images would need some filtering, but any collection would still be 100-1000 times smaller in size. > > Erik Zachte > > [1] http://www.infodisiac.com/Wikipedia/ > > > > _______________________________________________ > Xmldatadumps-l mailing list > Xmldatadumps-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
2011/11/18 Ariel T. Glenn ariel@wikimedia.org
As I said below, providing multiterabyte dumps does not seem reasonable to me.
What is the problem? Bandwidth? Disk space?
Monthly incrementals don't provide a workaround, unless you are suggesting that we put dumps online for every month since the beginning of the project.
Yes, indeed.
I think that a much more workable way to jump-start a mirror is to copy directly to disks in the datacenter, for an organization which will provide public access to its copy. This requires three things: 1) an organization that wants to host such a mirror, 2) them sending us disks, 3) me clearing it with Rob and with our datacenter tech, but he's agreed to this in principle in the past.
Ariel
Στις 17-11-2011, ημέρα Πεμ, και ώρα 14:11 +0100, ο/η emijrp έγραψε:
People can't mirror Commons if there is no public image dump. As there is no public image dump, people don't care about mirror. And so on...
You can offer monthly incremental image dumps.[1] Until mid-2008, month uploads are lower than 100 GB. Recently, it is on the 200-300GB rage. People is mirroring Domas visit logs at Internet Archive, ok, Commons monthly size in this case is about 10x, but it is not impossible. Arcnhive Team has mirrored GeoCities (0.9TB), Yahoo! Videos (20TB), Jamendo (2.5TB) and other huge sites. So, if you put that image dumps online, they are going to rage-download all.
You can start offering full resolution monthly dumps until 2007 or similar. But, man, we have to restart this soon or later.
[1] http://archiveteam.org/index.php?title=Wikimedia_Commons#Size_stats
2011/11/17 Ariel T. Glenn ariel@wikimedia.org I had a quick look and it turns out that the English language Wikipedia uses over 2.8 million images today. So, as you point out, an off line reader that just used thumbnails would still have to be selective about its image use.
In any case, putting together collections of thumbs doesn't resolve the need for a mirror of the originals, which I would really like to see happen. Ariel Στις 17-11-2011, ημέρα Πεμ, και ώρα 01:46 +0100, ο/η Erik Zachte έγραψε: > Ariel: > > Providing multiple terabyte sized files for download doesn't make any kind of sense to me. However, if we get concrete proposals for categories of Commons images people really want and would use, we can put those together. I think this has been said before on wikitech-l if not here. > > There is another way to cut down on download size, which would serve a whole class of content re-users, e.g. offline readers. > For offline readers it is not so important to have pictures of 20 Mb each, rather to have pictures at all, preferably 10's Kb's in size. > A download of all images, scaled down to say 600x600 max would be quite appropriate for many uses. > Map and diagrams would not survive this scale down (illegible text), but are very compact already. > In fact the compress ratio of each image is very reliable predictor of the type of content. > > In 2005 I distributed a DVD [1] with all unabridged texts for English Wikipedia and all 320,000 images on one DVD, to be loaded on 4Gb CF card for handheld. > Now we have 10 million images on Commons, so even scaled down images would need some filtering, but any collection would still be 100-1000 times smaller in size. > > Erik Zachte > > [1] http://www.infodisiac.com/Wikipedia/ > > > > _______________________________________________ > Xmldatadumps-l mailing list > Xmldatadumps-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Erik Zachte wrote:
Ariel:
Providing multiple terabyte sized files for download doesn't make any kind of sense to me. However, if we get concrete proposals for categories of Commons
images people really want
and would use, we can put those together. I think this has been said before on wikitech-l if not here.
There is another way to cut down on download size, which would serve a whole class of content re-users, e.g. offline readers. For offline readers it is not so important to have pictures of 20 Mb each, rather to have pictures at all, preferably 10's Kb's in size. A download of all images, scaled down to say 600x600 max would be quite appropriate (...)
I made this tool last month, precisely to allow easy downloading all images from a given category (inspired by WLM needs). http://toolserver.org/~platonides/catdown/catdown.php
Your download is just a tiny script with the list of urls to download, but enough for doing it without further manual intervention. There's also a nice estimate on how much space you will need to finish the download.
On Thu, Nov 17, 2011 at 6:40 AM, Ariel T. Glenn ariel@wikimedia.org wrote:
Στις 12-11-2011, ημέρα Σαβ, και ώρα 00:31 +1100, ο/η John Vandenberg έγραψε:
On Fri, Nov 11, 2011 at 11:18 PM, emijrp emijrp@gmail.com wrote:
Forwarding...
---------- Forwarded message ---------- From: emijrp emijrp@gmail.com Date: 2011/11/11 Subject: Old English Wikipedia image dump from 2005 To: wikiteam-discuss@googlegroups.com
Hi all;
I want to share with you this Archive Team link[1]. It is an old English Wikipedia image dump from 2005. One of the last ones, probably, before Wikimedia Foundation stopped publishing image dumps. Enjoy.
Regards, emijrp
[1] http://www.archive.org/details/wikimedia-image-dump-2005-11
People interested in image dumps may be also interested in my post relating to the GFDL requirements, which I think mean images need to be included in the dumps.
https://meta.wikimedia.org/w/index.php?title=Talk:Terms_of_use&diff=prev...
excerpt:
"..the [GFDL] license requires that someone can download a ''complete'' Transparent copy for one year after the last Opaque copy is distributed. As a result, I believe the BoT needs to ensure that the dumps are available ''and'' that they can be available for one year after WMF turns of the lights on the core servers (it allows 'agents' to provide this service). As Wikipedia contains images, the images are required to be included. .."
discussion continues ..
https://meta.wikimedia.org/wiki/Talk:Terms_of_use#Right_to_Fork
I would read this as requiring access to the images to remain available, not necessarily in dump form.
I dont believe that is the case. The GFDL, like the GPL, requires that it is possible to rebuild the product from the distributed source, minus any seperately distributed dependencies.
It is necessary to provide a simple mechanism for reliably downloading the used images on each project and incorporating all of the dumps needed to regenerate a replica of each project.
The 'source' can be broken into chunks, but it would be obviously contray to the spirit of the license to require that each and every image needs to be downloaded individually.
_and_ it needs to be possible for any consumer to perform the task of obtaining the source. Does the WMF block people who attempt to mirror the project content one item at a time? IMO blocking them is very sane, but if that is the only way to obtain the source then it would again be breaking the licence.
InstantCommons means that those images dont need to be redistributed in order for the projects to be compliant with the GFDL.
-- John Vandenberg
Στις 18-11-2011, ημέρα Παρ, και ώρα 20:33 +1100, ο/η John Vandenberg έγραψε:
On Thu, Nov 17, 2011 at 6:40 AM, Ariel T. Glenn ariel@wikimedia.org wrote:
Στις 12-11-2011, ημέρα Σαβ, και ώρα 00:31 +1100, ο/η John Vandenberg έγραψε:
On Fri, Nov 11, 2011 at 11:18 PM, emijrp emijrp@gmail.com wrote:
Forwarding...
---------- Forwarded message ---------- From: emijrp emijrp@gmail.com Date: 2011/11/11 Subject: Old English Wikipedia image dump from 2005 To: wikiteam-discuss@googlegroups.com
Hi all;
I want to share with you this Archive Team link[1]. It is an old English Wikipedia image dump from 2005. One of the last ones, probably, before Wikimedia Foundation stopped publishing image dumps. Enjoy.
Regards, emijrp
[1] http://www.archive.org/details/wikimedia-image-dump-2005-11
People interested in image dumps may be also interested in my post relating to the GFDL requirements, which I think mean images need to be included in the dumps.
https://meta.wikimedia.org/w/index.php?title=Talk:Terms_of_use&diff=prev...
excerpt:
"..the [GFDL] license requires that someone can download a ''complete'' Transparent copy for one year after the last Opaque copy is distributed. As a result, I believe the BoT needs to ensure that the dumps are available ''and'' that they can be available for one year after WMF turns of the lights on the core servers (it allows 'agents' to provide this service). As Wikipedia contains images, the images are required to be included. .."
discussion continues ..
https://meta.wikimedia.org/wiki/Talk:Terms_of_use#Right_to_Fork
I would read this as requiring access to the images to remain available, not necessarily in dump form.
I dont believe that is the case. The GFDL, like the GPL, requires that it is possible to rebuild the product from the distributed source, minus any seperately distributed dependencies.
It is necessary to provide a simple mechanism for reliably downloading the used images on each project and incorporating all of the dumps needed to regenerate a replica of each project.
The 'source' can be broken into chunks, but it would be obviously contray to the spirit of the license to require that each and every image needs to be downloaded individually.
There are scripts to download all media used on a project ( http://meta.wikimedia.org/wiki/Wikix ). As long as the end user runs one command, it doesn't matter what's happening on the back end.
_and_ it needs to be possible for any consumer to perform the task of obtaining the source. Does the WMF block people who attempt to mirror the project content one item at a time? IMO blocking them is very sane, but if that is the only way to obtain the source then it would again be breaking the licence.
AFAIK we do not block folks that are making serial requests, even if they crawl the entire media space. Serial requests don't incur a big cost on our servers.
InstantCommons means that those images dont need to be redistributed in order for the projects to be compliant with the GFDL.
-- John Vandenberg
However I would be happier if we had full media mirrors hosted by other folks (and they could provide packages of groups of files for download too).
Ariel
Στις 18-11-2011, ημέρα Παρ, και ώρα 11:49 +0200, ο/η Ariel T. Glenn έγραψε:
There are scripts to download all media used on a project ( http://meta.wikimedia.org/wiki/Wikix ). As long as the end user runs one command, it doesn't matter what's happening on the back end.
_and_ it needs to be possible for any consumer to perform the task of obtaining the source. Does the WMF block people who attempt to mirror the project content one item at a time? IMO blocking them is very sane, but if that is the only way to obtain the source then it would again be breaking the licence.
AFAIK we do not block folks that are making serial requests, even if they crawl the entire media space. Serial requests don't incur a big cost on our servers.
I should clarify this.
Crawling the media server and requesting all images one at a time (as long as a pile of people aren't doing it at once) is fine. Requesting all images in a specific or several thumb sizes is not; in the first case we serve files that already exist while in the second case the files may need to be generated and put someplace. And we simply don't have space to keep generated thumbs of every image on commons in various arbitrary sizes at the moment. So folks that *do* want to crawl the media server and request thumbs for all of them should check in with me so we can figure out how to get you the data you need.
Ariel
I see you are working on this https://wikitech.wikimedia.org/view/Dumps/Image_dumps
I don't have account there (how can i request one?). Why don't you offer incremental image backups, in one-day chunks? Since 2004-09-07 to (today - 1 year) to leave enough time to remove copyvios.
2011/12/2 Ariel T. Glenn ariel@wikimedia.org
Στις 18-11-2011, ημέρα Παρ, και ώρα 11:49 +0200, ο/η Ariel T. Glenn έγραψε:
There are scripts to download all media used on a project ( http://meta.wikimedia.org/wiki/Wikix ). As long as the end user runs one command, it doesn't matter what's happening on the back end.
_and_ it needs to be possible for any consumer to perform the task of obtaining the source. Does the WMF block people who attempt to mirror the project content one item at a time? IMO blocking them is very sane, but if that is the only way to obtain the source then it would again be breaking the licence.
AFAIK we do not block folks that are making serial requests, even if they crawl the entire media space. Serial requests don't incur a big cost on our servers.
I should clarify this.
Crawling the media server and requesting all images one at a time (as long as a pile of people aren't doing it at once) is fine. Requesting all images in a specific or several thumb sizes is not; in the first case we serve files that already exist while in the second case the files may need to be generated and put someplace. And we simply don't have space to keep generated thumbs of every image on commons in various arbitrary sizes at the moment. So folks that *do* want to crawl the media server and request thumbs for all of them should check in with me so we can figure out how to get you the data you need.
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
2012/1/31 emijrp emijrp@gmail.com:
I see you are working on this https://wikitech.wikimedia.org/view/Dumps/Image_dumps
I don't have account there (how can i request one?). Why don't you offer incremental image backups, in one-day chunks? Since 2004-09-07 to (today - 1 year) to leave enough time to remove copyvios.
I'd also like an account there; read-only would be OK so I can watch the progress!
Στις 31-01-2012, ημέρα Τρι, και ώρα 14:54 +1100, ο/η John Vandenberg έγραψε:
2012/1/31 emijrp emijrp@gmail.com:
I see you are working on this https://wikitech.wikimedia.org/view/Dumps/Image_dumps
I don't have account there (how can i request one?). Why don't you offer incremental image backups, in one-day chunks? Since 2004-09-07 to (today - 1 year) to leave enough time to remove copyvios.
I'd also like an account there; read-only would be OK so I can watch the progress!
You don't need an account to read the content, only to edit.
Ariel
On Tue, Jan 31, 2012 at 6:13 PM, Ariel T. Glenn ariel@wikimedia.org wrote:
You don't need an account to read the content, only to edit.
Ariel
I believe they mean watchlisting (so they get email notifs)
(If email alerts are even activated over there)
Στις 31-01-2012, ημέρα Τρι, και ώρα 19:17 +1000, ο/η K. Peachey έγραψε:
On Tue, Jan 31, 2012 at 6:13 PM, Ariel T. Glenn ariel@wikimedia.org wrote:
You don't need an account to read the content, only to edit.
Ariel
I believe they mean watchlisting (so they get email notifs)
(If email alerts are even activated over there)
I never use that feature so it never even crossed my mind... Yes, it's enabled on wikitech. So, username and the email address you want, email me off list and I'll set you up (John and anyone else).
Ariel
K. Peachey, 31/01/2012 10:17:
On Tue, Jan 31, 2012 at 6:13 PM, Ariel T. Glennariel@wikimedia.org wrote:
You don't need an account to read the content, only to edit.
Ariel
I believe they mean watchlisting (so they get email notifs)
(If email alerts are even activated over there)
You can also use Atom feeds anyway.
Nemo
I don't plan to do dailies any time soon. We don't even have real incrementals for the text revs, which people have been begging for forever; cleaning up the current adds/changes dumps and making them more useful (and making them stable) has to be first. For the images, we need to get the main bulk of the images out of here and into other folks' hands first. Dailies, if they were to happen, would be quite some time down the road; it's why I haven't written them into the plan. Note that we don't have a place to keep a second copy of everything from 2004 til now, which is another reason I can't go that route right now.
To get an account on wikitech, please give me a user name you want and an email address you prefer and I'll set you up.
Ariel
Στις 30-01-2012, ημέρα Δευ, και ώρα 23:42 +0100, ο/η emijrp έγραψε:
I see you are working on this https://wikitech.wikimedia.org/view/Dumps/Image_dumps
I don't have account there (how can i request one?). Why don't you offer incremental image backups, in one-day chunks? Since 2004-09-07 to (today - 1 year) to leave enough time to remove copyvios.
2011/12/2 Ariel T. Glenn ariel@wikimedia.org Στις 18-11-2011, ημέρα Παρ, και ώρα 11:49 +0200, ο/η Ariel T. Glenn έγραψε:
> > There are scripts to download all media used on a project > ( http://meta.wikimedia.org/wiki/Wikix ). As long as the end user runs > one command, it doesn't matter what's happening on the back end. > > > _and_ it needs to be possible for any consumer to perform the task of > > obtaining the source. Does the WMF block people who attempt to mirror > > the project content one item at a time? IMO blocking them is very > > sane, but if that is the only way to obtain the source then it would > > again be breaking the licence. > > AFAIK we do not block folks that are making serial requests, even if > they crawl the entire media space. Serial requests don't incur a big > cost on our servers. I should clarify this. Crawling the media server and requesting all images one at a time (as long as a pile of people aren't doing it at once) is fine. Requesting all images in a specific or several thumb sizes is not; in the first case we serve files that already exist while in the second case the files may need to be generated and put someplace. And we simply don't have space to keep generated thumbs of every image on commons in various arbitrary sizes at the moment. So folks that *do* want to crawl the media server and request thumbs for all of them should check in with me so we can figure out how to get you the data you need. Ariel _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
xmldatadumps-l@lists.wikimedia.org