After some delays and bug-hunting my script for the HTML static versions is in acceptable shape.
Here you can see an example, built from a SQL file of some weeks ago: (Don't try the Search box!!! I explain below)
http://www.arcetri.astro.it/~puglisi/wiki/dump/ma/main_page.html
Please don't DOS the connection, it's not a very fast line. Interested parties can find the script here:
http://www.arcetri.astro.it/~puglisi/wiki/wiki2static.txt
(renamed to .txt due to some server misconfig) use a wide terminal for this one. Everything (html code included) is in one single file. The whitespace may appear weird because I use 4-space tabs. There's no need to tell me you don't like the coding style, I alread know :-)))
Some issues:
- the topbar links do not work (known bug :-). The Edit link goes to the online wikipedia site. - interlanguage links are ignored - some wiki markup is not recognized yet. - no images are present (of course!) - filenames should be OK for most filesystems not "8.3" limited (max 63 chars, only a-z, 0-9 and underscore)
- despite the two-letter subdirectories, some of them have over 4,000 files in them!
- Time: the script takes more than 2 hours on my 1.3 Ghz Athlon...
- Size: this dump is about 800MB. (tar.gz is just 110MB). I think that I can bring it down to 600-650MB with a bit of trimming and eliminating unnecessary redirects. BUT, without some form of compression, the English wikipedia will soon overflow a single CD. Maybe we should target DVDs? :-)
- Images: no images are present here. AFAIK, each of them has a SQL record (that my script skips), but the actual image data is not included. How many megabytes of images we have? I think it will be impossible to store the full images on a CD. Certainly it's possible on a DVD. Maybe a low-res version could be included in a CD.
- Search: I tried a javascript search that worked well for small sized databases: it's basically a big array of strings (article titles and filenames) with some lines code that do a regexp match against them. For full-sized databases like this one, the search page becomes an 8 megabytes monster that takes forever to process (IE grabs 100 MB of memory and stops there, Opera is even worse). I'll see if I can find a different solution.
Enough for now. While I carry on development, any input is welcome.
Ciao, Alfio
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Je Ĵaŭdo 29 Majo 2003 02:15, Alfio Puglisi skribis:
- Images: no images are present here. AFAIK, each of them has a SQL
record (that my script skips), but the actual image data is not included.
Many images uploaded to Wikipedia are third-parties' copyrighted IP being used under the vague claim of "fair use". The project has not yet received legal advice about whether such materials can be redistributed under the terms of the GFDL; until that's resolved I at least have no intention of putting them all in one easy download, for which a significant use would be reuse and redistribution by people like yourself trying to reuse Wikipedia material.
IIRC the last word from Jimbo on the subject was: http://mail.wikipedia.org/pipermail/wikipedia-l/2002-November/007880.html
I know several prominent Wikipedians don't seem to care about this, but I think staying true to our all-reusable-all-redistributable license is a very important aspect of maintaining Wikipedia's credibility and fulfilling the project's goals. It may not always be _expedient_, but then copying every article straight out of EB or World Book or Encarta would have been a very expedient way to build an encyclopedia, and cracking Windows XP activation codes and binary search-and-replacing '(c) Microsoft' to 'Copyleft GPL' would be a very expedient way to put out a free operating system. :P
How many megabytes of images we have?
On the English wikipedia, the upload tree contains about 437 megs at present not counting older replaced revisions of files. I'm not sure what portion are actually used or needed; and some portion of that is non-image material (sound clips, etc).
- -- brion vibber (brion @ pobox.com)
(Brion Vibber brion@pobox.com):
Many images uploaded to Wikipedia are third-parties' copyrighted IP being used under the vague claim of "fair use". The project has not yet received legal advice about whether such materials can be redistributed under the terms of the GFDL; until that's resolved I at least have no intention of putting them all in one easy download, for which a significant use would be reuse and redistribution by people like yourself trying to reuse Wikipedia material.
I don't think there's anything unresolved; clearly we CAN'T redistribute copyrighted photos under GFDL. But the issue of whether or not we can use them in Wikipedia is more complicated. The supply of good public domain photos is negligible, and the usefulness of photos to an encyclopedia is critical enough to justify some hassle. So if copyrighted images, used with permission or under fair use, are attached to Wikipedia articles, and it is likely that most if not all redistributions of the Wikipedia would also qualify for the same fair use, I think that's a reasonable second choice, so long as (1) every image so used _is clearly identified_ as such, and has clear documentation of its source.
Images from unknown sources should certainly be removed, and any images that can be replaced with real GFDL images should be. Likewise, if we do any static mirror of Wikipedia, I think it's important that it retain (or point to) the image description pages that identify the source of each image and its status.
On Fri, 30 May 2003, Lee Daniel Crocker wrote:
critical enough to justify some hassle. So if copyrighted images, used with permission or under fair use, are attached to Wikipedia articles, and it is likely that most if not all redistributions of the Wikipedia would also qualify for the same fair use, I think that's a reasonable second choice, so long as (1) every image so used _is clearly identified_ as such, and has clear documentation of its source.
Images from unknown sources should certainly be removed, and any images that can be replaced with real GFDL images should be. Likewise, if we do any static mirror of Wikipedia, I think it's important that it retain (or point to) the image description pages that identify the source of each image and its status.
Maybe a new SQL field for images? Something like
copyright tinyint(2) NOT NULL
1 = public domain 2 = explicit GFDL 3 = fair use 4 = copyright holder permission 5 = don't redistribute 6 = stolen 7 = don't know
to be filled with a checkbox in the image upload page?
Ciao, Alfio
On Fri, May 30, 2003 at 10:44:16AM +0200, Alfio Puglisi wrote:
On Fri, 30 May 2003, Lee Daniel Crocker wrote:
critical enough to justify some hassle. So if copyrighted images, used with permission or under fair use, are attached to Wikipedia articles, and it is likely that most if not all redistributions of the Wikipedia would also qualify for the same fair use, I think that's a reasonable second choice, so long as (1) every image so used _is clearly identified_ as such, and has clear documentation of its source.
Images from unknown sources should certainly be removed, and any images that can be replaced with real GFDL images should be. Likewise, if we do any static mirror of Wikipedia, I think it's important that it retain (or point to) the image description pages that identify the source of each image and its status.
Maybe a new SQL field for images? Something like
copyright tinyint(2) NOT NULL
1 = public domain 2 = explicit GFDL 3 = fair use 4 = copyright holder permission 5 = don't redistribute 6 = stolen 7 = don't know
to be filled with a checkbox in the image upload page?
No !!! Just delete all non-free images.
Tomasz-
No !!! Just delete all non-free images.
It would be silly not to allow the freedoms given to us by the law to the maximum possible extent. Obviously free images (PD/FDL) are always preferable to non-free ones, but so long as Wikipedia remains redistributable AND forkable, I see no problem with using fair use pictures or those under equivalent semi-free licenses. Keep in mind that the freedom to modify is, for images, not nearly as important as for text.[1] How do you propose acquiring a free photo of a prominent, recently deceased author or actor? A picture of an important historical event? This is unrealistic -- you won't, and with your attitude, we will simply have no image for that article, while Encarta et al. will sport a nice gallery of them. Unacceptable.
Fair use is acceptable as per the current consensus on [[Wikipedia:Image use policy]]. You may use any quasi-dictatorial powers you have on pl. to enforce your point of view, but this will not be possible on en.
Having a status flag in the database would be very helpful for forkers, but we should not encourage uploading photos under licenses that limit redistribution. But a little more Linus Torvalds style pragmatism and a little less Richard Stallman style zealotry wouldn't hurt either.
What we should allow: --------------------- 1) public domain 2) FDL 3) fair use 4) free for non-commercial use. This is similar to fair use, but less vague. 5) Creative Commons licenses (e.g. Attribution) and FDL-equivalent open content licenses
What we should not allow: ------------------------- 1) Copyrighted, no permission (duh) 2) No permission to redistribute other than for Wikipedia (prevents forking) 3) Advertising or prominent copyright notices in images that cannot be removed 4) Other restrictive licenses
Regards,
Erik
[1] And before someone starts mentioning all the important crop and resizing operations we do on images, these would not constitute license violations because they do not substantially alter the source image.
On 30 May 2003, Erik Moeller wrote:
Having a status flag in the database would be very helpful for forkers, but we should not encourage uploading photos under licenses that limit redistribution. But a little more Linus Torvalds style pragmatism and a little less Richard Stallman style zealotry wouldn't hurt either.
And end up being sued like Linus?
-- Daniel
Daniel-
On 30 May 2003, Erik Moeller wrote:
Having a status flag in the database would be very helpful for forkers, but we should not encourage uploading photos under licenses that limit redistribution. But a little more Linus Torvalds style pragmatism and a little less Richard Stallman style zealotry wouldn't hurt either.
And end up being sued like Linus?
Hardly an argument. You can always end up being sued. The question is whether the other side wins.
Regards,
Erik
On 30 May 2003, Erik Moeller wrote:
Having a status flag in the database would be very helpful for forkers, but we should not encourage uploading photos under licenses that limit redistribution. But a little more Linus Torvalds style pragmatism and a little less Richard Stallman style zealotry wouldn't hurt either.
And end up being sued like Linus?
Hardly an argument. You can always end up being sued. The question is whether the other side wins.
That's like saying seatbelts are hardly a safety improvement, because you might still end up dead.
Free software and free content projects are very vunerable to corporate litigation simply because they don't have the financial resources to tackle it. The projects are generally run by a group of individuals, which make ideal targets for corporate threats about liability.
So speaking in practical terms (wich should please you) the question isn't so much about who is right or wrong - but what the risks are. And the risks of using material that isn't 100% verified as free are enormous. As the SCO debacle clearly demonstrates.
FreeBSD is a project where they have learned from bitter experience that every last piece of code included or inherited must be double checked for copyright, patents and non disclusure problems. Now, FreeBSD people are very far away "politically" from RMS, yet take these issues very seriously. So please don't turn this into a "practical vs. zealot" issue.
So should Wikipedia.
(Wikipedia also has the great advantage of incorporating hundreds, if not thousands of authors world wide. Cameras are cheap. If there is a serious dearth of pictures, we should organize a project to have more of our authors go out and take pictures, and work to aquire permission to use copyrighted pictures where this isn't possible.)
-- Daniel
Daniel-
Free software and free content projects are very vunerable to corporate litigation simply because they don't have the financial resources to tackle it. The projects are generally run by a group of individuals, which make ideal targets for corporate threats about liability.
That's why having a non-profit organization is a good idea. It is one of the best arguments to join the GNU project when developing free software, for example -- still, most people are understandably irritated by their zealous nature. For Wikipedia, we will have the Nupedia Foundation to offer legal protection.
So speaking in practical terms (wich should please you) the question isn't so much about who is right or wrong - but what the risks are. And the risks of using material that isn't 100% verified as free are enormous. As the SCO debacle clearly demonstrates.
Only that the "SCO debacle" happened in spite of assurances by everyone that only GNU GPL code was involved -- according to SCO, however, IBM screwed up. Aside from the fact that the entire lawsuit is a Microsoft FUD project, all this demonstrates is that in spite of your best intentions, you can still get sued -- exactly what I've been saying.
It makes no sense to ignore the freedoms offered to us by law -- and one of these freedoms, in the US and most other civilized nations, is fair use -- only because some people might interpret that law differently. Realistically, the only real threat to Wikipedia is that someone will send us a nasty letter to take this or that image down, which we obviously will do in most cases. What you are promoting is to give up freedom because the Large Evil Corporations have deeper pockets. I don't think we should give in that easily. See also: http://meta.wikipedia.org/wiki/avoid_copyright_paranoia
It makes sense to be careful, it makes sense to use open content whenever possible. But a purely ideological stance is counter productive. I find it quite ironic that the most ardent advocates of open content also frequently adopt a police state mentality with regard to perceived violations.
FreeBSD is a project where they have learned from bitter experience that every last piece of code included or inherited must be double checked for copyright, patents and non disclusure problems.
I'm all for double checking. I'm against saying "Everything but the GNU FDL or public domain is forbidden". I'm against removing images prematurely because they *might* be protected. There is no analogy to fair use in the free software world.
(Wikipedia also has the great advantage of incorporating hundreds, if not thousands of authors world wide. Cameras are cheap. If there is a serious dearth of pictures, we should organize a project to have more of our authors go out and take pictures, and work to aquire permission to use copyrighted pictures where this isn't possible.)
[Wikipedia-l] Wikipedia photo squad? Erik Moeller wikipedia-l@wikipedia.org Wed, 30 Oct 2002 15:51:29 +0100 (MET) http://mail.wikipedia.org/pipermail/wikipedia-l/2002-October/006804.html
As you can see, I'm all for collaborative content creation. However, it will be difficult to get a nice FDL photo of Clark Gable, because the guy's been dead for quite a while. Incidentally, the current article about Clark Gable uses a copyrighted photo, uploaded by Zoe, a long-time Wikipedian who is also very careful about finding and reporting copyright infringements. Fair use is a common practice on Wikipedia, and I don't think that should change.
Regards,
Erik
Erik Moeller wrote:
Daniel-
On 30 May 2003, Erik Moeller wrote:
Having a status flag in the database would be very helpful for forkers, but we should not encourage uploading photos under licenses that limit redistribution. But a little more Linus Torvalds style pragmatism and a little less Richard Stallman style zealotry wouldn't hurt either.
And end up being sued like Linus?
Hardly an argument. You can always end up being sued. The question is whether the other side wins.
Agreed, but there's still the problem of ambulance chasing lawyers earning contingency fees. Innocent defendants in such suits need to be able to collect contingency fees. :-D
Ec
Hr. Daniel Mikkelsen wrote:
On 30 May 2003, Erik Moeller wrote:
Having a status flag in the database would be very helpful for forkers, but we should not encourage uploading photos under licenses that limit redistribution. But a little more Linus Torvalds style pragmatism and a little less Richard Stallman style zealotry wouldn't hurt either.
And end up being sued like Linus?
I agree with Erik on this. Simply guessing that something is copyright doesn't make it so. It's all a question of whether you interpret copyright laws liberally in favour of the user or liberally in favour of the putative copyright owner. The fear of being sued is a red herring; we are very far from any possibility that such a thing might happen. With the safe harbor provisions of copyright law we would have ample opportunity to remove offending material before it went that far. Notification of copyright violation would have to come first from the copyright OWNER or his AUTHORIZED representative, not from some disinterested 3rd party do-gooder with unschooled guesses about what the law says.
Effectively violating copyright is very easy to do. I have hundreds of copyright books in my personal library that I could scan with OCR software, and add into a Wikipedia article without anyone EVER realizing it. That wouldn't make it right, but I'm confident that nobody would ever call my hand on it. If I can do it so can any other contributor, so let's not be naive about this.
Of course, if a copyright violation is flagrant and obvious it should be removed. Unfortunately, most alleged violations are not that clear, and we should give the contributor the benefit of the doubt without descending into copyright paranoia.
Eclecticology
On Fri, May 30, 2003 at 12:09:00PM +0200, Erik Moeller wrote:
Tomasz-
No !!! Just delete all non-free images.
It would be silly not to allow the freedoms given to us by the law to the maximum possible extent. Obviously free images (PD/FDL) are always preferable to non-free ones, but so long as Wikipedia remains redistributable AND forkable, I see no problem with using fair use pictures or those under equivalent semi-free licenses. Keep in mind that the freedom to modify is, for images, not nearly as important as for text.[1] How do you propose acquiring a free photo of a prominent, recently deceased author or actor? A picture of an important historical event? This is unrealistic -- you won't, and with your attitude, we will simply have no image for that article, while Encarta et al. will sport a nice gallery of them. Unacceptable.
This reasoning is exactly the greatest danger to Wikipedia - you want us to give up important freedoms to get a little more content. People who think like that are much greater threat that all Helgas and Lirs project ever had.
Not only it is perfectly acceptable, it is THE ONLY RIGHT THING TO DO.
Fair use is acceptable as per the current consensus on [[Wikipedia:Image use policy]]. You may use any quasi-dictatorial powers you have on pl. to enforce your point of view, but this will not be possible on en.
There has never been any contensus about that. Upload page has always been saying that copyright status of uploaded files must be clear.
Having a status flag in the database would be very helpful for forkers, but we should not encourage uploading photos under licenses that limit redistribution. But a little more Linus Torvalds style pragmatism and a little less Richard Stallman style zealotry wouldn't hurt either.
What you call "pragmatism" is in fact short-sightedness.
What we should allow:
- public domain
- FDL
- fair use
- free for non-commercial use. This is similar to fair use, but less
vague. 5) Creative Commons licenses (e.g. Attribution) and FDL-equivalent open content licenses
Of course Wikipedia may be used commercially.
Tomasz-
This reasoning is exactly the greatest danger to Wikipedia - you want us to give up important freedoms to get a little more content. People who think like that are much greater threat that all Helgas and Lirs project ever had.
I could respond on the same level, but I won't. It's a matter of balance -- what freedoms do you lose? You might lose the freedom to take all of Wikipedia's images and sell them as the "Coca Cola photo collection" for 50 bucks. Big deal. As I said, having the respective flag in the "image" table would be sufficient for any third party to easily filter out images which contradict commercial use.
Not only it is perfectly acceptable, it is THE ONLY RIGHT THING TO DO.
Because you say so? I don't care about an ideologically defined "RIGHT THING TO DO". I care about results.
Fair use is acceptable as per the current consensus on [[Wikipedia:Image use policy]]. You may use any quasi-dictatorial powers you have on pl. to enforce your point of view, but this will not be possible on en.
There has never been any contensus about that.
I don't know about pl:, but policies on en: are developed through discussions. This is how the current image use policy was created. Jimbo has said that fair use, in limited scope, is acceptable. I agree: Fair use is the last desirable of all choices because it is so vaguely defined. It should only be used for very important images.
What you call "pragmatism" is in fact short-sightedness.
No, I am thinking very much about how Wikipedia will look 10 years from now.
Regards,
Erik
On Friday 30 May 2003 14:38, Erik Moeller wrote:
From Wikipedia FAQ:
"What is Wikipedia? Wikipedia is a project to produce a new kind of encyclopedia that is comprehensive and _free_."
[emphasis by me]
For me "freedom" is a very important part of the Wikipedia project. By allowing images under the "fair use" rules Wikipedia is not completely free anymore.
but I won't. It's a matter of balance -- what freedoms do you lose? You might lose the freedom to take all of Wikipedia's images and sell them as the "Coca Cola photo collection" for 50 bucks. Big deal. As I said, having the respective flag in the "image" table would be sufficient for any third party to easily filter out images which contradict commercial use.
No, you lose much more. You can not easily combine the content of two "free" encyclopedias and get something that is "free". You can not copy images from the English Wikipedia to the German Wikipedia anymore because the "fair use" right works not this way in Germany.
And worse of all you discourage people contributing really free images--"because we already have the 'fair use' ones".
You can call this "stupid" - but I don't think that it is worth to give up the freedom of Wikipedia for a few more images.
How do you propose acquiring a free photo of a prominent, recently deceased author or actor? A picture of an important historical event? This is unrealistic -- you won't, and with your attitude, we will simply have no image for that article, while Encarta et al. will sport a nice gallery of them. Unacceptable.
So be it, then we have to image for that.
Linux had no 3d support for a long time--"unacceptable" some said, but nevertheless Linux survived and is now stronger than ever before and more and more companies write nowadays GPL code for Linux.
I hope that we are some day in a similar position that companies are proud to add some images to the free encyclopedia. In my humble opinion we don't have to "win" against Encarta in the multimedia sector, Microsoft can win this one easily considering how many image and video rights they own.
[...] No, I am thinking very much about how Wikipedia will look 10 years from now.
I think it is safe to say, that we all do.
<sarcasm>Who removes the "free" from the FAQ?</sarcasm>
best regards, Marco
P.S. Erik, thanks a lot for your great articles you have written for heise. They are very well written and of great value even for someone like me who thought that he knows quite a lot about wikipedia.
On Friday 30 May 2003 18:38, Marco Krohn wrote:
How do you propose acquiring a free photo of a prominent, recently deceased author or actor? A picture of an important historical event? This is unrealistic -- you won't, and with your attitude, we will simply have no image for that article, while Encarta et al. will sport a nice gallery of them. Unacceptable.
So be it, then we have to image for that.
obiously this should read: "... we have no image for that."
Marco
Marco Krohn wrote in part to Erik Moeller:
No, you lose much more. You can not easily combine the content of two "free" encyclopedias and get something that is "free". You can not copy images from the English Wikipedia to the German Wikipedia anymore because the "fair use" right works not this way in Germany.
Maybe you and Erik can't, being in Germany and all as you are, but ''I'' can. Everybody is invited to drop such requests on [[en:User talk:Toby Bartels]]!
-- Toby
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Je Vendredo 30 Majo 2003 03:09, Erik Moeller skribis:
Having a status flag in the database would be very helpful for forkers, but we should not encourage uploading photos under licenses that limit redistribution. But a little more Linus Torvalds style pragmatism and a little less Richard Stallman style zealotry wouldn't hurt either.
I assume you're referring to the row over linking binary-only modules in the Linux kernel. For the uninitiated:
The GPL license forbids linking GPL'd code with non-GPL'able code, excluding standard system libraries on non-free operating systems. This is to prevent fake "free" code that in fact is dependant on third-party non-free libraries in order to function.
The Free Software Foundation claims that this forbids _any_ linking, including dynamic run-time linking such as is common today in plug-in module/driver architectures. Linus doesn't interpret it this way, and says it's okay to distribute binary-only (non-free, no-source, limited distribution) driver modules that can be linked into the GPL'd Linux kernel at run-time.
Coupla notes: * Said modules are distributed by their copyright owners with the supported hardware, not as part of the Linux kernel. * The Linux kernel runs just fine without them. The modules provide additional support for certain hardware only. * Said linking is done at run time on the user's machine; the combined result exists only in memory and is never redistributed.
So by analogy "Linus Torvalds style pragmatism" might provide for a third-party filter program that inserts non-FDL images into Wikipedia articles on a reader's computer as they're loaded. ;) But it's real questionable whether it could mean "sure, let's put a whole bunch of stuff that's not compatible with our license directly into our main distribution and embed them directly into pages served from our server. After all, we can tell people to remove them if they can't use them (even though we say it's absolutely vital that we include these images to have a legitimate encyclopedia)."
What we should allow:
- public domain
- FDL
Wonderful.
- fair use
- free for non-commercial use. This is similar to fair use, but less
vague.
Can't redistribute these under the terms of the GFDL, so can't embed them in articles.
- Creative Commons licenses (e.g. Attribution) and FDL-equivalent
open content licenses
This is a greyer area, as such licenses have much the same goals, but may or may not be letter-compatible. There's a fair chance that someone releasing material under such a license would be willing to dual-license if asked.
What we should not allow:
- Copyrighted, no permission (duh)
- No permission to redistribute other than for Wikipedia (prevents
forking) 3) Advertising or prominent copyright notices in images that cannot be removed 4) Other restrictive licenses
How can these be shown to be distinct from claiming "fair use" on copyrighted material that hasn't given us *any* rights (the default being 'all rights reserved')?
Unless one of us is a lawyer familiar with the licenses and laws involved, I don't think this discussion is going to go anywhere useful at this point. :)
- -- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
How can these be shown to be distinct from claiming "fair use" on copyrighted material that hasn't given us *any* rights (the default being 'all rights reserved')?
"All right reserved" is equivalent to saying, "I don't know what my rights are, but whatever they are I'm claiming them. Figuring out what rights I don't have is your problem."
Unless one of us is a lawyer familiar with the licenses and laws involved, I don't think this discussion is going to go anywhere useful at this point. :)
Even if one of us IS a lawyer, I don't think it would go any further. ;-)
Ec
Brion Vibber wrote in part:
So by analogy "Linus Torvalds style pragmatism" might provide for a third-party filter program that inserts non-FDL images into Wikipedia articles on a reader's computer as they're loaded. ;)
Like a web browser? That's a third-party program that inserts Wikipedia's images into Wikipedia's articles on a reader's computer as they're loaded. The HTML document (the article) that we serve them has no image; we also serve them the image, which they request in reaction to our article (much as a module may be requested in reaction to the kernel's actions), but that's under fair use, not GFDL.
The problem, it seems to me, isn't that we use the images, but that we pretend that we're using them under the GFDL. They're already separated out a bit (not in the download, after all), and we should separate them (or the non-free ones) out further, rather than claiming in some places that we're entirely free and GFDL, while claiming in other places that we use some images through fair use. (I'd even support placing fair use images in a separate namespace and database from the free ones, just to make things ultraclear.) Our fair use images should be treated as *auxiliary*.
(even though we say it's absolutely vital that we include these images to have a legitimate encyclopedia).
It should ''never'' be vital to an article that we include an image, if we can help it, not only for distribution but also for accessibility. We should make our articles as good as possible without the images, whether or not we then also decide to send an image along with it. This is true even for free images, because we may have blind readers.
Unless one of us is a lawyer familiar with the licenses and laws involved, I don't think this discussion is going to go anywhere useful at this point. :)
Of course, IANAL either.
-- Toby
On Fri, 30 May 2003, Toby Bartels wrote:
Brion Vibber wrote in part:
So by analogy "Linus Torvalds style pragmatism" might provide for a third-party filter program that inserts non-FDL images into Wikipedia articles on a reader's computer as they're loaded. ;)
Like a web browser? That's a third-party program that inserts Wikipedia's images into Wikipedia's articles on a reader's computer as they're loaded.
...automatically upon the programmed instructions contained in the page, without permission of the copyright owner of the image, without instruction or intervention by the user.
The HTML document (the article) that we serve them has no image; we also serve them the image, which they request in reaction to our article (much as a module may be requested in reaction to the kernel's actions), but that's under fair use, not GFDL.
Compare again: * the modules are distributed by their owners, *not* with the GPL kernel * the images are distributed with the GFDL articles, *not* by their owners
* the modules are installed and loaded by deliberate action of the user on the user's machine, not automatically by the kernel * the images are loaded automatically upon the instructions in the article page without user intervention
* the kernel-module combination is never redistributed * the article-image combination is redistributed by local saving, hardcopy, mirroring, or format conversion of the articles unless careful effort is made to avoid including the image (which the article includes programmatic commands to include)
Most importantly, the paradigm of separate text and image files is simply a technical limitation of the HTML format used. If Wikipedia were distributed in PostScript, PDF, MS Word documents, RTF, or the printed page, there would be no such separation. ('Material printed in black ink is licensed under the GNU Free Documentation License and may be redistributed and modified under those terms. Material in cyan, magenta, and/or yellow inks which *just happens* to be on the same pages and *just happens* to be about things mentioned in the text and *just happens* to have been placed there by modifying the GFDL source text to include it is used without permission of the copyright holder and can't be reused except under very limited circumstances. These are actually separate documents, you see, having been printed by separate inks, but for convenience all four documents have been placed on the same page. They are not related in any way. It's just a fluke!')
As it is, the GFDL pages we distribute include commands with the explicit purpose of embedding particular non-GFDL material inside them, and correct loading of pages is expected to load the images and display them transparently without the user needing to notice that the data comes in several chunks with different filenames. To claim they are separate documents is at the least deceptive.
The problem, it seems to me, isn't that we use the images, but that we pretend that we're using them under the GFDL.
Either we are, or we're violating the license of every article that someone has modified by putting a non-GFDL picture into it.
They're already separated out a bit (not in the download, after all), and we should separate them (or the non-free ones) out further, rather than claiming in some places that we're entirely free and GFDL, while claiming in other places that we use some images through fair use. (I'd even support placing fair use images in a separate namespace and database from the free ones, just to make things ultraclear.) Our fair use images should be treated as *auxiliary*.
I'm all in favor of putting them on a separate server with a separate database and *not* embedding those images inline into articles, but allowing explicitly external links to those images which users would have to follow knowingly.
nonfree.wikipedia.org, anyone?
(even though we say it's absolutely vital that we include these images to have a legitimate encyclopedia).
It should ''never'' be vital to an article that we include an image, if we can help it, not only for distribution but also for accessibility.
Thanks! This removes the sense of urgency that drives some people to add non-free images knowing they are not compatible with the project's goals. Hopefully they'll stop immediately. :)
-- brion vibber (brion @ pobox.com)
(Brion Vibber vibber@aludra.usc.edu):
I'm all in favor of putting them on a separate server with a separate database and *not* embedding those images inline into articles, but allowing explicitly external links to those images which users would have to follow knowingly.
nonfree.wikipedia.org, anyone?
That's certainly an option to which I wouldn't object if it came to that. I still think it's a bit paranoid, but at least it would retain our ability to collect and describe useful information even if it did hamper its display a bit.
On Fri, 30 May 2003, Lee Daniel Crocker wrote:
(Brion Vibber vibber@aludra.usc.edu): I'm all in favor of putting them on a separate server with a separate database and *not* embedding those images inline into articles, but allowing explicitly external links to those images which users would have to follow knowingly.
nonfree.wikipedia.org, anyone?
That's certainly an option to which I wouldn't object if it came to that. I still think it's a bit paranoid,
You *will* submit to CopyrightParanoia... you *will* join the collective! :D
but at least it would retain our ability to collect and describe useful information even if it did hamper its display a bit.
Great. As an intermediate step, how about we go ahead and add a license-compatibility field or two to the image table and upload form (and at some point go looking through old images to mark the ones known to be PD or GFDL).
Then if we do decide we need to, we can slurp them out separately later or change their display style.
What I think I'd like to see instead of the blanket "the copyright holder has agreed to X" checkbox -- which encourages sloppiness -- is to have the following:
( ) I, the uploader, created this file and own the copyright. ( ) I got this file from somewhere else: [URL_or_citation_of_source___________] and ( ) This file is public domain (has fallen out of copyright, was never copyrighted, or has been explicitly put in PD by the author) ( ) This file is licensed under GFDL ( ) Copyright owner gives permission to reproduce for non-commercial/educational purposes only NOTE: [concerns over distribution, prefer free files] ( ) This file is used under 'fair use' claims without permission of the copyright holder. NOTE: [concerns over distribution, prefer free files]
Source is something that people are _supposed_ to put in the description, but it often doesn't get done. If we can reject uploads that don't have a source listed, this should encourage better documentation. (A true paranoic would require this of all text edits too, but people at least _seem_ to be less inclined to plagiarize text, and other people are a lot less tolerant of it when it's found, so I think our current procedures are sufficient for a good-faith effort in text-land.)
-- brion vibber (brion @ pobox.com)
Brion-
Great. As an intermediate step, how about we go ahead and add a license-compatibility field or two to the image table and upload form (and at some point go looking through old images to mark the ones known to be PD or GFDL).
I'm for it. The resulting form contents should probably be saved in the image description, so they can be edited. I agree with the NOTE: ..., a relevant link to [[fair use]] should probably also be added.
Regards,
Erik
Erik Moeller wrote:
Great. As an intermediate step, how about we go ahead and add a license-compatibility field or two to the image table and upload form (and at some point go looking through old images to mark the ones known to be PD or GFDL).
I'm for it. The resulting form contents should probably be saved in the image description, so they can be edited. I agree with the NOTE: ..., a relevant link to [[fair use]] should probably also be added.
This seems like something that all parties to the "fair use" debate can agree -- to have a license-compatibility field or two in the image table and upload form, such that we can clearly describe why we think we can have such-and-such an image on the website, and giving guidance to re-users.
--Jimbo
On 5/30/03 9:29 PM, "Jimmy Wales" jwales@bomis.com wrote:
Erik Moeller wrote:
Great. As an intermediate step, how about we go ahead and add a license-compatibility field or two to the image table and upload form (and at some point go looking through old images to mark the ones known to be PD or GFDL).
I'm for it. The resulting form contents should probably be saved in the image description, so they can be edited. I agree with the NOTE: ..., a relevant link to [[fair use]] should probably also be added.
This seems like something that all parties to the "fair use" debate can agree -- to have a license-compatibility field or two in the image table and upload form, such that we can clearly describe why we think we can have such-and-such an image on the website, and giving guidance to re-users.
Fair use bad for Wikipedia...
I have trouble understanding why this discussion is on Wikitech.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Je Dimanĉo 01 Junio 2003 12:24, The Cunctator skribis:
I have trouble understanding why this discussion is on Wikitech.
I foolishly made a comment on why an images tarball is not yet available for converters to play with. From now on I'm keeping my mouth shut unless I have legal representation present... :)
- -- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
I foolishly made a comment on why an images tarball is not yet available for converters to play with. From now on I'm keeping my mouth shut unless I have legal representation present... :)
Or, which is easier to handle, set a reply to: policy-l@wikipedia.org. In the other case you mentiiond I think we never will get a mail from you, because you are waiting for your legel representation or working to pay him ;)
But to come to the subject: Any information about the duration of the code frezze? I don't want to hurry someone (for someone equals to Lee or Brion), but I remember someone mentioning a week or so. It would be nice to get some information.
The Cunctator wrote:
Fair use bad for Wikipedia...
I have trouble understanding why this discussion is on Wikitech.
Honestly, I hadn't noticed that. I just hit reply when I want to respond to something, without paying any attention to which of the mailing lists I'm responding. I assume that whoever started the thread had some reason for putting it there. :-)
Ec
--- Brion Vibber vibber@aludra.usc.edu wrote:
( ) This file is public domain (has fallen out of copyright, was never copyrighted, or has been explicitly put in PD by the
author)
( ) This file is licensed under GFDL ( ) Copyright owner gives permission to reproduce for non-commercial/educational purposes only NOTE: [concerns over distribution, prefer free files] ( ) This file is used under 'fair use' claims without permission of the copyright holder. NOTE: [concerns over distribution, prefer free files]
My concern is this: a rational copyright holder faced with these choices will alway choose option 4 (permission for non-commercial distribution), thereby retaining maximal control. Yet option 4 is the one most detrimental to Wikipedia's goals, preventing modification as well as publication in a commercial book/CDROM version.
Axel
__________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com
Brion Vibber wrote:
Toby Bartels wrote:
Brion Vibber wrote:
So by analogy "Linus Torvalds style pragmatism" might provide for a third-party filter program that inserts non-FDL images into Wikipedia articles on a reader's computer as they're loaded. ;)
Like a web browser?
[Brion then gives several objections that seem reasonable. But I'll comment on the ones that seem least so.]
- the kernel-module combination is never redistributed
- the article-image combination is redistributed by local saving,
hardcopy, mirroring, or format conversion of the articles unless careful effort is made to avoid including the image (which the article includes programmatic commands to include)
So the article-image combination is never redistributed now (given that the images aren't included in the downloaded tarball -- I believe that this what began the discussion this time around). When I say that we need to separate out the fair use pics from the free ones, I mean precisely to prepare for such redistribution in the future, where we may (in some cases) have to strip out the nonfree pics.
Most importantly, the paradigm of separate text and image files is simply a technical limitation of the HTML format used. If Wikipedia were distributed in PostScript, PDF, MS Word documents, RTF, or the printed page, there would be no such separation.
Right, so maybe we'd have to strip them out if it were distributed thus '_`.
The problem, it seems to me, isn't that we use the images, but that we pretend that we're using them under the GFDL.
Either we are, or we're violating the license of every article that someone has modified by putting a non-GFDL picture into it.
This I don't agree with at all. When I submit GFDL text to a page, and somebody creates a modified version of that with [[Image:Foo]] in it, then they're definitely not violating the GFDL license of what I wrote. They just said "[[Image:Foo]]" next to it! The "technical limitations" are quite relevant here -- they're actually one way that HTML is much better than the printed page!
Our fair use images should be treated as *auxiliary*.
I'm all in favor of putting them on a separate server with a separate database and *not* embedding those images inline into articles, but allowing explicitly external links to those images which users would have to follow knowingly.
All right, maybe that will make everybody happy. I can live with it, I think.
It should ''never'' be vital to an article that we include an image, if we can help it, not only for distribution but also for accessibility.
Thanks! This removes the sense of urgency that drives some people to add non-free images knowing they are not compatible with the project's goals. Hopefully they'll stop immediately. :)
That said, sometimes we can't help it. Consider all the arguments presented in recent debates for why [[Clitoris]] needs good photographs and drawings. ^_^
-- Toby
Brion-
Coupla notes:
- Said modules are distributed by their copyright owners with the
supported hardware, not as part of the Linux kernel.
- The Linux kernel runs just fine without them. The modules provide
additional support for certain hardware only.
- Said linking is done at run time on the user's machine; the combined
result exists only in memory and is never redistributed.
So by analogy "Linus Torvalds style pragmatism" might provide for a third-party filter program that inserts non-FDL images into Wikipedia articles on a reader's computer as they're loaded. ;)
Since Wikipedia is a client/server application, you also have to look at what the server is doing. Image data is actually stored in a separate table; they have their own namespaces and are transcluded on demand as a page is fetched, so the analogy is applicable. Keep in mind that the FDL is primarily intended to cover printed documents, so it may well offer us a pretty decent loophole here.
The text [[Image:foo.png]] is, of course, under the FDL.
I'm sure if we pay a bunch of lawyers enough money, they can up with plenty of reasons why using fair use content is perfectly acceptable. We may need to do so for a reason that has so far been ignored in the discussion:
Quotes of any length are fair use, too. It is quite clear that we cannot do without them. We could move them into a separate namespace and transclude them, though, which would actually be a very Xanadu-esque thing to do.
Regards,
Erik
Brion-
What we should not allow:
- Copyrighted, no permission (duh)
- No permission to redistribute other than for Wikipedia (prevents
forking) 3) Advertising or prominent copyright notices in images that cannot be removed 4) Other restrictive licenses
How can these be shown to be distinct from claiming "fair use" on copyrighted material that hasn't given us *any* rights (the default being 'all rights reserved')?
See [[fair use]]. There are very narrow limits on the kind of materials and the extent to which we can use them.
Regards,
Erik
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Je Vendredo 30 Majo 2003 17:09, Erik Moeller skribis:
Brion-
How can these be shown to be distinct from claiming "fair use" on copyrighted material that hasn't given us *any* rights (the default being 'all rights reserved')?
See [[fair use]]. There are very narrow limits on the kind of materials and the extent to which we can use them.
Yes, but out of the category of images on which one might claim fair use, what portion of them are *not* in one of these categories:
- Copyrighted, no permission (duh)
- No permission to redistribute other than for Wikipedia
(prevents forking) 3) Advertising or prominent copyright notices in images that cannot be removed 4) Other restrictive licenses
While not every possible item in those categories could be used by fair use, it seems to me that anything _not_ in those categories wouldn't even come up on the fair use radar: it would be either public domain or freely redistributable and modifiable.
Perhaps I just interpret "restrictive licenses" differently from you... :)
- -- brion vibber (brion @ pobox.com)
Brion-
While not every possible item in those categories could be used by fair use, it seems to me that anything _not_ in those categories wouldn't even come up on the fair use radar
Sure. Fair use is a rare exception to the rule that we should not use copyrighted materials.
Regards,
Erik
Erik Moeller wrote:
Sure. Fair use is a rare exception to the rule that we should not use copyrighted materials.
And it is one which we should approach with great caution. I would be in favor of never relying on "fair use" at all, except that it would be silly and close to impossible to do so, particularly since ordinary quotes from copyrighted texts are "fair use". Shall we have a rule that we can't quote a few words from a copyrighted text? Absurd, I think.
Even so, caution is warranted, because re-users might not be in the same position vis a vis fair use as we are. And we should be sensitive to that.
--Jimbo
Jimmy Wales wrote in part:
Even so, caution is warranted, because re-users might not be in the same position vis a vis fair use as we are. And we should be sensitive to that.
In particular, we shouldn't rely on our non-profit, educational status. If that's necessary to render our work "fair use", then we're definitely not free by GNU standards.
-- Toby
--- Erik Moeller erik_moeller@gmx.de wrote:
How do you propose acquiring a free photo of a prominent, recently deceased author or actor? A picture of an important historical event?
First, use the extensive public domain archives, for instance the collections of the library of congress. Next, once our foundation has money, we can try to acquire the copyright of selected important images we need and cannot get in any other way.
Axel
__________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com
Axel-
First, use the extensive public domain archives
Pictures are public domain if they 1) have been created before 1923 2) are produced by the US government.
Obviously that leaves a lot of essential material, especially given that Wikipedia is not a US-only project, and most other nations don't have an equivalent public domain rule.
Next, once our foundation has money, we can try to acquire the copyright of selected important images we need and cannot get in any other way.
The problem with that approach is that we need an exclusive world-wide license, since we need to allow unlimited redistribution and modification. This is *very* expensive for most professional photographs, and impossible for many.
Regards,
Erik
--- Erik Moeller erik_moeller@gmx.de wrote:
First, use the extensive public domain archives
Pictures are public domain if they
- have been created before 1923
- are produced by the US government.
...or have been placed in the public domain by the copyright owner. Check out for example the excellent collection of portrait photographs donated to the Library of Congress.
Next, once our foundation has money, we can try to acquire the copyright of selected important images we need and cannot get in any other way.
The problem with that approach is that we need an exclusive world-wide license, since we need to allow unlimited redistribution and modification.
*non-exclusive*
This is *very* expensive for most professional photographs, and impossible for many.
So instead you suggest to simply take them for free, hoping that fair use applies?
I think Brion's suggestion of simply linking to the external site containing the photograph is a win-win-win-win proposal:
* The copyright owner gets exposure * Our readers get access to the educational content * We are still able to burn a CD with all our material and put "GFDL" on the cover * The inconvenience of the external link encourages contributors to hunt for free substitute photographs.
Axel
__________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com
Axel-
Check out for example the excellent collection of portrait photographs donated to the Library of Congress.
Which one specifically?
Next, once our foundation has money, we can try to acquire the copyright of selected important images we need and cannot get in any other way.
The problem with that approach is that we need an exclusive world-wide license, since we need to allow unlimited redistribution and modification.
*non-exclusive*
Actually, what we need is probably a transfer of the copyright to us, or at the very least a contract that allows unlimited sublicensing. Remember, we want to allow forks, commercial re-use etc. This is what is so expensive, because it deviates from normal company licensing policies, where you are allowed to use a work *in your product*, but certainly not to give others the right to create and distribute derivative works.
This is *very* expensive for most professional photographs, and impossible for many.
So instead you suggest to simply take them for free, hoping that fair use applies?
My, what nice rhetoric. Next you are going to accuse me of "stealing" ;-). And this from someone who thinks intellectual property rights do not exist (your user page). See where this kind of mentality is taking us? Copyright paranoia.
What I do suggest is that we use common sense to determine the few cases where we think fair use law is applicable, instead of simply ignoring these few loopholes in the current restrictive copyright code. I would think that someone opposed to intellectual property would embrace the idea that we should defend and make use of our rights, instead of bowing to the pressure from copyright holders.
I think Brion's suggestion of simply linking to the external site containing the photograph is a win-win-win-win proposal:
- The copyright owner gets exposure
- Our readers get access to the educational content
- We are still able to burn a CD with all our material and put "GFDL"
on the cover
- The inconvenience of the external link encourages contributors to
hunt for free substitute photographs.
Alas, it also has several problems:
- When the website is down, the image is no longer available. Broken links often go unnoticed for longer periods of time because we have no way to systematically check them. - The image is no longer embedded in the proper context. It becomes difficult to associate image content with image text. - The reader is taken away from the Wikipedia navigational structure to a non-HTML image page. This is bad user interface design.
So the win/win/win/win situation becomes a win/win/win/win/lose/lose/lose situation, at which point I think it more convenient to refer to it as a suboptimal solution.
I do support this method of illustrating articles in cases where we believe fair use law not to apply, e.g. maps from Mapquest.com. We can agree that because of the risk of legal liability for Bomis.com
- fair use should be kept at a minimum, - in other cases, this method should be used, - the database should contain as much information about copyright status as possible.
But I do not agree with exclusively linking to copyrighted pictures.
Regards,
Erik
On Sunday 01 June 2003 07:02, Erik Moeller wrote:
Erik,
[external linking of "fair use" images]
Alas, it also has several problems:
- When the website is down, the image is no longer available. Broken links
often go unnoticed for longer periods of time because we have no way to systematically check them.
- The image is no longer embedded in the proper context. It becomes
difficult to associate image content with image text.
- The reader is taken away from the Wikipedia navigational structure to a
non-HTML image page. This is bad user interface design.
So the win/win/win/win situation becomes a win/win/win/win/lose/lose/lose situation, at which point I think it more convenient to refer to it as a suboptimal solution.
I am more interested in the consequences of adding "fair use" images to the articles. As Axel pointed out we will violate the GFDL by embedding "fair use" images in our articles. This means that we have to change the license (at least for these articles) or even worse have to rewrite the whole article because such a license change is unlikely to be compatible with GFDL. Another thing is that we should remove the "free" from wikipedia.
Erik, I haven't seen a comment from you about this so far, but I would be very interested how you want to solve these problems.
best regards, Marco
Marco-
Erik, I haven't seen a comment from you about this so far, but I would be very interested how you want to solve these problems.
See my response to Brion: http://mail.wikipedia.org/pipermail/wikitech-l/2003-May/004200.html
Specifically note that textual quotes are already a form of fair use, and more closely linked to the actual article text than images. You don't want to forbid us to quote Martin Luther King, do you?
Regards,
Erik
On Sunday 01 June 2003 13:39, Erik Moeller wrote:
Erik, I haven't seen a comment from you about this so far, but I would be very interested how you want to solve these problems.
See my response to Brion: http://mail.wikipedia.org/pipermail/wikitech-l/2003-May/004200.html
thanks, I missed that.
Specifically note that textual quotes are already a form of fair use, and more closely linked to the actual article text than images. You don't want to forbid us to quote Martin Luther King, do you?
While I agree that quotes are neccessary I have my doubts if this is compliant with GFDL. Actually I believe they violate the GFDL and we should think about this rather sooner than later.
BUT, you cannot argue that because we already have potential unsolved problems with the GFDL we are free to violate the GFDL more than that.
In my humble opinion we should think about how we can solve the problems we already have with the quotes and not make our situation worse by allowing "fair use" images which clearly violate the GFDL.
best regards, Marco
Marco-
In my humble opinion we should think about how we can solve the problems we already have with the quotes and not make our situation worse by allowing "fair use" images which clearly violate the GFDL.
Let's be clear on one point: On en:, we are *already* allowing fair use of images. You would like to see this status *changed*. The legal case for doing so is no stronger than the case for prohibiting fair use of quotes or sounds (of which there are plenty). In fact, I will now demonstrate that it is weaker; that we need to worry more about quotes than about images.
The FDL was developed primarily for books and other printed works. As such, it does not contain any reference to "linking" of any kind. The relevant sections to our discussion are the following:
----------------------------
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.
....
7. AGGREGATION WITH INDEPENDENT WORKS
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.
----------------------------
What is now, in the context of Wikipedia, the work that is licensed under the FDL? It is not what the server generates, it is not what the web browser renders, for neither of these entities have the legal rights to claim a copyright. It is the document entered by the user in the Wikipedia article submission form, in its original, "transparent" wikitext source code form.
This document may contain a line like
[[Image:Britney_Spears.jpg]]
This line of text is licensed just like all other text in the document. It is implicitly understood by the user that this line will produce an image that is rendered in-line, but the copyright status of that image is not confirmed in the article -- it is confirmed during the upload of the image. The image data is stored in a separate row in the table, and can be referenced by *several pages*. The image pages even have their own history, both for the image and the image description. The Britney image is, for all intents and purposes, a separate work.
When the server creates the HTML page, it turns the above line into something like
<img src="http://www.wikipedia.org/upload/2/2e/Britney_Spears.jpg">
The user's web browser then produces the *aggregate work* consisting of the text and the image by fetching both from the server, from different locations.
The author did not create a "combined work" by combining the article with the line [[Image:Britney_Spears.jpg]], any more than he would by combining the article with the link [[Christina Aguilera]] or [http://www.britneyspears.com Britney official homepage].
Now what happens if I insert a Fair Use quote?
We do so in the text [[Martin Luther King]] under the heading "Views on anti-Zionism", for example. This quote is indented and inserted in the source text. Martin Luther King's estate holds the copyright on all his works. The resulting article is, for all intents and purposes, a *combined work*. This means that the work as a whole has to be FDL-licensed -- which we cannot do because we do not own the King copyrights.
Of course, you might argue that the quote is not copyrightable, but that is a very difficult stance to take given that even cell phone ringtones are considered "intellectual property".
Directly included quotes are a much more serious legal issue than client- side included images. As I previously suggested, that might be addressed by supporting transclusion of some type, so you could say
[[>Martin Luther King on Zionism]]
which would automatically fetch the quote from a separate namespace (which does not have to be specified because transclusion is only possible from that namespace -- hey, even my hypothetical features have to be user friendly) and insert it in the proper place. This would put quotes on the same legal level as image pages (with the distinction that the transclusion would happen server-side), and allow us to use the same quote in multiple places comfortably. Editing would get a bit trickier, though.
But in my understanding, it's not image pages that we have to worry about with regard to the FDL. Of course *we* do have to worry about what fair use means in each individual instance, but the same applies to text.
Regards,
Erik
Tomasz-
On Sun, Jun 01, 2003 at 03:05:00PM +0200, Erik Moeller wrote:
Let's be clear on one point: On en:, we are *already* allowing fair use of images.
No we're not !!!
Can you say denial?
http://www.wikipedia.org/wiki/Wikipedia%3AImage_use_policy
--- Image description page
....
Copyright status
* public domain: copyright expired * placed in public domain by photographer * released under the GFDL * released under the GFDL - in response to the boilerplate request for permission, Fred Jones said "That'd be fine" * copyrighted image - the author has given Wikipedia permission to use this image, but third parties may not use it without permission * copyrighted image - may only be used under "fair use" rules
Images in these last two categories cannot be distributed under the GFDL license and as such their inclusion in Wikipedia is extremely dubious. It is recommended that you instead link to a page with the image hosted on another server if you cannot obtain explicit permission to redistribute it a GFDL-compatible license. If you do upload "fair use" images here, keep in mind that they may all be removed at some point if the project gets a clear legal opinion on the matter. ---
Regards,
Erik
On Sun, Jun 01, 2003 at 04:00:00PM +0200, Erik Moeller wrote:
Tomasz-
On Sun, Jun 01, 2003 at 03:05:00PM +0200, Erik Moeller wrote:
Let's be clear on one point: On en:, we are *already* allowing fair use of images.
No we're not !!!
Can you say denial?
http://www.wikipedia.org/wiki/Wikipedia%3AImage_use_policy
Image description page
....
Copyright status
* public domain: copyright expired * placed in public domain by photographer * released under the GFDL * released under the GFDL - in response to the boilerplate request for permission, Fred Jones said "That'd be fine" * copyrighted image - the author has given Wikipedia permission to use this image, but third parties may not use it without permission * copyrighted image - may only be used under "fair use" rules
Images in these last two categories cannot be distributed under the GFDL license and as such their inclusion in Wikipedia is extremely dubious. It is recommended that you instead link to a page with the image hosted on another server if you cannot obtain explicit permission to redistribute it a GFDL-compatible license. If you do upload "fair use" images here, keep in mind that they may all be removed at some point if the project gets a clear legal opinion on the matter.
I don't know who put the last 2 categories there, but it has been obviously done without general consensus and IT BREAKS THE F***ING LAW - if you're putting such image on GFDL-ed article you're breaking copyright of authors of the article - they didn't give you permisions to distribute non-GFDL derivates of their work.
So even when it's legally ok to use such image (usually it's not), it's not ok to put it to GFDL article.
We also have to make it legal to distribute Wikipedia in Europe, which have different copyright laws, especially when it comes to what in USA is called "fair use", and what is usually described explicitely by copyright laws.
In Poland public speeches and short quotes are not subject to copyright laws, but photos are, and you have to pay the author for using them.
Tomasz-
if you're putting such image on GFDL-ed article you're breaking copyright of authors of the article - they didn't give you permisions to distribute non-GFDL derivates of their work.
See my response to Marco.
We also have to make it legal to distribute Wikipedia in Europe, which have different copyright laws, especially when it comes to what in USA is called "fair use", and what is usually described explicitely by copyright laws.
We should make reasonable efforts to do so, but fortunately, most countries have fair use equivalents. For example, in Germany the use of an image is called a "Großzitat", and currently allowed in scientific works (including popular scientific ones).
Our primary concern, however, especially regarding the English Wikipedia, is for US law.
Regards,
Erik
On Sun, Jun 01, 2003 at 07:18:00PM +0200, Erik Moeller wrote:
We also have to make it legal to distribute Wikipedia in Europe, which have different copyright laws, especially when it comes to what in USA is called "fair use", and what is usually described explicitely by copyright laws.
We should make reasonable efforts to do so, but fortunately, most countries have fair use equivalents. For example, in Germany the use of an image is called a "Großzitat", and currently allowed in scientific works (including popular scientific ones).
Our primary concern, however, especially regarding the English Wikipedia, is for US law.
In Poland such exceptions apply only to research and, to lesser extend, to education. Wikipedia is neither one or another.
Tomasz Wegrzanowski wrote:
In Poland such exceptions apply only to research and, to lesser extend, to education. Wikipedia is neither one or another.
Are you suggesting that Wikipedia is not educational? Being educational is one of the pillars that helps to establish fair use. Education is not limited to the activities of recognized institutions, but has a much broader meaning.
Ec
On Sun, Jun 01, 2003 at 11:38:34AM -0700, Ray Saintonge wrote:
Tomasz Wegrzanowski wrote:
In Poland such exceptions apply only to research and, to lesser extend, to education. Wikipedia is neither one or another.
Are you suggesting that Wikipedia is not educational? Being educational is one of the pillars that helps to establish fair use. Education is not limited to the activities of recognized institutions, but has a much broader meaning.
According to Polish law, it only applies to educational and research institutions and libraries. There aren't special rights for other entities just because given activity is "educational". In particular, publishers of textbooks and encyclopedias don't have special rights of "fair use" (they can use others' stuff for standard fee, but it doesn't help us much)
http://www.netlaw.pl/prawo_autorskie/upaip1994tj.html if you can read Polish ;) http://wiktionary.org/wiki/Polish_language if not, but you want to learn ;))
On 1 Jun 2003, Erik Moeller wrote:
Our primary concern, however, especially regarding the English Wikipedia, is for US law.
Is this policy? I think it's a problematic policy. I've heared Jimbo extol visions about having Wikipedia printed and sent out to the poor and the wretched, once we have enough articles and enough funding.
In the US, people can generally buy themselves their own encyclopedias. :)
-- Daniel
Hr. Daniel Mikkelsen wrote:
Our primary concern, however, especially regarding the English Wikipedia, is for US law.
Is this policy? I think it's a problematic policy. I've heared Jimbo extol visions about having Wikipedia printed and sent out to the poor and the wretched, once we have enough articles and enough funding.
It's not really policy, it's just a fact. If someone in a foreign country gets mad at me and wants to use the laws of that country against me, well, that's just tough for them, because I live in the United States, and the U.S. will not generally enforce that kind of claim against me here.
But, yes, it is also true that we want to make sure that what we are doing is distributable widely, and if we can comply with European copyright laws too, without too much trouble, we should.
--Jimbo
Tomasz Wegrzanowski wrote:
I don't know who put the last 2 categories there, but it has been obviously done without general consensus and IT BREAKS THE F***ING LAW - if you're putting such image on GFDL-ed article you're breaking copyright of authors of the article - they didn't give you permisions to distribute non-GFDL derivates of their work.
I don't think it's at all clear that including "fair use" content in a GFDL article has any material impact on GFDL compliance.
I have yet to see anyone who wants to take that argument to the logical conclusion, i.e. that no GFDL article can quote even one sentence from any copyrighted source. May an article on Tom Clancy quote a line from one of his books? May an article on Star Wars discuss the line "Use the force, Luke?"
Such quoting is normally done under the "fair use" convention.
We also have to make it legal to distribute Wikipedia in Europe, which have different copyright laws, especially when it comes to what in USA is called "fair use", and what is usually described explicitely by copyright laws.
I do support making it easy for people in most countries to be able to redistribute Wikipedia, but of course it is not possible nor desirable to attempt to comply with *every* law of *every* country.
--Jimbo
On Sunday 01 June 2003 15:05, Erik Moeller wrote:
Erik,
Let's be clear on one point: On en:, we are *already* allowing fair use of images.
o.k., something I do not doubt, nevertheless the question remains if this is compatible with the GFDL.
You would like to see this status *changed*. The legal case for doing so is no stronger than the case for prohibiting fair use of quotes or sounds (of which there are plenty).
I agree that we have the same problems with sound and quotes and I agree that quotes are even more a problem than images.
Erik, up to now I read three things from your postings:
1. "fair use" quotes are more a problem than images 2. in this posting you tried to defend the usage of "fair use" images by saying that they are not directly part of the article but are referenced 3. you are not happy with the consequences
Please, tell me if you think that "fair use" content (images, quotes whatever) is compatible (if directly used within the article) with the GFDL or not (regardless of any consequences).
<snip>
What is now, in the context of Wikipedia, the work that is licensed under the FDL? It is not what the server generates, it is not what the web browser renders, for neither of these entities have the legal rights to claim a copyright. It is the document entered by the user in the Wikipedia article submission form, in its original, "transparent" wikitext source code form.
<snip>
Erik, these are technical details no user will ever understand and I doubt some lawyer buys this argument. Whatsoever, we then agree that we have to at least move the "free use" content away from the article source, right?
As a consequence we have to emphasise that _only_ the wikipedia article source is free, _not_ the whole article. This is something which nobody will expect if we call us "Wikipedia, the free encyclopedia".
Citing from [[Wikipedia:Copyrights]]:
"The goal of Wikipedia is to create an information source in an encyclopedia format that is freely available. The license we use grants free access to our content in the same sense as free software is licensed freely. That is to say, Wikipedia content can be copied, modified, and redistributed so long as the new version grants the same freedoms to others and acknowledges Wikipedia as the source. Wikipedia articles therefore will remain free forever and can be used by anybody subject to certain restrictions, most of which serve to ensure that freedom. "
Let me repeat this: "The license we use grants free access to our content [...] Wikipedia content can be [...] modified". This is a statement about the "content" in general without mentioning any restrictions.
Erik, do you agree that we should change this statement? Should we say that only the source of the article is free, but not the "visible" article itself? Did I understood your position correctly?
best regards, Marco
Marco-
You would like to see this status *changed*. The legal case for doing so is no stronger than the case for prohibiting fair use of quotes or sounds (of which there are plenty).
I agree that we have the same problems with sound and quotes and I agree that quotes are even more a problem than images.
Good. Do you agree that removing quotes from Wikipedia is not an option?
Please, tell me if you think that "fair use" content (images, quotes whatever) is compatible (if directly used within the article) with the GFDL or not (regardless of any consequences).
I think they are compatible if they are separately licensed. They mey also be compatible if they are combined, but that is not my understanding of the FDL and of fair use.
Erik, these are technical details no user will ever understand and I doubt some lawyer buys this argument.
Actually, I do not consider it very ambiguous at all. The FDL makes no reference to linked materials, period. Therefore linked materials are separate works.
Whatsoever, we then agree that we have to at least move the "free use" content away from the article source, right?
Yes.
As a consequence we have to emphasise that _only_ the wikipedia article source is free, _not_ the whole article. This is something which nobody will expect if we call us "Wikipedia, the free encyclopedia".
What most people expect is that Wikipedia is free as in beer. We surprise them when we tell them that 90%+ of it are also free as in speech.
Erik, do you agree that we should change this statement? Should we say that only the source of the article is free, but not the "visible" article itself? Did I understood your position correctly?
We should clarify that we allow limited fair use of materials, and that the different components of an article (text, images, quotes, sounds) need to be checked separately if one intends to use Wikipedia materials in a way not allowed by fair use law or the local equivalent.
Regards,
Erik
On Sun, Jun 01, 2003 at 10:31:00PM +0200, Erik Moeller wrote:
As a consequence we have to emphasise that _only_ the wikipedia article source is free, _not_ the whole article. This is something which nobody will expect if we call us "Wikipedia, the free encyclopedia".
What most people expect is that Wikipedia is free as in beer. We surprise them when we tell them that 90%+ of it are also free as in speech.
I officialy propose that we ban him. It should be obvious to everyone now that he's much more dangerous than Helga.
If you want your own beer-free project, do everyone a favour, go ahead with it, and leave Wikipedia. Wikipedia is going to stay 100% speech-free project.
Tomasz wrote:
I officialy propose that we ban him.
Hmhm. It should be here somewhere .. ah, there it is:
__________________ /| /| | | ||__|| | Please do | / O O__ NOT | / \ feed the | / \ \ troll | / _ \ \ ______________| / |____\ \ || / | | | |____/ || / |_|_|/ \ __|| / / \ |____| || / | | /| | --| | | |// |____ --| * _ | |_|_|_| | -/ *-- _--\ _ \ // | / _ \ _ // | / * / _ /- | - | | * ___ c_c_c_C/ \C_c_c_c____________
Regards,
Erik
On Sunday 01 June 2003 22:31, Erik Moeller wrote:
I agree that we have the same problems with sound and quotes and I agree that quotes are even more a problem than images.
Good. Do you agree that removing quotes from Wikipedia is not an option?
No, if the quotes are not compatible with GFDL we _have to_ remove them from the article source.
I think they are compatible if they are separately licensed.
O.k. but then we have to make this very clear that the GFDL applies _not_ to all the content we provide.
Actually, I do not consider it very ambiguous at all. The FDL makes no reference to linked materials, period. Therefore linked materials are separate works.
You see it this way we'll see what the lawyers from the FSF say about this.
Whatsoever, we then agree that we have to at least move the "free use" content away from the article source, right?
Yes.
Good, finally we agree :-) This of course includes "free use" quotes.
As a consequence we have to emphasise that _only_ the wikipedia article source is free, _not_ the whole article. This is something which nobody will expect if we call us "Wikipedia, the free encyclopedia".
What most people expect is that Wikipedia is free as in beer. We surprise them when we tell them that 90%+ of it are also free as in speech.
This is a nice interpretation, but has nothing to do with what people that come from the free software world expect. _I_ expect a 100% free (in the sense of freedom) encyclopedia and to be honest I am very disappointed if this becomes official policy.
Do you agree if I change:
"The license we use grants free access to our content in the same sense as free software is licensed freely. That is to say, Wikipedia content can be copied, modified, and redistributed so long as the new version grants the same freedoms to others and acknowledges Wikipedia as the source. Wikipedia articles therefore will remain free forever and can be used by anybody subject to certain restrictions, most of which serve to ensure that freedom."
to:
"The license we use grants free access to our content. That is to say, most of Wikipedia content can be copied, modified, and redistributed so long as the new version grants the same freedoms to others and acknowledges Wikipedia as the source. Wikipedia articles (not all content) therefore will remain free forever and can be used by anybody subject to certain restrictions, most of which serve to ensure that freedom."
Erik, do you agree that we should change this statement? Should we say that only the source of the article is free, but not the "visible" article itself? Did I understood your position correctly?
We should clarify that we allow limited fair use of materials, and that the different components of an article (text, images, quotes, sounds) need to be checked separately if one intends to use Wikipedia materials in a way not allowed by fair use law or the local equivalent.
If this is or becomes official policy then Wikipedia is not free (in the sense of freedom) anymore. Then please also replace "Wikipedia the free encyclopedia" by something else, because it then becomes a lie :-(
Very disappointed, Marco
On Sun, 1 Jun 2003, Marco Krohn wrote:
I think they are compatible if they are separately licensed.
O.k. but then we have to make this very clear that the GFDL applies _not_ to all the content we provide.
Isn't this already too late? You can only dual license copyleft material if all copyright holders agree to it. The people who have posted stuff so far on Wikipedia have posted it under GFDL exclusively.
If we want to combine different licenses, we have to track down all contributers for each relevant article, and get their permisson. Otherwise, we're breaking GFDL.
And the only way to combine copylefted material with restricted material is by dual licensing (ours would be "GFDL for everyone, Fair Use for wikipedia.org").
(As I understand it.)
-- Daniel
Hr. Daniel Mikkelsen wrote:
Isn't this already too late? You can only dual license copyleft material if all copyright holders agree to it. The people who have posted stuff so far on Wikipedia have posted it under GFDL exclusively.
I'm sure that most of them have never given any serious attention to the licence details.
If we want to combine different licenses, we have to track down all contributers for each relevant article, and get their permisson. Otherwise, we're breaking GFDL.
Wouldn't that be just a little unrealistic? A more common sense solution would be better.
Ec
On Sun, 1 Jun 2003, Ray Saintonge wrote:
Isn't this already too late? You can only dual license copyleft material if all copyright holders agree to it. The people who have posted stuff so far on Wikipedia have posted it under GFDL exclusively.
I'm sure that most of them have never given any serious attention to the licence details.
I can't believe I'm hearing this. I don't know what to say.
I remember reading about how we're going to one day plug all history back into all articles, from the earlier software phases, because if we didn't we would be violating the license terms set by our contributers (GFDL).
And now we're just going to brush it all aside? I'm outraged.
If we want to combine different licenses, we have to track down all contributers for each relevant article, and get their permisson. Otherwise, we're breaking GFDL.
Wouldn't that be just a little unrealistic? A more common sense solution would be better.
Yes, this is unrealistic. This is why I said that it is probably too late to begin dual licensing Wikipedia content (except in the case of new articles).
The more common sense solution (in that it is a realistic endeavour) is of course to remove all quotes. Frankly, there aren't that many there to begin with.
-- Daniel
On Monday 02 June 2003 00:48, Ray Saintonge wrote:
Hr. Daniel Mikkelsen wrote:
Isn't this already too late? You can only dual license copyleft material if all copyright holders agree to it. The people who have posted stuff so far on Wikipedia have posted it under GFDL exclusively.
I'm sure that most of them have never given any serious attention to the licence details.
Which doesn't make it better. If you want to relicense content you have to have permission of all copyright holders, this is what the license says. Sorry, but there is no way around it.
If we want to combine different licenses, we have to track down all contributers for each relevant article, and get their permisson. Otherwise, we're breaking GFDL.
Wouldn't that be just a little unrealistic?
Yes indeed, this is far from being possible. There are only very few projects which went through all struggle in order to achieve a relicense, the biggest I am aware of was (related to?) the Mozilla project. For a project of the size of Wikipedia this simply won't work.
A more common sense solution would be better.
The "common sense" solution is to remove all content which makes license problems and this is all "fair use" stuff. If we don't remove this stuff we will again and again have license problems. Believe me, I have watched the [[KDE]] project for a long time and I am very well aware of years of license problems & battles and I really don't want to see the same mess here again.
If we have a license we have to take this seriously. We can't simply say "mmh, I don'l like the consequences, let's forget about the license". Sorry.
best regards, Marco
On Sunday 01 June 2003 23:32, Hr. Daniel Mikkelsen wrote:
On Sun, 1 Jun 2003, Marco Krohn wrote:
I think they are compatible if they are separately licensed.
O.k. but then we have to make this very clear that the GFDL applies _not_ to all the content we provide.
Isn't this already too late? You can only dual license copyleft material if all copyright holders agree to it. The people who have posted stuff so far on Wikipedia have posted it under GFDL exclusively.
If we want to combine different licenses, we have to track down all contributers for each relevant article, and get their permisson. Otherwise, we're breaking GFDL.
These are good questions.
I see it the same way as you do, but Erik claims (at least for images) that these are different pages with a different license that are added together by the server and by that we don't violate the GFDL. So he basically says that he found a loophole in the GFDL which allows mixing free / non-free content.
best regards, Marco
Am Mon, 2003-06-02 um 01.09 schrieb Marco Krohn:
I see it the same way as you do, but Erik claims (at least for images) that these are different pages with a different license that are added together by the server and by that we don't violate the GFDL. So he basically says that he found a loophole in the GFDL which allows mixing free / non-free content.
Like 99% of the people here, I am not lawyer, but here is my opinion - how I interpret GFDL, how I think the spirit of GFDL is:
IMHO articles should be seen as one work, and therefore it is 100% GFDL, or it violates the GFDL.
Personally, I would not like to see "my" work being combined with non-free content.
Another thing: Until now, I did not notice that the English Wikipedia says "This text is available ...", while the German Wikipedia says "Diese Seite..." ("This page ...") "... is available under the terms of the GNU FDL".
I also do think that this discussion belongs to another mailing list, wikipedia-l, and therefore cross-posted this message (sorry if I offend someone by this, but I think it is not fair to the Wikipedia users that general, important issues are discussed in a place where the subject is normally just software development ...)
Best regards, Zeno Gantner
Marco-
You see it this way we'll see what the lawyers from the FSF say about this.
The lawyers from the FSF interpret the FDL in the favor of the FSF's ideology. They are not an independent authority. If you want a legal opinion on the FDL, hire a lawyer.
As a consequence we have to emphasise that _only_ the wikipedia article source is free, _not_ the whole article. This is something which nobody will expect if we call us "Wikipedia, the free encyclopedia".
What most people expect is that Wikipedia is free as in beer. We surprise them when we tell them that 90%+ of it are also free as in speech.
This is a nice interpretation, but has nothing to do with what people that come from the free software world expect. _I_ expect a 100% free (in the sense of freedom) encyclopedia and to be honest I am very disappointed if this becomes official policy.
Well, do you prefer it if it happens without being reflected in all our policy documents? Because this is currently the case. You may say you are disappointed if Wikipedia is not 100% free in the Stallman-sense. I say I am disappointed if I can no longer quote articles, speeches, scientific papers, and so forth. I say I am disappointed if all the work that has been done in digitizing, converting and collecting small music samples will be abandoned. I say I am disappointed if the picture of the Rumsfeld/ Hussein handshake will be removed because it is copyrighted. I say I am disappointed if the last little freedom that copyright law grants us is taken away not by the content industry, but by overzealous, paranoid Wikipedians. What's next? DRM to protect our content against FDL violations?
Wikipedia documents human knowledge. This is impossible without quoting. It is impossible without fair use. An encyclopedia that cannot cite directly what others say is not an encyclopedia.
I want Wikipedia to be a free learning resource. If small but important parts that cannot easily be replaced with open content are not freely modifiable because of copyright law, then that is acceptable to me. You, Axel and Brion see a slippery slope -- I and Ray do not see this, and Jimbo is in the middle.
"The license we use grants free access to our content. That is to say, most of Wikipedia content can be copied, modified, and redistributed so long as the new version grants the same freedoms to others and acknowledges Wikipedia as the source. Wikipedia articles (not all content) therefore will remain free forever and can be used by anybody subject to certain restrictions, most of which serve to ensure that freedom."
That is an improvement, but it is really better to refer specifically to fair use content that can be part of articles, or fair use of images.
If this is or becomes official policy then Wikipedia is not free (in the sense of freedom) anymore. Then please also replace "Wikipedia the free encyclopedia" by something else, because it then becomes a lie :-(
I'm sorry to say so, but Mr. Stallman does not have any rights to the interpretation of the term "free". I'm sure he is equally sorry about that.
Regards,
Erik
On Sunday 01 June 2003 23:43, Erik Moeller wrote:
You see it this way we'll see what the lawyers from the FSF say about
this.
The lawyers from the FSF interpret the FDL in the favor of the FSF's ideology. They are not an independent authority. If you want a legal opinion on the FDL, hire a lawyer.
Sorry, but I trust a lawyer opinion more than non-lawyer statements and if you don't seem to trust the FSF then I ask myself why the GFDL was choosen?
[...]
I say I am disappointed if the last little freedom that copyright law grants us is taken away not by the content industry, but by overzealous, paranoid Wikipedians. What's next? DRM to protect our content against FDL violations?
Erik, I have the highest respect for you and your work, but this is below your standard. Please, I try to understand your point of view and I expect the same from you. Even if you might not believe it - we all here want the best for the Wikipedia project and we all want to keep "evil" away from Wikipedia. Thanks.
Wikipedia documents human knowledge.
right
This is impossible without quoting.
Sorry, wrong conclusion. You can always rephrase sentences. It is not neccessary to use quotes if you want to document human knowledge.
It is impossible without fair use. An encyclopedia that cannot cite directly what others say is not an encyclopedia.
Well, based on a wrong conclusion the statement does not get more true. You either define that an encyclopedia by saying that it includes citations (than your statement is trivial) or you say that an encyclopedia documents human knowledge, then your statement is wrong.
If this is or becomes official policy then Wikipedia is not free (in the sense of freedom) anymore. Then please also replace "Wikipedia the free encyclopedia" by something else, because it then becomes a lie :-(
I'm sorry to say so, but Mr. Stallman does not have any rights to the interpretation of the term "free". I'm sure he is equally sorry about that.
I am very well aware of the fact that "free" depends on the definiton of the term "free". The Wikipedia FAQ claims that Wikipedia is free in this very definition:
"Free content (or open content) works are those other than software which are licensed freely in the same (freedom) sense as Free software is licensed freely, see Free software definition. That is to say, recipients are given permission to use the content for any purpose, copy it, modify it, and to redistribute modified versions."
and even the announcements for the 100.000 article says that the content of Wikipedia is under terms of GFDL.
Therefore it is quite safe to say that everyone reading this and coming from the free software world will expect that "free" means free in the GFDL sense, no? Of course you can redefine "free" now. With the same argument Microsoft could say they are writing "free" software because everyone is "free" to use it. .
Can we nevertheless define "free" in this discussion in the canonical sense (as in free software). It is wildly accepted this way and otherwise it will be even more confusing.
It is the very first time that I heard that the statements above (that Wikipedia is "free") is not true anymore. As far as I understood it the project started as a "free" project. I consider allowing non-free content ("fair use" content) a drastic policy change. Was there any agreement on this change or was that done silently by someone (I don't say that s/he had bad intentions)?
best regards, Marco
Marco-
Sorry, but I trust a lawyer opinion more than non-lawyer statements and if you don't seem to trust the FSF then I ask myself why the GFDL was choosen?
I didn't choose the FDL, and I wouldn't. If it wasn't so hard to change licenses, we might have already switched to Creative Commons style copyleft or something similar. I am increasingly coming to the conclusion that the FDL is unsuitable for online publications. Even the Debian project rejects it. For my own textual projects, I use the public domain. The GNU project should stick to software licenses.
Sadly, because of Wikipedia, the FDL will continue to enjoy popularity because people will choose it for their projects in order to be compatible with us.
[...]
I say I am disappointed if the last little freedom that copyright law grants us is taken away not by the content industry, but by overzealous, paranoid Wikipedians. What's next? DRM to protect our content against FDL violations?
Erik, I have the highest respect for you and your work, but this is below your standard. Please, I try to understand your point of view and I expect the same from you. Even if you might not believe it - we all here want the best for the Wikipedia project and we all want to keep "evil" away from Wikipedia. Thanks.
I stand by my words. *Limiting* or *modifying* fair use is debatable. Eliminating it entirely is not. It would be completely paranoid. I think Mr. Wegrzanowski's recent comments illustrate the connection quite well. And from that paranoia, it is only a small step to wanting the security of additional "enforcement". The GNU project with its cult-like ideology and its constant threats against supposed GPL violators reminds me a lot of those who *defend* "intellectual property". I'm not opposed to copyleft per se, but the danger of using copyright against copyright is the same as when playing with fire -- you might start to like it.
This is impossible without quoting.
Sorry, wrong conclusion. You can always rephrase sentences.
Sure, you can always rephrase sentences. But information and emotion tends to get lost on the way. You are no longer saying what a person said, you are saying how you understood that person.
"I have a dream." Martin Luther King said that he "had a dream". King emphasized that he had a dream. In the middle of his speech, King remarked that he had a dream. King spoke of a dream he had. It was a dream, King said, that motivated him .. It must have been at some point in his sleep that the idea came to him .. King claimed, without presenting evidence, to have a "dream". With tears in his voice and his hands shaking, King spoke valiantly of the dream he had ..
Citations become interpretations. That is acceptable in some contexts, especially where purely factual information is concerned. But as a standard for the whole project, it is unprofessional and non-encyclopedic.
It is not neccessary to use quotes if you want to document human knowledge.
It is necessary if you want to create an encyclopedia.
It is impossible without fair use. An encyclopedia that cannot cite directly what others say is not an encyclopedia.
Well, based on a wrong conclusion the statement does not get more true. You either define that an encyclopedia by saying that it includes citations (than your statement is trivial) or you say that an encyclopedia documents human knowledge, then your statement is wrong.
An encyclopedia documents human knowledge as precisely and accurately as possible. By having a requirement to paraphrase all quotes you lose the ability to do this. I can't believe we're even discussing this. This must be some joke that I fail to understand.
I am very well aware of the fact that "free" depends on the definiton of the term "free". The Wikipedia FAQ claims that Wikipedia is free in this very definition:
<snip>
My point is this: If you think that Wikipedia is no longer a "free" encyclopedia because there's some fair use content in there, then you're using an "ideologically pure" definition of free that is not identical to mine, and not identical to the current policy on the English Wikipedia.
I have already explained to you in detail why fair use is compatible with the FDL when properly separated. All that remains to be done is to modify the software in accordance with our discussion, and to clarify some policy pages. We may have to build the transclusion feature I mentioned, although I personally don't care enough to do so. "Intellectual property" does not exist, after all.
Regards,
Erik
On Monday 02 June 2003 01:19, Erik Moeller wrote:
I didn't choose the FDL, and I wouldn't. If it wasn't so hard to change licenses, we might have already switched to Creative Commons style copyleft or something similar. I am increasingly coming to the conclusion that the FDL is unsuitable for online publications. Even the Debian project rejects it. For my own textual projects, I use the public domain. The GNU project should stick to software licenses.
Just to avoid a misunderstanding: the Debian project rejects the GFDL mainly because of the "invariant section" (as far as I understood it). So GFDL is, according to them, "not free enough". FWIW,
Marco
On Mon, Jun 02, 2003 at 09:07:32AM +0200, Marco Krohn wrote:
On Monday 02 June 2003 01:19, Erik Moeller wrote:
I didn't choose the FDL, and I wouldn't. If it wasn't so hard to change licenses, we might have already switched to Creative Commons style copyleft or something similar. I am increasingly coming to the conclusion that the FDL is unsuitable for online publications. Even the Debian project rejects it. For my own textual projects, I use the public domain. The GNU project should stick to software licenses.
Just to avoid a misunderstanding: the Debian project rejects the GFDL mainly because of the "invariant section" (as far as I understood it). So GFDL is, according to them, "not free enough". FWIW,
GFDL contains basic licensing scheme which Debian fully accepts and some "extensions" that are obiously non-free - invariant sections, front cover texts and back cover texts. GFDL document without those is perfectly ok for Debian.
Marco Krohn wrote:
I am more interested in the consequences of adding "fair use" images to the articles. As Axel pointed out we will violate the GFDL by embedding "fair use" images in our articles.
I don't think it is at all obvious that adding "fair use" content to an article makes it incompatible with the GFDL. If so, then we have a major problem with respect to even quoting copyrighted books.
Alex has said this, and you have echoed it, but I'd like to see a full analysis of it before we rely on it or accept it as true.
If it *is* true that nothing GFDL can contain any "fair use" content at all, then we will have to not quote from books or newspapers. But I think that's absurd, and I think that the GFDL is not incompatible with ordinary fair use.
--Jimbo
On Sunday 01 June 2003 14:51, Jimmy Wales wrote:
Marco Krohn wrote:
I am more interested in the consequences of adding "fair use" images to the articles. As Axel pointed out we will violate the GFDL by embedding "fair use" images in our articles.
I don't think it is at all obvious that adding "fair use" content to an article makes it incompatible with the GFDL.
_I_ think this is obvious (and please believe me that I am very happy with being wrong ;-)
"fair use" does not allow modifications. GFDL allows that.
adding "fair use" content to "fair use" content does not neccessarily result in "fair use" content. GFDL has no such limitations.
If so, then we have a major problem with respect to even quoting copyrighted books.
We will see what FSF answers to that. I have sent an email to the GNU organization and asked them for help on this. As soon as I have their reply I'll post it here.
best regards, Marco
Marco Krohn wrote:
We will see what FSF answers to that. I have sent an email to the GNU organization and asked them for help on this. As soon as I have their reply I'll post it here.
I support you in this. Richard Stallman knows who I am, and so if he or his people need to ask more specific questions, please let them know that I'm available and eager to get this resolved.
--Jimbo
Erik Moeller wrote:
What I do suggest is that we use common sense to determine the few cases where we think fair use law is applicable, instead of simply ignoring these few loopholes in the current restrictive copyright code. I would think that someone opposed to intellectual property would embrace the idea that we should defend and make use of our rights, instead of bowing to the pressure from copyright holders.
I agree with this approach, but I don't see where we have had problems with the copyright holders themselves. It would be easy to agree to remove this material if the request came from the copyright holder himself. The pressure so=far seems to be coming from people who imagine that something is copyright.
To me, an important question is, "Can a copyright exist without a copyright holder?" In a simple case suppose that the apparent copyright holder died intestate last year, with no apparent heir, and not enough in personal assets to warrant probate. Who now owns the copyright?
My inclination would be to use borderline material, with an appropriate warning that we will happily remove on request from a properly identified owner.
Eclecticology
On Sun, 1 Jun 2003, Ray Saintonge wrote:
My inclination would be to use borderline material, with an appropriate warning that we will happily remove on request from a properly identified owner.
It's harder to remove material from a printed edition of Wikipedia. This means any printed version must cut away a degree of the content.
This makes the problem with a compromise clearer: If we flag all "fair use" images and accept them, they will discourage the hunt for non-copyrighted similar material - the article already has a picture. (IMO, of course.)
-- Daniel
Hr. Daniel Mikkelsen wrote:
On Sun, 1 Jun 2003, Ray Saintonge wrote:
My inclination would be to use borderline material, with an appropriate warning that we will happily remove on request from a properly identified owner.
It's harder to remove material from a printed edition of Wikipedia. This means any printed version must cut away a degree of the content.
This makes the problem with a compromise clearer: If we flag all "fair use" images and accept them, they will discourage the hunt for non-copyrighted similar material - the article already has a picture. (IMO, of course.)
A printed edition (or even a CD edition) has been frequently mentioned on people's wish lists. Until people realistically start talking about how to fund such an undertaking, I have a hard time considering the idea as anything more than a pipe-dream.
Flagging fair use images is fine, as would flagging all possibly copyright impaired contributions where the situation is uncertain. In a printed edition, the flagged material could then be easily excluded.
Coding boxes :-) could work very well for that.
Ec
--- Ray Saintonge saintonge@telus.net wrote:
Hr. Daniel Mikkelsen wrote:
It's harder to remove material from a printed edition of Wikipedia. This means any printed version must cut away a degree of the
content.
A printed edition (or even a CD edition) has been frequently mentioned on people's wish lists. Until people realistically start talking about how to fund such an undertaking, I have a hard time considering the idea as anything more than a pipe-dream.
But the idea of a freely distributable and modifiable encyclopedia is surely not a pipe-dream. With a hodge-podge of fair-use, exclusively-licenced-to-Wikipedia, only-for-nonommercial-use and only-for-educational-use materials, each with slightly different interpretatins in different countries, that idea is dead.
Axel
__________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com
On Sun, 1 Jun 2003, Ray Saintonge wrote:
A printed edition (or even a CD edition) has been frequently mentioned on people's wish lists. Until people realistically start talking about how to fund such an undertaking, I have a hard time considering the idea as anything more than a pipe-dream.
Wikipedia's age can be counted in months, dismissing long term goals as pipe dreams this early on is forgetting the spectacular success Wikipedia has already achieved - who would have believed?
As for funding, is it so implausible that some publisher decides to make a printed edition of the best of our articles and sell it for profit? It would be very cheap for him.
-- Daniel
On Sun, Jun 01, 2003 at 11:01:04AM -0700, Ray Saintonge wrote:
Hr. Daniel Mikkelsen wrote:
On Sun, 1 Jun 2003, Ray Saintonge wrote:
My inclination would be to use borderline material, with an appropriate warning that we will happily remove on request from a properly identified owner.
It's harder to remove material from a printed edition of Wikipedia. This means any printed version must cut away a degree of the content.
This makes the problem with a compromise clearer: If we flag all "fair use" images and accept them, they will discourage the hunt for non-copyrighted similar material - the article already has a picture. (IMO, of course.)
A printed edition (or even a CD edition) has been frequently mentioned on people's wish lists. Until people realistically start talking about how to fund such an undertaking, I have a hard time considering the idea as anything more than a pipe-dream.
Flagging fair use images is fine, as would flagging all possibly copyright impaired contributions where the situation is uncertain. In a printed edition, the flagged material could then be easily excluded.
Coding boxes :-) could work very well for that.
It's not a pipe dream, it's software problem. In Phase 4, with high-quality PDF or tex output, it will be possible to print content from Wikipedia (probably only sections, not whole Wikipedia) And aren't CD editions happening right now ?
On Sun, 1 Jun 2003, Tomasz Wegrzanowski wrote:
And aren't CD editions happening right now ?
No, not until my (or someone else) script is running OK. And after that, you need some way to mass-produce CD, or just let people downloading ISOs.
Of course, the CD cannot contain any image (the text articles totally fill it).
(hope it does not count as "feeding" :-)))
Ciao, Alfio
On Sunday 01 June 2003 18:13, Ray Saintonge wrote:
Erik Moeller wrote:
What I do suggest is that we use common sense to determine the few cases where we think fair use law is applicable, instead of simply ignoring these few loopholes in the current restrictive copyright code. I would think that someone opposed to intellectual property would embrace the idea that we should defend and make use of our rights, instead of bowing to the pressure from copyright holders.
I agree with this approach, but I don't see where we have had problems with the copyright holders themselves. It would be easy to agree to remove this material if the request came from the copyright holder himself. The pressure so=far seems to be coming from people who imagine that something is copyright.
IMHO the problem is different: the question is _not_ whether we are allowed to use content under "fair use" right, but if it allowed to use this material under the license we are using (GFDL). This is an important difference.
You allowed to use "fair use" content in general (I think nobody doubts that), but some of us think that it is not legal combining these images / quotes / whatever with GFDL content because it violates the GFDL.
best regards, Marco Krohn
On Sun, Jun 01, 2003 at 09:13:08AM -0700, Ray Saintonge wrote:
Erik Moeller wrote:
What I do suggest is that we use common sense to determine the few cases where we think fair use law is applicable, instead of simply ignoring these few loopholes in the current restrictive copyright code. I would think that someone opposed to intellectual property would embrace the idea that we should defend and make use of our rights, instead of bowing to the pressure from copyright holders.
I agree with this approach, but I don't see where we have had problems with the copyright holders themselves. It would be easy to agree to remove this material if the request came from the copyright holder himself. The pressure so=far seems to be coming from people who imagine that something is copyright.
To me, an important question is, "Can a copyright exist without a copyright holder?" In a simple case suppose that the apparent copyright holder died intestate last year, with no apparent heir, and not enough in personal assets to warrant probate. Who now owns the copyright?
My inclination would be to use borderline material, with an appropriate warning that we will happily remove on request from a properly identified owner.
When whole Wikipedia will be included in Linux distros, distributed on CDs, and articles from Wikipedia will be used by hundreds of books and magazines and thousands of web sites, how are you going to do that ? We'd better play safe here.
Anything that doesn't allow Debian and other distros distribute Wikipedia in their main section is not acceptable. (That means DFSG and OSI compatibility, and "fair use" is clearly incompatible with these rules, btw. invariant sections/front and back covers etc. of GFDL are also incompatible with DFSG, negotiations with FSF are in process now to fix that problem with GFDL, plain GFDL is DFSG-compatible)
Tomasz Wegrzanowski wrote:
On Sun, Jun 01, 2003 at 09:13:08AM -0700, Ray Saintonge wrote:
Erik Moeller wrote:
What I do suggest is that we use common sense to determine the few cases where we think fair use law is applicable, instead of simply ignoring these few loopholes in the current restrictive copyright code. I would think that someone opposed to intellectual property would embrace the idea that we should defend and make use of our rights, instead of bowing to the pressure from copyright holders.
My inclination would be to use borderline material, with an appropriate warning that we will happily remove on request from a properly identified owner.
When whole Wikipedia will be included in Linux distros, distributed on CDs, and articles from Wikipedia will be used by hundreds of books and magazines and thousands of web sites, how are you going to do that ? We'd better play safe here.
How others use the material is their problem, and their risk. We shouldn't have to baby-sit them. Whatever license or copyrights are applied to Wikipedia reflects a collective comfort level. The user is still responsible for his own due-dilligence, no matter how conservative we are on the matter.
Anything that doesn't allow Debian and other distros distribute Wikipedia in their main section is not acceptable. (That means DFSG and OSI compatibility, and "fair use" is clearly incompatible with these rules, btw. invariant sections/front and back covers etc. of GFDL are also incompatible with DFSG, negotiations with FSF are in process now to fix that problem with GFDL, plain GFDL is DFSG-compatible)
At some point along the way I get lost in the subtle distinctions that separate this alphabet soup of licenses. I'm sure that the average contributor doesn't waste much time on it either. Similarly, when we install a new piece of software, how many of us really read and understand the legalese bafflegab of licences. We just click on yes, because if we don't the program won't work. When we contribute to Wikipedia we agree to the principle that we want our writing to be generally available to the public to use as it sees fit, and that we are sharing it with like-minded individuals. We don't worry about the contortions of armchair lawyers.
The practical principle that it is easier to get forgiveness than to get permission goes a long way as long as it is applied with a level of common sense whereby we restrain ourselves from flagrant abuse. The rule of law is a good thing, but too much law is simply ignored.
Ec
On Sun, Jun 01, 2003 at 11:33:05AM -0700, Ray Saintonge wrote:
Tomasz Wegrzanowski wrote:
When whole Wikipedia will be included in Linux distros, distributed on CDs, and articles from Wikipedia will be used by hundreds of books and magazines and thousands of web sites, how are you going to do that ? We'd better play safe here.
How others use the material is their problem, and their risk. We shouldn't have to baby-sit them. Whatever license or copyrights are applied to Wikipedia reflects a collective comfort level. The user is still responsible for his own due-dilligence, no matter how conservative we are on the matter.
They're not "others", lot of "them" are Wikipedians. If everyone had to consult a lawyer before distributing free software, it wouldn't be half as successful as it is now.
Anything that doesn't allow Debian and other distros distribute Wikipedia in their main section is not acceptable. (That means DFSG and OSI compatibility, and "fair use" is clearly incompatible with these rules, btw. invariant sections/front and back covers etc. of GFDL are also incompatible with DFSG, negotiations with FSF are in process now to fix that problem with GFDL, plain GFDL is DFSG-compatible)
At some point along the way I get lost in the subtle distinctions that separate this alphabet soup of licenses. I'm sure that the average contributor doesn't waste much time on it either. Similarly, when we install a new piece of software, how many of us really read and understand the legalese bafflegab of licences. We just click on yes, because if we don't the program won't work. When we contribute to Wikipedia we agree to the principle that we want our writing to be generally available to the public to use as it sees fit, and that we are sharing it with like-minded individuals. We don't worry about the contortions of armchair lawyers.
The practical principle that it is easier to get forgiveness than to get permission goes a long way as long as it is applied with a level of common sense whereby we restrain ourselves from flagrant abuse. The rule of law is a good thing, but too much law is simply ignored.
All major open projects are very serious about legal issues, and none would accept "just a few lines of code" from dubious source. Some are more paranoid (like FSF and Debian), some are less, but there is no single project that would accept code or other content under "fair use", as that's very likely to cause them, their users and their distributors lot of serious legal problems.
Tomasz Wegrzanowski wrote:
On Sun, Jun 01, 2003 at 11:33:05AM -0700, Ray Saintonge wrote:
Tomasz Wegrzanowski wrote:
When whole Wikipedia will be included in Linux distros, distributed on CDs, and articles from Wikipedia will be used by hundreds of books and magazines and thousands of web sites, how are you going to do that ? We'd better play safe here.
How others use the material is their problem, and their risk. We shouldn't have to baby-sit them. Whatever license or copyrights are applied to Wikipedia reflects a collective comfort level. The user is still responsible for his own due-dilligence, no matter how conservative we are on the matter.
They're not "others", lot of "them" are Wikipedians. If everyone had to consult a lawyer before distributing free software, it wouldn't be half as successful as it is now.
Perhaps the others are Wikipedians, but that does not absolve them of a responsibility of due diligence when they contribute to an outside project. If anything, by being aware of the present debate they may become subject to a higher standard than a non-Wikipedian. One of the purposes of fair use and safe harbor provisions is to remove the need to consult lawyers every time you want to add something of uncertain copyright.
Anything that doesn't allow Debian and other distros distribute Wikipedia in their main section is not acceptable. (That means DFSG and OSI compatibility, and "fair use" is clearly incompatible with these rules, btw. invariant sections/front and back covers etc. of GFDL are also incompatible with DFSG, negotiations with FSF are in process now to fix that problem with GFDL, plain GFDL is DFSG-compatible)
At some point along the way I get lost in the subtle distinctions that separate this alphabet soup of licenses. I'm sure that the average contributor doesn't waste much time on it either. Similarly, when we install a new piece of software, how many of us really read and understand the legalese bafflegab of licences. We just click on yes, because if we don't the program won't work. When we contribute to Wikipedia we agree to the principle that we want our writing to be generally available to the public to use as it sees fit, and that we are sharing it with like-minded individuals. We don't worry about the contortions of armchair lawyers.
The practical principle that it is easier to get forgiveness than to get permission goes a long way as long as it is applied with a level of common sense whereby we restrain ourselves from flagrant abuse. The rule of law is a good thing, but too much law is simply ignored.
All major open projects are very serious about legal issues, and none would accept "just a few lines of code" from dubious source. Some are more paranoid (like FSF and Debian), some are less, but there is no single project that would accept code or other content under "fair use", as that's very likely to cause them, their users and their distributors lot of serious legal problems.
But this is about content rather than code. I can easily see where an inappropriately used section of code could bring a whole project to a halt if it ever had to be withdrawn because of a copyright violation. With content removal of a problem passage would not need to jeopardize the project to any significant extent. Published snapshots of the project can be in fairly short print runs that are consistent with short term demands for the product. If a copyright problem arises with some article it can be deleted in time for the next run.
I'm certainly aware of the ambiguities in the word "free". They overlap to a great extent. That being said, I'm less concerned about the "free beer" or "free ride" sense. Freedom of speech, and free access to information are a lot more interesting than just getting something for nothing.
Ec
[Note: This is posted to both <wikitech-l> and <wikipedia-l> to preserve continuity; replies should go to <wikipedia-l>.]
Tomasz Wegrzanowski wrote:
Eclecticology wrote:
How others use the material is their problem, and their risk. We shouldn't have to baby-sit them. Whatever license or copyrights are applied to Wikipedia reflects a collective comfort level. The user is still responsible for his own due-dilligence, no matter how conservative we are on the matter.
They're not "others", lot of "them" are Wikipedians. If everyone had to consult a lawyer before distributing free software, it wouldn't be half as successful as it is now.
This isn't entirely true; the more paranoid must still check the less paranoid. But Wikipedia needs to make it easy by separating out the "fair use" pics and not claiming any longer that they are being distributed under the GFDL.
As for brief quotations, well, I knew that the GNU licences would come back to bite us someday, but I expected 50 years from now (hopefully *after* current copyright law became impossible to maintain). I never thought they would prove to be inadequate so soon! Surely RMS thought of brief quotations, one hopes?
-- Toby
[Note: This is posted to both <wikitech-l> and <wikipedia-l> to preserve continuity; replies should go to <wikipedia-l>.]
Eclecticology wrote in part:
I don't see where we have had problems with the copyright holders themselves. It would be easy to agree to remove this material if the request came from the copyright holder himself. The pressure so far seems to be coming from people who imagine that something is copyright.
My inclination would be to use borderline material, with an appropriate warning that we will happily remove on request from a properly identified owner.
This is a big difference between images and text. Our text is not only distributed -- it's modified in many ways. A copyright problem there can infect many articles later on. With images, we don't have nearly this sort of practical problem.
But we still need to clearly separate out the non-free images to aid later distributors, try not to rely on them for content, and replace them with free images when possible.
-- Toby
--- Erik Moeller erik_moeller@gmx.de wrote:
Axel-
Check out for example the excellent collection of portrait photographs donated to the Library of Congress.
Which one specifically?
http://memory.loc.gov/ammem/vvhtml/vvhome.html
Actually, what we need is probably a transfer of the copyright to us, or at the very least a contract that allows unlimited sublicensing.
No, the precise thing we need is that the copyright holder releases the picture under GFDL. They retain all rights, including the right, protected under GFDL, to be listed as author in perpetuity.
And if the New York Times doesn't want to put their newspaper under GFDL, they will still have to pay the copyright holder for the right to print the image.
See where this kind of mentality is taking us? Copyright paranoia.
Speaking of rhethoric: inventing a term with a negative connotation for a position you don't like does not advance your argument.
I would think that someone opposed to intellectual property would embrace the idea that we should defend and make use of our rights, instead of bowing to the pressure from copyright holders.
Yes, you might think that, and you would be wrong. In the end you end up with a Wikipedia that is such a mess copyright-wise that the goal "freely redistributable" becomes a farce; just another encyclopedia, without really advancing the global situation of "intellectual property".
I think Brion's suggestion of simply linking to the external site containing the photograph is a win-win-win-win proposal:
- The copyright owner gets exposure
- Our readers get access to the educational content
- We are still able to burn a CD with all our material and put "GFDL" on the cover
- The inconvenience of the external link encourages contributors to hunt for free substitute photographs.
Alas, it also has several problems:
- When the website is down, the image is no longer available. Broken
links often go unnoticed for longer periods of time because we have no way to systematically check them.
- The image is no longer embedded in the proper context. It becomes
difficult to associate image content with image text.
- The reader is taken away from the Wikipedia navigational structure
to a non-HTML image page. This is bad user interface design.
All these I redefine as advantages with my fourth point above :-)
- fair use should be kept at a minimum,
In practice, this means: "first ask for GFDL or public domain; if they refuse, ask for special licensing; if they refuse, you most likely can use it anyway."
Axel
__________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com
Axel-
Quite useful, thanks.
Actually, what we need is probably a transfer of the copyright to us, or at the very least a contract that allows unlimited sublicensing.
No, the precise thing we need is that the copyright holder releases the picture under GFDL.
Also fine, but again, expensive, because it deviates from usual licensing policies.
See where this kind of mentality is taking us? Copyright paranoia.
Speaking of rhethoric: inventing a term with a negative connotation for a position you don't like does not advance your argument.
It's called memetic engineering, and it works, as the adoption of the term by others shows. :-)
I would think that someone opposed to intellectual property would embrace the idea that we should defend and make use of our rights, instead of bowing to the pressure from copyright holders.
Yes, you might think that, and you would be wrong. In the end you end up with a Wikipedia that is such a mess copyright-wise that the goal "freely redistributable" becomes a farce; just another encyclopedia, without really advancing the global situation of "intellectual property".
I don't see the slippery slope, sorry. We can all agree on trying to find the right balance. A "fair use is not allowed" stance is neither balanced nor logical. I have not seen your response to my analysis that conluded that quotations are more problematic than images. Do you want to get rid of those, too? Why single out images?
- fair use should be kept at a minimum,
In practice, this means: "first ask for GFDL or public domain; if they refuse, ask for special licensing; if they refuse, you most likely can use it anyway."
There are no automatisms. Wikipedia is built by people. And people like you will make sure that fair use will remain the exception.
Regards,
Erik
--- Erik Moeller erik_moeller@gmx.de wrote:
No, the precise thing we need is that the copyright holder releases the picture under GFDL.
Also fine, but again, expensive, because it deviates from usual licensing policies.
I'm not sure we should assume this before having tried. One could argue that this should actually be pretty cheap: * GFDL materials are useless for most commercial enterprises * if they refuse to release under GFDL, we can always threaten to take it under fair use, which means that they neither get author recognition nor money.
I don't see the slippery slope, sorry.
I don't either. It's either freely distributable/modifiable or it isn't.
A "fair use is not allowed" stance is neither balanced nor logical. I have not seen your response to my analysis that conluded that quotations are more problematic than images.
If I understood correctly, you argue that quotes are embedded in the text while images are kept in separate files, thus GFDL is not inherited by the photo but is inherited by the quotes. This is incorrect. Derivative work are required to be under GFDL; what constitutes a derivative work is defined by copyright law. The technical detail that text and images are typically kept in separate files is irrelevant; illustrating an article by adding a picture is a classical case of a derivative work. Moving quotes out of the main text and then "including" them somehow is a technical gimmick that doesn't change anything: adding a quote also creates a derivative work.
So yes, fair use quotes are technically violations of GFDL, but completely harmless. Nobody wants to change quotes anyway, and fair use quotes are typically minute parts the work they originate from. Even commercial redistributors can use those quotes under fair use. All three of the above are false for images.
Axel
__________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com
Axel-
I'm not sure we should assume this before having tried. One could argue that this should actually be pretty cheap:
- GFDL materials are useless for most commercial enterprises
Which includes the people who sell them ;-). The problem is that by licensing under the GFDL, the company loses the ability to license to people like us in the future, i.e. those who want to produce freely copiable materials. So they will want to get as much money from the first GFDL licensee as they can, because they won't get another one. Furthermore, copyright holders are notiously suspicious about anything new -- they usually have ready made packages and will not alter their licensing policies just because we tell them to, simply because understanding the implications of the FDL requires legal consultation which costs money.
I obviously agree that it's worth trying. But I am not very optimistic.
I don't see the slippery slope, sorry.
I don't either. It's either freely distributable/modifiable or it isn't.
In its entirety, yes. But just because we allow fair use, Wikipedia will not automatically and gradually turn into a proprietary encyclopedia (slippery slope argument).
If I understood correctly, you argue that quotes are embedded in the text while images are kept in separate files, thus GFDL is not inherited by the photo but is inherited by the quotes. This is incorrect. Derivative work are required to be under GFDL;
So is the text
Bla
in combination with the text
[[Image:Bla.jpg]]
The technical detail that text and images are typically kept in separate files is irrelevant; illustrating an article by adding a picture is a classical case of a derivative work.
The wiki-author doesn't add a picture, he adds a reference to a picture. The web browser will automatically retrieve that image if so instructed (visit the page with lynx and there is no image). The result is an aggregated, not a combined work under the FDL. The author cannot even change the image in any way by editing the article. Now, if article and image were always compiled together (as they would be on paper), that would be a different matter. But they aren't.
Moving quotes out of the main text and then "including" them somehow is a technical gimmick that doesn't change anything: adding a quote also creates a derivative work.
You would no longer add a quote, but a reference to one, and the same logic as above would apply.
So yes, fair use quotes are technically violations of GFDL, but completely harmless.
Many copyright holders see things differently. Author Dan van der Vat, for example, was asked to pay 25 British pounds for quoting two sentences from Churchill's History of the Second World War in his book "The Atlantic Campaign". Sure: The legality is questionable. But don't kid yourself into believing that nobody would ever consider quotes infringing. Treating fair use of quotes and images entirely differently is hypocritical and wrong.
Regards,
Erik
--- Erik Moeller erik_moeller@gmx.de wrote:
The wiki-author doesn't add a picture, he adds a reference to a picture.
...with the intent and expectation that the image and the text be combined into a whole by the user's browser. It's a technical detail that this combining is done by the browser rather than by the server - had we used PDF rather than HTML as our distribution medium, then the combining would take place on the server.
So yes, fair use quotes are technically violations of GFDL, but completely harmless.
Many copyright holders see things differently. Author Dan van der Vat, for example, was asked to pay 25 British pounds for quoting two sentences from Churchill's History of the Second World War in his book "The Atlantic Campaign". Sure: The legality is questionable.
In other words: this would be laughed out of court.
But don't kid yourself into believing that nobody would ever consider quotes infringing. Treating fair use of quotes and images entirely differently is hypocritical and wrong.
It is neither, since short textual quotes are quite different from images in at least two respects relevant to fair use.
1) Quotes are typically a tiny fraction of the whole work, while images are typically 100% of the whole work.
2) There is no functioning market for the rights in short quotes, but there is a functioning market for the rights in images.
Now, I don't think Wikipedia is at any risk whatsoever: if somebody complains about an image, we simply take it down. We don't have money, so we won't get sued. The downstream users of our materials however may not share these luxuries, and in addition may have commercial interests which weakens their fair use defense considerably. In effect our fair use images shut out large classes of potential users of the encyclopedia.
Axel
__________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com
Axel-
The wiki-author doesn't add a picture, he adds a reference to a picture.
....with the intent and expectation that the image and the text be combined into a whole by the user's browser.
What matters in law is the text of the FDL, because that's what we're dealing with. The FDL states that aggregation with "separate and independent" works is acceptable. The wiki-author creates a combined work of the previous work and the image reference -- with the expectation that they will be *aggregated* on *some* users' systems. Texts and images are separate works, stored separately, sometimes transferred together, sometimes not. They do not constitute a single, individual work only because some browsers display them together. This case is explicitly treated in the FDL. Do you accept that there is a distinction between aggregation and combination? Then where does combination end and aggregation begin?
The classical example for aggregation is a CD-ROM of several works. But just like a Wikipedia text with an image, the individual works are almost certainly tied together via references and identifiers. Would a CD-ROM that displays dynamically copyrighted, keyword-associated pictures when an article is shown infringe the FDL? Hardly.
Many copyright holders see things differently. Author Dan van der Vat, for example, was asked to pay 25 British pounds for quoting two sentences from Churchill's History of the Second World War in his book "The Atlantic Campaign". Sure: The legality is questionable.
In other words: this would be laughed out of court.
Maybe so, but would Wikipedia go to court?
It is neither, since short textual quotes are quite different from images in at least two respects relevant to fair use.
- Quotes are typically a tiny fraction of the whole work, while images
are typically 100% of the whole work.
Even when they are "combined" with Wikipedia articles? ;-) At least now you are interpreting the term "work" reasonably. The next step is to do so not only when it suits your argument ..
- There is no functioning market for the rights in short quotes, but
there is a functioning market for the rights in images.
Mostly true, and we should be very careful in dealing with images that are part of this market. This is the essence of a reasonable fair use doctrine for Wikipedia: Try to figure out if someone may be interested in stopping distribution of picture X, and if so, do not include it (possibly with some rare exceptions of high political/historical significance).
Now, I don't think Wikipedia is at any risk whatsoever: if somebody complains about an image, we simply take it down. We don't have money, so we won't get sued. The downstream users of our materials however may not share these luxuries,
That's why we will provide the flags in the image table.
and in addition may have commercial interests which weakens their fair use defense considerably.
People who want to make money with Wikipedia can be expected to do some manual work.
In effect our fair use images shut out large classes of potential users of the encyclopedia.
So does the FDL. But we can make fair use optional.
Regards,
Erik
--- Erik Moeller erik_moeller@gmx.de wrote:
Axel-
The wiki-author doesn't add a picture, he adds a reference to a picture.
....with the intent and expectation that the image and the text be combined into a whole by the user's browser.
What matters in law is the text of the FDL, because that's what we're dealing with.
Actually, what matters is the meaning of "derivative work", which is defined in copyright law. If illustrating an article with a picture creates a derivative work, GFDL applies automatically to the whole.
This is the essence of a reasonable fair use doctrine for Wikipedia: Try to figure out if someone may be interested in stopping distribution of picture X, and if so, do not include it (possibly with some rare exceptions of high political/historical significance).
That does not make good policy: Wikipedia contributors have zero incentive to do this research. They want to illustrate an article, and quick. Finding the copyright holder and sending an email to ask whether there is a functioning market for digital reproduction of this picture would be way to much hassle. Should all fair-use pictures without this proof be deleted?
and in addition may have commercial interests which weakens their fair use defense considerably.
People who want to make money with Wikipedia can be expected to do some manual work.
"The free encyclopedia", as in "freely modifiable and redistributable", not "free for non-commercial use".
Axel
__________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com
[Replies should go only to <wikipedia-l>.]
Axel Boldt wrote in part:
Erik Moeller wrote:
The wiki-author doesn't add a picture, he adds a reference to a picture.
...with the intent and expectation that the image and the text be combined into a whole by the user's browser. It's a technical detail that this combining is done by the browser rather than by the server - had we used PDF rather than HTML as our distribution medium, then the combining would take place on the server.
I've decided that this is not at all a technical detail. It's part of the point of both our design and HTML's design that the same image can be dynamically combined with several different pieces of text. Indeed, if we used PDF and had to combine things on our server, *then* we might have an argument that this was merely a technical detail, in an attempt to wriggle out of the GFDL's restrictions. But with HTML, the technology is following the authors' intent precisely.
Author Dan van der Vat, for example, was asked to pay 25 British pounds for quoting two sentences from Churchill's History of the Second World War in his book "The Atlantic Campaign". Sure: The legality is questionable.
In other words: this would be laughed out of court.
Was it?
But don't kid yourself into believing that nobody would ever consider quotes infringing. Treating fair use of quotes and images entirely differently is hypocritical and wrong.
It is neither, since short textual quotes are quite different from images in at least two respects relevant to fair use.
- Quotes are typically a tiny fraction of the whole work, while images
are typically 100% of the whole work.
This, I think, is an important point. It came up long before in discussion of album covers. It seems doubtful that our usage of these images is truly "fair use" in the first place (a separate issue from whether it violates the GFDL). After all, the portion of *our* work that it constitutes is irrelevant (and that's still 100%, since the image is the entire work for us too); it's the portion of *their* work, and that's obviously 100%.
- There is no functioning market for the rights in short quotes, but
there is a functioning market for the rights in images.
Yes, this also affects "fair use" law in the US.
Now, I don't think Wikipedia is at any risk whatsoever: if somebody complains about an image, we simply take it down. We don't have money, so we won't get sued. The downstream users of our materials however may not share these luxuries, and in addition may have commercial interests which weakens their fair use defense considerably. In effect our fair use images shut out large classes of potential users of the encyclopedia.
It shuts out hardly any users, relatively speaking, since it doesn't shut out any readers or writers. What it shuts out is forkers, and others that would reproduce Wikipedia. This is why it's important that we not claim that all image files are covered under the GFDL, since many are no such thing. IOW, it's the separation of the free images from the proprietary ones that we need to be working on.
-- Toby
(This policy stuff should really be over on wikipedia-l, not here, so I am cc:'ing).
Toby Bartels wrote:
It came up long before in discussion of album covers. It seems doubtful that our usage of these images is truly "fair use" in the first place (a separate issue from whether it violates the GFDL).
It doesn't seem doubtful to me. See Kelley v. ArribaSoft on thumbnails: http://biotech.law.lsu.edu/cases/IP/copyright/kelly_v_arriba_soft.htm
And also: "On the other hand, in Nunez v. Caribbean International News Corp.,*fn22 the First Circuit found that copying a photograph that was intended to be used in a modeling portfolio and using it instead in a news article was a transformative use."
It's a very complicated issue, and although I've spent many hours reading court cases, I still can't say with any certainty on lots of questions.
But the album cover example seems pretty squarely fair use.
It shuts out hardly any users, relatively speaking, since it doesn't shut out any readers or writers. What it shuts out is forkers, and others that would reproduce Wikipedia. This is why it's important that we not claim that all image files are covered under the GFDL, since many are no such thing. IOW, it's the separation of the free images from the proprietary ones that we need to be working on.
This I agree with completely.
Also, *where possible*, and I think this is more cases than people commonly realize, we should be replacing fair use images with pure GNU FDL images.
--Jimbo
Axel Boldt wrote:
Now, I don't think Wikipedia is at any risk whatsoever: if somebody complains about an image, we simply take it down. We don't have money, so we won't get sued. The downstream users of our materials however may not share these luxuries, and in addition may have commercial interests which weakens their fair use defense considerably. In effect our fair use images shut out large classes of potential users of the encyclopedia.
I agree with this assessment. I would add to this a general concern that our use of images under "fair use" prevents the rise of a stronger demand for really free sources of images.
Jimmy-
I agree with this assessment. I would add to this a general concern that our use of images under "fair use" prevents the rise of a stronger demand for really free sources of images.
I believe this argument is fallacious, since most of our instances of fair use are in cases where it isn't realistically possible to get a "really free" source. We're not going to put up an image of a famous building as fair use, because we can get a Wikipedian to make a photograph. But we can't make free photographs of deceased celebrities in any appealing state.
Of course, we can try to get permissions (and we can, and should, try to do so with all of our fair use images), but our main problem with doing so is that we effectively require the copyright holders to relicense their works under the FDL or to put them in the public domain. Most people aren't going to do either.
Regards,
Erik
[Note: This is crossposted to <wikitech-l> and <wikipedia-l> for continuity. Replies should go to <wikipedia-l>, since it's a policy discussion.]
Axel Boldt wrote on <wikitech-l>:
Erik Moeller wrote:
I have not seen your response to my analysis that conluded that quotations are more problematic than images.
If I understood correctly, you argue that quotes are embedded in the text while images are kept in separate files, thus GFDL is not inherited by the photo but is inherited by the quotes. This is incorrect. Derivative work are required to be under GFDL; what constitutes a derivative work is defined by copyright law. The technical detail that text and images are typically kept in separate files is irrelevant; illustrating an article by adding a picture is a classical case of a derivative work. Moving quotes out of the main text and then "including" them somehow is a technical gimmick that doesn't change anything: adding a quote also creates a derivative work.
Including quotations would indeed be a technical gimmick, and our server would provide a single HTML file, a derivative work. That images are separate, however, is more than a technical detail; it's an important feature of HTTP that's used in other ways. Our copyright notice at the bottom of the page even refers only to "text"; a result of this feature is that the text is easily separated.
-- Toby
Axel Boldt wrote:
Next, once our foundation has money, we can try to acquire the copyright of selected important images we need and cannot get in any other way.
I don't think that that's the way to go. Once we start paying for material, the nature of the project would change drastically.
Ec
--- Erik Moeller erik_moeller@gmx.de wrote:
Tomasz-
No !!! Just delete all non-free images.
It would be silly not to allow the freedoms given to us by the law to the maximum possible extent.
By law, we could include texts which are exclusively licensed to us. Yet we don't do it: the goal isn't to produce a good encyclopedia (there are enough already), but a good encyclopedia that's free in as many senses as possible.
Axel
__________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com
Axel-
-+- Erik Moeller erik_moeller@gmx.de wrote:
Tomasz-
No !!! Just delete all non-free images.
It would be silly not to allow the freedoms given to us by the law to the maximum possible extent.
By law, we could include texts which are exclusively licensed to us. Yet we don't do it: the goal isn't to produce a good encyclopedia (there are enough already), but a good encyclopedia that's free in as many senses as possible.
^reasonably
Regards,
Erik
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Je Ĵaŭdo 29 Majo 2003 22:18, Lee Daniel Crocker skribis:
I don't think there's anything unresolved; clearly we CAN'T redistribute copyrighted photos under GFDL. But the issue of whether or not we can use them in Wikipedia is more complicated.
Aye, there's the rub. Can we modify a GFDL-licensed work by integrating material we *know* we cannot distribute under the GFDL license and *distribute the result under the GFDL license*?
It sure looks to me like we're blatantly voiding our own license, and that makes me reeeeeeal uncomfortable.
- -- brion vibber (brion @ pobox.com)
On Thu, May 29, 2003 at 09:46:53PM -0700, Brion Vibber wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Je ??a??do 29 Majo 2003 02:15, Alfio Puglisi skribis:
- Images: no images are present here. AFAIK, each of them has a SQL
record (that my script skips), but the actual image data is not included.
Many images uploaded to Wikipedia are third-parties' copyrighted IP being used under the vague claim of "fair use". The project has not yet received legal advice about whether such materials can be redistributed under the terms of the GFDL; until that's resolved I at least have no intention of putting them all in one easy download, for which a significant use would be reuse and redistribution by people like yourself trying to reuse Wikipedia material.
On Polish Wikipedia they would be immediately deleted. "Fair use" is well known to be incompatible with free software - it gives very limited rights.
Please detele all such images, so Wikipedia can stay free.
--- Tomasz Wegrzanowski taw@users.sourceforge.net wrote:
"Fair use" is well known to be incompatible with free software - it gives very limited rights.
Please delete all such images, so Wikipedia can stay free.
I agree. Illustrating an article by adding a photo creates a derivative work, therefore the whole has to be put under GFDL, therefore it cannot be fair use material (or anything besides public domain or GFDL).
Axel
__________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com
Brion Vibber wrote:
I know several prominent Wikipedians don't seem to care about this, but I think staying true to our all-reusable-all-redistributable license is a very important aspect of maintaining Wikipedia's credibility and fulfilling the project's goals.
I agree completely. Whenever we have a tough decision between expedience and redistributability _in the long run_, we should choose redistributability.
I am of the opinion, though not to the point of pushing strongly for enforcement just now, that it would be best if we removed a great many images that are currently in wikipedia. They are o.k. for us to use, on account of so-called "fair use", but they are not GNU-free, and this should trouble us.
--Jimbo
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Finally got a chance to look at Alfio's script, here's some non-copyright commentary. :)
Je Ĵaŭdo 29 Majo 2003 02:15, Alfio Puglisi skribis:
http://www.arcetri.astro.it/~puglisi/wiki/dump/ma/main_page.html
Looks nice! Cleaner interface than we have, too. ;)
filenames should be OK for most filesystems not "8.3" limited (max 63 chars, only a-z, 0-9 and underscore)
despite the two-letter subdirectories, some of them have over 4,000
files in them!
Letters aren't distributed evenly, alas... If you want to even that out, consider using a binary hash as the basis of the divisions. (We use the first one and two hex digits of the md5hash of the title/filename for the uploads and the rendered page cache, for instance.) They're not pretty, though.
- Time: the script takes more than 2 hours on my 1.3 Ghz Athlon...
Whee... I'm gonna have to get a faster cpu...
- Size: this dump is about 800MB. (tar.gz is just 110MB). I think
that I can bring it down to 600-650MB with a bit of trimming and eliminating unnecessary redirects. BUT, without some form of compression, the English wikipedia will soon overflow a single CD. Maybe we should target DVDs? :-)
Single CD would be preferable, of course, though a static HTML dump can target mirror sites which don't have that limitation as well.
Possibilities: * Nearly all (all?) browsers these days can read files sent in gzip encoding. But can we do this on the filesystem reliably, where we have no opportunity to send an encoding header? Browsers seem to treat .html.gz files as application/gzip and send them to an external app, and don't internally recognize a gzipped file if just named as .html
* Self-extracting JavaScript. :) I'm sure someone, somewhere has done this; if not it's worth it for the evil factor: rewrite gunzip in JavaScript, and have the content of the HTML files be a <script> tag with a big string and a call to the gunzip() function pulled in from a common .js file. Downsides are likely crappy performance and an inability to function in non-JavaScript browsers.
* Use transparent filesystem compression on the CD. I think only Linux supports this, and only if certain options are enabled in the kernel configuration; not portable, but might be nice for personal use.
* Ship a server. Java applet or light executable for many platforms which serves out the pages with appropriate encoding header. Downsides: hard to make portable, can't just browse the filesystem.
* Offline reader program with its own storage and display methods. Again, hard to make portable, can't just browse the filesystem. (cf http://meta.wikipedia.org/wiki/WINOR )
- Search: I tried a javascript search that worked well for small
sized databases: it's basically a big array of strings (article titles and filenames) with some lines code that do a regexp match against them. For full-sized databases like this one, the search page becomes an 8 megabytes monster that takes forever to process (IE grabs 100 MB of memory and stops there, Opera is even worse). I'll see if I can find a different solution.
Here are some thoughts, and since I'm not going to have time to implement this myself for a while take it or leave it as you like:
It may be better to go with something similar to MySQL's fulltext search index: break the titles into words, and associate words with lists of pages that contain them rather than full titles with their page names. Instead of regexping a hundred thousand strings, you'd only need to break the query into words, fetch the lists of pages for _just those words_, and intersect or union the results as desired.
Space/memory could probably be saved using prefix codes of some sort...
Also, you could break up the index into several smaller files so not all strings need to be loaded into memory. I don't recall JavaScript having an include() command, but in the worst case you could pull some kind of <frameset> or <iframe> thing and bring up the necessary sub-scripts in another frame.
- -- brion vibber (brion @ pobox.com)
On Sun, 1 Jun 2003, Brion Vibber wrote:
Je Ĵaŭdo 29 Majo 2003 02:15, Alfio Puglisi skribis:
http://www.arcetri.astro.it/~puglisi/wiki/dump/ma/main_page.html
Looks nice! Cleaner interface than we have, too. ;)
Being static, lots of the dynamic stuff and special pages didn't need to be on the topbar :-)
Letters aren't distributed evenly, alas... If you want to even that out, consider using a binary hash as the basis of the divisions. (We use the first one and two hex digits of the md5hash of the title/filename for the uploads and the rendered page cache, for instance.) They're not pretty, though.
I'll save this for when/if it becomes a real problem
- Size: this dump is about 800MB. (tar.gz is just 110MB).
[...]
Single CD would be preferable, of course, though a static HTML dump can target mirror sites which don't have that limitation as well.
The new version of the script (not online yet) produces a dump that, according to Nero, can be written on a 650MB cd. The main reasons are a smaller html template and the elimination of redirects (but they are still present for searches).
- Self-extracting JavaScript. :) I'm sure someone, somewhere has done
this; if not it's worth it for the evil factor: rewrite gunzip in JavaScript, and have the content of the HTML files be a <script> tag with a big string and a call to the gunzip() function pulled in from a common .js file. Downsides are likely crappy performance and an inability to function in non-JavaScript browsers.
So I wasn't the only one thinking about this :) Some days ago I did a little Google search, but found nothing. I'm also *sure* that someone has already written this. Again, I'll postpone it to a future version.
I should point out that the main reason for size bloat is the proliferation of small files. Combined with the 2048 bytes cluster size for cd-rom filesystems, it means that each article uses at least 2K, and an average of 1K is wasted on bigger files. Just counting bytes, the html version is around 490MB. So maybe some way to bundle files together (maybe using frames and #anchors, or now-you-see-it-now-you-don't effects in Javascript :) could pay off.
It may be better to go with something similar to MySQL's fulltext search index: break the titles into words, and associate words with lists of pages that contain them rather than full titles with their page names. Instead of regexping a hundred thousand strings, you'd only need to break the query into words, fetch the lists of pages for _just those words_, and intersect or union the results as desired.
Hmm, this seems neat. Now how many different words are there in the average Wikipedia dump? :)) And, a frameset would be necessary as in the next option, barring some black magic communication between Javascript pages.
Also, you could break up the index into several smaller files so not all strings need to be loaded into memory. I don't recall JavaScript having an include() command, but in the worst case you could pull some kind of
<frameset> or <iframe> thing and bring up the necessary sub-scripts in another frame.
This is what I was thinking to do as the next step. The include() can be hacked up, but memory would just add up to the original size. Some neat frameset should do the trick.
Next version online soon :-)
Ciao, Alfio
wikitech-l@lists.wikimedia.org