Is Wikipedia the largest "free content" website? i.e. website consisting primarily of free content.
The only competitors that I can think of are
1. Project Gutenberg, however they have a few free-gratis etexts sprinkled through their collection.
2. Million Books Project http://www.ulib.org/
-- John Vandenberg
John Vandenberg, 08/07/2011 08:26:
- Project Gutenberg, however they have a few free-gratis etexts
sprinkled through their collection.
- Million Books Project http://www.ulib.org/
What about the Internet Archive? They certainly have much more free content than us, if you just count bytes.
Nemo
On Fri, Jul 8, 2011 at 5:00 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
What about the Internet Archive? They certainly have much more free content than us, if you just count bytes.
The Internet Archive has many subprojects which are non-free content (wayback machine) and dubious content (Open source books).
Their Live Music Archive and Moving image collection may be bigger in terms of bytes.
I'm less confident in the Moving image collection, as they dont explain why the items are PD. e.g.
http://en.wikipedia.org/wiki/Wikipedia:Media_copyright_questions/Archive/201...
John Vandenberg, 08/07/2011 09:31:
The Internet Archive has many subprojects which are non-free content (wayback machine) and dubious content (Open source books).
They have 1.5 millions books (texts) only in the 1800-1922 range. http://www.archive.org/search.php?query=mediatype%3A%28texts%29%20AND%20date%3A[1800-01-01%20TO%201922-12-31]
Nemo
From having looked through Internet Archive's live music collection, I doubt
much, if any of it, would be considered free by our definition. They only seem to care that the artist was fine with being recorded at the show, and there's certainly no release to do anything you want to do with the recording, like there would have to be with a CC-BY-SA release.
The music there is free beer, but you couldn't say, use it commercially or sell albums of it without falling afoul of the copyright law.
On Fri, Jul 8, 2011 at 3:31 AM, John Vandenberg jayvdb@gmail.com wrote:
On Fri, Jul 8, 2011 at 5:00 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
What about the Internet Archive? They certainly have much more free content than us, if you just count bytes.
The Internet Archive has many subprojects which are non-free content (wayback machine) and dubious content (Open source books).
Their Live Music Archive and Moving image collection may be bigger in terms of bytes.
I'm less confident in the Moving image collection, as they dont explain why the items are PD. e.g.
http://en.wikipedia.org/wiki/Wikipedia:Media_copyright_questions/Archive/201...
-- John Vandenberg
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Fri, Jul 8, 2011 at 08:26, John Vandenberg jayvdb@gmail.com wrote:
- Million Books Project http://www.ulib.org/
LOTS of copyrighted and dubious content, by random checking.
g
Yes, and I'm sure Wikipedia also has lots of copyrighted and dubious content, as hard as we try...
2011/7/8 Peter Gervai grinapo@gmail.com
On Fri, Jul 8, 2011 at 08:26, John Vandenberg jayvdb@gmail.com wrote:
- Million Books Project http://www.ulib.org/
LOTS of copyrighted and dubious content, by random checking.
g
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On 8 July 2011 09:20, M. Williamson node.ue@gmail.com wrote:
Yes, and I'm sure Wikipedia also has lots of copyrighted and dubious content, as hard as we try...
We're reaching the stage of arguing category membership. This suggests stepping back:
John, what do you anticipate as the useful purpose for the answer to the original question of "largest free content site"?
(There may be no non-fuzzy answer.)
- d.
Well, I just think any repository that lets some non-free works slip through the cracks by accident, can't suddenly be disqualified unless we're ready to disqualify Wikipedia too. So what category do we fit into that they do not?
2011/7/8 David Gerard dgerard@gmail.com
On 8 July 2011 09:20, M. Williamson node.ue@gmail.com wrote:
Yes, and I'm sure Wikipedia also has lots of copyrighted and dubious content, as hard as we try...
We're reaching the stage of arguing category membership. This suggests stepping back:
John, what do you anticipate as the useful purpose for the answer to the original question of "largest free content site"?
(There may be no non-fuzzy answer.)
- d.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Fri, Jul 8, 2011 at 8:16 PM, M. Williamson node.ue@gmail.com wrote:
Well, I just think any repository that lets some non-free works slip through the cracks by accident, can't suddenly be disqualified unless we're ready to disqualify Wikipedia too. So what category do we fit into that they do not?
I wouldn't disqualify any project which is effectively employing best efforts to remove non-free content.
My limited browsing of Open Source Books project indicates it is mostly junk and has a very high percentage of non-free content, and that it doesn't appear that they are staying on top of it.
The Million Book Project is a bit better, but they often don't include sufficient metadata and I've seen many works with a year of publication that is post 1950 yet "pre-1923" is used as the public domain justification.
Here are two that I noted as copyrighted back in May. http://www.archive.org/details/DeskWorkEnglishGrammer http://www.archive.org/details/PearsCyclopaedia
Here is one of a series that I noted as copyrighted back in 2008. http://www.archive.org/details/americasmusic030111mbp
-- John Vandenberg
On Fri, Jul 8, 2011 at 13:04, John Vandenberg jayvdb@gmail.com wrote:
The Million Book Project is a bit better, but they often don't include sufficient metadata and I've seen many works with a year of publication that is post 1950 yet "pre-1923" is used as the public domain justification.
Or they actually list that it's copyrighted and show only 15% of the content.
g
On Fri, Jul 8, 2011 at 7:38 PM, David Gerard dgerard@gmail.com wrote:
On 8 July 2011 09:20, M. Williamson node.ue@gmail.com wrote:
Yes, and I'm sure Wikipedia also has lots of copyrighted and dubious content, as hard as we try...
We're reaching the stage of arguing category membership. This suggests stepping back:
John, what do you anticipate as the useful purpose for the answer to the original question of "largest free content site"?
(There may be no non-fuzzy answer.)
By "free content", I mean [[free content]], which can include fair-use/fair-dealing, and possibly including a small proportion of non-free if it is well described.
Wikipedia is the most popular free content website.
I'm wondering if it is also safe to say that Wikipedia is the largest free content website.
-- John Vandenberg
On 8 July 2011 07:26, John Vandenberg jayvdb@gmail.com wrote:
Is Wikipedia the largest "free content" website? i.e. website consisting primarily of free content.
The only competitors that I can think of are
- Project Gutenberg, however they have a few free-gratis etexts
sprinkled through their collection.
- Million Books Project http://www.ulib.org/
In terms of raw data the answer is no. wikimedia commons is larger.
Other than that you are probably mostly looking at US goverment stuff. The patent office for example.
Right. NARA has 5 billion pages of PD content online, as I learned this morning. Is it 'a website'?
The Internet Archive and many others include roughly 2 million PD books, just under 1 billion pages of text.
Flickr Commons has many times the # of free content photos of WM Commons.
What sets WM projects apart is the amount of curation and collection management that has gone into it, making the vast majority of the work both well and consistently categorized, revised and improved where possible to remove duplicates and mistakes, and sifted to filter up material that is both useful for general education, and can be cross connected or linked with other such material [via both internal links and citations].
In that sense, Wikimedia is the largest online project I know of.
But if we wanted to make wikisource a repository for all free content licensed material available anywhere online, it would become 100x to 1000x the size of all other projects combined.
SJ
On Fri, Jul 8, 2011 at 3:19 PM, geni geniice@gmail.com wrote:
On 8 July 2011 07:26, John Vandenberg jayvdb@gmail.com wrote:
Is Wikipedia the largest "free content" website? i.e. website consisting primarily of free content.
The only competitors that I can think of are
- Project Gutenberg, however they have a few free-gratis etexts
sprinkled through their collection.
- Million Books Project http://www.ulib.org/
In terms of raw data the answer is no. wikimedia commons is larger.
Other than that you are probably mostly looking at US goverment stuff. The patent office for example.
-- geni
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On 8 July 2011 16:47, Samuel Klein meta.sj@gmail.com wrote:
Right. NARA has 5 billion pages of PD content online, as I learned this morning. Is it 'a website'?
Do you have a cite for that? Could probably be added to:
http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons
Dominic - this is from the Archivist's speech today. Is there a handy cite?
S
On Fri, Jul 8, 2011 at 4:38 PM, geni geniice@gmail.com wrote:
On 8 July 2011 16:47, Samuel Klein meta.sj@gmail.com wrote:
Right. NARA has 5 billion pages of PD content online, as I learned this morning. Is it 'a website'?
Do you have a cite for that? Could probably be added to:
http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons
-- geni
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On 7/8/11 4:40 PM, Samuel Klein wrote:
Dominic - this is from the Archivist's speech today. Is there a handy cite?
S
On Fri, Jul 8, 2011 at 4:38 PM, genigeniice@gmail.com wrote:
On 8 July 2011 16:47, Samuel Kleinmeta.sj@gmail.com wrote:
Right. NARA has 5 billion pages of PD content online, as I learned this morning. Is it 'a website'?
Do you have a cite for that? Could probably be added to:
Actually, I think the point that David Ferriero was making, to give you a sense of the immensity of their digitization struggle, was that that is the size of their *holdings*.Their digital collections are not even be in the millions yet; the current official number is 153,000 (documents, so the page count could still be much higher) digitized and described at the item-level in the catalog, though there may be some thousands more not in the catalog in online exhibits. They do, of course, have an increasing number of born-digital documents as well. It's a huge undertaking. As I mentioned earlier today, only 68% of the holdings of National Archives are even cataloged, and many of these are not even item-level descriptions, so they are not even at the point yet where they know everything they have. Some statistics: http://www.archives.gov/research/arc/about-arc.html.
Dominic
wikimedia-l@lists.wikimedia.org