Daniel Pink's WIRED article about Wikipedia, "the self-organizing, self-repairing, hyperaddictive library of the future," has hit the shelves.
Titled "The Book Stops Here", the six-page piece opens with a picture of Jimbo gazing levelly over a large stack of Britannica volumes and -- are those the 2001 Florida Statues? It follows up with a set of beautiful sketches of six active wikipedians (Angela, Bryan Derksen, Carptrash, Kingturtle, Ram-Man, and Raul654), whose stories are woven into the article.
Pink deals quite well with the nuances and motivations of the en: community, and the Wikipedia healing factor. However he all but ignores other languages (the article's one real flaw), and makes no mention of Wikimedia, New York, or other gatherings. He also demonstrates a Pelligrinesque affection for the term "God-King" (the subject is a quote from the article).
More : http://blogs.law.harvard.edu/sj/2005/02/17#a797
Cheers, SJ
I think that's largely due to the fact that since English speakers as a group (not nessecarily on an individual level) are generally unaccommodating of other languages (look at the complaints that were received after the wikipedia.org domain was made to no longer redirect to en:) and believe the world revolves around them and them alone, thus "Wikipedia" to most English speakers means what I would call "en.wikipedia" or "The English Wikipedia".
On the other hand, I think most French speakers who are aware of the concept are referring to all versions when they say Wikipédia, at least much more often than are English speakers.
Thus, while it includes a few prolific editors of en:, it excludes entirely the big names on all other Wikipedias.
While the big players on the Icelandic Wikipedia would hardly go well into such an article, the big players on the German, French, Japanese, etc. Wikipedias would probably fit in very nicely.
I also notice there isn't much diversity, and by that I mean that all of those people I would put together in a single group as opposed to for example Ævar or those like him.
Mark
On Thu, 17 Feb 2005 16:07:56 -0500, Sj 2.718281828@gmail.com wrote:
Daniel Pink's WIRED article about Wikipedia, "the self-organizing, self-repairing, hyperaddictive library of the future," has hit the shelves.
Titled "The Book Stops Here", the six-page piece opens with a picture of Jimbo gazing levelly over a large stack of Britannica volumes and -- are those the 2001 Florida Statues? It follows up with a set of beautiful sketches of six active wikipedians (Angela, Bryan Derksen, Carptrash, Kingturtle, Ram-Man, and Raul654), whose stories are woven into the article.
Pink deals quite well with the nuances and motivations of the en: community, and the Wikipedia healing factor. However he all but ignores other languages (the article's one real flaw), and makes no mention of Wikimedia, New York, or other gatherings. He also demonstrates a Pelligrinesque affection for the term "God-King" (the subject is a quote from the article).
More : http://blogs.law.harvard.edu/sj/2005/02/17#a797
Cheers, SJ
-- http://en.wikipedia.org/wiki/User:Sj _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
Sj wrote:
Whaddayamean, the God-King is fallible!?
As an interesting data point relating to quality vs EB, I've been going through 1911EB articles thought not to be in WP (about half just need a redir), and cross-checking with present-day EB at the same time - it's remarkable how some present-day EB articles are nearly word-for-word identical with their 1911 versions, except for being shortened by leaving out detail, references, and citations, tsk tsk.
My current favorite is Felix Faure, who in 1911EB "died of apoplexy", in present-day EB "died suddenly", while in WP we also find out just what he was doing at the moment of death, heh-heh, plus we have an article on the woman he was doing it with, she having a life story of her own.
Stan
Stan Shebs wrote:
As an interesting data point relating to quality vs EB, I've been going through 1911EB articles thought not to be in WP (about half just need a redir), and cross-checking with present-day EB at the same time - it's remarkable how some present-day EB articles are nearly word-for-word identical with their 1911 versions, except for being shortened by leaving out detail, references, and citations, tsk tsk.
I've long felt that the 1911EB should be carried in its entirety on Wikisource, as should the 1921 12th edition. (The 12th was just a repeat of the 11th edition with 3 supplementary volumes, which probably explains why it is so seldom mentioned.) That would be an enormous task. Aside from the first volume which is in Project Guttenberg, the "Love to know" version that is now on line is bloody awful. The people who put that project together appear to have forgotten about proofreading. Their creative content to support any copyright claims lies primarily in their massive collection of OCR typos, and in their random selection of excluded pages.
It also needs to be remarked that the original 1911EB was full of illustrations which the present online version omits, and mathematical and chemical expressions that are totally garbled by treating them as ordinary text. The 1911EB also had many interesting maps, including many on fold-out pages which would be a particular challenge. This is in part because of the way that maps were drawn in 1911. Their way of showing mountainous territory combined with a desire to show as vany small hamlets as possible often gives a cluttered appearance to these maps.
In brief if we were ever to take this on seriously, we could have a much better product than what is already on line.
Ec
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Ray Saintonge schrieb:
I've long felt that the 1911EB should be carried in its entirety on Wikisource, as should the 1921 12th edition. (The 12th was just a repeat of the 11th edition with 3 supplementary volumes, which probably explains why it is so seldom mentioned.) That would be an enormous task.
Maybe we can get a "real" copy of the 1921 edition to WikiMania, and have scanners/digital cameras manned with volunteers. Once we have everything as images, we can OCR/type/proofread online in a distributed fashion. We're kinda good at that :-)
Magnus
Alternatively, we could store it as images online. That would take a lot of bandwidth, but... it would preserve the original much much better.
Mark
On Fri, 18 Feb 2005 23:49:19 +0100, Magnus Manske magnus.manske@web.de wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Ray Saintonge schrieb:
I've long felt that the 1911EB should be carried in its entirety on Wikisource, as should the 1921 12th edition. (The 12th was just a repeat of the 11th edition with 3 supplementary volumes, which probably explains why it is so seldom mentioned.) That would be an enormous task.
Maybe we can get a "real" copy of the 1921 edition to WikiMania, and have scanners/digital cameras manned with volunteers. Once we have everything as images, we can OCR/type/proofread online in a distributed fashion. We're kinda good at that :-)
Magnus -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFCFnDvCZKBJbEFcz0RAmn+AJ9Vel8h34gL37EGno1ES7Sb9FtKcQCfW9yY jEdSswlpmeGa86ZEhU9PgT0= =EYeR -----END PGP SIGNATURE----- _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
Mark Williamson (node.ue@gmail.com) [050219 09:55]:
On Fri, 18 Feb 2005 23:49:19 +0100, Magnus Manske magnus.manske@web.de wrote:
Ray Saintonge schrieb:
I've long felt that the 1911EB should be carried in its entirety on Wikisource, as should the 1921 12th edition. (The 12th was just a repeat of the 11th edition with 3 supplementary volumes, which probably explains why it is so seldom mentioned.) That would be an enormous task.
Maybe we can get a "real" copy of the 1921 edition to WikiMania, and have scanners/digital cameras manned with volunteers. Once we have everything as images, we can OCR/type/proofread online in a distributed fashion. We're kinda good at that :-)
Alternatively, we could store it as images online. That would take a lot of bandwidth, but... it would preserve the original much much better.
An A4 page at 300dpi is 8.7 MB; at 600dpi it's 34.8 MB. How many pages are there? Time to buy a 1000-stack of DVD-Rs ;-)
- d.
David Gerard wrote:
Mark Williamson (node.ue@gmail.com) [050219 09:55]:
On Fri, 18 Feb 2005 23:49:19 +0100, Magnus Manske magnus.manske@web.de wrote:
Ray Saintonge schrieb:
I've long felt that the 1911EB should be carried in its entirety on Wikisource, as should the 1921 12th edition. (The 12th was just a repeat of the 11th edition with 3 supplementary volumes, which probably explains why it is so seldom mentioned.) That would be an enormous task.
Maybe we can get a "real" copy of the 1921 edition to WikiMania, and have scanners/digital cameras manned with volunteers. Once we have everything as images, we can OCR/type/proofread online in a distributed fashion. We're kinda good at that :-)
Alternatively, we could store it as images online. That would take a lot of bandwidth, but... it would preserve the original much much better.
An A4 page at 300dpi is 8.7 MB; at 600dpi it's 34.8 MB. How many pages are there? Time to buy a 1000-stack of DVD-Rs ;-)
The 1911EB has 29 volumes, of which the last is an index. Each volume has about 1,000 pages. Add three volumes for the supplements, and we have a mere 32,000 pages. David, was your estimate based on colour scanning? Wouldn't monochrome scanning take less space? There are very few colour pages.
The online images is an interesting proposal. Not long ago on the list there was mention of a Swedish project that includes both a scanned and OCR version of a page. A scanned version is helpful for maintaining the integrity of a text; a character recognized is better for applying search functions and annotations. The French Gallica collection in pdf can be a tremendous resource, but is difficult to use. There are some interesting points to be explored, such as how much can the system handle.
The idea of having a bunch of volunteers working away in public view at Wikimania to put something of the size of EB12 on line has great publicity appeal, especially if these volunteers are at it round the clock. Whether the EB should be the only work treated that way at Wikimania should remain an open question. Perhaps the scanned works should be in several languages. :-)
Ec
--- Ray Saintonge saintonge@telus.net wrote:
The idea of having a bunch of volunteers working away in public view at Wikimania to put something of the size of EB12 on line has great publicity appeal, especially if these volunteers are at it round the clock. Whether the EB should be the only work treated that way at Wikimania should remain an open question. Perhaps the scanned works should be in several languages. :-)
I think this is a wonderful idea. But to work we would need some software features that currently do not exist (scanned image side-by-side with wiki page; each in its own HTML frame).
-- mav
__________________________________ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250
Daniel Mayer wrote:
I think this is a wonderful idea. But to work we would need some software features that currently do not exist (scanned image side-by-side with wiki page; each in its own HTML frame).
Doesn't the Gutenberg Distributed Proofreaders project already do this sort of thing?
http://www.pgdp.net/c/default.php
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Doesn't the Gutenberg Distributed Proofreaders project already do this sort of thing?
Yup, and we're actively working on EB11. At the moment, the following are posted (finished):
http://www.gutenberg.org/etext/200 -- Vol. 1 [*] http://www.gutenberg.org/etext/13600 -- Vol. 2, Part 1, Slice 1
Various other parts of many volumes are in progress. If you'd like to help, visit http://www.pgdp.net/ and register a new account. You should probably start with some beginners projects to familiarise yourself with our Proofreading Guidelines, and then you're welcome to work on whatever you like, including the current EB volume being proofed: Vol 6 #3 (Chiton to Cincinnati). While you're there, you can join Team Wikipedia ;).
Mark Williamson wrote:
Alternatively, we could store it as images online. That would take a lot of bandwidth, but... it would preserve the original much much better.
When we're working on a particular section, the images are necessarily stored online. After a project is posted to Project Gutenberg, the scans are archived to a different server so that they can still be referred to for corrections. It's planned that One Day the archived images will be publicly accessible.
Michael
[*] Note: Volume 1 was posted as the Project Gutenberg Encyclopedia due to potential trademark issues with the use of the EB name that are now resolved.
Michael Ciesielski wrote:
Brion Vibber wrote:
Doesn't the Gutenberg Distributed Proofreaders project already do this sort of thing?
Yup, and we're actively working on EB11. At the moment, the following are posted (finished):
http://www.gutenberg.org/etext/200 -- Vol. 1 [*] http://www.gutenberg.org/etext/13600 -- Vol. 2, Part 1, Slice 1
Various other parts of many volumes are in progress. If you'd like to help, visit http://www.pgdp.net/ and register a new account. You should probably start with some beginners projects to familiarise yourself with our Proofreading Guidelines, and then you're welcome to work on whatever you like, including the current EB volume being proofed: Vol 6 #3 (Chiton to Cincinnati). While you're there, you can join Team Wikipedia ;).
Another approach to proofreading is to have it done independently from the beginning by two different people and using a file compare function to compare the results. The benefit here is that it reduces the possibility that one person might be influenced by another's errors.
I looked at the volume 1 material, notably at "Algae", but the illustrations are not there. How does PG plan to deal with illustrations? How searchable is the PG version? Would I be able to easily find an article without downloading the whole volume? The Algae article includes a "q.v." to "Bryophyta". Does PG anticipate that I* would be able to follow that link with a simple click of the mouse?
Mark Williamson wrote:
Alternatively, we could store it as images online. That would take a lot of bandwidth, but... it would preserve the original much much better.
When we're working on a particular section, the images are necessarily stored online. After a project is posted to Project Gutenberg, the scans are archived to a different server so that they can still be referred to for corrections. It's planned that One Day the archived images will be publicly accessible.
I would support something like that.
Ec
Ray Saintonge wrote:
Another approach to proofreading is to have it done independently from the beginning by two different people and using a file compare function to compare the results. The benefit here is that it reduces the possibility that one person might be influenced by another's errors.
Double-key entry works very well for typed material because the errors made by typists are random. The biggest problem with proofreading OCRed material isn't the errors introduced, but the errors already in the text that are missed ("he" instead of "be"; "arid" instead of "and"; etc.). Because of this, it makes the most sense to have the proofreading rounds build on the work of the previous round.
I looked at the volume 1 material, notably at "Algae", but the illustrations are not there. How does PG plan to deal with illustrations? How searchable is the PG version? Would I be able to easily find an article without downloading the whole volume? The Algae article includes a "q.v." to "Bryophyta". Does PG anticipate that I* would be able to follow that link with a simple click of the mouse?
Volume 1 was not done by DP. The HTML version of Volume 2, http://snowy.arsc.alaska.edu/gutenberg/1/3/6/0/13600/13600-h/13600-h.htm is a better model for how the rest of the EB will be done. Illustrations are included, but linking q.v.'s won't be feasible until the entire encyclopedia is finished. Luckily, thanks to the wonders of modern technology, it's fairly easy to jump to the entry you want :)
Michael
Brion Vibber wrote:
Daniel Mayer wrote:
I think this is a wonderful idea. But to work we would need some software features that currently do not exist (scanned image side-by-side with wiki page; each in its own HTML frame).
Doesn't the Gutenberg Distributed Proofreaders project already do this sort of thing?
It does, but what does it do with the raw scanned version after the proofreading is done? For much of what we have now what we and PG do is adequate, and it probably satisfies the needs of the average person who wants to read those texts. For literary works in Roman script and other highly textual material there should be little problem, but we would like the illustrations too in the scientific works.
The availability of the original scans would be our guarantee of the authenticity of our content. That's a significant factor in times when credibility is a big complaint about anything that appears on the net. When the proofreading for a book has been completed a text file should remain the primary avenue to a work, but we would give a reader the opportunity to see what the work really looked like when it was printed. The graphic version could probably be on a low traffic server.
Ec
Ray Saintonge (saintonge@telus.net) [050219 11:22]:
David Gerard wrote:
An A4 page at 300dpi is 8.7 MB; at 600dpi it's 34.8 MB. How many pages are there? Time to buy a 1000-stack of DVD-Rs ;-)
The 1911EB has 29 volumes, of which the last is an index. Each volume has about 1,000 pages. Add three volumes for the supplements, and we have a mere 32,000 pages. David, was your estimate based on colour scanning? Wouldn't monochrome scanning take less space? There are very few colour pages.
Greyscale (one byte per pixel, 210 x 297 / 2.54 / 2.54 * 300 * 300 pixels). You could probably reduce it to four bits per pixel. I wouldn't suggest going to three. I did this last year scanning in a pile of stuff.
The online images is an interesting proposal. Not long ago on the list there was mention of a Swedish project that includes both a scanned and OCR version of a page. A scanned version is helpful for maintaining the integrity of a text; a character recognized is better for applying search functions and annotations. The French Gallica collection in pdf can be a tremendous resource, but is difficult to use. There are some interesting points to be explored, such as how much can the system handle.
Project Gutenberg and Distributed Proofreaders have web-based software that apparently does the job well. Scan on one side, OCR on the other, correct. Do one page at a time, highly parallelisable.
The idea of having a bunch of volunteers working away in public view at Wikimania to put something of the size of EB12 on line has great publicity appeal, especially if these volunteers are at it round the clock. Whether the EB should be the only work treated that way at Wikimania should remain an open question. Perhaps the scanned works should be in several languages. :-)
Could be very nice. And very geeky. And get EB to say nasty things about us again ;-)
- d.
On Sat, Feb 19, 2005 at 12:23:08PM +1100, David Gerard wrote:
Ray Saintonge (saintonge@telus.net) [050219 11:22]:
David Gerard wrote:
An A4 page at 300dpi is 8.7 MB; at 600dpi it's 34.8 MB. How many pages are there? Time to buy a 1000-stack of DVD-Rs ;-)
The 1911EB has 29 volumes, of which the last is an index. Each volume has about 1,000 pages. Add three volumes for the supplements, and we have a mere 32,000 pages. David, was your estimate based on colour scanning? Wouldn't monochrome scanning take less space? There are very few colour pages.
Greyscale (one byte per pixel, 210 x 297 / 2.54 / 2.54 * 300 * 300 pixels). You could probably reduce it to four bits per pixel. I wouldn't suggest going to three. I did this last year scanning in a pile of stuff.
Why should it be kept uncompressed, or compressed by throwing away the lowest bits ? There are pretty good algorithms for compressing text, like djvu.
Anyway, the 32000 pages in 8-bpp 600 dpi take just about 1 terabyte. That's not much.
Ray Saintonge wrote:
An A4 page at 300dpi is 8.7 MB; at 600dpi it's 34.8 MB. How many pages are there? Time to buy a 1000-stack of DVD-Rs ;-)
The 1911EB has 29 volumes, of which the last is an index. Each volume has about 1,000 pages. Add three volumes for the supplements, and we have a mere 32,000 pages. David, was your estimate based on colour scanning? Wouldn't monochrome scanning take less space? There are very few colour pages.
Sorry, this discussion is painfully clueless. If you really want to, you can learn the basics of document imaging on your own in a week by using Google. Get a scanner, an OCR program, download some free software and start to play around.
Digitizing encyclopedias can be done. No magic. A typical volume of 800 pages might take 160 megabytes in images and 5 megabytes in plain text. You will want to cut the spine off the books and use a sheet feeding scanner. No sweat. Old encyclopedias are cheap on Ebay or in your local second hand shop, since no sane person would buy one and the insane usually have less money.
However, I doubt that this should be a part of Wikipedia / Wikimedia. Distributed Proofreaders and the Internet Archive are already doing important parts of what you ask for. I think Wikipedia should consume the results, not produce them.
For example, while digitizing yet another year run (1893) of a Norwegian engineering journal the other day, I found a nice illustration http://runeberg.org/tekuke/1893/0161.html that I cut out and uploaded to the Wikimedia Commons, and used in http://en.wikipedia.org/wiki/Architect
Instead of inventing the wheel all over again, perhaps you should get involved in Distributed Proofreaders and help improve their system.
The online images is an interesting proposal. Not long ago on the list there was mention of a Swedish project that includes both a scanned and OCR version of a page.
Hi there!
Haven't read the article yet, but I thought "benevolent dictator" was the term of choice. Not that I think Jimbo isn't God-like.
Google: site:mail.wikipedia.org benevolent dictator - 16 hits site:mail.wikipedia.org god-king - 1 hit
On Thu, 17 Feb 2005 16:07:56 -0500, Sj 2.718281828@gmail.com wrote:
Daniel Pink's WIRED article about Wikipedia, "the self-organizing, self-repairing, hyperaddictive library of the future," has hit the shelves.
Titled "The Book Stops Here", the six-page piece opens with a picture of Jimbo gazing levelly over a large stack of Britannica volumes and -- are those the 2001 Florida Statues? It follows up with a set of beautiful sketches of six active wikipedians (Angela, Bryan Derksen, Carptrash, Kingturtle, Ram-Man, and Raul654), whose stories are woven into the article.
Pink deals quite well with the nuances and motivations of the en: community, and the Wikipedia healing factor. However he all but ignores other languages (the article's one real flaw), and makes no mention of Wikimedia, New York, or other gatherings. He also demonstrates a Pelligrinesque affection for the term "God-King" (the subject is a quote from the article).
More : http://blogs.law.harvard.edu/sj/2005/02/17#a797
Cheers, SJ
-- http://en.wikipedia.org/wiki/User:Sj _______________________________________________ WikiEN-l mailing list WikiEN-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikien-l
wikipedia-l@lists.wikimedia.org