On Fri, Jul 25, 2008 at 2:16 AM, Tim Starling tstarling@wikimedia.org wrote:
John Vandenberg wrote:
Hi,
I've been told that a large percentage of the EB1911 sits within the history of English Wikipedia, and a during a recent discussion about EB1911 here few checks indicate that this is possibly true, and that the EB1911 text imported into Wikipedia is from a decent transcription. In the following very long discussion, there are a two tables consisting of five Wikipedia articles starting with "A" and "B", a link to the Wikipedia revision consisting of the EB1911 text, a link to the copy now on Wikisource, and a link to the pagescan (set up by Tim Starling):
http://en.wikipedia.org/wiki/Wikipedia_talk:Plagiarism
I am interested in piecing together the history of the EB1911 import, because if this was as extensive as some claim, hidden in Wikipedia is possibly the best and most complete available transcription of EB1911, and I would like to work out a good algorithm to pull it out and put it on Wikisource, which has slowly been building an online copy that is true to the original. Or maybe we can find whoever imported it, and re-use the import files.
This will benefit Wikipedia, as it will allow readers and editors to determine what parts of those Wikipedia article have not been altered since 1911, which will act as a caution flag for readers, and a todo item for editors. There is a WikiProject to go back and verify all of the articles imported from EB1911; this task can be better distributed if the task if the reader can see the original text without a degree in wiki-archeology.
http://en.wikipedia.org/wiki/WP:EB1911
The relevant Wikisource pages people may way to look at are:
http://en.wikisource.org/wiki/EB1911
and the "project page" for that effort is at
http://en.wikisource.org/wiki/WS:EB1911
and the complete set of scans in TIFF and PNG; I recommend installing the TIFF plugin, as those images are a joy to view and the plugin has a nice zoom interface.
The scan was never meant to just sit on my user page, Wikisource community members were meant to copy it to some relevant location and make links to it. They apparently didn't figure this out. I wrote the scanset extension for the benefit of all of Wikisource, not just for my user space.
You can find the details of the origin of the scan in my original mailing list post about the subject. The scan was made and distributed by a person who, for religious reasons, wanted to see this material disseminated as widely as possible. The scan was distributed as a CD set at low cost, and on the CD set, it was stated that there were no restrictions.
The contents of the CD were put up on a website, with the website's name discreetly overlaid in a corner of the TIFF image. A Wikipedian downloaded them and send them to me. I made a script to blank out the website name and convert the images to PNG. The result is the version that we currently host.
There are now many incoming URL links to those pagescan sets. If we move the index pages, and keep the redirect, will those links continue to function?
It would be lovely if we can integrate the EB1911 scanset with the Wikisource side-by-side proofreading interface, extension "Proofread Page". e.g.
http://en.wikisource.org/wiki/Index:H.R._Rep._No._94-1476 http://en.wikisource.org/wiki/Index:Wind_in_the_Willows_%281913%29.djvu http://en.wikisource.org/wiki/Index:A_Treatise_on_Electricity_and_Magnetism_...
-- John
wikisource-l@lists.wikimedia.org