Re: [Wikisource-l] [WikiEN-l] EB1911 in Wikipedia - Wikisource-l

24 Jul 2008

On Fri, Jul 25, 2008 at 2:16 AM, Tim Starling &lt;tstarling(a)wikimedia.org&gt; wrote:
...
  John Vandenberg wrote:
  Hi,

 I've been told that a large percentage of the EB1911 sits within the
 history of English Wikipedia, and a during a recent discussion about
 EB1911 here few checks indicate that this is possibly true, and that
 the EB1911 text imported into Wikipedia is from a decent
 transcription.  In the following very long discussion, there are a two
 tables consisting of five Wikipedia articles starting with "A" and
 "B", a link to the Wikipedia revision consisting of the EB1911 text, a
 link to the copy now on Wikisource, and a link to the pagescan (set up
 by Tim Starling):

 http://en.wikipedia.org/wiki/Wikipedia_talk:Plagiarism

 I am interested in piecing together the history of the EB1911 import,
 because if this was as extensive as some claim, hidden in Wikipedia is
 possibly the best and most complete available transcription of EB1911,
 and I would like to work out a good algorithm to pull it out and put
 it on Wikisource, which has slowly been building an online copy that
 is true to the original.  Or maybe we can find whoever imported it,
 and re-use the import files.

 This will benefit Wikipedia, as it will allow readers and editors to
 determine what parts of those Wikipedia article have not been altered
 since 1911, which will act as a caution flag for readers, and a todo
 item for editors.  There is a WikiProject to go back and verify all of
 the articles imported from EB1911; this task can be better distributed
 if the task if the reader can see the original text without a degree
 in wiki-archeology.

 http://en.wikipedia.org/wiki/WP:EB1911

 The relevant Wikisource pages people may way to look at are:

 http://en.wikisource.org/wiki/EB1911

   and the "project page" for that effort is at

 http://en.wikisource.org/wiki/WS:EB1911

   and the complete set of scans in TIFF and PNG; I recommend
 installing the TIFF plugin, as those images are a joy to view and the
 plugin has a nice zoom interface.

 http://en.wikisource.org/wiki/User:Tim_Starling

 The scan was never meant to just sit on my user page, Wikisource community
 members were meant to copy it to some relevant location and make links to
 it. They apparently didn't figure this out. I wrote the scanset extension
 for the benefit of all of Wikisource, not just for my user space.

 You can find the details of the origin of the scan in my original mailing
 list post about the subject. The scan was made and distributed by a person
 who, for religious reasons, wanted to see this material disseminated as
 widely as possible. The scan was distributed as a CD set at low cost, and
 on the CD set, it was stated that there were no restrictions.

 The contents of the CD were put up on a website, with the website's name
 discreetly overlaid in a corner of the TIFF image. A Wikipedian downloaded
 them and send them to me. I made a script to blank out the website name
 and convert the images to PNG. The result is the version that we currently
 host. 
There are now many incoming URL links to those pagescan sets.  If we
move the index pages, and keep the redirect, will those links continue
to function?

It would be lovely if we can integrate the EB1911 scanset with the
Wikisource side-by-side proofreading interface, extension "Proofread
Page".  e.g.

http://en.wikisource.org/wiki/Index:H.R._Rep._No._94-1476
http://en.wikisource.org/wiki/Index:Wind_in_the_Willows_%281913%29.djvu
http://en.wikisource.org/wiki/Index:A_Treatise_on_Electricity_and_Magnetism…

--
John