Re: [Wikisource-l] [WikiEN-l] EB1911 in Wikipedia - Wikisource-l

25 Jul 2008


      On Fri, Jul 25, 2008 at 2:16 AM, Tim Starling tstarling@wikimedia.org wrote:
...
John Vandenberg wrote:
...
Hi,
I've been told that a large percentage of the EB1911 sits within the
history of English Wikipedia, and a during a recent discussion about
EB1911 here few checks indicate that this is possibly true, and that
the EB1911 text imported into Wikipedia is from a decent
transcription.  In the following very long discussion, there are a two
tables consisting of five Wikipedia articles starting with "A" and
"B", a link to the Wikipedia revision consisting of the EB1911 text, a
link to the copy now on Wikisource, and a link to the pagescan (set up
by Tim Starling):
http://en.wikipedia.org/wiki/Wikipedia_talk:Plagiarism
I am interested in piecing together the history of the EB1911 import,
because if this was as extensive as some claim, hidden in Wikipedia is
possibly the best and most complete available transcription of EB1911,
and I would like to work out a good algorithm to pull it out and put
it on Wikisource, which has slowly been building an online copy that
is true to the original.  Or maybe we can find whoever imported it,
and re-use the import files.
This will benefit Wikipedia, as it will allow readers and editors to
determine what parts of those Wikipedia article have not been altered
since 1911, which will act as a caution flag for readers, and a todo
item for editors.  There is a WikiProject to go back and verify all of
the articles imported from EB1911; this task can be better distributed
if the task if the reader can see the original text without a degree
in wiki-archeology.
http://en.wikipedia.org/wiki/WP:EB1911
The relevant Wikisource pages people may way to look at are:
http://en.wikisource.org/wiki/EB1911
and the "project page" for that effort is at
http://en.wikisource.org/wiki/WS:EB1911
and the complete set of scans in TIFF and PNG; I recommend
installing the TIFF plugin, as those images are a joy to view and the
plugin has a nice zoom interface.
http://en.wikisource.org/wiki/User:Tim_Starling
The scan was never meant to just sit on my user page, Wikisource community
members were meant to copy it to some relevant location and make links to
it. They apparently didn't figure this out. I wrote the scanset extension
for the benefit of all of Wikisource, not just for my user space.
You can find the details of the origin of the scan in my original mailing
list post about the subject. The scan was made and distributed by a person
who, for religious reasons, wanted to see this material disseminated as
widely as possible. The scan was distributed as a CD set at low cost, and
on the CD set, it was stated that there were no restrictions.
The contents of the CD were put up on a website, with the website's name
discreetly overlaid in a corner of the TIFF image. A Wikipedian downloaded
them and send them to me. I made a script to blank out the website name
and convert the images to PNG. The result is the version that we currently
host.
There are now many incoming URL links to those pagescan sets.  If we
move the index pages, and keep the redirect, will those links continue
to function?
It would be lovely if we can integrate the EB1911 scanset with the
Wikisource side-by-side proofreading interface, extension "Proofread
Page".  e.g.
http://en.wikisource.org/wiki/Index:H.R._Rep._No._94-1476
http://en.wikisource.org/wiki/Index:Wind_in_the_Willows_%281913%29.djvu
http://en.wikisource.org/wiki/Index:A_Treatise_on_Electricity_and_Magnetism_...
--
John