[Wikisource-l] On linking Wikisource with page images

Ray Saintonge saintonge at telus.net
Tue Jan 22 21:05:51 UTC 2008


Birgitte SB wrote:
> I think you are trying to
> do far too much with the same piece of text. 
> Perfectly readable/editable wikimarkup and exactly
> macthing OCR text are not possible with the same text.
>   I suggest you find a way to hack having the text
> existing twice in the proofreading page.  Something
> like below:
>
> <!-- Here is text with OCR breaks and hyphens which
> matches the printed page-->
>
> Here is the wikimarkup text that is trancluded to the
> WS page
>
> Of course this means both sets of text need to be
> proofread, but I think a script should be able
> highlight all the differences between them making it
> simple to proofread one from the other.  If you really
> want to have only one version of the text, it will
> have to have the exactness of OCR sacrificed. People
> will always go through the markup "fixing" the
> hyphens.
The whole proposal seems to come into the realm of biting off more than 
we can chew.  I can give ThomasV's approach to having all material 
backed up by page scans full marks for what it sets out to do, but that 
still doesn't change the fact that some editors still find it more 
convenient to sub-optimally upload entire books from Project Gutenberg 
with little more additional effort than breaking off chapters into 
separate pages and adding headers.  Unless we can get real people to do 
tedious but relatively non-technical tasks such as proofreading, how can 
we ever convince them to remain consistent  with technical tasks whose 
benefits are far fom obvious.

Eighteenth century scientific texts may have done well with only a  
single printing, but more popular works that had multiple editions 
present a challenge unless we can declare a particular printing to be 
canonical.  The best printing for this may not be easily or cheaply 
available. As an example, I have an alomost complete set of the Ticknor 
and Fields version of the works of Thomas De Quincey.  In the course of 
putting this together I ended up with apparently duplicate volumes.  In 
the case of the second volume of the "Theological Essays" I have both an 
1854 and an 1864 printing.  The 1854 edition goes  to page 276 and the 
1864 edition to page 315.  The later edition adds an essay missing from 
the earlier. 

The first three lines of page 71 of the 1864 printing from "Toilette of 
the Hebrew Lady" and ending a paragraph read

    "the precious stones; and at other times, the pearls
    were strung two and two, and their beautiful white-
    ness relieved by the interposition of red coral."

In the 1854 printing the same text appears as lines 27-9 of page 69, 
except that "whiteness" now appears fully on the middle line without 
hyphenation.  Footnotes that were at the end of an essay in 1854 are 
moved to the proper page in 1864.

At one time, if a second printing was needed, it was easier and cheaper 
to reset the type, with all the attedent errors that one might imagine.  
Labour was cheap, and manufactured type very expensive.

Ec



More information about the Wikisource-l mailing list