[Wikisource-l] Scanned texts

John Vandenberg jayvdb at gmail.com
Fri Jan 4 12:34:57 UTC 2008


On Jan 4, 2008 9:28 AM, Yann Forget <yann at forget-me.net> wrote:
> Hello,
>
> This is now off topic for foundation-l.
> Better to continue this on wikisource-l.
>
> Klaus Graf wrote:
> >> Hello,
> >>
> >> I agree with Ray here, and I think that Klaus' mail does not report
> >> exactly the reality. The French Wikisource has the greatest numbers of
> >> scanned texts so far,
> >
> > Is there a proof for this claim?
>
> http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics
> lists 40,043 pages for fr.ws and 16,939 for de.ws.

I am surprised how few page scans are in the en namespace:

4883 of which 2792 are the NSRW.  Of the remaining 2091,
* 620 : Maxwell's Treatise on Electricity and Magnetism
* 368 : H.R. Rep. No. 94-1476
* 289 : The How and Why Library
* 200 :[[History of Iowa/Volume 4]] (google book; no images)
* 66 : Spirella Manual (1913)
* 59 : Faraday's Experimental researches in electricity
* 45 : Secret history of the French court under Richelieu and Mazarin
(google book; images on google)
+ 444 misc. others

Even those numbers are inflated, as I know that many of the pages in
Maxwell's treatise are empty shells that once contained only a header
before the days of Index: pages.

> http://fr.wikisource.org/wiki/Wikisource:Livres_disponibles_en_mode_page
> lists 62,326 scanned pages (not yet all ocred and proofread, and I am
> not sure that this page is up to date).

That number must include images that do not have Page:s, as there are
40046 articles in ns104 on frwikisource.

http://tools.wikimedia.de/~st47/cgi-bin/pagesinns.pl?server=3&db=frwikisource&ns=104

If en.ws was to include those, all of the EB1911 images could be included :-)

> >  but does not make mandatory to have them to
> >> publish a text there. It is only a suggestion, which many contributors
> >> follow.

40 thousand pages is a testament to the success of the fr.ws approach.

> >> I think that the important point is not scanned texts, but notation on
> >> whether and how the texts are proofread by editors, whatever means the
> >> editors use to proofread the texts.
> >
> > I am monitoring discussions on digitization projects as archival
> > professional since years. It's standard to give not only e-texts but
> > scans. Wikisource demands no scans when a permanent web adress (e.g.
> > library project) for the scans outside Commons is given.
> >
> > I think the average quality of other Wikisource branches is very poor.
> > In most cases there is no source given: one cannot know which source
> > is used, and for scholarly purposes the e-text is worthless.

While indicating a source is critical for scholarly purposes where
many editions exist, it is also often completely irrelevant when only
one edition is likely to exist.  There are many newspaper and journal
articles that are _only_ available online by way of Wikisource, and
even when not proof-read these are invaluable for scholarly purposes.

An example of a high quality transcription that is occurring without
page scans, see Journal of Discourses and EB1911.

http://en.wikisource.org/wiki/Journal_of_Discourses
http://en.wikisource.org/wiki/EB1911

Even in the case when many editions exist, a copy on Wikisource of
poor provenance is _not_ worthless - it is merely a work in progress -
anyone can improve it by pinning it down to one edition, and because
Wikisource is a WMF project, any scholar knows without asking that the
Wikisource community will welcome their involvement in our project.
As a result, poor copies that put Wikisource into google search
results do bring in new and valuable contributors.

Stipulating that imperfection is intolerable is inhumane.
Eradicating the imperfection is the path to perfection.

> I think we already have had this discussion earlier, but the
> misunderstanding continues.
> There are two different issues here. It is important not to mix them:
> 1. Scans provided alongside texts.
> 2. Notation of quality.
>
> Quality is not an absolute value. It is relative to the sources
> available for a given text. Quality does not have the same meaning for a
> text from 1920 and a text from the 15th century. So one should not talk
> about quality, but about notation of quality.
>
> I agree that giving the source is important and should be part of a
> quality notation. The most important is to have a clear notation so that
> readers know how and by whom the texts have been proofread. Scans alone
> are not a proof of quality, but they help getting a better quality. They
> are not the only way to get good quality texts. Some texts may be
> proofread by several contributors, so of very good qualilty, but
> Wikisource might not be able to have scanned images if a public domain
> edition is not easily avalaible.

Page scans are, for the most part, orthogonal to quality.  Meticulous
transcriptions of page scans of a collection poorly collated, or of a
rushed translation, is nothing short of a wasted effort.

On en.ws, we do need to increase the percentage of texts with page
scans.  Any suggestions on how to achieve this?

--
John



More information about the Wikisource-l mailing list