Hello,
This is now off topic for foundation-l. Better to continue this on wikisource-l.
Klaus Graf wrote:
Hello,
I agree with Ray here, and I think that Klaus' mail does not report exactly the reality. The French Wikisource has the greatest numbers of scanned texts so far,
Is there a proof for this claim?
http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics lists 40,043 pages for fr.ws and 16,939 for de.ws.
http://fr.wikisource.org/wiki/Wikisource:Livres_disponibles_en_mode_page lists 62,326 scanned pages (not yet all ocred and proofread, and I am not sure that this page is up to date).
but does not make mandatory to have them to
publish a text there. It is only a suggestion, which many contributors follow.
I think that the important point is not scanned texts, but notation on whether and how the texts are proofread by editors, whatever means the editors use to proofread the texts.
I am monitoring discussions on digitization projects as archival professional since years. It's standard to give not only e-texts but scans. Wikisource demands no scans when a permanent web adress (e.g. library project) for the scans outside Commons is given.
I think the average quality of other Wikisource branches is very poor. In most cases there is no source given: one cannot know which source is used, and for scholarly purposes the e-text is worthless.
I think we already have had this discussion earlier, but the misunderstanding continues. There are two different issues here. It is important not to mix them: 1. Scans provided alongside texts. 2. Notation of quality.
Quality is not an absolute value. It is relative to the sources available for a given text. Quality does not have the same meaning for a text from 1920 and a text from the 15th century. So one should not talk about quality, but about notation of quality.
I agree that giving the source is important and should be part of a quality notation. The most important is to have a clear notation so that readers know how and by whom the texts have been proofread. Scans alone are not a proof of quality, but they help getting a better quality. They are not the only way to get good quality texts. Some texts may be proofread by several contributors, so of very good qualilty, but Wikisource might not be able to have scanned images if a public domain edition is not easily avalaible.
Klaus Graf
Regards,
Yann
On Jan 4, 2008 9:28 AM, Yann Forget yann@forget-me.net wrote:
Hello,
This is now off topic for foundation-l. Better to continue this on wikisource-l.
Klaus Graf wrote:
Hello,
I agree with Ray here, and I think that Klaus' mail does not report exactly the reality. The French Wikisource has the greatest numbers of scanned texts so far,
Is there a proof for this claim?
http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics lists 40,043 pages for fr.ws and 16,939 for de.ws.
I am surprised how few page scans are in the en namespace:
4883 of which 2792 are the NSRW. Of the remaining 2091, * 620 : Maxwell's Treatise on Electricity and Magnetism * 368 : H.R. Rep. No. 94-1476 * 289 : The How and Why Library * 200 :[[History of Iowa/Volume 4]] (google book; no images) * 66 : Spirella Manual (1913) * 59 : Faraday's Experimental researches in electricity * 45 : Secret history of the French court under Richelieu and Mazarin (google book; images on google) + 444 misc. others
Even those numbers are inflated, as I know that many of the pages in Maxwell's treatise are empty shells that once contained only a header before the days of Index: pages.
http://fr.wikisource.org/wiki/Wikisource:Livres_disponibles_en_mode_page lists 62,326 scanned pages (not yet all ocred and proofread, and I am not sure that this page is up to date).
That number must include images that do not have Page:s, as there are 40046 articles in ns104 on frwikisource.
http://tools.wikimedia.de/~st47/cgi-bin/pagesinns.pl?server=3&db=frwikis...
If en.ws was to include those, all of the EB1911 images could be included :-)
but does not make mandatory to have them to
publish a text there. It is only a suggestion, which many contributors follow.
40 thousand pages is a testament to the success of the fr.ws approach.
I think that the important point is not scanned texts, but notation on whether and how the texts are proofread by editors, whatever means the editors use to proofread the texts.
I am monitoring discussions on digitization projects as archival professional since years. It's standard to give not only e-texts but scans. Wikisource demands no scans when a permanent web adress (e.g. library project) for the scans outside Commons is given.
I think the average quality of other Wikisource branches is very poor. In most cases there is no source given: one cannot know which source is used, and for scholarly purposes the e-text is worthless.
While indicating a source is critical for scholarly purposes where many editions exist, it is also often completely irrelevant when only one edition is likely to exist. There are many newspaper and journal articles that are _only_ available online by way of Wikisource, and even when not proof-read these are invaluable for scholarly purposes.
An example of a high quality transcription that is occurring without page scans, see Journal of Discourses and EB1911.
http://en.wikisource.org/wiki/Journal_of_Discourses http://en.wikisource.org/wiki/EB1911
Even in the case when many editions exist, a copy on Wikisource of poor provenance is _not_ worthless - it is merely a work in progress - anyone can improve it by pinning it down to one edition, and because Wikisource is a WMF project, any scholar knows without asking that the Wikisource community will welcome their involvement in our project. As a result, poor copies that put Wikisource into google search results do bring in new and valuable contributors.
Stipulating that imperfection is intolerable is inhumane. Eradicating the imperfection is the path to perfection.
I think we already have had this discussion earlier, but the misunderstanding continues. There are two different issues here. It is important not to mix them:
- Scans provided alongside texts.
- Notation of quality.
Quality is not an absolute value. It is relative to the sources available for a given text. Quality does not have the same meaning for a text from 1920 and a text from the 15th century. So one should not talk about quality, but about notation of quality.
I agree that giving the source is important and should be part of a quality notation. The most important is to have a clear notation so that readers know how and by whom the texts have been proofread. Scans alone are not a proof of quality, but they help getting a better quality. They are not the only way to get good quality texts. Some texts may be proofread by several contributors, so of very good qualilty, but Wikisource might not be able to have scanned images if a public domain edition is not easily avalaible.
Page scans are, for the most part, orthogonal to quality. Meticulous transcriptions of page scans of a collection poorly collated, or of a rushed translation, is nothing short of a wasted effort.
On en.ws, we do need to increase the percentage of texts with page scans. Any suggestions on how to achieve this?
-- John
On en.ws, we do need to increase the percentage of texts with page scans. Any suggestions on how to achieve this?
-- John
The biggest stumbling block at en.WS is that we already had so many texts before the Page: feature. Many of the most commonly available files already exist under a seperate edition. I started to do some Shakespeare (Romeo and Juliet) for example. I found a nice djvu file (my preference for working with) and uploaded to Commons. Soon I realized the text was completely different. I started adding it as a seperate title but it is abandoned for the momoment. The original idea of being able to check the endless IP revisions to Shakespeare was my primary motivation and it turned out to be worthless for that.
So if we really want to prioritize scans, on en.WS we are naturally going to end up overwriting existing texts. Which is not necessarily a problem so long as we go about it carefully. Both in regard to peoples attachment to past work and to ensure we are not replacing texts with inferior editions or replacing highly proofed text with OCR.
The path of least resistance, which I always favor, would be to start with a quality collection of files of promenant texts. I like .djvu files because you do not need to upload individual pages to Commons. Does anyone have any suggestions about an existing collection of scans available that we might want to start with?
Birgitte SB
____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs
Yann Forget wrote:
Hello,
This is now off topic for foundation-l. Better to continue this on wikisource-l.
Klaus Graf wrote:
Hello,
I agree with Ray here, and I think that Klaus' mail does not report exactly the reality. The French Wikisource has the greatest numbers of scanned texts so far,
Is there a proof for this claim?
http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics lists 40,043 pages for fr.ws and 16,939 for de.ws.
This is completely false; the German Wikisource have a high number of texts with scans that are not counted because they are not in their Page: namespace. This is because they have used a different tool before the availability of the Page: extension. Since the activation of the extension on de.ws, they have been converting old texts to the new format, but they are far from having finished. There are still plenty of pages on de.ws that contain several scanned pages (and sometimes or hundreds of them). These pages need to be split and moved into the Page namespace. Once this is finished, you'll be able to make statements based on those figures.
In addition, it is fair to estimate than more than 50% of the scans on fr.ws have never been proofread by a wikisource user. This contrasts with the 8.700 pages on de.ws that have been proofread once, and 11.400 that have been proofread twice.
The German WS adopted a policy of making scans mandatory, which explains why they are more advanced. I believe that this is the only sensible policy for Wikisource, and it should be generalized to all subdomains. Without scans, a text on wikisource will always look suspicious, no matter how carefully it was proofread. Without scans, our work cannot be trusted, no matter how much fun, pain or pleasure we had formatting it.
The German wikisource adopted their policy early, which gave them a quality advantage. It is now time for other subdomains to catch up.
Thomas
I don't really think the German policy is the only way to go, although I appreciate their results. In English especially there are a large number of scans available online already with various terms of use or copyright claimed. And if the en.WS contributor lives in the UK that copyright is quite enforceable. en.WS does have texts which are proofread against scanned editions by multiple people even though the scan itself is not hosted on en.WS or Commons, but rather linked to on the talkpage. I don't exactly know exactly how the German policy reads but you all seem to be talking about scans which will eventually be in the Page namespace. en.WS also has texts which have been proofed through other organizations like Project Gutenburg. Although I have nothing against the Germans doing things in such a way, I would hardly support en.WS adopting this policy. It not the only way to secure quality texts.
In all seriousness en.WS will lagging behind other wikis in quality for the near future. This is due to sheer numbers as well as the "dumping ground" the origins of en.WS. I anticipate en.WS improving at a pace that doesn't burnout contributors, not necessarily "catching up" with anyone else. I congratulate de.WS on their success and hope they are able to translate it into breaking ground in other areas as well.
Birgitte SB
--- ThomasV thomasV1@gmx.de wrote:
Yann Forget wrote:
Hello,
This is now off topic for foundation-l. Better to continue this on wikisource-l.
Klaus Graf wrote:
Hello,
I agree with Ray here, and I think that Klaus'
mail does not report
exactly the reality. The French Wikisource has
the greatest numbers of
scanned texts so far,
Is there a proof for this claim?
http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics
lists 40,043 pages for fr.ws and 16,939 for de.ws.
This is completely false; the German Wikisource have a high number of texts with scans that are not counted because they are not in their Page: namespace. This is because they have used a different tool before the availability of the Page: extension. Since the activation of the extension on de.ws, they have been converting old texts to the new format, but they are far from having finished. There are still plenty of pages on de.ws that contain several scanned pages (and sometimes or hundreds of them). These pages need to be split and moved into the Page namespace. Once this is finished, you'll be able to make statements based on those figures.
In addition, it is fair to estimate than more than 50% of the scans on fr.ws have never been proofread by a wikisource user. This contrasts with the 8.700 pages on de.ws that have been proofread once, and 11.400 that have been proofread twice.
The German WS adopted a policy of making scans mandatory, which explains why they are more advanced. I believe that this is the only sensible policy for Wikisource, and it should be generalized to all subdomains. Without scans, a text on wikisource will always look suspicious, no matter how carefully it was proofread. Without scans, our work cannot be trusted, no matter how much fun, pain or pleasure we had formatting it.
The German wikisource adopted their policy early, which gave them a quality advantage. It is now time for other subdomains to catch up.
Thomas
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikisource-l
____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
wikisource-l@lists.wikimedia.org