[Wikisource-l] Scanned texts

ThomasV thomasV1 at gmx.de
Sun Jan 6 16:07:21 UTC 2008


Yann Forget wrote:
> Hello,
>
> This is now off topic for foundation-l.
> Better to continue this on wikisource-l.
>
> Klaus Graf wrote:
>   
>>> Hello,
>>>
>>> I agree with Ray here, and I think that Klaus' mail does not report
>>> exactly the reality. The French Wikisource has the greatest numbers of
>>> scanned texts so far,
>>>       
>> Is there a proof for this claim?
>>     
>
> http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics
> lists 40,043 pages for fr.ws and 16,939 for de.ws.
>
>   

This is completely false; the German Wikisource have a high
number of texts with scans that are not counted because they
are not in their Page: namespace. This is because they have
used a different tool before the availability of the Page: extension.
Since the activation of the extension on de.ws, they have been
converting old texts to the new format, but they are far from
having finished. There are still plenty of pages on de.ws that
contain several scanned pages (and sometimes or hundreds
of them). These pages need to be split and moved into the Page
namespace. Once this is finished, you'll be able to make statements
based on those figures.

In addition, it is fair to estimate than more than 50% of the scans
on fr.ws have never been proofread by a wikisource user. This
contrasts with the 8.700 pages on de.ws that have been proofread
once, and 11.400 that have been proofread twice.

The German WS adopted a policy of making scans mandatory, which
explains why they are more advanced. I believe that this is the only
sensible policy for Wikisource, and it should be generalized to all
subdomains. Without scans, a text on wikisource will always look
suspicious, no matter how carefully it was proofread. Without scans,
our work cannot be trusted, no matter how much fun, pain or pleasure
we had formatting it.

The German wikisource adopted their policy early, which gave
them a quality advantage. It is now time for other subdomains
to catch up.

Thomas



More information about the Wikisource-l mailing list