ThomasV wrote:
> The new
Ajax-OCR service does not use a robot to create pages;
it is installed at 11
subdomains.
see OCR.js at
http://wikisource.org/wiki/Wikisource:Shared_Scripts
So, it is activated for the Norwegian Wikisource.
But when I tried it, only garbage comes out:
Expected: Rom, forstår ni? — he he! — Nå ja, der nere i det
Got: Rom, forstiir ni? — he he! ·— Néja, der nere idet,
Maybe the OCR is set for French, not for Norwegian, since
the output is full of accented é but no Norwegian æøå.
The idea is very nice, but how do we make this work for
Scandinavian and other languages? What OCR engine do you use?
I believe there is a help page at en.ws, that
describes how
to update the text layer of a djvu file. once you've done this,
you just need to upload the modified djvu as a new version
of the file. The fact that image coordinates are lost in the
process is not a problem for wikisource.
For Wikisource, losing coordinates is not a problem.
But for Wikisource, updating the Djvu is not an issue.
Wikisource is fine with having the text in the Page:
namespace on Wikisource. It is others, external users
who might want an updated Djvu file, and they might
care about searching for a word and finding it in the
right position of the image.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se
Wikimedia Sverige - stöd fri kunskap -
http://wikimedia.se/