ThomasV wrote:
The new Ajax-OCR service does not use a robot to create pages;
it is installed at 11 subdomains. see OCR.js at http://wikisource.org/wiki/Wikisource:Shared_Scripts
So, it is activated for the Norwegian Wikisource. But when I tried it, only garbage comes out:
Expected: Rom, forstår ni? — he he! — Nå ja, der nere i det
Got: Rom, forstiir ni? — he he! ·— Néja, der nere idet,
Maybe the OCR is set for French, not for Norwegian, since the output is full of accented é but no Norwegian æøå.
The idea is very nice, but how do we make this work for Scandinavian and other languages? What OCR engine do you use?
I believe there is a help page at en.ws, that describes how to update the text layer of a djvu file. once you've done this, you just need to upload the modified djvu as a new version of the file. The fact that image coordinates are lost in the process is not a problem for wikisource.
For Wikisource, losing coordinates is not a problem. But for Wikisource, updating the Djvu is not an issue. Wikisource is fine with having the text in the Page: namespace on Wikisource. It is others, external users who might want an updated Djvu file, and they might care about searching for a word and finding it in the right position of the image.