Good to know. I consulted the website of ABBYY and it say one option is an "Open license for local use on workstations", but I guess it's not a FLOSS license, unfortunately.

By the way, what is the state of the affair regarding Indic languages?

Do we have a central page documenting existing OCR pipeline used by the wikisource community?

What should I say to a contributor which come to me asking "I have this old PD book in my personnal library that I would like to digitalize, share and proofread in Wikisource, where should I start?". Do we have an online service, for example on tool labs, which enable to either upload or simply input url of a facsimile and that launch the OCR for example backed on tesseract?

Shouldn't we update our roadmap[1], or is there a more up to date document elsewhere?

[1] https://meta.wikimedia.org/wiki/Wikisource_roadmap


Le 13/04/2018 à 08:28, Nahum Wengrov a écrit :
I use ABBYY Finereader, don't remember the exact version (probably 12 or 11). I bought it a few years ago and it works perfectly for my language (Hebrew).

On Fri, Apr 13, 2018 at 2:22 AM, mathieu stumpf guntz <psychoslave@culture-libre.org> wrote:

Thank you Nahum,

Could you indicate which OCR solution you are using?


Le 26/03/2018 à 17:27, Nahum Wengrov a écrit :
I frequently work offline on he.wikisource. I download the entire pdf file from commons to my hard drive, and OCR the page I need myself. One can use the OCR of wikisource and download the text too, I guess, page by page. Then I proof the text in a Word document, open to the lower half of my screen, with the pdf open on the upper half of the screen, where I go to the page I need with acrobat reader, and scroll both windows down or up as needed.

On Mon, Mar 26, 2018 at 11:21 AM, mathieu stumpf guntz <psychoslave@culture-libre.org> wrote:
Le 24/03/2018 à 16:22, billinghurst a écrit :
Though that would defeat the purpose of online proofreading with account verification. Some of the true value of our online process is that contribution builds a level of trust and knowledge and that is reflected in both our patrolling and the allocation of autopatrolled status.
How providing tools to make batch work offline would interfere in anyway with that? Once the work is done, it can be uploaded to Wikisource with whichever account the user want.

Actually, to my mind, the main benefit of the online aspect is the peer to peer production model. Also there is no need of a central node carrying accounts to take into account the trust given to a particular contributor. There is digital signature technologies such as gpg for example. Having a central node with a web interface just makes things easier for most users, it doesn't improve the trustability of the environment. On the contrary, with a single point of failure, we actually rely on a weaker solution on this regard.

 Also how would you have access to templates, and components like that from off-line?
Well, that just show how innefecient are this tools to continue to contribute while being offline. It's allways possible to install Mediawiki and download required templates, but currently this process seems way to complicated, doesn't it.


Also we generally cannot download the images separately as that is usually part of the later clean-up where people have the technical skills.
I'm afraid the term "image" misguided your answer. It's seems you interpreted that as picture elements from files, while I was talking about this files themselves.

So yes, there is the capacity to have the text and proofread the text, that actual checking the text against the image is not the sole component of proofreading, and further it would not be at all helpful for validation.
There is nothing magic about working directly in a browser. People do download and upload all the required material anyway, but on a page per page base. The result is just as valid as it is done when transactions are operated on a file repository level.

Cheers

_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l




_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l