Lars Aronsson, 12/03/19 22:27:
Is spell checking (and a normal dictionary) the only useful tool?
I'm not sure it's the only or most useful, but it's definitely common.
Would you count the number of spelling errors, or the ratio of errors to correct words? Has anyone done this?
It's routinely done by OCR software, and in fact if I remember correctly such information is stored in DjVu files (the uncertain words are marked).
In practice, such information is most useful when preparing the files for upload to Wikisource, because you can minimise your manual work by checking that the OCR was mostly successful and if not try with different settings. I thought we had such information in https://en.wikisource.org/wiki/Category:File_creation_help but seems not.
We discussed this in the long past but I don't remember when. https://strategy.wikimedia.org/wiki/Proposal:Make_Wikisource_scale reminds me that https://www.nla.gov.au/australian-newspaper-plan had an impressive crowdsourcing a decade ago, but I don't see whether it died.
OCR assessment is a well-researched issue so you'll find things like https://www.digitisation.eu/glossary/ground-truth/ but not so much about how to organise a transcription project: http://succeed-project.eu/wiki/index.php/TPDL_Tutorial_State-of-the-art_tools_for_text_digitisation#2._OCR_and_Post-correction claims that VTL was the most promising method back then, but in 2015 we found it was over in 2014. https://lists.wikimedia.org/pipermail/wikisource-l/2015-October/002516.html
Federico