Lars Aronsson, 12/03/19 22:27:
Is spell checking (and a normal dictionary) the only
useful tool?
I'm not sure it's the only or most useful, but it's definitely common.
Would you count the number of spelling errors, or the
ratio
of errors to correct words? Has anyone done this?
It's routinely done by OCR software, and in fact if I remember correctly
such information is stored in DjVu files (the uncertain words are marked).
In practice, such information is most useful when preparing the files
for upload to Wikisource, because you can minimise your manual work by
checking that the OCR was mostly successful and if not try with
different settings. I thought we had such information in
<https://en.wikisource.org/wiki/Category:File_creation_help> but seems not.
We discussed this in the long past but I don't remember when.
<https://strategy.wikimedia.org/wiki/Proposal:Make_Wikisource_scale>
reminds me that <https://www.nla.gov.au/australian-newspaper-plan> had
an impressive crowdsourcing a decade ago, but I don't see whether it died.
OCR assessment is a well-researched issue so you'll find things like
<https://www.digitisation.eu/glossary/ground-truth/> but not so much
about how to organise a transcription project:
<http://succeed-project.eu/wiki/index.php/TPDL_Tutorial_State-of-the-art_tools_for_text_digitisation#2._OCR_and_Post-correction>
claims that VTL was the most promising method back then, but in 2015 we
found it was over in 2014.
<https://lists.wikimedia.org/pipermail/wikisource-l/2015-October/002516.html>
Federico