Lars is right, (too) little changed in several years; so let me say that
my opinion has not changed since 2009 when I wrote "Make Wikisource
scale" (which I dared to link from
https://meta.wikimedia.org/wiki/Role_of_Wikisource#footer ).
The one and only question worth asking is: can Wikisource, as a
concept, proofread a million books and involve half a million
volunteers? Because IMHO it must.
When I think of this, I agree that OCR is the main issue. But it's not
necessarily the one which worries me most, because tesseract is
something living outside the wiki which can be improved even if the wiki
has design issues. If we try really hard, we may face unsolvable
integration problems in the OCR<->DjVU<->Wikisource food chain; but so
far the issue is rather that we never tried seriously.[1]
What worries me most is something else: all the effort we spend making
perfectly loyal layouts with fragile templates, which are worth NOTHING
outside our wiki; all the effort we spend organising books scattered
across pages, to form a structure that not even MediaWiki knows
about,[2] let alone an ePub exporter[3] or OAI-PMH handle[4] or third
party user. I don't care if VisualEditor can make those templates easier
to use, I care about things like making Proofread Page understand
METS[5] or perhaps making sure what we're doing can end up in a DocBook[6].
We might discover that these things only require small adjustments, or
that they don't matter that much. Or we might discover that one of the
tools linked by Vigneron (which I didn't manage to try yet) requires a
fundamental shift. Either way, we need to reason about it to be
confident we're on the right track, and/or maybe pioneer some new way of
working in one subdomain.
However, in 5 years I've yet to find ONE person that says, yes Nemo,
you're right, Wikisource should be 10 or 50 times as big as Wikipedia,
let's plan for that. Probably I'm wrong. :)
Nemo
[1]
https://www.mediawiki.org/wiki/CAPTCHA
[2] Will it ever?
https://meta.wikimedia.org/wiki/Book_management
[3] Despite the recently-trashed work by PediaPress, and all Tpt's
awesomeness with WSexport.
[4] Though,
https://www.mediawiki.org/wiki/Extension:Proofread_Page#OAI-PMH
[5]
https://lists.wikimedia.org/pipermail/wikisource-l/2014-September/002081.ht…
[6]
https://phabricator.wikimedia.org/T63047#679332