Bodhisattwa Mandal, 29/07/20 00:40:
Coming from a community with not much volunteer force,
I actually want any
strategy which involves minimal human interference into the tagging
process, as we can't afford to spread our thin line.
Anything that goes into the wikitext has an implicit cost for humans.
Already the templates we use for mere formatting and layout make it
costly to do relatively simple things such as "give me a plain text
version of the book", although they sometimes manage to make other
things easier (such as making a decent HTML version which may also work
in EPUB).
If the purpose of linking "persons, places, creative works, events" to
Wikidata is to provide marginally faster information to the average
reader browsing the Wikisource website, then you can do it with a
JavaScript gadget similar to the various Wiktionary gadgets which we've
had for a while, and take a probabilistic approach. If the purpose is
*disambiguation* (and attendant features like structured search), then
it's quite a different matter.
We probably can't afford an approach like METS/ALTO for any significant
number of works. Nowadays people do all sorts of fancy things with IIIF
but I'm not sure about detailed tagging at scale. The advantage of
something like an IIIF manifest is that you can store it separately from
whatever we have now, and "just" overlay it on the images (merging with
the wikitext and HTML is going to be harder; compare efforts by
Alex_brollo with hOCR/DjVu transfers).
You can probably imagine a relatively simple gadget to suggest possible
Wikidata items to connect to some parts of an image and let the user
confirm or not with a single click, then store the result in a JSON on a
wiki page. If it's designed for the Page namespace, maybe it can even be
enabled by default on a willing subdomain without disturbing casual
users. If some focus is determined (say, "depicts"-like statements for
illustrations in books), it might be possible to have some perceptible
progress with an edit drive à la Wikisource birthday prize, to attract
new users beyond the usual suspects and generate some enthusiasm.
Federico