Hi,
As the Wikimedia Summit 2019 ended recently, it would be great if the
representative from Wikisource User Group can share his experience and
submit a report.
Regards
Bodhisattwa
Hi all,
Wikimania is in less than a month!
A lot of things related to Wikisource will happen there!
Here is a quick overview:
There will be a room with a full day of presentations about Transcription,
most of them being about Wikisource (see
https://wikimania.wikimedia.org/wiki/2019:Program and
https://wikimania.wikimedia.org/wiki/2019:Transcription) and a meetup (
https://wikimania.wikimedia.org/wiki/2019:Meetups/Wikisource feel free to
sign up! and if you have suggestion of topics to talk about, please add
them to the Agenda section).
I can't wait to see some of you there, and for the others we will share as
much as we can (at least a report).
Cheers,
~nicolas
I like and I studies - as deeply I can - djvu file structure and DjvuLibre
routines; dealing with wikisource needs, I appreciate, but I like less, pdf
files for their complexity. Proofread procedure is presently based on djvu
or pdf files; but I see that another approach could be used, using only
simpler routines.
Proofreading procedure needs two inputs:
1. a set of good images of page scans;
2. a good mapped file of text content matched with images.
About "mapped text", there are two alternatives, hOCR and xml; both can be
used to extract "unmapped raw text" when needed at server level, but at
local level too by jQuery. If hOCR/xml of page text could be fastly and
simply accessed from nsPage, I see interesting opportunities - i.e.
generalized highlighting of selected text on nsPage image both in view and
in edit mode; formatting suggestions from heuristic analysis of word
coordinates; different organization of high level text structures, as wrong
column layout).
Alex brollo (it.wikisource)