[Wikisource-l] Does really wikisource need djvu/pdf files?

6 Jul 2019


      I like and I studies - as deeply I can - djvu file structure and DjvuLibre
routines; dealing with wikisource needs, I appreciate, but I like less, pdf
files for their complexity. Proofread procedure is presently based on djvu
or pdf files; but I see that another approach could be used, using only
simpler routines.
Proofreading procedure needs two inputs:
1. a set of good images of page scans;
2. a good mapped file of text content matched with images.
About "mapped text", there are two alternatives, hOCR and xml; both can be
used to extract "unmapped raw text" when needed at server level, but at
local level too by jQuery. If hOCR/xml of page text could be fastly and
simply accessed from nsPage, I see interesting opportunities - i.e.
generalized highlighting of selected text on nsPage image both in view and
in edit mode; formatting suggestions from heuristic analysis of word
coordinates; different organization of high level text structures, as wrong
column layout).
Alex brollo (it.wikisource)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikisource-l] Does really wikisource need djvu/pdf files?