There's on the web an interesting suggestion about difference between djvu
and pdf. The question was: how I can get hOCR from hidden layer of a pdf
file? The reply: convert pdf in djvu, then all wik be simple (more or
less). This comes from the fact that anything into a djvu file is open and
"simply" accessible, just as anything into a pdf is difficult and obscure.
Djvu is wiki, pdf isn't. I don't know any other open format that implements
searchable hidden text underlying page image.
But as a first step, incredible djvu opportunities should be *actively
explored and used*! If you use a car simply as a hen-house, never driving
it, any standard and effective hen-house is similar, or more effective, in
your opinion.
Alex
2018-04-06 15:45 GMT+02:00 Federico Leva (Nemo) <nemowiki(a)gmail.com>om>:
Peter Meyer, 06/04/2018 14:59:
Could we distill these issues online on a wiki
page somewhere? Or is it
already done?
(1) what are the significant differences between pdf and djvu (or some
new version of djvu that we could imagine coming up with)
I agree this is important to outline. For instance, is there some
Wikisource where PDF files are actively discouraged in favour of DjVu, and
for what reasons?
Which DjVu features we dream of using within 5 years, which PDF doesn't
provide? Do we want a system where libraries can feed us with DjVu files,
the proofread text gets ingested back to the DjVu file and libraries can
reuse it? Do we want to use some of the low level features of the text
layer to widely deploy some dark magic, such as the captcha-based
proofreading we talked about many times or some other interaction between
MediaWiki and the scans? What "market" is there for such features?
DjVu became our favourite format back at the time when the upload size
limit was around 10 MiB, if I remember correctly, and compression was the
most important factor. I often find myself explaining why it's such a
useful format, but in the end if someone asks me "so, is it fine to just
upload a PDF at Wikisource?" I have a hard time giving an answer other than
"sure, don't worry, it will be the same".
Federico
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l