I think that it's time to bring this issue up:
how can we manage efficiently born-digital documents?
I think Wikisource has developed a pretty amazing workflow to deal
with digitized documents, transcribing and proofreading them with the
Proofread extension.
But every time we use on Wikisource this extension to transcribe back
a PDF (for thesis, CC-BY-SA books, etc.), I feel something is wrong.
We don't have many tools (or if we have they are spread out in different places)
to extract automatically formatted text from them, and afaik we can't
take a LaTeX source and simply upload that on Source.
What do you think? Could we discuss the issue?
Aubrey
2011/6/26 Samuel Klein <meta.sj(a)gmail.com>om>:
Here's an example of a remarkable publication that
we should support
capturing, in its elements and its final layout, to support reuse and
sharing in other sorts of documents:
http://skateistan.org/skateistan_blog/out-now-student-mag-arts-skateboarding
http://www.skateistan.org/PDFs/Bridge-Final.pdf
We need to improve automation for adding these sorts of things to
wikisource: scripts to request and capture license information, and to
batch upload PDFs, extracting individual images and text from source
files, uploading them separately, and approximating the original
layout.
Sam.
--
identi.ca:sj w:user:sj +1 617 529 4266
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l