You need to be cautious talking about "PDF" documents, as it is not the document presentation format, it is the source of the text. So I like to talk as the source being digitally prepared (and not requiring validation, though may require formatting), or OCR'd (requiring validation, and probably formatting.)
If you are talking about how we represent digitally prepared text with the validation process. I would have no issue with the text being ripped and having a bot run through and taking it straight to level 4 (green), and then redefining green to say validated, or digitally prepared text not requiring validation.
At the same time, if someone proposed and generates a fifth colour to represent digitally prepared text not requiring proofreading, then I will be happy with that. It may make someone happier in being a truer representation, but in the end to me it is a moot point. In the end, each of those is a local community decision, though one that should be made in consideration of how the other wikis interpret their processes.
Regards, Billinghurst
On Tue, 11 Jun 2013 15:12:41 -0400, David Cuenca dacuetu@gmail.com wrote:
@Billinghurst, I think Aubrey was referring mainly to pdf files, which sometimes have text and format but they are not that easy to represent
in
Wikisource. The main problem is that our current workflow always assume that we are going to proofread a text and have it stored as a web page.
@others: for me it doesn't matter much if the representation of the metadata is done by a template, an index page, or something different (maybe related to the new Extension:BookManager?) However I think that from the user point of view it is better to have a consistent system that can handle:
- representation of book/source metadata
- give access to export/visualization options
I'm preparing a document with some ideas that we can discuss here.
Micru
On Tue, Jun 11, 2013 at 7:48 AM, billinghurst billinghurst@gmail.comwrote:
On Tue, 11 Jun 2013 12:16:54 +0530, "Aarti K. Dwivedi" ellydwivedi2093@gmail.com wrote:
A slighly off-topic question: Even if we modify the extension to
proofread
books which do not have scans( I am assuming books that were born
digital
), against what will these books be proofread?
I am not sure why we are looking to proofread a digital only file,
unless
of course it never had a text layer and it had to be OCR'd.
Proofreading
surely only relates to scanned images where there has been the need to proofread.
Regards, Billinghurst
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l