Hi everybody,

Here is my attempt at giving my point of view while trying to summarize the discussion:

1. I think the role of Index: pages should be to present the *source* of a work. This is true whether the source is a scanned edition (as is most often the case at the moment), or a digital PDF (that is, containing text and not images) as is the case for most "digital-born" documents. I think it is good to have a neat separation between the original source and how Wikisource presents the work in the main namespace. Indeed, even if Wikisource tries to be as true as possible to the original content, there are very often some changes in the way it is presented in the main namespace.

2. Ideally, the metadata about the source of a work (author, date of printing, etc.) should be located in Wikidata. But metadata related to proofreading (e.g. the proofreading level of each individual page), being specific to the mission of Wikisource, should be located in Wikisource. How to do this while keeping the interface simple (i.e. hide it from the user so that she doesn't have to go from Wikisource to Wikidata to Wikisource) is a valid and very important concern, but is also beyond my current understanding of Wikidata and its integration into Wikimedia projects.

3. The current system with 4 quality levels to represent the proofreading state of a page is not sufficient to represent the diversity of proofreading scenarios. Indeed, there is a distinction to make between the *correctness* of the text and its *formatting*. In the case of a scanned edition which has been OCRed, we do need several passes before reaching a satisfying level of confidence about the correctness of the text as well as a suitable formatting (proper use of the wikicode, etc.). For digital-born documents however, as billinghurst said, we can automatically assume that the extracted text is correct, but that still doesn't mean that the text is correctly formatted and ready to be transcluded in the main namespace. Maybe we should add another level meaning "text is correct, still needs formatting"? Ideally, we should have to scales of quality levels: one dealing with the correctness of the text, and one dealing with its formatting. This would probably be too heavy and confusing though...

Thibaut (user:Zaran on Wikisource)

On 06/12/2013 01:35 PM, Andrea Zanni wrote:

On Wed, Jun 12, 2013 at 1:32 PM, billinghurst <billinghurst@gmail.com> wrote:
If you are talking about how we represent digitally prepared text with the
validation process. I would have no issue with the text being ripped and
having a bot run through and taking it straight to level 4 (green), and
then redefining green to say validated, or digitally prepared text not
requiring validation.

At the same time, if someone proposed and generates a fifth colour to
represent digitally prepared text not requiring proofreading, then I will
be happy with that. It may make someone happier in being a truer
representation, but in the end to me it is a moot point. In the end, each
of those is a local community decision, though one that should be made in
consideration of how the other wikis interpret their processes.

Thanks for clarifying this.
I agree with you, and would welcome both solutions.

But a lot of wikisourcerors don't think this way, 
so better discuss :-)

Aubrey





_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l