Hi everybody,

Here is my attempt at giving my point of view while trying to summarize the discussion:

1. I think the role of Index: pages should be to present the *source* of a work. This is true whether the source is a scanned edition (as is most often the case at the moment), or a digital PDF (that is, containing text and not images) as is the case for most "digital-born" documents. I think it is good to have a neat separation between the original source and how Wikisource presents the work in the main namespace. Indeed, even if Wikisource tries to be as true as possible to the original content, there are very often some changes in the way it is presented in the main namespace.

2. Ideally, the metadata about the source of a work (author, date of printing, etc.) should be located in Wikidata. But metadata related to proofreading (e.g. the proofreading level of each individual page), being specific to the mission of Wikisource, should be located in Wikisource. How to do this while keeping the interface simple (i.e. hide it from the user so that she doesn't have to go from Wikisource to Wikidata to Wikisource) is a valid and very important concern, but is also beyond my current understanding of Wikidata and its integration into Wikimedia projects.

3. The current system with 4 quality levels to represent the proofreading state of a page is not sufficient to represent the diversity of proofreading scenarios. Indeed, there is a distinction to make between the *correctness* of the text and its *formatting*. In the case of a scanned edition which has been OCRed, we do need several passes before reaching a satisfying level of confidence about the correctness of the text as well as a suitable formatting (proper use of the wikicode, etc.). For digital-born documents however, as billinghurst said, we can automatically assume that the extracted text is correct, but that still doesn't mean that the text is correctly formatted and ready to be transcluded in the main namespace. Maybe we should add another level meaning "text is correct, still needs formatting"? Ideally, we should have to scales of quality levels: one dealing with the correctness of the text, and one dealing with its formatting. This would probably be too heavy and confusing though...

Thibaut (user:Zaran on Wikisource)

On 06/12/2013 01:35 PM, Andrea Zanni wrote: