I think everything is doable, the problem is how to do it without cluttering the interface and keeping things simple.

Some levels might be redundant and we could take the chance to think if they are really necessary.

Some proposed changes:
- Proofread page levels: "Unused", "Proofread", "Proofread with format", "Validated" (the "unused" level would mean: pages with no text, ocr text, pages with irrelevant content).
- All pages would be created at start with the extracted ocr text at "unused" level, so finally search engines could also find our texts even if they are not started yet
- A checkbox list to tag pages: "damaged scan", "missing scan", "contains media" (image, score, etc)
- Color codes: like now plus orange for "Proofread with format". Page with tags would affect the color too. "damaged" would make the color half purple and half the corresponding proofread level color, "contains media" could add a (black?) square around the page number
- Proofread book levels should be automatic to the lowest page level, plus two options, one to mark the book as "ready to export" and another one to mark it as "digital source", which would bring all pages at "proofread" level.

For the metadata interface I keep thinking about it, and my impression is that we should start working from Template:Book [1] until having a version that can be used across Commons, Index pages, and books without supporting scans (in this last case it could be the same header template with an option to expand it to show the whole template:book).
That template also might need some coloring/reorganizing to reflect the Work/Edition distinction that Wikidata is bringing [2]
And if with Lua it is possible to read/write Wikidata, then the possible migration towards a Wikidata-powered Wikisource shouldn't be that far away.

Cheers,
Micru

[1] http://commons.wikimedia.org/wiki/Template:Book
[2] http://www.wikidata.org/wiki/Wikidata:Books_task_force


On Wed, Jun 12, 2013 at 8:48 AM, Andrea Zanni <zanni.andrea84@gmail.com> wrote:

On Wed, Jun 12, 2013 at 2:32 PM, Thibaut Horel <thibaut.horel@gmail.com> wrote:
3. The current system with 4 quality levels to represent the proofreading state of a page is not sufficient to represent the diversity of proofreading scenarios. Indeed, there is a distinction to make between the *correctness* of the text and its *formatting*. In the case of a scanned edition which has been OCRed, we do need several passes before reaching a satisfying level of confidence about the correctness of the text as well as a suitable formatting (proper use of the wikicode, etc.). For digital-born documents however, as billinghurst said, we can automatically assume that the extracted text is correct, but that still doesn't mean that the text is correctly formatted and ready to be transcluded in the main namespace. Maybe we should add another level meaning "text is correct, still needs formatting"? Ideally, we should have to scales of quality levels: one dealing with the correctness of the text, and one dealing with its formatting. This would probably be too heavy and confusing though...

I couldn't agree more. 
I think this could be an opportunity also to make task *smaller* and *clearer* 
(in the direction of "microtask", which are contributions in crowdsourcing projects which are small, definite and simple. eg GalaxyZoo, reCAPTCHA).

We could define some tasks as
* corrected the page
* proofread the text
* formatted the page
* validated the formatting
* OPTIONAL added optional templates/links/annotations
*...

We could even have qualifiers (all/part of the page, ...)

Is this idea crazy, or somewhat doable?

Aubrey

_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l




--
Etiamsi omnes, ego non