2014-11-24 19:51 GMT+01:00 Andrea Zanni <zanni.andrea84@gmail.com>:
Please keep up this good discussion :-)
We have the Wikisource contest on it.source right now,
so this mail is not going to be as long and detailed as I hoped.

I agree with Vigneron that the Survey report is a good start:
having written it myself, I'm well aware that it's not perfect, and that questions were not bulletproof, as well the methodology.
Nonetheless, we tried hard to make it and many results are as good and trustworthy.

I personally agree that a VE integration with the Proofread extension would be much needed:
if you think about it, Wikisource is the right place for the VE.
We could simplify enormously the life of new proofreaders, and formatting on Wikisource is ten times more difficult than in Wikipedia.
I'm sure it's one of the best thing to do right now.

At the same time, I agree with Lars (who always has great insights)
that we still need to do the big leap in digital libraries.
For me, one of the thing Wikisource offers that nobody does is *hypertextuality*,
and connections and integration with other projects as Wikidata (hopefully) and Wikipedia.
I agree with him that algorithmic learning of Wikisource is an amazing idea: just think about having a Tesseract instance for every Wikisource, and the tesseract learns from every page the community proofreads... In few years, we could even think about tell our Tesseract to distinguish between XII century Italian vs XIX century... We could have amazing open source OCRs to give to the world.

Another greataccomplishment could be *giving back proofread OCR* to GLAMs: think about libraries (or Internet Archive!) give us ancient texts, and us giving them back a perfect djvu or PDF with mapped text inside... 
I'm sure we could have many GLAMs coming to us then :-)
We cannot give them back almost anything, right now, a part from our HTML pages.


VE integration is important and could be very useful but I'm not sure if it's really that urgent for the wikisources. In short : is VE really a priority ?
On a wikisource page there is far less formatting than in a wikipedia article (but ‘touché’ : the little formatting on Wikisource could be a pain in the a**).
VE has still some glitch/malstructure (my favorite : did you ever try to put a ref with a template inside ?), should we wait before adapting it to Wikisource ? (or should we start right now knowing it's a long way…).

A tool like Gallica (website of the National Library of France) is testing seems more useful to me. You can test it here : https://ozalid.orange-labs.fr/ozviewer/

There's probably something to look further about a tool like http://tools.wmflabs.org/dicompte/ (compare the dump of Wikisource and Wiktionary and give the list of words in Wikisource without definition on Wiktionary) but in real time and integrated in the edit interface.

Cdlt, ~nicolas