@Micru: of course, as you say, machine learning is the elephant in the room. I dream of something we could call "Wikisource as a platform": meaning an environment with structured data and workflows where you can have APIs and tools for interact with humans and machines, both for input and for output. We could have OCR software that learn from our human proofreaders, and ideally we could even have OCRs tailored for determined centuries or types of books. We could ue machine learning to look for citations within books (for example other cited books or authors).¹ This could improve heavily our library: on Internet Archive or Google Books we have millions of books that just wait for us to make them readable and accessible, and, of course, connect them to Wikipedia, to Wikidata, to other Wikisource books.
IMHO, this is obviously important for GLAMs: we could be much more usable and easy for libraries, archives and museums that want to upload into Wikisource their texts and books, and make them part of our hyperlinked library. They could import easily on Wikisource, and could export as well. Now, this is impossible or at least very very difficult.²
I'm not sure that all these features could go in just one project, but it's probably worth trying.
Aubrey
[1] I remember I explored the idea with Amir, but I couldn't follow up. [2] To get all the data I needed from Wikisource books, I had to basically scrape the website.
On Mon, Mar 20, 2017 at 8:14 PM, Pine W wiki.pine@gmail.com wrote:
Glad to see this discussion. Pinging Alex Stinson for this discussion in case he has any insights to add from a GLAM perspective.
Pine
On Mon, Mar 20, 2017 at 7:48 AM, David Cuenca Tudela dacuetu@gmail.com wrote:
On Sun, Mar 19, 2017 at 9:44 PM, Asaf Bartov abartov@wikimedia.org wrote:
what might be the significant role our unique advantage might play in 15 years?
There are some circumstantial aspects that might be relevant for Wikisource:
- With the emergence of machine learning, do volunteers really need to
spend so much time formatting? Or will we able to use our data to train a system to do some pre-formatting for us?
- With the existing flood of data, can we consider ws as a relevancy
setter? If a document has been transcribed/imported into wikisource, is that enough to make the document relevant?
- Considering that not all libraries might have the resources to develop
their own platform, can Wikisource be used as a neutral platform by external agents as a complement to their own infrastructure?
Regarding the 15 years time frame, it might be a good exercise to examine different scenarios. Yes, one could be to think big, to expect growth and a favorable environment. But what about the opposite? What if there are *less* people able to contribute?
Cheers, Micru
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l