Re: [Wikisource-l] Wikimedia Strategy

20 Mar 2017

      @Micru: of course, as you say, machine learning is the elephant in the room.
I dream of something we could call "Wikisource as a platform":
meaning an environment with structured data and workflows where you can
have APIs
and tools for interact with humans and machines, both for input and for
output.
We could have OCR software that learn from our human proofreaders, and
ideally we could
even have OCRs tailored for determined centuries or types of books.
We could ue machine learning to look for citations within books (for
example other cited books or authors).¹
This could improve heavily our library:
on Internet Archive or Google Books we have millions of books that just
wait for us to make them
readable and accessible, and, of course, connect them to Wikipedia, to
Wikidata, to other Wikisource books.
IMHO, this is obviously important for GLAMs:
we could be much more usable and easy for libraries, archives and museums
that want to upload into Wikisource their texts and books, and make them
part of our hyperlinked library.
They could import easily on Wikisource, and could export as well.
Now, this is impossible or at least very very difficult.²
I'm not sure that all these features could go in just one project, but it's
probably worth trying.
Aubrey
[1] I remember I explored the idea with Amir, but I couldn't follow up.
[2] To get all the data I needed from Wikisource books, I had to basically
scrape the website.
On Mon, Mar 20, 2017 at 8:14 PM, Pine W wiki.pine@gmail.com wrote:
...
Glad to see this discussion. Pinging Alex Stinson for this discussion in
case he has any insights to add from a GLAM perspective.
Pine
On Mon, Mar 20, 2017 at 7:48 AM, David Cuenca Tudela dacuetu@gmail.com
wrote:
...
On Sun, Mar 19, 2017 at 9:44 PM, Asaf Bartov abartov@wikimedia.org
wrote:
...
what might be the significant role our unique advantage might play in 15
years?
There are some circumstantial aspects that might be relevant for
Wikisource:

With the emergence of machine learning, do volunteers really need to

spend so much time formatting? Or will we able to use our data to train a
system to do some pre-formatting for us?

With the existing flood of data, can we consider ws as a relevancy

setter? If a document has been transcribed/imported into wikisource, is
that enough to make the document relevant?

Considering that not all libraries might have the resources to develop

their own platform, can Wikisource be used as a neutral platform by
external agents as a complement to their own infrastructure?
Regarding the 15 years time frame, it might be a good exercise to examine
different scenarios. Yes, one could be to think big, to expect growth and a
favorable environment. But what about the opposite? What if there are
*less* people able to contribute?
Cheers,
Micru

Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikisource-l] Wikimedia Strategy