Another thing I would be very happy to see in the future is a greater, systematic collaboration with Internet Archive.
I'm convinced that it's a vital part of our ecosystem, because it allow easily a lot of things that should be done by skilled users (like create a PDF/djvu, OCR, etc).
When a I explain Wikisource I always explain Internet Archive first, teaching people to upload there their files, then into Commons/Wikisource via the "IA Upload" tool.

This is why the Italian Wikisource community created a dedicated collection on IA:
https://archive.org/details/itwikisource

To create a collection, you need at least 50 items, and then you can ask Internet Archive to give you permission.
Right now, Alex brollo is writing some scripts that will allow a better maintenance of the metadata,
we'll share them when they are ready.

If you create a collection, please tell us: we could even have a greater "Wikisource" collection, that contains all the linguistic collections.

Maybe this is a bit OT for the strategy, but I think it suggests way to improve the collaboration between us and IA.

On Fri, Mar 24, 2017 at 10:50 AM, Andrea Zanni <zanni.andrea84@gmail.com> wrote:
Anyone else?
It would be very good to know the gist of the discussions/opinions you are having in your local Wikisource.

The Italian Wikisource for example is summing this up here:
https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Sources/Italian_Wikisource_Village_pump

For us, there is a bit of a disagreement about the idea and goal of being a "library", and being a "typography": being a library is more focused on access, on services build upon texts (text analysis, text mining, searching, hyperlinking, annotation) and the transcribing/proofreading part, which needs a whole different level of tools and interface.

Maybe you are having a similar discussion?
Do you possibly see a "fork", in the future, of Wikisource in 2 different projects, or at least 2 different interfaces?

Aubrey

On Mon, Mar 20, 2017 at 10:54 PM, Andrea Zanni <zanni.andrea84@gmail.com> wrote:
@Micru: of course, as you say, machine learning is the elephant in the room.
I dream of something we could call "Wikisource as a platform":
meaning an environment with structured data and workflows where you can have APIs
and tools for interact with humans and machines, both for input and for output.
We could have OCR software that learn from our human proofreaders, and ideally we could
even have OCRs tailored for determined centuries or types of books.
We could ue machine learning to look for citations within books (for example other cited books or authors).¹
This could improve heavily our library:
on Internet Archive or Google Books we have millions of books that just wait for us to make them
readable and accessible, and, of course, connect them to Wikipedia, to Wikidata, to other Wikisource books.

IMHO, this is obviously important for GLAMs:
we could be much more usable and easy for libraries, archives and museums that want to upload into Wikisource their texts and books, and make them part of our hyperlinked library.
They could import easily on Wikisource, and could export as well.
Now, this is impossible or at least very very difficult.²

I'm not sure that all these features could go in just one project, but it's probably worth trying.

Aubrey

[1] I remember I explored the idea with Amir, but I couldn't follow up.
[2] To get all the data I needed from Wikisource books, I had to basically scrape the website.

On Mon, Mar 20, 2017 at 8:14 PM, Pine W <wiki.pine@gmail.com> wrote:
Glad to see this discussion. Pinging Alex Stinson for this discussion in case he has any insights to add from a GLAM perspective.

Pine


On Mon, Mar 20, 2017 at 7:48 AM, David Cuenca Tudela <dacuetu@gmail.com> wrote:
On Sun, Mar 19, 2017 at 9:44 PM, Asaf Bartov <abartov@wikimedia.org> wrote:
what might be the significant role our unique advantage might play in 15 years?  

There are some circumstantial aspects that might be relevant for Wikisource:
- With the emergence of machine learning, do volunteers really need to spend so much time formatting? Or will we able to use our data to train a system to do some pre-formatting for us?
- With the existing flood of data, can we consider ws as a relevancy setter? If a document has been transcribed/imported into wikisource, is that enough to make the document relevant?
- Considering that not all libraries might have the resources to develop their own platform, can Wikisource be used as a neutral platform by external agents as a complement to their own infrastructure?

Regarding the 15 years time frame, it might be a good exercise to examine different scenarios. Yes, one could be to think big, to expect growth and a favorable environment. But what about the opposite? What if there are *less* people able to contribute? 

Cheers,
Micru


_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l



_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l