Another thing I would be very happy to see in the future is a greater,
systematic collaboration with Internet Archive.
I'm convinced that it's a vital part of our ecosystem, because it allow
easily a lot of things that should be done by skilled users (like create a
PDF/djvu, OCR, etc).
When a I explain Wikisource I always explain Internet Archive first,
teaching people to upload there their files, then into Commons/Wikisource
via the "IA Upload" tool.
This is why the Italian Wikisource community created a dedicated collection
on IA:
https://archive.org/details/itwikisource
To create a collection, you need at least 50 items, and then you can ask
Internet Archive to give you permission.
Right now, Alex brollo is writing some scripts that will allow a better
maintenance of the metadata,
we'll share them when they are ready.
If you create a collection, please tell us: we could even have a greater
"Wikisource" collection, that contains all the linguistic collections.
Maybe this is a bit OT for the strategy, but I think it suggests way to
improve the collaboration between us and IA.
On Fri, Mar 24, 2017 at 10:50 AM, Andrea Zanni <zanni.andrea84(a)gmail.com>
wrote:
Anyone else?
It would be very good to know the gist of the discussions/opinions you are
having in your local Wikisource.
The Italian Wikisource for example is summing this up here:
https://meta.wikimedia.org/wiki/Strategy/Wikimedia_
movement/2017/Sources/Italian_Wikisource_Village_pump
For us, there is a bit of a disagreement about the idea and goal of being
a "library", and being a "typography": being a library is more
focused on
access, on services build upon texts (text analysis, text mining,
searching, hyperlinking, annotation) and the transcribing/proofreading
part, which needs a whole different level of tools and interface.
Maybe you are having a similar discussion?
Do you possibly see a "fork", in the future, of Wikisource in 2 different
projects, or at least 2 different interfaces?
Aubrey
On Mon, Mar 20, 2017 at 10:54 PM, Andrea Zanni <zanni.andrea84(a)gmail.com>
wrote:
@Micru: of course, as you say, machine learning
is the elephant in the
room.
I dream of something we could call "Wikisource as a platform":
meaning an environment with structured data and workflows where you can
have APIs
and tools for interact with humans and machines, both for input and for
output.
We could have OCR software that learn from our human proofreaders, and
ideally we could
even have OCRs tailored for determined centuries or types of books.
We could ue machine learning to look for citations within books (for
example other cited books or authors).¹
This could improve heavily our library:
on Internet Archive or Google Books we have millions of books that just
wait for us to make them
readable and accessible, and, of course, connect them to Wikipedia, to
Wikidata, to other Wikisource books.
IMHO, this is obviously important for GLAMs:
we could be much more usable and easy for libraries, archives and museums
that want to upload into Wikisource their texts and books, and make them
part of our hyperlinked library.
They could import easily on Wikisource, and could export as well.
Now, this is impossible or at least very very difficult.²
I'm not sure that all these features could go in just one project, but
it's probably worth trying.
Aubrey
[1] I remember I explored the idea with Amir, but I couldn't follow up.
[2] To get all the data I needed from Wikisource books, I had to
basically scrape the website.
On Mon, Mar 20, 2017 at 8:14 PM, Pine W <wiki.pine(a)gmail.com> wrote:
Glad to see this discussion. Pinging Alex Stinson
for this discussion in
case he has any insights to add from a GLAM perspective.
Pine
On Mon, Mar 20, 2017 at 7:48 AM, David Cuenca Tudela <dacuetu(a)gmail.com>
wrote:
On Sun, Mar 19, 2017 at 9:44 PM, Asaf Bartov
<abartov(a)wikimedia.org>
wrote:
> what might be the significant role our unique advantage might play in
> 15 years?
>
There are some circumstantial aspects that might be relevant for
Wikisource:
- With the emergence of machine learning, do volunteers really need to
spend so much time formatting? Or will we able to use our data to train a
system to do some pre-formatting for us?
- With the existing flood of data, can we consider ws as a relevancy
setter? If a document has been transcribed/imported into wikisource, is
that enough to make the document relevant?
- Considering that not all libraries might have the resources to
develop their own platform, can Wikisource be used as a neutral platform by
external agents as a complement to their own infrastructure?
Regarding the 15 years time frame, it might be a good exercise to
examine different scenarios. Yes, one could be to think big, to expect
growth and a favorable environment. But what about the opposite? What if
there are *less* people able to contribute?
Cheers,
Micru
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l