Hello,
I would like to know if any Wikisource community has moved forward to *automatically[1]* tag or annotate Wikisource texts or has any plans to do so.
Regards, Bodhisattwa
[1] (without manually adding annotation templates)
Hi Bodhi,
I'm interested to know the answer too. There are a lot of untapped potentials there but no real plans that I know of.
I'm cc-ing this to C. Scott Ananian who did a presentation on a related subject during the last Wikimania ( https://wikimania.wikimedia.org/wiki/2019:Transcription/A_general_annotation... not automated tho... but as far as I know, this is the closest to you're idea) and maybe you could provide some answers.
Cheers, ~nicolas
Le mar. 28 juil. 2020 à 17:36, Bodhisattwa Mandal < bodhisattwa.rgkmc@gmail.com> a écrit :
Hello,
I would like to know if any Wikisource community has moved forward to *automatically[1]* tag or annotate Wikisource texts or has any plans to do so.
Regards, Bodhisattwa
[1] (without manually adding annotation templates) _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Thanks Nicolas.
Nemo, for now, any persons, places, creative works, events etc. mentioned in the Wikisource texts and have Wikidata items.
Regards, Bodhisattwa
On Wed, Jul 29, 2020, 00:17 Federico Leva (Nemo) nemowiki@gmail.com wrote:
What kind of tagging and annotation do you have in mind?
Federico
Indeed a very interesting direction. I suggested it during the Wikisource Conference in 2016 (Vienna) as one distinguishing feature Wikisource *could* develop to differentiate itself from various other text repositories like Project Gutenberg, HathiTrust, etc. But WMF is not ready to allocate engineering resources to this, and there hasn't been a volunteer attempt, as far as I know.
A.
Asaf Bartov (he/him/his)
Senior Program Officer, Emerging Wikimedia Communities
Wikimedia Foundation https://wikimediafoundation.org/
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality! https://donate.wikimedia.org
On Tue, Jul 28, 2020 at 9:58 PM Bodhisattwa Mandal < bodhisattwa.rgkmc@gmail.com> wrote:
Thanks Nicolas.
Nemo, for now, any persons, places, creative works, events etc. mentioned in the Wikisource texts and have Wikidata items.
Regards, Bodhisattwa
On Wed, Jul 29, 2020, 00:17 Federico Leva (Nemo) nemowiki@gmail.com wrote:
What kind of tagging and annotation do you have in mind?
Federico
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Hi all,
Some tools exists outside Wikisource, for instance I know https://www.textrazor.com/demo who find the Qid of words in a text (very good quality but proprietary) or https://ordia.toolforge.org/text-to-lexemes (crude but open, based on SPARQL and for Lexemes), that can generate annotation on the fly. It's not easy (there is a lot of questions) but I'm confident that some things are doable (at least as a POC).
Cheers, ~nicolas
Do you mean that templates (or some other annotation syntax) will be added to wikitext, just not by humans?
Or suggested by software, and added to wikitext after being confirmed by humans?
Or not added to wikitext at all and stored separately somewhere?
בתאריך יום ג׳, 28 ביולי 2020, 18:36, מאת Bodhisattwa Mandal < bodhisattwa.rgkmc@gmail.com>:
Hello,
I would like to know if any Wikisource community has moved forward to *automatically[1]* tag or annotate Wikisource texts or has any plans to do so.
Regards, Bodhisattwa
[1] (without manually adding annotation templates) _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Amir,
Coming from a community with not much volunteer force, I actually want any strategy which involves minimal human interference into the tagging process, as we can't afford to spread our thin line.
Your first option looks more inclined to what I was trying to say. However, I understand that there will be possibilities of errors or ambiguities and need some level of human check system anyway.
Personally, I would love the third option but looks like it requires more engineering than the other two, forgive me if I am wrong. So considering the lack of initiatives in this area in the past, I would stick to the first one as a more practical approach for now.
Regards, Bodhisattwa
On Wed, Jul 29, 2020, 02:18 Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
Do you mean that templates (or some other annotation syntax) will be added to wikitext, just not by humans?
Or suggested by software, and added to wikitext after being confirmed by humans?
Or not added to wikitext at all and stored separately somewhere?
בתאריך יום ג׳, 28 ביולי 2020, 18:36, מאת Bodhisattwa Mandal < bodhisattwa.rgkmc@gmail.com>:
Hello,
I would like to know if any Wikisource community has moved forward to *automatically[1]* tag or annotate Wikisource texts or has any plans to do so.
Regards, Bodhisattwa
[1] (without manually adding annotation templates) _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Bodhisattwa Mandal, 29/07/20 00:40:
Coming from a community with not much volunteer force, I actually want any strategy which involves minimal human interference into the tagging process, as we can't afford to spread our thin line.
Anything that goes into the wikitext has an implicit cost for humans. Already the templates we use for mere formatting and layout make it costly to do relatively simple things such as "give me a plain text version of the book", although they sometimes manage to make other things easier (such as making a decent HTML version which may also work in EPUB).
If the purpose of linking "persons, places, creative works, events" to Wikidata is to provide marginally faster information to the average reader browsing the Wikisource website, then you can do it with a JavaScript gadget similar to the various Wiktionary gadgets which we've had for a while, and take a probabilistic approach. If the purpose is *disambiguation* (and attendant features like structured search), then it's quite a different matter.
We probably can't afford an approach like METS/ALTO for any significant number of works. Nowadays people do all sorts of fancy things with IIIF but I'm not sure about detailed tagging at scale. The advantage of something like an IIIF manifest is that you can store it separately from whatever we have now, and "just" overlay it on the images (merging with the wikitext and HTML is going to be harder; compare efforts by Alex_brollo with hOCR/DjVu transfers).
You can probably imagine a relatively simple gadget to suggest possible Wikidata items to connect to some parts of an image and let the user confirm or not with a single click, then store the result in a JSON on a wiki page. If it's designed for the Page namespace, maybe it can even be enabled by default on a willing subdomain without disturbing casual users. If some focus is determined (say, "depicts"-like statements for illustrations in books), it might be possible to have some perceptible progress with an edit drive à la Wikisource birthday prize, to attract new users beyond the usual suspects and generate some enthusiasm.
Federico
On Tue, 28 Jul 2020 at 16:35, Bodhisattwa Mandal bodhisattwa.rgkmc@gmail.com wrote:
I would like to know if any Wikisource community has moved forward to automatically[1] tag or annotate Wikisource texts or has any plans to do so.
When I have suggested doing this - even manually - on en.Wikisource, I have been told that there is consensus against such annotation, and not to do it.
It is a great opportunity to miss :-(
wikisource-l@lists.wikimedia.org