Yes, you are absolutely right. Abstract Wikipedia will allow us to generate such corpora. The big question then will be, whether these corpora are actually representative enough of the respective natural language? That I don't know and that's something for researchers to figure out in 2023.
And yes, you touch on a very interesting point: it could be possible that Abstract Wikipedia will be used for creating parallel corpora between language pairs where such corpora are currently missing. These missing corpora make it currently harder to train machine translation for these languages or language pairs. Or even monolingual corpora that can help with zero-shot and other neural machine translation approaches.
Thus, the funny thing is that the sheer existence of Abstract Wikipedia will likely reduce the time window in which Abstract Wikipedia is helpful, because it will likely be something that will speed up the development of more reliable machine translation solutions for many more languages. Abstract Wikipedia will help with potentially making Abstract Wikipedia obsolete!
And that's a good thing.
(Note, that doesn't apply to Wikifunctions)
On Mon, Jun 21, 2021 at 6:50 AM Andy borucki.andrzej@gmail.com wrote:
Is very hard to make large or even medium size corpus of sentences, in which each word would be manually annotated with sense. Abstract Wikipedia not only allows generate text in many languages from one source but can be WSD corpus. Moreover: in many languages. This allows understanding natural text and operations like:
- translation from any natural language to disambig form
- translate from this form to other natural language
and after step 1 this form will very useful not only for translation
I was interested in this Abstract Wikipedia project one year ago.Now I'm not up to date on the topic On Arctic Knot conference will be look on project as database of disambiguated knowledge? _______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...