Yes, you are absolutely right. Abstract Wikipedia will allow us to generate
such corpora. The big question then will be, whether these corpora are
actually representative enough of the respective natural language? That I
don't know and that's something for researchers to figure out in 2023.
And yes, you touch on a very interesting point: it could be possible that
Abstract Wikipedia will be used for creating parallel corpora between
language pairs where such corpora are currently missing. These missing
corpora make it currently harder to train machine translation for these
languages or language pairs. Or even monolingual corpora that can help with
zero-shot and other neural machine translation approaches.
Thus, the funny thing is that the sheer existence of Abstract Wikipedia
will likely reduce the time window in which Abstract Wikipedia is helpful,
because it will likely be something that will speed up the development of
more reliable machine translation solutions for many more languages.
Abstract Wikipedia will help with potentially making Abstract Wikipedia
And that's a good thing.
(Note, that doesn't apply to Wikifunctions)
On Mon, Jun 21, 2021 at 6:50 AM Andy <borucki.andrzej(a)gmail.com> wrote:
Is very hard to make large or even medium size corpus
of sentences, in
which each word would be manually annotated with sense.
Abstract Wikipedia not only allows generate text in many languages from
one source but can be WSD corpus. Moreover: in many languages.
This allows understanding natural text and operations like:
1) translation from any natural language to disambig form
2) translate from this form to other natural language
and after step 1 this form will very useful not only for translation
I was interested in this Abstract Wikipedia project one year ago.Now I'm
not up to date on the topic
On Arctic Knot conference will be look on project as database of
Abstract-Wikipedia mailing list -- abstract-wikipedia(a)lists.wikimedia.org