Hoi,
Sorry to state the obvious (for me) .. We datamine Wikipedias for
statements in Wikipedia. Consequently much information that could be /
should be in an article (in any and all languages) is reflected by
Wikidata. There is much that is not found in every language and information
on some subjects can easily be provided from Wikidata as a list (think
awards, books published etc). The good news is that Wikidata will provide
lists for this purpose. For all other topics like date of death / birth and
place of death / birth where people studied etc you have the benefit of
existing articles in a Wikipedia and the work done at Wikidata.
Hope this helps.
Thanks,
GerardM
On 24 August 2017 at 19:56, Leila Zia <leila(a)wikimedia.org> wrote:
Hi all,
==Question==
Do you know of a dataset we can use as ground truth for aligning
sections of one article in two languages? I'm thinking a tool such as
Content Translation may capture this data somewhere, or there may be
some other community initiative that has matched a subset of the
sections between two versions of one article in two languages. Any
insights/directions is appreciated. :) I'm not going to worry about
what language pairs we do have this dataset in right now, the first
question is: do we have anything? :)
==Context==
As part of the research we are doing to build recommendation systems
that can recommend sections (or templates) for already existing
Wikipedia articles, we are looking at the problem of section alignment
between languages, i.e., given two languages x and y and two version
of article a in these two languages, can an algorithm (with relatively
high accuracy) tell us which section in the article in language x
correspond to which other section in the article in language y?
Thanks,
Leila
--
Leila Zia
Senior Research Scientist
Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l