Re: [Wiki-research-l] ground truth for section alignment across languages

28 Aug 2017

Hoi,
Sorry to state the obvious (for me) .. We datamine Wikipedias for
statements in Wikipedia. Consequently much information that could be /
should be in an article (in any and all languages) is reflected by
Wikidata. There is much that is not found in every language and information
on some subjects can easily be provided from Wikidata as a list (think
awards, books published etc). The good news is that Wikidata will provide
lists for this purpose. For all other topics like date of death / birth and
place of death / birth where people studied etc you have the benefit of
existing articles in a Wikipedia and the work done at Wikidata.

Hope this helps.
Thanks,
      GerardM

On 24 August 2017 at 19:56, Leila Zia &lt;leila(a)wikimedia.org&gt; wrote:

...
  Hi all,

 ==Question==
 Do you know of a dataset we can use as ground truth for aligning
 sections of one article in two languages? I'm thinking a tool such as
 Content Translation may capture this data somewhere, or there may be
 some other community initiative that has matched a subset of the
 sections between two versions of one article in two languages. Any
 insights/directions is appreciated. :) I'm not going to worry about
 what language pairs we do have this dataset in right now, the first
 question is: do we have anything? :)

 ==Context==
 As part of the research we are doing to build recommendation systems
 that can recommend sections (or templates) for already existing
 Wikipedia articles, we are looking at the problem of section alignment
 between languages, i.e., given two languages x and y and two version
 of article a in these two languages, can an algorithm (with relatively
 high accuracy) tell us which section in the article in language x
 correspond to which other section in the article in language y?

 Thanks,
 Leila

 --
 Leila Zia
 Senior Research Scientist
 Wikimedia Foundation

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] ground truth for section alignment across languages