[Wiki-research-l] ground truth for section alignment across languages

24 Aug 2017


      Hi all,
==Question==
Do you know of a dataset we can use as ground truth for aligning
sections of one article in two languages? I'm thinking a tool such as
Content Translation may capture this data somewhere, or there may be
some other community initiative that has matched a subset of the
sections between two versions of one article in two languages. Any
insights/directions is appreciated. :) I'm not going to worry about
what language pairs we do have this dataset in right now, the first
question is: do we have anything? :)
==Context==
As part of the research we are doing to build recommendation systems
that can recommend sections (or templates) for already existing
Wikipedia articles, we are looking at the problem of section alignment
between languages, i.e., given two languages x and y and two version
of article a in these two languages, can an algorithm (with relatively
high accuracy) tell us which section in the article in language x
correspond to which other section in the article in language y?
Thanks,
Leila
--
Leila Zia
Senior Research Scientist
Wikimedia Foundation

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[Wiki-research-l] ground truth for section alignment across languages