Hi,
I'm doing a project with the content translation team as part of my FOSS
OPW internship.Our goal is to understand the proportion of translated pages
in some wiki from other wikipedias.
As a first step, I need to create a list of of articles in one language
(i.e. HE) that have corresponding articles in another language, starting
with English.
I wonder what is the best why to create this list. Possible approaches that
I thought of:
1. Using the API Sandbox and some iterative script that calls it.
2. Using the Wikimedia Dumps (specifically wiki interlanguage link records).
3. Using the Wikidata dumps (specifically wikidatawiki-latest-langlinks).
Am I missing something? Which way is the best to build the list? especially
when taking into account the possibility of inline interlanguage links?
Thanks for your help,
Neta