Hi,
I am working on the mass migration tools project
<https://www.mediawiki.org/wiki/Extension:Translate/Mass_migration_tools>
as a part of Google Summer of Code. One of the parts of project is to
import old translations into the Translate Extension.
We are done with a basic import by splitting the old pages on double
newlines (\n\n) and some more alignment based on h2 headers. We are now
thinking of improving the alignment.
Is there some work done on the subject mentioned? For each of the unit,
what I would like to do is clear all the linguistic elements and have the
bare markup left. Then, I could compare the markup of the source and target
units and align accordingly.
Are there any API's available which already do this? Please guide me to
accomplish this task.
--
Warm Regards,
*Pratik Lahoti*
GSoC Intern | Wikimedia
User:BPositive <http://www.mediawiki.org/wiki/User:BPositive>