Hi,
I am working on the
mass migration tools project as a part of Google Summer of Code. One of the parts of project is to import old translations into the Translate Extension.
We are done with a basic import by splitting the old pages on
double newlines (\n\n) and some more alignment based on h2 headers. We
are now thinking of improving the alignment.
Is there some
work done on the subject mentioned? For each of the unit, what I would
like to do is clear all the linguistic elements and have the bare markup
left. Then, I could compare the markup of the source and target units
and align accordingly.
Are there any API's available which already do this? Please guide me to accomplish this task.