I got it. o_O
No need of regex, lxml, pyquery nor XLST.... most simple python parsing routines can understand abbyy xml and extract both text and informations about text.
The goal was, to get by python both plain text (the same produced by wikisource server when creating a new page from a djvu text layer) and some html formatting, into a format usable by VisualEditor; and if you take a look to http://it.wikipedia.org/wiki/Utente:Alex_brollo/Sandbox
, you'll see in red only owrds, where parameter wordPenalty is more than 0 into the source file abbyy xml.
Alex brollo (from it.wikisource)