Converting to UTF-code - Wikitech-l

3 Sep 2004


      Hoi,
After analysing how to parse the text version of a GEMET list, I decided 
to also have a look at the html code. The reason was that the Russian, 
Bulgarian, Greek characters became unreadable.
The HTML can be read as well as the codes are changed to be in the 
pre-UTF format (eg &#1099;&#1096; etc). It can therefore be parsed, 
eventually I could upload it to wiktionary. The question is how do I 
convert it to UTF-8??
A question about the UTF-8 conversion, is it possible to have a bot 
convert the non UTF-8 stuff to UTF-8 on en:wiktionary ??
Thanks,
    GerardM