Gerard Meijssen wrote:
The HTML can be read as well as the codes are changed
to be in the
pre-UTF format (eg ыш etc). It can therefore be parsed,
eventually I could upload it to wiktionary. The question is how do I
convert it to UTF-8??
You can use the software i wrote to convert some wikis to utf-8 :
http://mboquien.free.fr/wikiconvert-20040902.tar.gz
Don't read the "README" file as it is completely outdated, i've to
rewrite
it. The usage is simple : ./wikiconv -i input_file -o output_file -e
encoding_of_the_input_file. It needs Qt and it works fine under linux. Not
tried on other OS.
A question about the UTF-8 conversion, is it possible
to have a bot
convert the non UTF-8 stuff to UTF-8 on en:wiktionary ??
It is also possible with the program but it would put the en wiktionary read
only during a few minutes, shaihulud is used to this kind of operations. I
use it quite often on fr:, but manually to convert some pages that contain
many utf-8 entities.
Med