Dear Anivar:
There are Four Components
Thanks for the addendum - how important is the rendering engine in the scheme of things? Is work on that pretty much done or are there issues there too?
It is Font dependent. There is a need of Preparing Conversion maps for each Ascii font to convert data encoded in them to unicode. Swathanthra Malayalam Computing's Payyan's (http://wiki.smc.org.in/Payyans ) is a tool developed for converting ASCII to Unicode easily for any Indic Language by building a Font map for each needed font . This tool helped Malayalam Wiktionary to convert many copyright expired books in non standard encodings to Unicode Popular Firefox extension named Padma uses similar encoding conversion tables to display ASCII news websites in Unicode
So how do these work? They have built a map for every single ASCII encoding/font pair (since this is some ugly hack) and the corresponding Unicode value? There must be thousands of ASCII encoding/font pairs right? Is this even a viable option? Are there alternatives to this?
I dont think this will happen. There is a long history of lobbying for thiswith CDAC from 2001 Onwards and nothing happened. CDAC made enough money by selling ASCII fonts(and still makes) and They cant even think about giving them away with a FOSS License . And during frequent terms they eat more government money for making yet another CD to ship with their FOSS project forks (such ad Bhaathiya OO , IndiFox etc )+ These fonts. In the same way most of the TDIL funding to CDAC for Indic Language technology research does not make output at all or not getting released, even after TDIL's policy decision to release them under a foss license.
I can see the frustration of this - so in your opinion, an effort not worth undertaking? Assuming they were ready to use a FOSS license, are the fonts good enough to want to use?
Searching and sorting algorithms for Indic languages are in development and are not bug free. Indic support is not yet available in most of the search solutions (including FOSS solutions like Lucene or Solr) because of the complex word formation characteristics.
But if I understand correctly, this is *only* possible using Unicode encoding. Right?
Thank you, Anivar.
Best,
Gautam ________ http://social.prathambooks.org/