On 22 February 2011 22:29, Santhosh Thottingal santhosh.thottingal@gmail.com wrote:
I think you have some confusion on Unicode and Fonts. Let me try to clarify in simple words.
Yes - I did! And thank you for such a detailed response.
To see if I have understood this - there are three components:
1. Input (Different types of keyboard layouts are used but are independent of the method of encoding - correct?) 2. Encoding and storing the input (ASCII is the older method - have heard of ISCII as well but do not know what that is but Unicode is the standard. 3. Representing, visually for the human user, what has been inputed and encoded. (Font or type faces and these are, to an extent, independent of the encoding method used.)
But I know that many people still use the term "data in unicode fonts", data in xyz font etc. This usage came into existence just because, before unicode was popular, most of the Indian publishers used a non-standard way of representing our data- using English(or latin -ascii) data and change the font's 'face' to Indian glyph. "a fancy dress" hack. The letter "k" will be shown as hindi "ka" with the help of a font. ie the data is still english, but what you "see" is Hindi.
So if I understand correctly, not only is the encoding in ASCII but the representation of that encoding is tied to a particular font (that was used for representation at entry?) and will only be represented properly when using that font? However, what I am trying to understand is whether there is consistency across the ASCII encoding? Will ka in Hindi be encoded in ASCII only one way or is there a linkage, that I do not understand, to the font used to represent it as well?
The reason I ask is because if ka in Hindi is always encoded the same way irrespective of the font used to represent it, then it should not be hard to build an ASCII to Unicode map of encoding that will only have to be done once for each language? Though something tells me I am way off on this assumption.
This is true. Fonts exist for all scripts , but the variety , or quality of the existing fonts varies. Availability of fonts licensed in foss compatible license is also a problem. For a detailed list of Indic fonts with license info, see http://indlinux.org/wiki/index.php/IndicFontsList
Thanks, Santosh. This is a really useful. Also, are these screen or print ready fonts?
You are correct. I would say "fonts licensed under any FOSS license" instead of "free use/reuse".
Indeed. FOSS license is what I should have said.
In fact, the funds were spent(read wasted) for the development of Proprietary fonts by government agencies like CDAC. Fonts with free(dom) licenses were developed and maintained by FOSS developer communities.
*sigh* In your opinion, would they be any real benefit if they did license the ILDC series under a true FOSS license?
Each Unicode character is multi-byte character while in ASCII, it is single byte.
Ah. Okay. I understand now.
This is not comparable since search is not possible in ascii font way of representing data. Since the data is not in Hindi , but we just "see" as Hindi, one cannot do a search or any such data processing on that data.
If I understand, it is not possible to search within ASCII encoded text but this can be done in Unicode encoded text?
Thank you very much Santosh - I have learned a lot from this.
Best,
Gautam