(OT) On the importance of Unicode

List overview All Threads
Download

newer

older

Re: [Wikimediaindia-l] How to add...

How to add a logo for WIkipedia...

Gautam John

17 Feb 2011 17 Feb '11

5:59 a.m.

Hey Everyone:

This one isn't directly connected with Wikimedia projects but is, IMHO, one element of any Wikimedia project in India - Unicode.

I'm trying to bring together some ideas as to why Unicode is important, what the upsides and downsides are. My initial thoughts:

1. While there are many ways to achieve a legal framework for inter-operable content (CC, GFDL, PD or the Copyright Act Amendment for the PI) etc. there needs to be a technical framework for such interoperability as well. 2. Given that we publish in Indian languages, using Unicode fonts are the only way to achieve cross-platform interoperability and is a global standard. 3. Given India's push towards copyright reform for the print impaired, it is imperative that Unicode fonts be used in the creation of Indic content because it is otherwise a huge barrier to conversion to print-friendly formats. 4. Unicode, being an open global standard guarantees content accessibility in the future and ensures no proprietary font and vendor lock in. 5. The limitation is on the lack of high quality and varied typefaces that are both screen and print optimised open type Indic Unicode fonts. 6. Given the importance of linguistic diversity to India's cultural heritage, it is imperative that greater attention is paid to the development of such fonts under licenses that allow for free re-use and to fix issues in the fonts that might arise. 7. The Govt. should fund the open development of at least 5 such fonts for each the 21 Constitutionally recognised languages and make these available not just for free, but under free license to re-use and improve as well. 8. The GoI has recognised this and notified Unicode 5.1.0 as the de-facto standard for all eGovernance projects. This standard needs to be more widely adopted for all Government digital projects and any software or content procurement as well.

Would love to hear your thoughts.

Thank you.

Best,

Gautam ________ http://social.prathambooks.org/

Show replies by date

Gautam John

17 Feb 17 Feb

6:19 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On 17 February 2011 11:29, Gautam John gautam@prathambooks.org wrote:

...

I'm trying to bring together some ideas as to why Unicode is important, what the upsides and downsides are. My initial thoughts:

A few other points that I read here: http://anandabazar-unicode.appspot.com/

Data usage: Use of Unicode will significantly reduce bandwidth/storage Search (within a page/web search etc.)

Thank you.

Best,

Gautam ________ http://social.prathambooks.org/

Santhosh Thottingal

22 Feb 22 Feb

5:07 p.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On Thu, Feb 17, 2011 at 11:49 AM, Gautam John gautam@prathambooks.org wrote:

...

Data usage: Use of Unicode will significantly reduce bandwidth/storage

This is technically wrong if we are comparing with ASCII encoded data. If you read the http://en.wikipedia.org/wiki/Unicode article, you can understand that the indic letter അ (unicode) requires more bytes than a single English letter "A" Each Unicode character is multi-byte character while in ASCII, it is single byte.

...

Search (within a page/web search etc.)

This is not comparable since search is not possible in ascii font way of representing data. Since the data is not in Hindi , but we just "see" as Hindi, one cannot do a search or any such data processing on that data.

-Santhosh

BalaSundaraRaman

17 Feb 17 Feb

7:05 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

Hi Gautam,

I have some points to share, but got to go back to work now. Can I get back on this later?

- Sundar

"That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted." - George Boole, quoted in Iverson's Turing Award Lecture

----- Original Message ----

...

From: Gautam John gautam@prathambooks.org To: Discussion list on Indian language projects of Wikimedia. wikimediaindia-l@lists.wikimedia.org Sent: Thu, February 17, 2011 11:29:27 AM Subject: [Wikimediaindia-l] (OT) On the importance of Unicode

Hey Everyone:

This one isn't directly connected with Wikimedia projects but is, IMHO, one element of any Wikimedia project in India - Unicode.

I'm trying to bring together some ideas as to why Unicode is important, what the upsides and downsides are. My initial thoughts:

While there are many ways to achieve a legal framework for

inter-operable content (CC, GFDL, PD or the Copyright Act Amendment for the PI) etc. there needs to be a technical framework for such interoperability as well. 2. Given that we publish in Indian languages, using Unicode fonts are the only way to achieve cross-platform interoperability and is a global standard. 3. Given India's push towards copyright reform for the print impaired, it is imperative that Unicode fonts be used in the creation of Indic content because it is otherwise a huge barrier to conversion to print-friendly formats. 4. Unicode, being an open global standard guarantees content accessibility in the future and ensures no proprietary font and vendor lock in. 5. The limitation is on the lack of high quality and varied typefaces that are both screen and print optimised open type Indic Unicode fonts. 6. Given the importance of linguistic diversity to India's cultural heritage, it is imperative that greater attention is paid to the development of such fonts under licenses that allow for free re-use and to fix issues in the fonts that might arise. 7. The Govt. should fund the open development of at least 5 such fonts for each the 21 Constitutionally recognised languages and make these available not just for free, but under free license to re-use and improve as well. 8. The GoI has recognised this and notified Unicode 5.1.0 as the de-facto standard for all eGovernance projects. This standard needs to be more widely adopted for all Government digital projects and any software or content procurement as well.

Would love to hear your thoughts.

Thank you.

Best,

Gautam ________ http://social.prathambooks.org/

Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Gautam John

7:12 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On 17 February 2011 12:35, BalaSundaraRaman sundarbecse@yahoo.com wrote:

...

I have some points to share, but got to go back to work now. Can I get back on this later?

Sure, Sundar! No hurry.

Thank you.

Best,

Gautam ________ http://social.prathambooks.org/

Santhosh Thottingal

22 Feb 22 Feb

4:59 p.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On Thu, Feb 17, 2011 at 11:29 AM, Gautam John gautam@prathambooks.org wrote:

...

Given that we publish in Indian languages, using Unicode fonts are

the only way to achieve cross-platform interoperability and is a global standard. 3. Given India's push towards copyright reform for the print impaired, it is imperative that Unicode fonts be used in the creation of Indic content because it is otherwise a huge barrier to conversion to print-friendly formats. 4. Unicode, being an open global standard guarantees content accessibility in the future and ensures no proprietary font and vendor lock in.

I think you have some confusion on Unicode and Fonts. Let me try to clarify in simple words. Unicode is an encoding standard. it says how a 'letter' is represented by a group of bits or bytes. And it ensures a uniqueness for each of the letters across thousands of languages in the world. Fonts are just "clothes" for these data. sometimes optimized for web, sometimes for print. sometimes fancy... Data can exist without fonts too. Only thing is one cannot "see" the data properly.or you see them naked(as question marks, squares or raw code points depending on your operating system environment)

So if you say 'using unicode fonts for indic content", it does not make sense. we cannot represent or "store" data in fonts. or when you say "unicode fonts are the only way to achieve interoperability:, it is wrong since it is "encoding standard" makes interoperability possible.

Unicode data does not have dependency on the font. Font is users choice and it is at readers side.

But I know that many people still use the term "data in unicode fonts", data in xyz font etc. This usage came into existence just because, before unicode was popular, most of the Indian publishers used a non-standard way of representing our data- using English(or latin -ascii) data and change the font's 'face' to Indian glyph. "a fancy dress" hack. The letter "k" will be shown as hindi "ka" with the help of a font. ie the data is still english, but what you "see" is Hindi. Obviously the data cannot be presented to anybody without this "special clothes". If you get this data and don't have the associated font, what you see will be just some junk latin characters. Many publishers created their own fonts with this technique in their own way. So to send some data to your friend, you need to tell him that, hey, this data is in Sree Font.. this data is in Kathika font etc. Even after Unicode is popular, a very small percentage of publishers moved to Unicode, and others still continue with ASCII font dependent data.

If one uses Unicode, no need to mention about font. One can read it using a good "unicode compatible" font of his/her choice.

So "data is in unicode encoding" is correct. "data is in unicode font" is wrong. "data can be viewed using any unicode compatible font" is correct. I hope it is clear.

...

The limitation is on the lack of high quality and varied typefaces

that are both screen and print optimised open type Indic Unicode fonts.

This is true. Fonts exist for all scripts , but the variety , or quality of the existing fonts varies. Availability of fonts licensed in foss compatible license is also a problem. For a detailed list of Indic fonts with license info, see http://indlinux.org/wiki/index.php/IndicFontsList

...

Given the importance of linguistic diversity to India's cultural

heritage, it is imperative that greater attention is paid to the development of such fonts under licenses that allow for free re-use and to fix issues in the fonts that might arise.

You are correct. I would say "fonts licensed under any FOSS license" instead of "free use/reuse".

...

The Govt. should fund the open development of at least 5 such fonts

for each the 21 Constitutionally recognised languages and make these available not just for free, but under free license to re-use and improve as well.

You got it. But history shows that such funding did not play much role in development of the fonts listed here: http://indlinux.org/wiki/index.php/IndicFontsList In fact, the funds were spent(read wasted) for the development of Proprietary fonts by government agencies like CDAC. Fonts with free(dom) licenses were developed and maintained by FOSS developer communities.

Thanks Santhosh Thottingal http://thottingal.in

Gautam John

6:10 p.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On 22 February 2011 22:29, Santhosh Thottingal santhosh.thottingal@gmail.com wrote:

...

I think you have some confusion on Unicode and Fonts. Let me try to clarify in simple words.

Yes - I did! And thank you for such a detailed response.

To see if I have understood this - there are three components:

1. Input (Different types of keyboard layouts are used but are independent of the method of encoding - correct?) 2. Encoding and storing the input (ASCII is the older method - have heard of ISCII as well but do not know what that is but Unicode is the standard. 3. Representing, visually for the human user, what has been inputed and encoded. (Font or type faces and these are, to an extent, independent of the encoding method used.)

...

But I know that many people still use the term "data in unicode fonts", data in xyz font etc. This usage came into existence just because, before unicode was popular, most of the Indian publishers used a non-standard way of representing our data- using English(or latin -ascii) data and change the font's 'face' to Indian glyph. "a fancy dress" hack. The letter "k" will be shown as hindi "ka" with the help of a font. ie the data is still english, but what you "see" is Hindi.

So if I understand correctly, not only is the encoding in ASCII but the representation of that encoding is tied to a particular font (that was used for representation at entry?) and will only be represented properly when using that font? However, what I am trying to understand is whether there is consistency across the ASCII encoding? Will ka in Hindi be encoded in ASCII only one way or is there a linkage, that I do not understand, to the font used to represent it as well?

The reason I ask is because if ka in Hindi is always encoded the same way irrespective of the font used to represent it, then it should not be hard to build an ASCII to Unicode map of encoding that will only have to be done once for each language? Though something tells me I am way off on this assumption.

...

This is true. Fonts exist for all scripts , but the variety , or quality of the existing fonts varies. Availability of fonts licensed in foss compatible license is also a problem. For a detailed list of Indic fonts with license info, see http://indlinux.org/wiki/index.php/IndicFontsList

Thanks, Santosh. This is a really useful. Also, are these screen or print ready fonts?

...

You are correct. I would say "fonts licensed under any FOSS license" instead of "free use/reuse".

Indeed. FOSS license is what I should have said.

...

In fact, the funds were spent(read wasted) for the development of Proprietary fonts by government agencies like CDAC. Fonts with free(dom) licenses were developed and maintained by FOSS developer communities.

*sigh* In your opinion, would they be any real benefit if they did license the ILDC series under a true FOSS license?

...

Each Unicode character is multi-byte character while in ASCII, it is single byte.

Ah. Okay. I understand now.

...

This is not comparable since search is not possible in ascii font way of representing data. Since the data is not in Hindi , but we just "see" as Hindi, one cannot do a search or any such data processing on that data.

If I understand, it is not possible to search within ASCII encoded text but this can be done in Unicode encoded text?

Thank you very much Santosh - I have learned a lot from this.

Best,

Gautam

Anivar Aravind

23 Feb 23 Feb

6:25 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On 2/22/11, Gautam John gautam@prathambooks.org wrote:

...

On 22 February 2011 22:29, Santhosh Thottingal santhosh.thottingal@gmail.com wrote:

...
I think you have some confusion on Unicode and Fonts. Let me try to clarify in simple words.

Yes - I did! And thank you for such a detailed response.

To see if I have understood this - there are three components:

Input (Different types of keyboard layouts are used but are

independent of the method of encoding - correct?) 2. Encoding and storing the input (ASCII is the older method - have heard of ISCII as well but do not know what that is but Unicode is the standard. 3. Representing, visually for the human user, what has been inputed and encoded. (Font or type faces and these are, to an extent, independent of the encoding method used.)

There are Four Components

1. Input Methods ( GOI approved Inscript layout, Various Popular Layouts , Translitraton Keyboards, Phonetic Keyboards) 2. Encoding ( unicode) 3. Font (Opentype Fonts ie. supporting Unicode) 4. Rendering Engines (this does the shaping of Complex Glyphs using the Open type font table in Fonts . eg. Pango in Gnome, Harfbuzz in KDE, ICU in Openoffice & java based programmes , Uniscribe in Windows etc )

...

...
But I know that many people still use the term "data in unicode fonts", data in xyz font etc. This usage came into existence just because, before unicode was popular, most of the Indian publishers used a non-standard way of representing our data- using English(or latin -ascii) data and change the font's 'face' to Indian glyph. "a fancy dress" hack. The letter "k" will be shown as hindi "ka" with the help of a font. ie the data is still english, but what you "see" is Hindi.

So if I understand correctly, not only is the encoding in ASCII but the representation of that encoding is tied to a particular font (that was used for representation at entry?) and will only be represented properly when using that font? However, what I am trying to understand is whether there is consistency across the ASCII encoding? Will ka in Hindi be encoded in ASCII only one way or is there a linkage, that I do not understand, to the font used to represent it as well?

ASCII is not like Unicode. It only understands latin, not any other language. All over India, legacy, non-standard local language "technologies" (ugly hacks) have gained deep roots. Local newspaper websites as well as publishing houses seem to use their own non-standard fonts. This means that documents and web sites get tied to fonts. These fonts may or may not be freely available, and in some extreme cases, may be no longer available at all. If you lose the font, you lose the content as well.

Ka in Hindi may be mapped in the position of A in some font , in the position of H in some other font as per the convenience of font developer

...

The reason I ask is because if ka in Hindi is always encoded the same way irrespective of the font used to represent it, then it should not be hard to build an ASCII to Unicode map of encoding that will only have to be done once for each language? Though something tells me I am way off on this assumption.

It is Font dependent. There is a need of Preparing Conversion maps for each Ascii font to convert data encoded in them to unicode. Swathanthra Malayalam Computing's Payyan's (http://wiki.smc.org.in/Payyans ) is a tool developed for converting ASCII to Unicode easily for any Indic Language by building a Font map for each needed font . This tool helped Malayalam Wiktionary to convert many copyright expired books in non standard encodings to Unicode

Popular Firefox extension named Padma uses similar encoding conversion tables to display ASCII news websites in Unicode

...

...
This is true. Fonts exist for all scripts , but the variety , or quality of the existing fonts varies. Availability of fonts licensed in foss compatible license is also a problem. For a detailed list of Indic fonts with license info, see http://indlinux.org/wiki/index.php/IndicFontsList

Thanks, Santosh. This is a really useful. Also, are these screen or print ready fonts?

Each Language Communities can answer this question well. In Malayalam we have both screen and print fonts, including one Ornamental font .

...

...
You are correct. I would say "fonts licensed under any FOSS license" instead of "free use/reuse".

Indeed. FOSS license is what I should have said.

...
In fact, the funds were spent(read wasted) for the development of Proprietary fonts by government agencies like CDAC. Fonts with free(dom) licenses were developed and maintained by FOSS developer communities.

*sigh* In your opinion, would they be any real benefit if they did license the ILDC series under a true FOSS license?

I dont think this will happen. There is a long history of lobbying for thiswith CDAC from 2001 Onwards and nothing happened. CDAC made enough money by selling ASCII fonts(and still makes) and They cant even think about giving them away with a FOSS License . And during frequent terms they eat more government money for making yet another CD to ship with their FOSS project forks (such ad Bhaathiya OO , IndiFox etc )+ These fonts. In the same way most of the TDIL funding to CDAC for Indic Language technology research does not make output at all or not getting released, even after TDIL's policy decision to release them under a foss license.

...

...
Each Unicode character is multi-byte character while in ASCII, it is single byte.

Ah. Okay. I understand now.

...
This is not comparable since search is not possible in ascii font way of representing data. Since the data is not in Hindi , but we just "see" as Hindi, one cannot do a search or any such data processing on that data.

If I understand, it is not possible to search within ASCII encoded text but this can be done in Unicode encoded text?

Searching and sorting algorithms for Indic languages are in development and are not bug free. Indic support is not yet available in most of the search solutions (including FOSS solutions like Lucene or Solr) because of the complex word formation characteristics. Most of the existing applications tries exact string-matching algorithms on Indic content yielding only 20% of results. Indic search algorithms should use language and grammar aware algorithms

...

Thank you very much Santosh - I have learned a lot from this.

Best,

Gautam

Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

-- "[It is not] possible to distinguish between 'numerical' and 'nonnumerical' algorithms, as if numbers were somehow different from other kinds of precise information." - Donald Knuth

Gautam John

4:53 p.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

Dear Anivar:

...

There are Four Components

Thanks for the addendum - how important is the rendering engine in the scheme of things? Is work on that pretty much done or are there issues there too?

...

It is Font dependent. There is a need of Preparing Conversion maps for each Ascii font to convert data encoded in them to unicode. Swathanthra Malayalam Computing's Payyan's (http://wiki.smc.org.in/Payyans ) is a tool developed for converting ASCII to Unicode easily for any Indic Language by building a Font map for each needed font . This tool helped Malayalam Wiktionary to convert many copyright expired books in non standard encodings to Unicode Popular Firefox extension named Padma uses similar encoding conversion tables to display ASCII news websites in Unicode

So how do these work? They have built a map for every single ASCII encoding/font pair (since this is some ugly hack) and the corresponding Unicode value? There must be thousands of ASCII encoding/font pairs right? Is this even a viable option? Are there alternatives to this?

...

I dont think this will happen. There is a long history of lobbying for thiswith CDAC from 2001 Onwards and nothing happened. CDAC made enough money by selling ASCII fonts(and still makes) and They cant even think about giving them away with a FOSS License . And during frequent terms they eat more government money for making yet another CD to ship with their FOSS project forks (such ad Bhaathiya OO , IndiFox etc )+ These fonts. In the same way most of the TDIL funding to CDAC for Indic Language technology research does not make output at all or not getting released, even after TDIL's policy decision to release them under a foss license.

I can see the frustration of this - so in your opinion, an effort not worth undertaking? Assuming they were ready to use a FOSS license, are the fonts good enough to want to use?

...

Searching and sorting algorithms for Indic languages are in development and are not bug free. Indic support is not yet available in most of the search solutions (including FOSS solutions like Lucene or Solr) because of the complex word formation characteristics.

But if I understand correctly, this is *only* possible using Unicode encoding. Right?

Thank you, Anivar.

Best,

Gautam ________ http://social.prathambooks.org/

Anivar Aravind

24 Feb 24 Feb

3:21 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On 2/23/11, Gautam John gautam@prathambooks.org wrote:

...

Dear Anivar:

...
There are Four Components

Thanks for the addendum - how important is the rendering engine in the scheme of things? Is work on that pretty much done or are there issues there too?

If your language have some errors in Complex Glyph formation, it is a rendering engine issue. You can find more here http://en.wikipedia.org/wiki/Wikipedia:Enabling_complex_text_support_for_Ind...

Rendering Engines like Pango evolved through more than 10 years of patching & correction by language communities. It work Pretty well in most of the indic languages. Harfbuzz(http://www.freedesktop.org/wiki/Software/HarfBuzz) is relatively new player in the field by taking code from Pango QT & ICU . Harfuzz-ng is used in new Firefox 4 as its default Rendering engine .Uniscribe engine in Windows based systems started supporting Indic fonts from Windows XP SP2 onwards.

Let me give an example for why Rendering engine is important. Now For latin script wiki's there is PDF download option & Pediapress to print them directly But Such Options are not available for Non Latin wikis Character Rendering is a the block here. Pedia press's library fails to render non latin content , because the library they use is not making use of rendering engines.

If a teacher went to internet cafe for reading a wikipedia entry in indian language , she must ensure following things before reading/printing articles

1. ensure the Operating system have Indic support 2. Ensure It have a font to display content correctly 3. Browser renders well

Then only she can read it/ print it in human readable form. If there is PDF export facility with server side rendering , it was so easy for her to to take it /print it for students.

Sometime back Santhosh Posted his project Pypdflib for testing in this list. It is a library for rendering PDF from Indic language wiki pages . It uses functionality of pango for generating PDF In short Rendering is a major roadblock in reaching wikipedia to masses. The projects like santhosh's effort are very important to fill this gap.

...

...
It is Font dependent. There is a need of Preparing Conversion maps for each Ascii font to convert data encoded in them to unicode. Swathanthra Malayalam Computing's Payyan's (http://wiki.smc.org.in/Payyans ) is a tool developed for converting ASCII to Unicode easily for any Indic Language by building a Font map for each needed font . This tool helped Malayalam Wiktionary to convert many copyright expired books in non standard encodings to Unicode Popular Firefox extension named Padma uses similar encoding conversion tables to display ASCII news websites in Unicode

So how do these work? They have built a map for every single ASCII encoding/font pair (since this is some ugly hack) and the corresponding Unicode value?

Yes. payyan's wikipage have an Howto for creating fontmaps

...

There must be thousands of ASCII encoding/font pairs right? Is this even a viable option? Are there alternatives to this?

This is the only viable option as of now. Most of the languages have around 10-20 popular fonts . Creating Mapping tables for them is anyway a big task . But if each language communities are contributing, it is not a big task. And Padma project has done mapping of many news website fonts already through the contributions of many people.

There is no other free alternative . BTW Document Conversion is a big business and many corporates are working on this area to provide solutions for companies & governments

...

...
I dont think this will happen. There is a long history of lobbying for thiswith CDAC from 2001 Onwards and nothing happened. CDAC made enough money by selling ASCII fonts(and still makes) and They cant even think about giving them away with a FOSS License . And during frequent terms they eat more government money for making yet another CD to ship with their FOSS project forks (such ad Bhaathiya OO , IndiFox etc )+ These fonts. In the same way most of the TDIL funding to CDAC for Indic Language technology research does not make output at all or not getting released, even after TDIL's policy decision to release them under a foss license.

I can see the frustration of this - so in your opinion, an effort not worth undertaking? Assuming they were ready to use a FOSS license, are the fonts good enough to want to use?

In my opinion, Efforts on this will be waste of time & money .I dont believe in miracles with CDAC.

CDACMumbai have a history of GPL Licensing one font series as a part of their indix project , Raghu Series, by Late. Prof. R.K.Joshi, Famous Calligrapher and Researcher in Type faces. http://en.wikipedia.org/wiki/R_K_Joshi Rebranding his Jana Series fonts to raghu series & GPLing them was his long term effort from inside CDAC. But its font tables need to be corrected to make them usable . We did this work for malayalam and Raghu-Malayalam is currently maintained by SMC. Anyway it is an exceptional case

...

...
Searching and sorting algorithms for Indic languages are in development and are not bug free. Indic support is not yet available in most of the search solutions (including FOSS solutions like Lucene or Solr) because of the complex word formation characteristics.

But if I understand correctly, this is *only* possible using Unicode encoding. Right?

Yes. And Problems & instability in unicode encoding also affects this Sometime back GerardM 's post is shared in this list http://ultimategerardm.blogspot.com/2010/12/malayalam-enigma.html Also read these break thoughts on unicode by Indic language communities http://www.j4v4m4n.in/2009/11/07/unicode-or-malayalam/

Anivar

...

Thank you, Anivar.

Best,

Gautam ________ http://social.prathambooks.org/

-- "[It is not] possible to distinguish between 'numerical' and 'nonnumerical' algorithms, as if numbers were somehow different from other kinds of precise information." - Donald Knuth

Gautam John

4:08 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

Thank you, once again, Anivar.

...

Rendering Engines like Pango evolved through more than 10 years of patching & correction by language communities. It work Pretty well in most of the indic languages.

That's a relief!

...

Now For latin script wiki's there is PDF download option & Pediapress to print them directly In short Rendering is a major roadblock in reaching wikipedia to masses. The projects like santhosh's effort are very important to fill this gap.

Indeed and I now see how important it is to be able to abstract away OS dependency on this as well.

...

This is the only viable option as of now. Most of the languages have around 10-20 popular fonts . Creating Mapping tables for them is anyway a big task .

Wow! That is impressive - I was looking at it through a publishing lens - a print publishing lens, and from what little I know, they use hundreds of fonts over the years and as important as online is, there is also this mass of legacy content locked up away and that will never see the light of the internet unless such mapping tables are created.

...

There is no other free alternative . BTW Document Conversion is a big business and many corporates are working on this area to provide solutions for companies & governments

Really? Would you be able to point me to any products or services that exists around this, please?

...

In my opinion, Efforts on this will be waste of time & money .I dont believe in miracles with CDAC.

*sigh*

...

CDACMumbai have a history of GPL Licensing one font series as a part of their indix project , Raghu Series, by Late. Prof. R.K.Joshi, Famous Calligrapher and Researcher in Type faces.

I did not know this - thanks for pointing me to it.

...

Yes. And Problems & instability in unicode encoding also affects this

Thanks much Anivar. I really appreciate the answers and have learned much.

Best,

Gautam ________ http://social.prathambooks.org/

sankarshan

4:14 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On Thu, Feb 24, 2011 at 8:51 AM, Anivar Aravind anivar.aravind@gmail.com wrote:

...

CDACMumbai have a history of GPL Licensing one font series as a part of their indix project , Raghu Series, by Late. Prof. R.K.Joshi, Famous Calligrapher and Researcher in Type faces. http://en.wikipedia.org/wiki/R_K_Joshi Rebranding his Jana Series fonts to raghu series & GPLing them was his long term effort from inside CDAC. But its font tables need to be corrected to make them usable .

The 'GPL' that these fonts had was the 'General Public License' wasn't it ? And not the GNU General Public License. I may be mistaken though etc.

I've been, in the past, known to berate and sigh C-DAC. In recent times I've arrived at the conclusion that there's no upside thinking that TDIL/MinIT/C-DAC will eventually figure out that selling services around their products make for a better business case than trying to hawk the products themselves. Or, that LGPL licensing their products might make it easier to have an application developer network around it.

-- sankarshan mukhopadhyay http://sankarshan.randomink.org/blog

Gautam John

4:24 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

Two things I meant to add:

1. The eGov standards body for India has recently notified Unicode 5.1.0 as the default standard for all eGov applications henceforth. (Sadly, their website is DoA - http://egovstandards.gov.in/) I am hopeful that this will be the start of some initiative within Government and would, hopefully, spread.

A cache of their Approach Paper on Localization is here:

http://webcache.googleusercontent.com/search?q=cache:e28QCFBDI-cJ:egovstanda...

And a cache their Character Encoding Standard For Indian Languages is here:

http://docs.google.com/viewer?a=v&q=cache:dYxnM6D7IMQJ:egovstandards.gov...

2. On input methods - is there anything of a best practice or even a Government notification about an input standard?

Thank you.

Best,

Gautam ________ http://social.prathambooks.org/

Shiju Alex

4:42 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

Even though Central Government has adopted Unicode as the encoding standard, the case is not the same with most State Governments. As far as I know only few state goverments (Tamil Nadu, Punjab, Kerala,...) had adopted Unicode standard. Many are still in the ASCII era.

On input methods - is there anything of a best practice or even a

...

Government notification about an input standard?

...

I haven't seen any notification regarding this yet. But InScript is officially/unofficially adopted as the default input scheme. That is why it is part school syllabus in some states.

On Thu, Feb 24, 2011 at 9:54 AM, Gautam John gautam@prathambooks.orgwrote:

...

Two things I meant to add:

The eGov standards body for India has recently notified Unicode

5.1.0 as the default standard for all eGov applications henceforth. (Sadly, their website is DoA - http://egovstandards.gov.in/) I am hopeful that this will be the start of some initiative within Government and would, hopefully, spread.

A cache of their Approach Paper on Localization is here:

http://webcache.googleusercontent.com/search?q=cache:e28QCFBDI-cJ:egovstanda...

And a cache their Character Encoding Standard For Indian Languages is here:

http://docs.google.com/viewer?a=v&q=cache:dYxnM6D7IMQJ:egovstandards.gov...

On input methods - is there anything of a best practice or even a

Government notification about an input standard?

Thank you.

Best,

Gautam ________ http://social.prathambooks.org/

Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Bala Jeyaraman

4:48 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

...

...
On input methods - is there anything of a best practice or even a

Government notification about an input standard?

In Tamil Nadu, the govt recommends and endorses the Tamil 99 keyboard layout.

On Thu, Feb 24, 2011 at 10:12 AM, Shiju Alex shijualexonline@gmail.comwrote:

...

Even though Central Government has adopted Unicode as the encoding standard, the case is not the same with most State Governments. As far as I know only few state goverments (Tamil Nadu, Punjab, Kerala,...) had adopted Unicode standard. Many are still in the ASCII era.

On input methods - is there anything of a best practice or even a

...
Government notification about an input standard?

...
I haven't seen any notification regarding this yet. But InScript is officially/unofficially adopted as the default input scheme. That is why it is part school syllabus in some states.

On Thu, Feb 24, 2011 at 9:54 AM, Gautam John gautam@prathambooks.orgwrote:

...
Two things I meant to add:

The eGov standards body for India has recently notified Unicode

5.1.0 as the default standard for all eGov applications henceforth. (Sadly, their website is DoA - http://egovstandards.gov.in/) I am hopeful that this will be the start of some initiative within Government and would, hopefully, spread.

A cache of their Approach Paper on Localization is here:

http://webcache.googleusercontent.com/search?q=cache:e28QCFBDI-cJ:egovstanda...

And a cache their Character Encoding Standard For Indian Languages is here:

http://docs.google.com/viewer?a=v&q=cache:dYxnM6D7IMQJ:egovstandards.gov...

On input methods - is there anything of a best practice or even a

Government notification about an input standard?

Thank you.

Best,

Gautam ________ http://social.prathambooks.org/

Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

-- Beauty lies in the eyes of the beer holder

Gautam John

4:48 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On 24 February 2011 10:12, Shiju Alex shijualexonline@gmail.com wrote:

...

Even though Central Government has adopted Unicode as the encoding standard, the case is not the same with most State Governments. As far as I know only few state goverments (Tamil Nadu, Punjab, Kerala,...) had adopted Unicode standard. Many are still in the ASCII era.

Thank you, Shiju. A question - what are the hesitancies for Governments to move to Unicode as the encoding standards? Is it the tools they use? The workflow? A legacy issue - "we'll never be able to open our old files"?

I'm trying to map this space out - it's just that I am coming to see it as being really really important and want to try and do something here.

Also, the GoI is slowly making some noises about standards and openness etc. and I am hoping this are small points that can add up. For example, the TAGUP report: http://finmin.nic.in/reports/TAGUP_Report.pdf

...

From the Executive Summary:

"Chapter 6 points out some key design considerations for the solution architecture. The solution architecture should be designed to be ﬂexible, reusable, extensible by stakeholders, and free of vendor lock-in. Given that many Government projects touch end-users such as citizens and ﬁrms, the Government should also play an active role in promoting banking and accessibility for all. This can form the basis of a platform for delivery of services. Chapter 7 addresses openness in implementation of Government IT projects. It describes the relevance of open standards, open data, and open source. The Government should not only be a consumer, but also strive to produce and facilitate open standards, open data, and open source. It also suggests the creation of an open source foundation for open sourcing software from Government projects.

Give me a little hope.

Best,

Gautam ________ http://social.prathambooks.org/

jayanta nath

5:30 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

In West Bengal there are no Govt announcement regarding Unicode and KB layout.Our Govt are still in the ASCII era in all department. But they adopted Unicode by *Society for Natural Language Technology Research* (NLTR) (http://www.nltr.org/) and released Baishakhi Linux 2.0http://www.nltr.org/SNLTR/index.php?option=com_content&task=view&id=118&Itemid=119(inbuilt unicode supported all Indic Language as like other Linux distro) .The society has been seeded by the Govt. of West Bengal (Dept. of Information Technology) with initial funding and support. NLTR promote Bengali computing through Unicode and Baishakhi KB which is more similear as Inscript Bengali.

But my personal experience is not very good, when I go any govt office in West Bengal( Writers' Buildinghttp://en.wikipedia.org/wiki/Writers%27_Building), they use Windows OS (pirated?), ASCII Bengali interface like i-leap and Bijoy etc. I dont know why they funded for Baishakhi Linux 2.0http://www.nltr.org/SNLTR/index.php?option=com_content&task=view&id=118&Itemid=119?

On Thu, Feb 24, 2011 at 10:18 AM, Gautam John gautam@prathambooks.orgwrote:

...

On 24 February 2011 10:12, Shiju Alex shijualexonline@gmail.com wrote:

...
Even though Central Government has adopted Unicode as the encoding

standard,

...
the case is not the same with most State Governments. As far as I know

only

...
few state goverments (Tamil Nadu, Punjab, Kerala,...) had adopted Unicode standard. Many are still in the ASCII era.

Thank you, Shiju. A question - what are the hesitancies for Governments to move to Unicode as the encoding standards? Is it the tools they use? The workflow? A legacy issue - "we'll never be able to open our old files"?

I'm trying to map this space out - it's just that I am coming to see it as being really really important and want to try and do something here.

Also, the GoI is slowly making some noises about standards and openness etc. and I am hoping this are small points that can add up. For example, the TAGUP report: http://finmin.nic.in/reports/TAGUP_Report.pdf

From the Executive Summary:

"Chapter 6 points out some key design considerations for the solution architecture. The solution architecture should be designed to be ﬂexible, reusable, extensible by stakeholders, and free of vendor lock-in. Given that many Government projects touch end-users such as citizens and ﬁrms, the Government should also play an active role in promoting banking and accessibility for all. This can form the basis of a platform for delivery of services. Chapter 7 addresses openness in implementation of Government IT projects. It describes the relevance of open standards, open data, and open source. The Government should not only be a consumer, but also strive to produce and facilitate open standards, open data, and open source. It also suggests the creation of an open source foundation for open sourcing software from Government projects.

Give me a little hope.

Best,

Gautam ________ http://social.prathambooks.org/

Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

-- With Warm Regards, *Jayanta Nath* Calcutta,West Bengal +91 9836294438 Facebook :http://www.facebook.com/jayantanth Wikipedia :http://en.wikipedia.org/wiki/User:Jayantanth আসুন পাইরেসি মুক্ত ভারত গড়ি,সবাই মুক্ত সফ্‌টওয়ার ব্যবহার করি [image: O:-)],অন্যকে ব্যবহারে উৎসাহিত করি। ______________________________ Wikimediaindia-l mailing list wikimedia-in-wb@lists.wikimedia.org Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-in-wb https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Anivar Aravind

6:06 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

Dear gautam

Thanks for those links. I am aware about that. But not get enough time to read it yet. But are you sure, it specified unicode 5.1 . I am curious becuase new rupee symbol getting encoded only in unicode 6.1. Usually govt standards does not specify versions.

Anivar

On 2/24/11, Gautam John gautam@prathambooks.org wrote:

...

Two things I meant to add:

The eGov standards body for India has recently notified Unicode

5.1.0 as the default standard for all eGov applications henceforth. (Sadly, their website is DoA - http://egovstandards.gov.in/) I am hopeful that this will be the start of some initiative within Government and would, hopefully, spread.

A cache of their Approach Paper on Localization is here:

http://webcache.googleusercontent.com/search?q=cache:e28QCFBDI-cJ:egovstanda...

And a cache their Character Encoding Standard For Indian Languages is here:

http://docs.google.com/viewer?a=v&q=cache:dYxnM6D7IMQJ:egovstandards.gov...

On input methods - is there anything of a best practice or even a

Government notification about an input standard?

Thank you.

Best,

Gautam ________ http://social.prathambooks.org/

Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

-- Sent from my mobile device "[It is not] possible to distinguish between 'numerical' and 'nonnumerical' algorithms, as if numbers were somehow different from other kinds of precise information." - Donald Knuth

Gautam John

6:13 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On 24 February 2011 11:36, Anivar Aravind anivar.aravind@gmail.com wrote:

...

Thanks for those links. I am aware about that. But not get enough time to read it yet. But are you sure, it specified unicode 5.1 . I am curious becuase new rupee symbol getting encoded only in unicode 6.1. Usually govt standards does not specify versions.

Yep. What it states is:

Unicode shall be the storage-encoding standard for all constitutionally recognised Indian

Languages including English and other global languages as follows:

Unicode 5.1.0 and its future up-gradation as reported by Unicode consortium from time to time.

Thank you.

Best,

Gautam ________ http://social.prathambooks.org/

Anivar Aravind

5:59 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

Dear sankarshan

Initial license of raghu font series was confusing. But later they changed it to gnu gpl, as per the insistance of RK joshi. Gnu gpl licensed fonts were released as a part of Indix project of cdacmumbai.

Anivar

On 2/24/11, sankarshan foss.mailinglists@gmail.com wrote:

...

On Thu, Feb 24, 2011 at 8:51 AM, Anivar Aravind anivar.aravind@gmail.com wrote:

...
CDACMumbai have a history of GPL Licensing one font series as a part of their indix project , Raghu Series, by Late. Prof. R.K.Joshi, Famous Calligrapher and Researcher in Type faces. http://en.wikipedia.org/wiki/R_K_Joshi Rebranding his Jana Series fonts to raghu series & GPLing them was his long term effort from inside CDAC. But its font tables need to be corrected to make them usable .

The 'GPL' that these fonts had was the 'General Public License' wasn't it ? And not the GNU General Public License. I may be mistaken though etc.

I've been, in the past, known to berate and sigh C-DAC. In recent times I've arrived at the conclusion that there's no upside thinking that TDIL/MinIT/C-DAC will eventually figure out that selling services around their products make for a better business case than trying to hawk the products themselves. Or, that LGPL licensing their products might make it easier to have an application developer network around it.

-- sankarshan mukhopadhyay http://sankarshan.randomink.org/blog

Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Nikhil Sheth

3:45 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

Great discussion, but I wonder why I didn't see any real, easy, doable, inexpensive, quickfix solution put forth that every Indian on the internet can begin using immediately to get around the Unicode Vs custom Fonts issue.

So here's some from me:

1. Quick copy-paste, working with a net connection: http://www.google.com/transliterate/

2. Put a bookmarklet/favorite in your browser to type in Indian language in any site. इधर भी : http://t13n.googlecode.com/svn/trunk/blet/docs/help.html

3. Get these languages installed in 5 mins on your machine so you can use it in any application from notepad to chat : http://www.google.com/ime/transliteration/ or sneak out the files for offline installation in your hometown using this neat hack: http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.h...

(I know our greatest angels won't care about this one because it only works on Evil Windows!)

4. Indian made alternative both editor and input language: http://www.baraha.com/

Sincere apologies to the purists who might blow up like a volcano at either going to the Evil Google Lord for help, or Daring to use transliteration instead of the so-easy-to-use-and-learn-if-only-you-spend-a-whole-day-on-it-and-get-an-indic-script-keyboard-from-God-knows-where-because-everyone-is-well-off-and-supposed-to-be-living-in-a-well-connected-metro-like-me.

If there is an open-source/cross-platform/creative commons/kumbayaah solution where we don't have to mug up what to do when we forget what we are supposed to have mugged up like the key combination for भ or त्र or ण or ळinstead of just typing "bh" or "tra" or "na" or "l" and (if needed) backspacing twice to get a dropdown menu to choose what we truly want and moving on with our lives, or where we don't have to bend the laws of physics to get that elusive त्सा or perform computer साल्सा to have that split letter stuff on our screen then let's have it right here and right now or let's get our hands dirty and make'em for the love of the Lord instead of blasting the impure and corrupt Harijans who dare to take shortcuts for the sake of getting their work done on time.

(Disclaimer : Only little offense meant with the hope to give a kick and create a demand for real open source solutions that can rival the private ones)

Cheers, Nikhil Sheth +91-966-583-1250 Pune, India Teach For India http://www.teachforindia.org/ Fellow, 2011-13 www.nikhilsheth.tk Find me on: Twitter http://twitter.com/nikhiljs | Facebookhttp://www.facebook.com/nikjs| LinkedIn http://in.linkedin.com/in/nikhiljs | Google http://www.google.com/profiles/nikhil.js| RangDehttp://www.rangde.org/investor/nikhilsheth Join me on: Pune Documentary Clubhttp://www.facebook.com/group.php?gid=138497769525636| Let's Do it Pune http://www.facebook.com/pages/Lets-do-it-Pune/103857326346659 | Toastmasters in Punehttp://www.facebook.com/pages/Toastmasters-in-Pune/148767611833746| Wikipedia For Schools projecthttp://education.wikia.com/wiki/Wikipedia_For_Schools_Offline_Edition

sankarshan

3:51 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On Thu, Feb 24, 2011 at 9:15 AM, Nikhil Sheth nikhil.js@gmail.com wrote:

...

Great discussion, but I wonder why I didn't see any real, easy, doable, inexpensive, quickfix solution put forth that every Indian on the internet can begin using immediately to get around the Unicode Vs custom Fonts issue.

So here's some from me:

Quick copy-paste, working with a net connection:

http://www.google.com/transliterate/

Put a bookmarklet/favorite in your browser to type in Indian language in

any site. इधर भी : http://t13n.googlecode.com/svn/trunk/blet/docs/help.html

Get these languages installed in 5 mins on your machine so you can use it

in any application from notepad to chat : http://www.google.com/ime/transliteration/ or sneak out the files for offline installation in your hometown using this neat hack: http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.h...

(I know our greatest angels won't care about this one because it only works on Evil Windows!)

Indian made alternative both editor and input language:

http://www.baraha.com/

Getting things fixed at the 'plumbing' level is a hard climb but it is worth it since it would also ensure that offline devices can utilize what is technically correct (note, that this does not necessarily imply that the above choices are 'incorrect'). Doing it using web technologies is one thing, doing it for the desktop, especially the offline-desktop is another part of the same coin.

We have come a long way since the days when one needed a recompiled Pango (the renderer) to even decently render Indic or, when input methods were flaky. Using standards and developing code pieces that comply with those standards make it easier for platforms across the spectrum to do Indic (and, other complex scripts) well.

And, looking at all this discussion I now wish that I submitted a 'state of Indic' paper at some conference happening currently ;)

-- sankarshan mukhopadhyay http://sankarshan.randomink.org/blog

Gautam John

4:13 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On 24 February 2011 09:21, sankarshan foss.mailinglists@gmail.com wrote:

...

And, looking at all this discussion I now wish that I submitted a 'state of Indic' paper at some conference happening currently ;)

Oh but you should! I would learn much from it and I am sure everyone else will learn something too!

Thank you.

Best,

Gautam ________ http://social.prathambooks.org/

Shiju Alex

3:59 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

...

This discussion is not at all about input methods. I do not know why a

sudden comparison between transliteration vs. InScript came here.

Looking at all the solutions you provided, let me ask one thing. Have you really actively contributed/contributing to any Indian language wikipedia. A survey on the input methods used by Indian wikipedians will give a different answer.

Shiju

On Thu, Feb 24, 2011 at 9:15 AM, Nikhil Sheth nikhil.js@gmail.com wrote:

...

Great discussion, but I wonder why I didn't see any real, easy, doable, inexpensive, quickfix solution put forth that every Indian on the internet can begin using immediately to get around the Unicode Vs custom Fonts issue.

So here's some from me:

Quick copy-paste, working with a net connection:

http://www.google.com/transliterate/

Put a bookmarklet/favorite in your browser to type in Indian language in

any site. इधर भी : http://t13n.googlecode.com/svn/trunk/blet/docs/help.html

Get these languages installed in 5 mins on your machine so you can use

it in any application from notepad to chat : http://www.google.com/ime/transliteration/ or sneak out the files for offline installation in your hometown using this neat hack: http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.h...

(I know our greatest angels won't care about this one because it only works on Evil Windows!)

Indian made alternative both editor and input language:

http://www.baraha.com/

Sincere apologies to the purists who might blow up like a volcano at either going to the Evil Google Lord for help, or Daring to use transliteration instead of the so-easy-to-use-and-learn-if-only-you-spend-a-whole-day-on-it-and-get-an-indic-script-keyboard-from-God-knows-where-because-everyone-is-well-off-and-supposed-to-be-living-in-a-well-connected-metro-like-me.

If there is an open-source/cross-platform/creative commons/kumbayaah solution where we don't have to mug up what to do when we forget what we are supposed to have mugged up like the key combination for भ or त्र or ण or ळinstead of just typing "bh" or "tra" or "na" or "l" and (if needed) backspacing twice to get a dropdown menu to choose what we truly want and moving on with our lives, or where we don't have to bend the laws of physics to get that elusive त्सा or perform computer साल्सा to have that split letter stuff on our screen then let's have it right here and right now or let's get our hands dirty and make'em for the love of the Lord instead of blasting the impure and corrupt Harijans who dare to take shortcuts for the sake of getting their work done on time.

(Disclaimer : Only little offense meant with the hope to give a kick and create a demand for real open source solutions that can rival the private ones)

Cheers, Nikhil Sheth +91-966-583-1250 Pune, India Teach For India http://www.teachforindia.org/ Fellow, 2011-13 www.nikhilsheth.tk Find me on: Twitter http://twitter.com/nikhiljs | Facebookhttp://www.facebook.com/nikjs| LinkedIn http://in.linkedin.com/in/nikhiljs | Google http://www.google.com/profiles/nikhil.js| RangDehttp://www.rangde.org/investor/nikhilsheth Join me on: Pune Documentary Clubhttp://www.facebook.com/group.php?gid=138497769525636| Let's Do it Pune http://www.facebook.com/pages/Lets-do-it-Pune/103857326346659| Toastmasters in Punehttp://www.facebook.com/pages/Toastmasters-in-Pune/148767611833746| Wikipedia For Schools projecthttp://education.wikia.com/wiki/Wikipedia_For_Schools_Offline_Edition

Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Anivar Aravind

4:06 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On 2/24/11, Nikhil Sheth nikhil.js@gmail.com wrote:

...

Great discussion, but I wonder why I didn't see any real, easy, doable, inexpensive, quickfix solution put forth that every Indian on the internet can begin using immediately to get around the Unicode Vs custom Fonts issue.

Hey, What you are mentioning is just about Transliteration Input methods. And there are 100's of such solutions , Phonetic Keyboards etc . Transliteration keyboards existed years before google & most of the solutions you pointed. Take a look at Firefox extensions and m17n-db to get a feel of it.

The Discussion here was not only about Input methods. It is about Encoding , Rendering & Fonts, which is the underlying technology which enable input methods to work

Also just a friendly request to understand thread first before knee-jerking with what you know

Anivar Aravind

...

So here's some from me:

Quick copy-paste, working with a net connection:

http://www.google.com/transliterate/

Put a bookmarklet/favorite in your browser to type in Indian language in

any site. इधर भी : http://t13n.googlecode.com/svn/trunk/blet/docs/help.html

Get these languages installed in 5 mins on your machine so you can use it

in any application from notepad to chat : http://www.google.com/ime/transliteration/ or sneak out the files for offline installation in your hometown using this neat hack: http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.h...

(I know our greatest angels won't care about this one because it only works on Evil Windows!)

Indian made alternative both editor and input language:

http://www.baraha.com/

Sincere apologies to the purists who might blow up like a volcano at either going to the Evil Google Lord for help, or Daring to use transliteration instead of the so-easy-to-use-and-learn-if-only-you-spend-a-whole-day-on-it-and-get-an-indic-script-keyboard-from-God-knows-where-because-everyone-is-well-off-and-supposed-to-be-living-in-a-well-connected-metro-like-me.

If there is an open-source/cross-platform/creative commons/kumbayaah solution where we don't have to mug up what to do when we forget what we are supposed to have mugged up like the key combination for भ or त्र or ण or ळinstead of just typing "bh" or "tra" or "na" or "l" and (if needed) backspacing twice to get a dropdown menu to choose what we truly want and moving on with our lives, or where we don't have to bend the laws of physics to get that elusive त्सा or perform computer साल्सा to have that split letter stuff on our screen then let's have it right here and right now or let's get our hands dirty and make'em for the love of the Lord instead of blasting the impure and corrupt Harijans who dare to take shortcuts for the sake of getting their work done on time.

(Disclaimer : Only little offense meant with the hope to give a kick and create a demand for real open source solutions that can rival the private ones)

Cheers, Nikhil Sheth +91-966-583-1250 Pune, India Teach For India http://www.teachforindia.org/ Fellow, 2011-13 www.nikhilsheth.tk Find me on: Twitter http://twitter.com/nikhiljs | Facebookhttp://www.facebook.com/nikjs| LinkedIn http://in.linkedin.com/in/nikhiljs | Google http://www.google.com/profiles/nikhil.js| RangDehttp://www.rangde.org/investor/nikhilsheth Join me on: Pune Documentary Clubhttp://www.facebook.com/group.php?gid=138497769525636| Let's Do it Pune http://www.facebook.com/pages/Lets-do-it-Pune/103857326346659 | Toastmasters in Punehttp://www.facebook.com/pages/Toastmasters-in-Pune/148767611833746| Wikipedia For Schools projecthttp://education.wikia.com/wiki/Wikipedia_For_Schools_Offline_Edition

-- "[It is not] possible to distinguish between 'numerical' and 'nonnumerical' algorithms, as if numbers were somehow different from other kinds of precise information." - Donald Knuth

sankarshan

4:10 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On Thu, Feb 24, 2011 at 9:36 AM, Anivar Aravind anivar.aravind@gmail.com wrote:

...

The Discussion here was not only about Input methods. It is about Encoding , Rendering & Fonts, which is the underlying technology which enable input methods to work

Also just a friendly request to understand thread first before knee-jerking with what you know

The discussion started off with Unicode (Gautam was the OP if I recall correctly). And, then of course it has progressed into a discussion about the various pieces that are complex or, are work-in-progress towards a solution. Sometimes it isn't easy for everyone to see where it is going. Doesn't necessarily mean that we cannot be excellent to each other.

-- sankarshan mukhopadhyay http://sankarshan.randomink.org/blog

Gautam John

4:11 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On 24 February 2011 09:15, Nikhil Sheth nikhil.js@gmail.com wrote:

...

Great discussion, but I wonder why I didn't see any real, easy, doable, inexpensive, quickfix solution put forth that every Indian on the internet can begin using immediately to get around the Unicode Vs custom Fonts issue.

Sure - it's great to see that there are multiple input methods, some local and some on the Web that allow for Unicode encoded text but I was actually coming at it from a legacy issue - there is tons of 'digital' content that is not accessible - how do we make it accessible and there is a great hesitancy for certain verticals to use Unicode on the basis of the 'lack of fonts' issue. I was trying to build a case as to why Unicode is important and how we could increase the diversity of available fonts.

Thank you.

Best,

Gautam ________ http://social.prathambooks.org/

Santhosh Thottingal

4:14 a.m.

New subject: [Wikimediaindia-l] (OT) On the importance of Unicode

On Thu, February 24, 2011 9:15 am, Nikhil Sheth wrote:

...

Great discussion, but I wonder why I didn't see any real, easy, doable, inexpensive, quickfix solution put forth that every Indian on the internet can begin using immediately to get around the Unicode Vs custom Fonts issue.

Nikhil, all those tools you mentioned are for Transliteration based input methods. And *proprietary* solutions.

But in any modern free desktop distributions - whether it is ubuntu, debian, fedora or any other distro, you have lot of inputs methods to select. All of these get installed in default installation itself and you can choose transliteration based input method or inscript or any input method which fits our "laziness-to-learn".

And if you are too lazy to learn any key combination, for that also we have offline desktop based solutions which can do suggestions, give drop down list of alternatives etc. (see http://thottingal.in/blog/2008/10/27/swanalekha-m17n-based-input-method-for-...) Just give a try on the various input methods available in a latest GNU/Linux distro for your language. If none fits, let the foss developers know what exactly you are looking for. You will surely get a solution.

And it is worth to spent some time to learn one standard keyboard layout for your language. You learned to write(using pen) by spending some amount of time right? in Kerala, students at 7th standard learns typing in Malayalam. Once our language syllabus on other languages accept this model, I am sure that every student will be good in writing and typing in their language.

Language proficiency will be measured as read+write+type+speak in future.

-Santhosh

4880

Age (days ago)

4887

Last active (days ago)

wikimediaindia-l@lists.wikimedia.org

27 comments

9 participants

tags (0)

participants (9)

Anivar Aravind
Bala Jeyaraman
BalaSundaraRaman
Gautam John
jayanta nath
Nikhil Sheth
sankarshan
Santhosh Thottingal
Shiju Alex