[Wikimediaindia-l] Indic languages & unicode issues.
Santhosh Thottingal
santhosh.thottingal at gmail.com
Sun Dec 26 16:58:17 UTC 2010
On Sun, Dec 26, 2010 at 7:43 PM, CherianTinu Abraham
<tinucherian at gmail.com> wrote:
> Hi all,
> Happened to see Gerard's blog post on issues with Malayalam Wikipedia
> & Unicode upgrade to
> 5.1 http://ultimategerardm.blogspot.com/2010/12/malayalam-enigma.html
The issue is very complex. There were heated debates around this topic
in Unicode Indic Mailing list for years. In short the issue is about
dual encoding- representing a letter using two types of unicode
character codes. Unicode's decision to bring the second encoding in
standard was widely debated and opposed mainly by FOSS developer
community from Malayalam. Unicode announced the dual encoding scheme
without canonical equivalence definition in 2005 and reverted it when
scholars and developers opposed it.
The same proposal again introduced. Foss community, language scholars
protested the proposal. The SMC community submitted a document with 17
reasons why dual encoding should not be introduced.- see
http://wiki.smc.org.in/images/2/23/SMC_Unicode_5.1.pdf
Similarly a seminar conducted to discuss the issue by University of
Kerala opposed the proposal. see
http://images2.wikia.nocookie.net/__cb20080131071131/fci/images/1/19/Report_of_Workshop.pdf
But Unicode technical consortium did not bother to answer both of
these reports and went ahead with the decision in Unicode 5.1. The
dual encoding scheme is with out any canonical equivalence definition.
Since it is not there in standard I doubt whether Operating systems
will implement it, not to mention about search engines.
Since the new encoding scheme is defined without backward
compatibility, or against unicode's stability policy, Malayalam FOSS
community decided not to implement it until issues are resolved and
continuing with unicode 5.0 encoding. Malayalam news portals also
follow unicode 5.0. Most of the tools from Google also continue with
unicode 5.0 based encoding. Malayalam wikipedia decided to go ahead
with latest version of unicode. I had resisted this move in the
discussion pages of Malayalam wikipedia. The decision was taken based
on voting by a small community of editors and not based on proper
technical analysis.
Believe it or not, this is how Malayalam wiki is rendered inWindows XP
IE 8 box with OS default font:
http://thottingal.in/tmp/ml-wiki-winxp-IE8.png
I hope it gives some clue about the issue that Gerard mentioned.
Most of the discussions happened around the encoding issue was in
Malayalam(in Malayalam wiki or in blogs), but this English blog post
might summarize it
http://www.j4v4m4n.in/2009/11/07/unicode-or-malayalam/
Discussions happened in Malayalam wikipedia(content in Malayalam
language) http://ml.wikipedia.org/wiki/വിക്കിപീഡിയ:പഞ്ചായത്ത്_(സാങ്കേതികം)/യൂണികോഡ്_5.1.0/ചർച്ച_(പഴയവ)
Thanks
Santhosh Thottingal
http://thottingal.in
More information about the Wikimediaindia-l
mailing list