Re: [Wikimediaindia-l] Indic languages & unicode issues.

29 Dec 2010

      Ragib,
(copied Tamil Wiki list)
We've faced an issue similar to Bug #5948. Due to non-canonicalisation, there 
are two articles on the same title in Tamil Wikipedia!
http://ta.wikipedia.org/wiki/%E0%AE%AA%E0%AF%87%E0%AE%9A%E0%AF%8D%E0%AE%9A%E...
 (Tamil discussion)
- Sundar
"That language is an instrument of human reason, and not merely a medium for 
the expression of thought, is a truth generally admitted."
- George Boole, quoted in Iverson's Turing Award Lecture
----- Original Message ----
...
From: Ragib Hasan ragibhasan@gmail.com
To: Discussion list on Indian language projects of Wikimedia. 
wikimediaindia-l@lists.wikimedia.org
Sent: Wed, December 29, 2010 10:23:06 AM
Subject: Re: [Wikimediaindia-l] Indic languages & unicode issues.
I'm curious about the issue you are discussing ... is this similar to
a  long-standing bug that affects Bengali, Assamese, and Bishnupriya
Manipuri  wikipedias?
https://bugzilla.wikimedia.org/show_bug.cgi?id=5948
Ragib
User:Ragib  on en and bn
--
Ragib Hasan, Ph.D
NSF Computing Innovation  Fellow and
Assistant Research Scientist
Dept of Computer  Science
Johns Hopkins University
3400 N Charles Street
Baltimore, MD  21218
Website:
http://www.ragibhasan.com
On Mon, Dec  27, 2010 at 1:29 AM, BalaSundaraRaman sundarbecse@yahoo.com  
wrote:
...
...
Unicode's decision to bring the second encoding  in
...
standard was widely  debated  and opposed mainly by FOSS  developer
community from Malayalam.  Unicode announced the dual  encoding scheme
without canonical equivalence  definition in 2005  and reverted it when
scholars and developers opposed   it.
Sadly, you're not alone in this, Santhosh.
We have  had canonical non-equivalence issues and many more (similar to the
 atomic chillu issue) in Tamil too. :(
Part of it was inherited from the  umbrellaish ISCII model (done with good
intentions, I believe).
 They put the abugidas of the Indo-Aryan languages and other systems like
Tamil
...
(haven't studied other writing systems enough to comment upon)  into one
bucket
...
and we're still suffering for that. They cite stability  when legitimate
changes
...
are sought, but allow such breaking  changes.
I'm sure you'll be working with the search engines to  map the equivalent
glyph
...
sequences. Also, please explore mediawiki tech  solutions to add redirects
or
...
...
hidden texts (though not  ideal).

Sundar

"That language is an instrument  of human reason, and not merely a medium for
the
...
expression of thought,  is a truth generally admitted."

George Boole, quoted in Iverson's  Turing Award Lecture

----- Original Message  ----
...
From: Santhosh Thottingal santhosh.thottingal@gmail.com
 To: Discussion list on Indian language projects of Wikimedia.
wikimediaindia-l@lists.wikimedia.org
 Sent: Sun, December 26, 2010 10:28:17 PM
Subject: Re:  [Wikimediaindia-l] Indic languages & unicode issues.
On Sun, Dec 26, 2010 at 7:43 PM, CherianTinu Abraham
tinucherian@gmail.com  wrote:
...
Hi all,
Happened to see Gerard's blog  post on issues with Malayalam  Wikipedia
& Unicode upgrade  to
 5.1  http://ultimategerardm.blogspot.com/2010/12/malayalam-enigma.html
The  issue is very complex. There were heated debates around this  topic
in  Unicode Indic Mailing list for years. In short the issue  is about
dual  encoding- representing a letter using two types of  unicode
character codes.  Unicode's decision to bring the second  encoding in
standard was widely  debated  and opposed mainly by FOSS  developer
community from Malayalam.  Unicode announced the dual  encoding scheme
without canonical equivalence  definition in 2005  and reverted it when
scholars and developers opposed   it.
The same proposal again introduced. Foss community, language   scholars
protested the proposal. The SMC community submitted a  document with  17
reasons why dual encoding should not be  introduced.-  see
 http://wiki.smc.org.in/images/2/23/SMC_Unicode_5.1.pdf
Similarly a   seminar conducted to discuss the issue by University of
Kerala  opposed the  proposal.   see
http://images2.wikia.nocookie.net/__cb20080131071131/fci/images/1/19/Report_...
f
...
...
f
   But Unicode technical consortium did not bother to answer both of
 these  reports and went ahead with the decision in Unicode 5.1. The
 dual encoding  scheme is with out any canonical equivalence  definition.
Since it is not  there in standard I doubt whether  Operating systems
will implement it, not to  mention about search  engines.
Since the new encoding scheme is defined   without backward
compatibility, or against unicode's stability  policy,   Malayalam FOSS
community decided not to implement it until  issues are  resolved and
continuing with unicode 5.0 encoding.  Malayalam news portals  also
follow unicode 5.0. Most of the tools  from Google also continue  with
unicode 5.0 based encoding.  Malayalam wikipedia decided to go  ahead
with latest version of  unicode. I had resisted this move in  the
discussion pages of  Malayalam wikipedia. The decision was taken  based
on voting by a  small community of editors and not based on  proper
technical  analysis.
Believe it or not, this is how   Malayalam wiki is rendered inWindows XP
IE 8 box with OS default   font:
http://thottingal.in/tmp/ml-wiki-winxp-IE8.png
I  hope it gives some  clue about the issue that Gerard  mentioned.
Most of the discussions  happened around the  encoding issue was in
Malayalam(in Malayalam wiki or in  blogs), but  this English blog post
might summarize  it
 http://www.j4v4m4n.in/2009/11/07/unicode-or-malayalam/
Discussions  happened in Malayalam wikipedia(content in Malayalam
 language)
http://ml.wikipedia.org/wiki/%E0%B4%B5%E0%B4%BF%E0%B4%95%E0%B5%8D%E0%B4%95%E...)
)
...
...
Thanks
Santhosh Thottingal
 http://thottingal.in

Wikimediaindia-l l  mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Wikimediaindia-l  mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Wikimediaindia-l  mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wikimediaindia-l] Indic languages & unicode issues.