Re: [Wikitech-l] Unicode equivalence

22 May 2010


      Hi,
If you don't still have this thread, the background is that the
Malayam projects want to, and are, using Unicode 5.1 for five
characters that have composed code points in 5.1, and decomposed in
5.0. The equivalences are:
CHILLU NN     0D23, 0D4D, 200D        0D7A
CHILLU N       0D28, 0D4D, 200D        0D7B
CHILLU RR     0D30, 0D4D, 200D        0D7C
CHILLU L        0D32, 0D4D, 200D        0D7D
CHILLU LL      0D33, 0D4D, 200D        0D7E
Somewhere in the server code, these are "normalized" to 5.1 for the ml
projects. Problem:
http://ml.wiktionary.org/w/index.php?title=%E0%B4%95%E0%B5%81%E0%B4%B1%E0%B5...
What you see happening is Interwicket trying to create the language
links. It adds the correct link(s), to the 5.0 forms on the other
wikts; then on the next scan of the language links tables it removes
the links as invalid, as the 5.1 titles don't exist on the other
wikts. This then repeats. (;-)
The problem is that it can't write the correct link, as the text
normalization "fixes" it.
The other direction isn't a problem, the links are to the 5.0 forms,
and when followed are normalized to 5.1 in the title lookup, and the
page found.
I'm not (yet) suggesting a particular solution, there are several
possibilities (from fairly decent to grotesque hackery ...). But would
someone tell me where in the server code this is done? I have not been
able to find it. Then I can understand a bit better, possibly just fix
it in the bot code somehow, or suggest a fix server-side.
Best Regards,
Robert

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Unicode equivalence