On Sunday 30 March 2014 08:40 PM, Federico Leva (Nemo) wrote:
As said, if you find <https://en.wiktionary.org/?oldid=23646739> or others offensive, please just edit and add {{context|vulgar}} or |obscene or whatever appropriate.
I'll try.
Yes, all the problems you mention seem just to be consequences of this. The entry in question is <https://en.wiktionary.org/wiki/%E0%B4%BE>

Could you check any rendering issues also? In this image - image_077ebd23_d890a7083e967d92.png - vowel sign appears after the letter as മുംബ ൈ (without space) correct one is മുംബൈ. Images image_00896685_3f5db13f53a2f352.png , image_35628971_fbfc5b67d488e883.png , image_b5d2be0d_7223dc2282b35e15.png etc.. also share similar problem.


Is there some generalisable learning here? Exclude letters? (Wiktionary experts should tell us if they're all tagged as such.) Only use "words" of at least two unicode characters?

Vowel signs should not start a captcha (or any of the words in captcha) and no two vowel signs should appear side by side.

Vowel signs for Malayalam: ാ, ി, ീ, ു, ൂ, ൃ, െ, േ, ൈ, ൊ, ോ, ൗ, ൌ
Other signs (above same rule should be applied on these signs also) : ്, ം, ഃ

Vowel letters should not be in the middle of a word (or captcha)
Vowel letters: അ, ആ, ഇ, ഈ, ഉ, ഊ, ഋ, ഌ, എ, ഏ, ഐ, ഒ, ഔ

(Possibly these rules are applicable to other Indic languages also because their vowel letters and vowel signs act very similar to Malayalam.)

If possible, do not include Malayalam chillu characters [1] in captcha (atleast for now) because they have two encodings possible since Unicode 5.1.0. Normalization enabled only in ml.wikis and bug to enable normalization in all wikimedia wikis still pending [2].

If possible, limit the Malayalam block to U+0D02 to U+0D57, because other characters (except chillu characters) are not popular and probably not even mapped in keyboards. In the limit itself U+0D3A, 0D3D and 0D4E should be avoided which are also facing similar uncertainty.

[1]: http://unicode.org/versions/Unicode5.1.0/#Malayalam_Chillu_Characters
[2]: https://bugzilla.wikimedia.org/show_bug.cgi?id=45476