Hoi,
We do have languages that are not supported with CLDR locales. Does Unicode
on it own suffice ?
Thanks.
GerardM
2009/1/10 Greg Hewgill <greg(a)hewgill.com>
2009/1/11 Gerard Meijssen
<gerard.meijssen(a)gmail.com>om>:
How many characters are there according to your
software in the word
Mbɔ́tɛ
? The correct answer is 5
Since I was working with the enwiki dump, I did not pay much attention
to internationalisation issues. I arbitrarily defined a "word" as the
Python regular expression: [\w\d]+
So, the answer to your question depends on how Python implements the
\w word-matching regular expression atom:
"When the LOCALE and UNICODE flags are not specified, matches any
alphanumeric character and the underscore; this is equivalent to the
set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus
whatever characters are defined as alphanumeric for the current
locale. If UNICODE is set, this will match the characters [0-9_] plus
whatever is classified as alphanumeric in the Unicode character
properties database. "
Greg Hewgill
http://hewgill.com
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l