On 6/9/05, Александр Сигачёв alexander.sigachov@gmail.com wrote:
Sal'
If summary field text is in multi-byte encoding and we want to grep the first 150 chars of comment then a slightly strange char sometimes appears in history
Example: http://commons.wikimedia.org/w/index.php?title=Image:Venera-7_diagram.jpg&am...
(==Описание/Description== *ru:Межпланетная автоматическая станция «Венера-7»: 1 — панели солнечных батарей; 2 — датчик астроориентации; 3 — защитная �) ==============
I think, It's first byte of truncated two-byte char. So, we have to use "mb_substr" instead of "substr", is'nt it?
Without having actually looked at the code but it should be using the truncate() function from the language class, however the Language.php version of that function is not Unicode aware so stuff like this will continue happening until bug 2069 is solved (http://bugzilla.wikimedia.org/show_bug.cgi?id=2069)