Re: [Wikitech-l] OT: What language would you have done it in, and why?

1 Aug 2006


      Chad Perrin wrote:
...
PHP is supposedly planning to incorporate Python's ICU, which has some
reasonable Unicode support for regexen, at some point in the future.
PHP already has unicode regex support, because PCRE has had it for some 
time and PHP just bundles that. In fact, the simplest way to split a 
UTF-8 string by character in PHP 4-5 with no mbstring is to do 
preg_match_all('/./u',...). MediaWiki uses this on occasion.
In PHP 6, they are moving to a 16-bit character type (not sure if it's 
UTF-16 or UCS-2), with a distinct binary string type. If "unicode 
semantics" are enabled, string literals will be unicode by default, and 
all the usual string operations would be character-wise. I dare say this 
would cause some backwards compatibility problems for applications such 
as MediaWiki.
PHP 6 requires ICU for its internal unicode support, but I'm not sure to 
what extent they will be providing interfaces to ICU's more complex 
functions. Note that ICU is not "Python's ICU", it's a library written 
by IBM which is natively C, C++ and Java. There is a set of swig 
wrappers to bind the C++ API to Python.
-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] OT: What language would you have done it in, and why?

Re: [Wikitech-l] OT: What language would *you* have done it in, and why?

Re: [Wikitech-l] OT: What language would you have done it in, and why?