Hmm I don't think this thread is a good place for fighting language wars. Your POV is that PHP is a bad language; my POV is that PHP offers reasonable trade-off between performance, standards support, cost, reliability, complexity and so on. However, I did some research and it looks like the first prize in the category "Best support of Unicode in regular expressions" goes to Perl (Perl is cited as example many times by Unicode.org). Unfortunately PHP clearly sucks at the moment (even with mbstring extension). Perhaps version 6 will change that. So it might make sense to rewrite standalone component of a new diff engine in Perl.
Also it looks like some people don't understand punctuation issue. In Unicode *standard* punctuation marks can be below 0xc0 as well as *above*. If you look at code written by Tim Starling you'll see: // Punctuation and control characters if (ch < 0xc0) return false; So basically code above assumes that punctuation marks can only have codes below 0xc0 which is incorrect. On the other hand if you type in MS Word left single quotation mark then sequence of letters then right single quotation mark only sequence of letters will be spell checked. Which is nice and shows that MS Word developers respect at least Unicode standard. In other words Word sees difference between *all* Unicode punctuation marks and all Unicode letters. But you won't be able to repeat same trick with Mediawiki. Current diff engine considers all punctuation marks with codes above 0xc0 to be letters and makes them part of a word. Tim Starling in his defence says that high numbered punctuation is rare and the fact it is processed incorrectly won't do much damage. Well to a certain extend it's a good defence but if you accept it then you should also accept statements like "Opera is rarely used browser so if Wikipedia renders incorrectly in Opera it wouldn't do much damage" or "Supporting of just IE and FF is sufficient enough". BTW I noticed few glitches with how Wikipedia is displayed in Opera. Probably I've drunk too much open source Kool-Aid but here is a good example of proprietary product (manufactured by so-much-hated Microsoft) obeying standards and open source software that selectively supports standards. Someone suggested to me to fix it. Well I'm afraid I'm more on a bug creating side of things :) In fact, I was expecting that "Unicode Nazis" will rush to fix it. Instead all I got were "who cares" type of responses. I guess I should add more water to my Kool-Aid next time … Also small suggestion to all new participants in this thread – please state whether you like or not feature in question (you can find a description in the original e-mail).
On 10/06/06, Timwi timwi@gmx.net wrote:
Roman Nosov wrote:
I'm totally agree with Timwi – proper Unicode support is a requirement not a feature. However can someone tell me why PHP comes with no appropriate out-of-box support for such vital feature in 21 century?
Well, you see, that (the fact that PHP misses out on the most basic vital features, not just Unicode specifically) is kind of why no sensible 21st-century programmer would ever recommend PHP to start writing something new, and why the only people who choose or even recommend PHP are amateurs. Now obviously we are "stuck" with MediaWiki written in PHP, so we have to use it and live with its severe shortcomings...
For example look no further than Wikipedia's current diff engine.
You have mentioned some properties of the current diff engine, but I'm afraid I don't see how any of them are in any way a problem or an issue.
Timwi
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l