On Mon, Jun 23, 2003 at 10:16:57PM -0700, Menchi Zh-En WP wrote:
From: "Menchi" ... 2) The following example is even more bizarre. There's no space added, no char changed, but yet 3 sentences were reddened:
http://zh.wikipedia.org/w/wiki.phtml?title=Wikipedia:%E5%B8%B8%E8%A7%81%E9%9...
Such difficulties renders what could be obvious vandalism in Western Wikipedias quite subtle in CJK Wikipedia.s
Menchi Zh-En
Ok, after the 6 inspections of that Chinese Wiki-diff -- 1 of which C-&-P'ed to Notepad, another to Word -- I gave you above: I found the difference, finally. It is a vandal (and I honestly thought just a newbie wanting to perfect our Mandarin wording! I mentioned vandalism merely as a possibility).
The vandal changed the mentioning of the English WP creation year fr. 2001 -> 2006. A moronic joke or Anglophobist it is.
Anyway, that just further proofs the flaw of the current diff detection.
It's word-based, so it doesn't work well with spaceless scripts. The easiest thing would be to tell it that every CJK character is a word on its own. (there has to be some nice regular expression for CJK character)
On the other hand I'm not sure if Japanese and Koreans would be happy with that treatment of kana/hangul, so something more complex may be needed in future.