Why not just use Lucene (http://jakarta.apache.org/lucene/docs/index.html) or one of its many ports?
It's mature, stable, open-source, actively developed and widely considered to be a very fast, and very high-quality full-text searching and indexing engine started by an expert in full-text searching (http://www.nutch.org/blog/cutting.html, http://lucene.sourceforge.net/publications.html)
And it does Unicode just fine.
Krzysztof Kowalczyk | http://blog.kowalczyk.info