Jaska Zedlik wrote:
Hi!
There are different apostrophe signs exist. Let's consider 2 of them:
U+0027 and U+2019. They have the same meaning and both of them are
acceptable and apostrophes for the English language, for instance. The
problem is that MediaWiki internal search distinguishes these two
apostrophes and the words containing U+2019 can't be found with the
request containing U+0027 and vice versa.
Probably what we should be doing in this area is running text through
Unicode compatibility composition normalization as well as some other
character folding for punctuation forms where necessary.
(UtfNormal::toNFKC() will merge things like full-width Roman characters
but won't merge these related-but-not-quite-the-same punctuation forms.)
-- brion
MediaWiki uses a search index for the internal search
and the index is
renewed every time the article is saved. I have found that if to
override the function stripForSearch() in the language class with the
new function wich relpaces the U+2019 with U+0027 for search index it
appears that the internal search begins to work properly not paying
attention to which exactly apostrophe was provided in the search
query, either U+0027 or U+2019. For sure, the context is not
highlighted if the apostrophes differ in the query and in the result,
but the search returns what is really needed.
The question is, if we override the stripForSearch() function in the
language class in such a way, won't this cause any problems?
The code of the override function is the following:
function stripForSearch( $string ) {
$s = $string;
$s = preg_replace( '/\xe2\x80\x99/', '\'', $s );
return parent::stripForSearch( $s );
}
We want to introduce such an issue for Belarusian, but I think
Ukrainian language may experience the same problem with the different
apostrophes, as U+0027 is not a valid apostrophe here as well, but
only U+0027 (the typewriter apostrophe) is available on the majority
of Belarusian and Ukrainian keyboard layouts.
Thanks,
zedlik
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l