On Fri, Aug 16, 2002 at 06:08:11PM -0700, lcrocker(a)nupedia.com wrote:
[...] Conversely, if you or others
want to customize the list itself, you can edit the file
"FulltextStoplist.php" in the code, and I can recompile MySQL
to use our cutomized one.
Actually, out of curiosity I wrote a small PHP script that fills a table
with information about which words are used how many times and in how many
articles. It needs some refining (it runs over all pages in table cur and it
uses the regexp \w+ to find words) but with it we could for example determine a
list of words that is used in more than 50% of the pages. Those are the
search words that MySQL ignores anyway. It would also give us a quick and
dirty way to determine the stopword list for the non-English Wikipedias.
I'm running it at the moment on the dump from May 20.
-- Jan Hidders