How did you determine the list?
Just an idea: Wouldn't zimwriter be the place to automatically
generate such a list during indexing or in a separate pass?
An additional parameter would specify the maximum size of that
list, the minimum number of occurrences of a word or maybe a
minimum percentage of articles the word occurres in, before it
gets marked as trivial.
I formally had the word list in a database. From there I just counted the
number of occurencies of words and sorted that list. This is a simple sql
statement. Then I viewes that list manually and decided, which words to skip.
It would be possible for sure to automatically determine the list. It will
just take additional significant processing power. It may be worth trying. It
could be extracted quite easily from the resulting zim index file.
I like that idea. I will try that.
Tommi