Re: [openZIM dev-l] Kiwix and ZimReader/Writer Index Format

23 Aug 2009


      ...
How did you determine the list?
Just an idea: Wouldn't zimwriter be the place to automatically
generate such a list during indexing or in a separate pass?
An additional parameter would specify the maximum size of that
list, the minimum number of occurrences of a word or maybe a
minimum percentage of articles the word occurres in, before it
gets marked as trivial.
I formally had the word list in a database. From there I just counted the 
number of occurencies of words and sorted that list. This is a simple sql 
statement. Then I viewes that list manually and decided, which words to skip.
It would be possible for sure to automatically determine the list. It will 
just take additional significant processing power. It may be worth trying. It 
could be extracted quite easily from the resulting zim index file.
I like that idea. I will try that.
Tommi

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [openZIM dev-l] Kiwix and ZimReader/Writer Index Format