the search
algorithm doesn't seem to be that good either:
Wikiwix: 9 results for 'dog'
Mayflower: 3,416 results for 'dog'
I get the same low results when searching for other common terms.
That doesn't mean anything. Performance of retrieval algorithms is
measured in precision/recall, and depends on the application. 9
results for dog is probably a low recall, but if the precision is good
the algorithm could be useful. And if the 3,416 answers of Mayflower
are all about something else than dogs, it's not useful.
To have a good evaluation of these algorithms, we should create an
annotated sample of data, complete with queries and referential (list
of correct answers). Preferably we would have several sets to perform
machine learning (if needed) and evaluation, and we would use standard
or common standards like TREC, as to inherit their tools.
-- Rama