the search algorithm doesn't seem to be that good either:
Wikiwix: 9 results for 'dog' Mayflower: 3,416 results for 'dog'
I get the same low results when searching for other common terms.
That doesn't mean anything. Performance of retrieval algorithms is measured in precision/recall, and depends on the application. 9 results for dog is probably a low recall, but if the precision is good the algorithm could be useful. And if the 3,416 answers of Mayflower are all about something else than dogs, it's not useful.
To have a good evaluation of these algorithms, we should create an annotated sample of data, complete with queries and referential (list of correct answers). Preferably we would have several sets to perform machine learning (if needed) and evaluation, and we would use standard or common standards like TREC, as to inherit their tools. -- Rama