On Saturday 19 July 2008 15:44:51 Brion Vibber wrote:
On Jul 18, 2008, at 16:03, Marc Cousin mcousin@sigma.fr
EXPLAIN ANALYZE SELECT page_id, page_namespace, page_title, old_text AS page_text, ts_rank(titlevector, to_tsquery('default','postgres')) AS rnk FROM page p, revision r, pagecontent c WHERE p.page_latest = r.rev_id AND r.rev_text_id = c.old_id AND textvector @@ to_tsquery('default','postgres') AND page_is_redirect = '0' AND page_namespace IN (0,9,11) ORDER BY rnk DESC, page_id DESC LIMIT 20 OFFSET 0;
Hmm; a fundamental problem here is that text.old_text is opaque to the database -- it may contain compressed text, text in an alternate encoding, or a reference to another table or database at the application level.
This is part of why we use a separate table for the search index for the MySQL search -- it's actually consistently readable to the db's search engine as well as providing a summary table so the index only has to cover live versions.
-- brion
Hi,
Thanks for the answer...
What do you mean by 'opaque' ?
Btw the separate table for mysql solves the problem I'm having with postgresql (except that there is not the myisam problem, so the fulltext index could be stored directly in the 'page' table with postgresql). For a wiki like mine, with a lot of versions per document, text search performance becomes very bad.
So can you explain me what you mean by opaque ?
Thanks a lot for your time.
Marc