[Mediawiki-l] Text search performance and postgresql

Greg Sabino Mullane greg at endpoint.com
Tue Jul 29 15:29:44 UTC 2008


Marc Cousin wrote:
> The plan is this one
[explain analyze plans]

Those are pretty bad plans - you might benefit from some simple tuning.
Try lowering random_page_cost and boosting effective_cache_size. Or come
round to #postgresql on freenode, for more free tuning help than you can
shake a stick at.

The query you gave ran in 56 ms for me on what I'm guessing is
a larger wiki (~ 400K rows in pagecontent) (and a pretty busy box).

You should be seeing a Bitmap Index Scan on "ts2_page_text",

If you find yourself using the same namespace restrictions a lot, you can
create some custom indexes as well, e.g.

CREATE INDEX page_index1 ON page(page_title) WHERE page_is_redirect = 0
AND page_namespace IN (0,9,11);

> Is there a reason we index every version of every content ?

We don't. Or at least, we were not intending to. You can grep for UPDATE
in SearchPostgres.php. I just changed (r38184) the code to double-check we
nullify *all* old revisions: this may explain part of what you were seeing
before. You can check on individual pages and see if they are all
null-but-latest like so:

SELECT CASE WHEN textvector IS NULL THEN 1 ELSE 0 END AS isnull, COUNT(*)
FROM pagecontent WHERE old_id IN (SELECT rev_text_id FROM revision WHERE
rev_page = (SELECT page_id FROM page WHERE page_namespace = 0 AND
page_title = 'Foobar' LIMIT 1)) GROUP BY 1;

(mostly for the benefit of other people reading this thread, since you
said you already manually updated the database)

Brion Vibber wrote:
> Hmm; a fundamental problem here is that text.old_text is opaque to the
> database -- it may contain compressed text, text in an alternate
> encoding, or a reference to another table or database at the
> application level.

Right: $wgCompressRevisions is not recommended for a Postgres backend.
Should probably document that somewhere. Postgres will compress
automatically anyway.

Not to say that we might not want to change the way things are done, but
the easiest solution to the original poster's problem is to get the query
optimized first, and figure out why the old revision are not being nulled.

Platonides wrote:

> The postgresql backend is not too well maintained, people wanting to
> postgre appear from time to time and give it a kick.
> I don't remember who is currently supposed to be taking care of it, but
> i interpret that as he hasn't commented on this, he doesn't oppose ;)

Wow, that's a bit harsh - not too well maintained? Hardly. :) Myself and
others have put in a lot of work to get it working and keep it so,
including paving the way for other database backends by discovering and
addressing lots of mysqlisms in the code. I apologize for not replying: 
I somehow got recently unsubscribed.

-- 
Greg Sabino Mullane greg at turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200807291126
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
Url : http://lists.wikimedia.org/pipermail/mediawiki-l/attachments/20080729/edcdd050/attachment.pgp 


More information about the MediaWiki-l mailing list