Thanks for your aswers.
Here's what I see from my (narrow) understanding of how mediawiki works.
- I already tried to change the parameters you mention, but as they were
correct for my (pretty old) machine, this brought me even worse results (this
was expected). The bad plans come from the fact we have a lot of versions of
page revisions indexed in the page table.
Anyhow, I find it strange that we use the pagecontent table to store those
full text vectors when they are in fact the full text vectors for a page.
Wouldn't this textvector be better placed either in the page table or in a
searchindex with a 1:1 relationship like in mysql ? That would be more
logical to the planner : how could it guess that the records from the
pagecontent table that are indexed are indeed the latest revisions from the
page table ? Whereas if the vector is in the page table, the query becomes a
more classical join : we find the records matching the full text criterion
from page table, then join to revision and pagecontent to retrieve the
- all revisions of all pages were indexed in the text table. Maybe because
we're still using 1.9.1 ? (there is no update in SearchPostgres.php here)
If I understand correctly what you explain below, the content of text is
not 'opacified' for postgresql, as there is no point in doing that because of
toast compression, so working with triggers is the logical way to go with
On Tuesday 29 July 2008 17:29:44 Greg Sabino Mullane wrote:
Marc Cousin wrote:
The plan is this one
[explain analyze plans]
Those are pretty bad plans - you might benefit from some simple tuning.
Try lowering random_page_cost and boosting effective_cache_size. Or come
round to #postgresql on freenode, for more free tuning help than you can
shake a stick at.
The query you gave ran in 56 ms for me on what I'm guessing is
a larger wiki (~ 400K rows in pagecontent) (and a pretty busy box).
You should be seeing a Bitmap Index Scan on "ts2_page_text",
If you find yourself using the same namespace restrictions a lot, you can
create some custom indexes as well, e.g.
CREATE INDEX page_index1 ON page(page_title) WHERE page_is_redirect = 0
AND page_namespace IN (0,9,11);
Is there a reason we index every version of every
We don't. Or at least, we were not intending to. You can grep for UPDATE
in SearchPostgres.php. I just changed (r38184) the code to double-check we
nullify *all* old revisions: this may explain part of what you were seeing
before. You can check on individual pages and see if they are all
null-but-latest like so:
SELECT CASE WHEN textvector IS NULL THEN 1 ELSE 0 END AS isnull, COUNT(*)
FROM pagecontent WHERE old_id IN (SELECT rev_text_id FROM revision WHERE
rev_page = (SELECT page_id FROM page WHERE page_namespace = 0 AND
page_title = 'Foobar' LIMIT 1)) GROUP BY 1;
(mostly for the benefit of other people reading this thread, since you
said you already manually updated the database)
Brion Vibber wrote:
Hmm; a fundamental problem here is that
text.old_text is opaque to the
database -- it may contain compressed text, text in an alternate
encoding, or a reference to another table or database at the
Right: $wgCompressRevisions is not recommended for a Postgres backend.
Should probably document that somewhere. Postgres will compress
Not to say that we might not want to change the way things are done, but
the easiest solution to the original poster's problem is to get the query
optimized first, and figure out why the old revision are not being nulled.
The postgresql backend is not too well
maintained, people wanting to
postgre appear from time to time and give it a kick.
I don't remember who is currently supposed to be taking care of it, but
i interpret that as he hasn't commented on this, he doesn't oppose ;)
Wow, that's a bit harsh - not too well maintained? Hardly. :) Myself and
others have put in a lot of work to get it working and keep it so,
including paving the way for other database backends by discovering and
addressing lots of mysqlisms in the code. I apologize for not replying:
I somehow got recently unsubscribed.