Thanks for your aswers.
Here's what I see from my (narrow) understanding of how mediawiki
works.
- I already tried to change the parameters you mention, but as they
were
correct for my (pretty old) machine, this brought me even worse
results (this
was expected). The bad plans come from the fact we have a lot of
versions of
page revisions indexed in the page table.
Anyhow, I find it strange that we use the pagecontent table to store
those
full text vectors when they are in fact the full text vectors for a
page.
Wouldn't this textvector be better placed either in the page table
or in a
searchindex with a 1:1 relationship like in mysql ? That would be more
logical to the planner : how could it guess that the records from the
pagecontent table that are indexed are indeed the latest revisions
from the
page table ? Whereas if the vector is in the page table, the query
becomes a
more classical join : we find the records matching the full text
criterion
from page table, then join to revision and pagecontent to retrieve the
associated text.
- all revisions of all pages were indexed in the text table. Maybe
because
we're still using 1.9.1 ? (there is no update in SearchPostgres.php
here)
If I understand correctly what you explain below, the content of
text is
not 'opacified' for postgresql, as there is no point in doing that
because of
toast compression, so working with triggers is the logical way to go
with
postgresql ?
On Tuesday 29 July 2008 17:29:44 Greg Sabino Mullane wrote:
Marc Cousin wrote:
The plan is this one
[explain analyze plans]
Those are pretty bad plans - you might benefit from some simple
tuning.
Try lowering random_page_cost and boosting effective_cache_size. Or
come
round to #postgresql on freenode, for more free tuning help than
you can
shake a stick at.
The query you gave ran in 56 ms for me on what I'm guessing is
a larger wiki (~ 400K rows in pagecontent) (and a pretty busy box).
You should be seeing a Bitmap Index Scan on "ts2_page_text",
If you find yourself using the same namespace restrictions a lot,
you can
create some custom indexes as well, e.g.
CREATE INDEX page_index1 ON page(page_title) WHERE page_is_redirect
= 0
AND page_namespace IN (0,9,11);
Is there a reason we index every version of every
content ?
We don't. Or at least, we were not intending to. You can grep for
UPDATE
in SearchPostgres.php. I just changed (r38184) the code to double-
check we
nullify *all* old revisions: this may explain part of what you were
seeing
before. You can check on individual pages and see if they are all
null-but-latest like so:
SELECT CASE WHEN textvector IS NULL THEN 1 ELSE 0 END AS isnull,
COUNT(*)
FROM pagecontent WHERE old_id IN (SELECT rev_text_id FROM revision
WHERE
rev_page = (SELECT page_id FROM page WHERE page_namespace = 0 AND
page_title = 'Foobar' LIMIT 1)) GROUP BY 1;
(mostly for the benefit of other people reading this thread, since
you
said you already manually updated the database)
Brion Vibber wrote:
Hmm; a fundamental problem here is that
text.old_text is opaque to
the
database -- it may contain compressed text, text in an alternate
encoding, or a reference to another table or database at the
application level.
Right: $wgCompressRevisions is not recommended for a Postgres
backend.
Should probably document that somewhere. Postgres will compress
automatically anyway.
Not to say that we might not want to change the way things are
done, but
the easiest solution to the original poster's problem is to get the
query
optimized first, and figure out why the old revision are not being
nulled.
Platonides wrote:
The postgresql backend is not too well
maintained, people wanting to
postgre appear from time to time and give it a kick.
I don't remember who is currently supposed to be taking care of
it, but
i interpret that as he hasn't commented on this, he doesn't
oppose ;)
Wow, that's a bit harsh - not too well maintained? Hardly. :)
Myself and
others have put in a lot of work to get it working and keep it so,
including paving the way for other database backends by discovering
and
addressing lots of mysqlisms in the code. I apologize for not
replying:
I somehow got recently unsubscribed.
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org