On Monday 28 July 2008 13:34:41 Platonides wrote:
Marc Cousin wrote:
I totally agree with you.
Is there a way of getting it in a future mediawiki release (I can do the
work if necessary) ?
Sure. Ask brion for a SVN account (you'll need to send him a public key).
The postgresql backend is not too well maintained, people wanting to
postgre appear from time to time and give it a kick.
I don't remember who is currently supposed to be taking care of it, but
i interpret that as he hasn't commented on this, he doesn't oppose ;)
Okay.
First I'd like to sum up what's to be done (please anybody tell me if I'm
wrong, I'll let a few days pass before I start working).
= Current state =
MySQL's way of doing text indexation:
- Articles updates are done on text (pagecontent for postgresql), and page (I
of course omit the rest of the schema as it's the same as far as full text
indexing is concerned)
- Sometimes, there is an indexation work that locks searchindex and puts the
new pages contents into it (retrieving the contents from the text table). New
page contents is retrieved thanks to recentchanges.
For PostgreSQL, right now : full text data is stored into the pagecontent
(text) table. So there are many unnecessary versions of full text data.
What I'd like to know is the reason for this asynchronous indexing in mysql :
is it because of the myisam table for full text (to avoid locking during
inserts), or for other performance reasons?
If there is no other reason than locking, there is no point in having this
asynchronous job with postgresql.
= Proposal =
- create the same searchindex table in postgresql as in mysql
- modify the search queries to use this new table (reuse and adapt the mysql
code)
- maintain searchindex synchronously with a trigger. I would think the best
place to put this trigger would be on the page table. The trigger should
update searchindex when page_latest is put to a non zero value (I think the
trigger will have everything it needs when this trigger fires).
- create a migration script for the database :
- remove unnecessary columns and indexes on text table, and its trigger.
- remove titlevector from title and its trigger.
- add pagecontent and put all page records in it, then put gin indexes (on
title and on content)
- put the trigger in place
Comments on this, please ?
Cheers
Marc