Ok, I now understand the meaning of opaque, even if that data isn't that
opaque, because the database and mediawiki are using it for full text
indexing.
I totally agree with you, that would be a really bad idea to try to mess with
the content of the page (that wasn't what I was talking about).
Currently, for postgresql there is a trigger updating the 'textvector' column
in pagecontent table when 'old_text' is modified, so that the text search
index is up to date.
The problem is that the textvector should only exist for the latest version of
the pagecontent. There is no point in indexing the old versions of the
contents of the page.
That's done for mysql because of the 1:1 relationship between page and
searchindex, but isn't done for postgresql.
For PostgreSQL, there are at least 2 solutions :
- Put the textvector into the page table
- Do the same as for mysql : create a searchindex table (even if the myisam
justification for this table doesn't hold for mysql)
Plus the temporary fix I was talking about in my mail : put an empty
textvector for all versions of a document except its last one, by changing
the trigger a bit. That divided the size of my fulltext index by 30 and made
the text search fast again.
What would be the preferred way to solve this problem ?
On Monday 21 July 2008 19:47:09 Daniel Friesen wrote:
By 'opaque' he means that the databases are
supposed to handle that
field transparently. They are supposed to send the data raw to the app
and act as if they have absolutely no clue how to handle the data inside
of that field.
The reason being the data inside of that column is inconstant and
letting the database modify or pretend it knows what is inside there can
end up with real unexpected results.