-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Marc Cousin wrote:
MySQL's way of doing text indexation:
- Articles updates are done on text (pagecontent for postgresql), and page (I
of course omit the rest of the schema as it's the same as far as full text
indexing is concerned)
- Sometimes, there is an indexation work that locks searchindex and puts the
new pages contents into it (retrieving the contents from the text table). New
page contents is retrieved thanks to recentchanges.
Well, no -- the search index is updated on page save, unless you go to a
lot of (usually unnecessary) trouble to disable it and do bulk updates.
Here's what happens normally:
Article::editUpdates() puts a SearchUpdate object on the post-edit
updates stack when the edit is complete. This gets run at the completion
of the request, along with many other updates (links tables, view
counts, etc).
SearchUpdate::doUpdate() takes the text, does some freaky markup
normalization, and sends it off to the search engine's
SearchEngine::update() or SeachEngine::updateTitle() methods.
The SearchEngine subclass for the relevant engine then updates its index
for the given page. In the case of the MySQL backend, this creates or
updates a row in the searchindex table.
What I'd like to know is the reason for this
asynchronous indexing in mysql :
is it because of the myisam table for full text (to avoid locking during
inserts), or for other performance reasons?
If you mean the non-default option to disable updates on edit, that
would be because MyISAM updates cause locking trouble if your site is
very busy, so you might prefer to do updates in bulk off-peak with the
search temporarily disabled.
However this is not the default.
- maintain searchindex synchronously with a trigger. I
would think the best
place to put this trigger would be on the page table. The trigger should
update searchindex when page_latest is put to a non zero value (I think the
trigger will have everything it needs when this trigger fires).
Can't do it -- the database has insufficient knowledge to interpret the
contents of page text in the general case.
* It doesn't know how to uncompress compressed revisions
* It doesn't know how to access text in external storage blobs
* It doesn't know the namespaces etc for proper markup normalization
Since the database can't read page text as a general case, this has to
be done from the application layer.
- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iEYEARECAAYFAkiOEnIACgkQwRnhpk1wk47lnACg2SApw9M6bxbcwsS9IhzFC/3Z
I1gAn3ZWFwnABt82JvNkJHQiax7RNRXW
=ES9a
-----END PGP SIGNATURE-----