Thomas Dalton wrote:
On 04/01/2008, Jack Eapen C wrote:
>
> By this time I hv figured out something
> Justing playing around with mediawiki and mysql fulltext search- I hv
> the following function:
>
> $wgHooks['ArticleSaveComplete'][] = 'getNormalTextfromWikiText';
>
> function getNormalTextfromWikiText(&$article,&$user,&$text)
> {
> global $wgParser;
> $result = $wgParser->parse($text, $wgParser->mTitle,
> $wgParser->mOptions);
> $new_text= $result->getText();
>
> $dbw =& wfGetDB( DB_MASTER );
> $dbw->insert( 'searchable_text',
> array(
> 'page_id' => $article->getID(),
> 'searchable_text' => $new_text
> ) );
> return true;
> }
You probably want to change that to the place where the page is
rendered. Doing it there
a) You're parsing it twice (and parsing is expensive).
b) Your table is not updated when the page changes without being edited
(eg. templates).
Ah! I see. I was thinking about HTML tables and was
completely
confused! Would a simple regexp that removes everything between < and
do the trick? I've never really got the hang
of regexps, so I won't
give you any code to try, but it should be a relatively
easy one, I
imagine.
You shouldn't use regex for HTML tags. Use the php function strip_tags()