[Mediawiki-l] Parsing raw text

Platonides Platonides at gmail.com
Sat Jan 5 16:15:50 UTC 2008

Thomas Dalton wrote:
> On 04/01/2008, Jack Eapen C wrote:
>> By this time I hv figured out something
>> Justing playing around with mediawiki and mysql fulltext search- I hv
>> the following function:
>> $wgHooks['ArticleSaveComplete'][] = 'getNormalTextfromWikiText';
>> function getNormalTextfromWikiText(&$article,&$user,&$text)
>> {
>>    global $wgParser;
>>    $result = $wgParser->parse($text, $wgParser->mTitle,
>> $wgParser->mOptions);
>>    $new_text= $result->getText();
>>         $dbw =& wfGetDB( DB_MASTER );
>>                         $dbw->insert( 'searchable_text',
>>                                 array(
>>                                 'page_id' => $article->getID(),
>>                                 'searchable_text'    => $new_text
>>                                 ) );
>>          return true;
>> }

You probably want to change that to the place where the page is 
rendered. Doing it there
a) You're parsing it twice (and parsing is expensive).
b) Your table is not updated when the page changes without being edited 
(eg. templates).

> Ah! I see. I was thinking about HTML tables and was completely
> confused! Would a simple regexp that removes everything between < and
>> do the trick? I've never really got the hang of regexps, so I won't
> give you any code to try, but it should be a relatively easy one, I
> imagine.

You shouldn't use regex for HTML tags. Use the php function strip_tags()

More information about the MediaWiki-l mailing list