Re: [Mediawiki-l] Parsing raw text

5 Jan 2008

      Thomas Dalton wrote:
...
On 04/01/2008, Jack Eapen C wrote:
...
By this time I hv figured out something
Justing playing around with mediawiki and mysql fulltext search- I hv
the following function:
$wgHooks['ArticleSaveComplete'][] = 'getNormalTextfromWikiText';
function getNormalTextfromWikiText(&$article,&$user,&$text)
{
   global $wgParser;
   $result = $wgParser->parse($text, $wgParser->mTitle,
$wgParser->mOptions);
   $new_text= $result->getText();
    $dbw =& wfGetDB( DB_MASTER );
                    $dbw->insert( 'searchable_text',
                            array(
                            'page_id' => $article->getID(),
                            'searchable_text'    => $new_text
                            ) );
     return true;

}
You probably want to change that to the place where the page is 
rendered. Doing it there
a) You're parsing it twice (and parsing is expensive).
b) Your table is not updated when the page changes without being edited 
(eg. templates).
...
Ah! I see. I was thinking about HTML tables and was completely
confused! Would a simple regexp that removes everything between < and
...
do the trick? I've never really got the hang of regexps, so I won't
give you any code to try, but it should be a relatively easy one, I
imagine.
You shouldn't use regex for HTML tags. Use the php function strip_tags()

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [Mediawiki-l] Parsing raw text