Re: [Wikitech-l] [MediaWiki-CVS] phase3/includes Parser.php, 1.602, 1.603

26 Mar 2006

On 3/24/06, Gabriel Wicke &lt;gabrielwicke(a)users.sourceforge.net&gt; wrote:
...
  Update of /cvsroot/wikipedia/phase3/includes
 In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12319/includes

 Modified Files:
         Parser.php
 Log Message:
 Provide some cleanup if tidy is disabled:

 * fix invalid nesting of anchors and i/b
 * remove empty i/b tags
 * remove divs inside anchors

 Fixes several test cases

 Index: Parser.php
 ===================================================================
 RCS file: /cvsroot/wikipedia/phase3/includes/Parser.php,v
 retrieving revision 1.602
 retrieving revision 1.603
 diff -u -d -r1.602 -r1.603
 --- Parser.php  22 Mar 2006 04:57:14 -0000      1.602
 +++ Parser.php  24 Mar 2006 16:36:29 -0000      1.603
 @@ -250,6 +250,32 @@

                 if (($wgUseTidy and $this->mOptions->mTidy) or $wgAlwaysUseTidy) {
                         $text = Parser::tidy($text);
 +               } else {
 +                       # attempt to sanitize at least some nesting problems
 +                       # (bug #2702 and quite a few others)
 +                       $tidyregs = array(
 +                               # ''Something [http://www.cool.com
cool''] -->
 +                               # <i>Something</i><a
href="http://www.cool.com"..><i>cool></i></a>
 +                              
'/(<([bi])>)(<([bi])>)?([^<]*)(<\/?a[^<]*>)([^<]*)(<\/\\4>)?(<\/\\2>)/'
=>
 +                               '\\1\\3\\5\\8\\9\\6\\1\\3\\7\\8\\9',
 +                               # fix up an anchor inside another anchor, only
 +                               # at least for a single single nested link (bug 3695)
 +                              
'/(<a[^>]+>)([^<]*)(<a[^>]+>[^<]*)<\/a>(.*)<\/a>/'
=>
 +                               '\\1\\2</a>\\3</a>\\1\\4</a>',
 +                               # fix div inside inline elements- doBlockLevels won't
wrap a line which
 +                               # contains a div, so fix it up here; replace
 +                               # div with escaped text
 +                               '/(<([aib])
[^>]+>)([^<]*)(<div([^>]*)>)(.*)(<\/div>)([^<]*)(<\/\\2>)/'
=>
 +                              
'\\1\\3&lt;div\\5&gt;\\6&lt;/div&gt;\\8\\9',
 +                               # remove empty italic or bold tag pairs, some
 +                               # introduced by rules above
 +                               '/<([bi])><\/\\1>/' => ''
 +                       );
 +
 +                       $text = preg_replace(
 +                               array_keys( $tidyregs ),
 +                               array_values( $tidyregs ),
 +                               $text );
                 }

                 wfRunHooks( 'ParserAfterTidy', array( &$this, &$text ) );

This fixes the "Bug 2702: Mismatched <i>, <b> and <a> tags are
invalid" test case but it's not really an improvement. The test case
was supposed to demonstrate that we don't balance tags, which this
doesn't fix, it merely hacks around very specific cases with regular
expressions which fail if you insert more tags which would be fixed in
a parser that balanced tags properly.

I'm all for fixing the parser, but it's not an improvement to make
that parser test cases we have pass by basically writing a hack in the
parser to make just that test pass rather than fixing the core issue.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] [MediaWiki-CVS] phase3/includes Parser.php, 1.602, 1.603