On Thu, Jan 8, 2009 at 1:31 PM, Greg L Greg_L_at_Wikipedia@comcast.net wrote:
I'm not a developer so it would be great if either of you (Aryeh or Mr.Z-man) could explain whether a character-counting parser function (or similar tool) is currently available (or could be made) for template authors to use.
Such tools are available, but none has been written well enough that it could be used on Wikimedia sites.
As for "we currently have no plans to enable StringFunctions or any similar functionality on Wikimedia sites", why would that be a good plan?
Andrew was mistaken in that statement. Variables are currently off the table, but string functions aren't. They need someone to write a good version of them that isn't DOSable and handles things like strip markers acceptably.
On Thu, Jan 8, 2009 at 1:43 PM, Chad innocentkiller@gmail.com wrote:
Brion outlined his concerns with StringFunctions--when it was merged with ParserFunctions and he reverted it--back in r39653. Mainly, the overall package is too memory intensive as currently written.
His exact comment might be more elucidating: "o_O These look like the least CPU- and memory-efficient implementations of strlen(), strpos() etc that could possibly be created..." For example, the {{#len:}} function was implemented as the return value of this:
/** * Splits the string into its component parts using preg_match_all(). * $chars is set to the resulting array of multibyte characters. * Returns count($chars). */ function mwSplit ( &$parser, $str, &$chars ) { # Get marker prefix & suffix $prefix = preg_quote( $parser->mUniqPrefix ); if( isset($parser->mMarkerSuffix) ) $suffix = preg_quote( $parser->mMarkerSuffix ); else if ( strcmp( MW_PARSER_VERSION, '1.6.1' ) > 0 ) $suffix = 'QINU\x07'; else $suffix = 'QINU';
# Treat strip markers as single multibyte characters $count = preg_match_all('/' . $prefix . '.*?' . $suffix . '|./su', $str, $arr); $chars = $arr[0]; return $count; }
Rather than, say, replacing strip markers using the appropriate Parser method, and then returning mb_strlen(). Or whatever would be appropriate. I'm not sure what would be, but I'm pretty sure it doesn't involve exploding the string into an array to calculate its length.