[Wikitech-l] PCRE unicode mode performances?

2 Jul 2013

Hello,

AbuseFilter does not match word boundaries in devanagari script which is
logged at  https://bugzilla.wikimedia.org/46773 (has some unit test
result attached).

The root cause is that the regex pattern are not in unicode mode ('u'
regexp flag) and thus \b is being dumb.

The fix would be to set the preg_match in AbuseFilter to unicode mode,
but I am worried about the performances implications.  I once wrote a
patch that used unicode properties and that made the parser
significantly slower.

Maybe the AbuseFilter code path is not that critical for performances :)
 Any thoughts?

-- 
Antoine "hashar" Musso

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] PCRE unicode mode performances?