Pherhaps, however looking over the regex: /^(.+?)_*:_*(.*)$/S
Checking over it with my regex tool, I notice that when it encounters a
_ it ends up doubling back over it when not followed by a : . Not to
mention that for each character it needs to check that it's any
character, or an underscore/followed by any, and if that's followed by a : .
I think that using a string function to find the first : in the string
(CPU's are best at incrementation so that's nothing), and then trimming
would be faster than using the regex.
Oh, ya, also there is something to remember. With the new format of
normalization the splitting should NOT trim whitespace as the current
setup does. If that were done then it would be eliminating whitespace
from the title which someone's altered normalization may actually wish
to keep.
So a altered version of that regex to suit, would be: /^(.+?):(.*)$/S
which most definitely is no where near as efficient as a simple find :
and split. I'll probably use list() and explode() actually.
On another note, I noticed something with the normalization. While : is
the standard separator, abstracting the normalization process like this
is actually loosening the definition of what is what in a title, while
still keeping it stable. Honestly, if someone changed the methods used
to prefix things, and altered the splitting sequence, someone could
probably change MediaWiki to use something like :: as the separator
instead. If they went to even more work, they could probably introduce a
special type of Namespace to MediaWiki which could use a different kind
of prefix, or even restrict to inclusion of only certain types of pages.
(Basically, wiki like card game wiki could force their package redirects
and card ids into special namespaces dedicated to them).
Actually, in light of that, I might add another hook or two, or clean up
some of the title functions to properly abstract the prefixing to where
it should be instead of mixing it up all over the place.
~Daniel Friesen(Dantman) of:
-The Gaiapedia (
http://gaia.wikia.com)
-Wikia ACG on
Wikia.com (
http://wikia.com/wiki/Wikia_ACG)
-and
Wiki-Tools.com (
http://wiki-tools.com)
Platonides wrote:
DanTMan wrote:
So to cut down on that, I'm going to try
using normal string
functions to pull out the prefixes and trim them off. A strpos, substr,
and trim set together is much quicker than a full blown regex pattern match.
Not always. Remember that the PHP code surrounding that functions is
interpreted, while the regex call is run on compiled code.
I think some of the sysadmins remarked that the use of regex *improved*
the perfomance.
I'm not saying you shouldn't change it to traditional means, just that
time should be checked to be sure it's not slower.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l