Pherhaps, however looking over the regex: /^(.+?)_*:_*(.*)$/S Checking over it with my regex tool, I notice that when it encounters a _ it ends up doubling back over it when not followed by a : . Not to mention that for each character it needs to check that it's any character, or an underscore/followed by any, and if that's followed by a : . I think that using a string function to find the first : in the string (CPU's are best at incrementation so that's nothing), and then trimming would be faster than using the regex.
Oh, ya, also there is something to remember. With the new format of normalization the splitting should NOT trim whitespace as the current setup does. If that were done then it would be eliminating whitespace from the title which someone's altered normalization may actually wish to keep. So a altered version of that regex to suit, would be: /^(.+?):(.*)$/S which most definitely is no where near as efficient as a simple find : and split. I'll probably use list() and explode() actually.
On another note, I noticed something with the normalization. While : is the standard separator, abstracting the normalization process like this is actually loosening the definition of what is what in a title, while still keeping it stable. Honestly, if someone changed the methods used to prefix things, and altered the splitting sequence, someone could probably change MediaWiki to use something like :: as the separator instead. If they went to even more work, they could probably introduce a special type of Namespace to MediaWiki which could use a different kind of prefix, or even restrict to inclusion of only certain types of pages. (Basically, wiki like card game wiki could force their package redirects and card ids into special namespaces dedicated to them). Actually, in light of that, I might add another hook or two, or clean up some of the title functions to properly abstract the prefixing to where it should be instead of mixing it up all over the place.
~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com)
Platonides wrote:
DanTMan wrote:
So to cut down on that, I'm going to try using normal string functions to pull out the prefixes and trim them off. A strpos, substr, and trim set together is much quicker than a full blown regex pattern match.
Not always. Remember that the PHP code surrounding that functions is interpreted, while the regex call is run on compiled code. I think some of the sysadmins remarked that the use of regex *improved* the perfomance. I'm not saying you shouldn't change it to traditional means, just that time should be checked to be sure it's not slower.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l