The logic for links lies in parser/Parser.php's Parser::replceInternalLinks (Actually now it's Parser::replaceInternalLinks2 you should see), TimStarling also made some recent changes so you may also want to look at parser/LinkHolderArray.php.
For the preprocessor stuff parser/Preprocessor_DOM.php, however the logic there is actually fairly more complex than what we'll even need.
I'm trying to find a way to get nested things to work right, without ruining TimStarling's recent improvements to the memory and speed of that area of the parser. I did a small benchmark between: A) recursive call; Find [[, walk till the closing ]] and do a recursive call for the stuff in between. (This is similar to what we currently do now, though we limit to a depth of 2) B) markers and a single hashtable; As we find [['s we create a stack of offsets, when a ]] is found we pop the last offset, create a new token for the hashtable with the contents in between, and replace the text with a marker. (Though when expanding, we need to expand multiple times because the content of markers can have markers inside of them as well)
For a real flat setup A) and B) are similar, though A) does have a slightly lower footprint (But do note that this test is flat string replacement recursion, there is no link holders setup and we don't run a setup which we would be running multiple times with the recursion, so a actual Parser implementation would likely be heavier). However, when you get into an insane level of nested brackets, A) starts to take 10x the time that B) takes. This would be why we limit to a depth of 2 recursions, but may actually make using B) to create a tree possible. Of course, links would never be nested like that, but when you are using a different order of parsing it does get needed.
I'm considering creating another parser (inheriting from Parser) in order to start experimenting and working on a different order. That would allow us to use the Parser_DiffTest to make sure that for all use cases syntax remains the same. (And also allow us to benchmark).
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Rolf Lampa wrote:
Daniel Friesen skrev:
Timstarling commented that adding links into the preprocessor would change they way they are handled and would break cases like http://sandbox.wiki-tools.com/edit/FakeLink changing the syntax of WikiText in an incompatible way.
Yes, that's a good example.
<...>However my issue lies in the |, there is no strict handling of those and making them "safe" is handled by parsing links inside of the links before the | is broken up. Works good for the parser, but not for anything you want to send to a callback.
A temporary hint for extension writers (until a final generic solution is available in the framework) is to count the brackets and count the pipes only while at the "main-link" level, that is:
- Start a loop examining the string or string fragment.
- Count UP on [ brackets. // $BracketsCnt++
- Count DOWN on ] brackets. // $BracketsCnt--
- Count | (pipes) ONLY when BracketsCnt equals two // if ( $BracketsCnt = 2 )
PipeCnt++ 4. Break loop if more than one pipe // if ( $PipesCnt > 1 ) Exit; was found
This would determine this syntax error in this link "[[ | | ]]" as well as in "[[ | [[ | | ]] ]]" (on the second call if called recursively).
So the plan is to actually build an object tree similar to the Frames and Parts the preprocessor uses. This'll allow for better handling of things inside of callbacks.
Which php file do you recommend me to start look at for this logic?
Regards,
// Rolf Lampa