The logic for links lies in parser/Parser.php's
Parser::replceInternalLinks (Actually now it's
Parser::replaceInternalLinks2 you should see), TimStarling also made
some recent changes so you may also want to look at
parser/LinkHolderArray.php.
For the preprocessor stuff parser/Preprocessor_DOM.php, however the
logic there is actually fairly more complex than what we'll even need.
I'm trying to find a way to get nested things to work right, without
ruining TimStarling's recent improvements to the memory and speed of
that area of the parser.
I did a small benchmark between:
A) recursive call; Find [[, walk till the closing ]] and do a recursive
call for the stuff in between. (This is similar to what we currently do
now, though we limit to a depth of 2)
B) markers and a single hashtable; As we find [['s we create a stack of
offsets, when a ]] is found we pop the last offset, create a new token
for the hashtable with the contents in between, and replace the text
with a marker. (Though when expanding, we need to expand multiple times
because the content of markers can have markers inside of them as well)
For a real flat setup A) and B) are similar, though A) does have a
slightly lower footprint (But do note that this test is flat string
replacement recursion, there is no link holders setup and we don't run a
setup which we would be running multiple times with the recursion, so a
actual Parser implementation would likely be heavier). However, when you
get into an insane level of nested brackets, A) starts to take 10x the
time that B) takes. This would be why we limit to a depth of 2
recursions, but may actually make using B) to create a tree possible. Of
course, links would never be nested like that, but when you are using a
different order of parsing it does get needed.
I'm considering creating another parser (inheriting from Parser) in
order to start experimenting and working on a different order. That
would allow us to use the Parser_DiffTest to make sure that for all use
cases syntax remains the same. (And also allow us to benchmark).
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of:
-The Nadir-Point Group (
http://nadir-point.com)
--It's Wiki-Tools subgroup (
http://wiki-tools.com)
--The ElectronicMe project (
http://electronic-me.org)
--Games-G.P.S. (
http://ggps.org)
-And Wikia ACG on
Wikia.com (
http://wikia.com/wiki/Wikia_ACG)
--Animepedia (
http://anime.wikia.com)
--Narutopedia (
http://naruto.wikia.com)
Rolf Lampa wrote:
Daniel Friesen skrev:
Timstarling commented that adding links into the
preprocessor would
change they way they are handled and would break cases like
http://sandbox.wiki-tools.com/edit/FakeLink changing the syntax of
WikiText in an incompatible way.
Yes, that's a good example.
<...>However my issue lies in the |, there
is no
strict handling of those and making them "safe" is handled by parsing
links inside of the links before the | is broken up. Works good for the
parser, but not for anything you want to send to a callback.
A temporary hint for extension writers (until a final generic solution
is available in the framework) is to count the brackets and count the
pipes only while at the "main-link" level, that is:
0. Start a loop examining the string or string fragment.
1. Count UP on [ brackets. // $BracketsCnt++
2. Count DOWN on ] brackets. // $BracketsCnt--
3. Count | (pipes) ONLY when
BracketsCnt equals two // if ( $BracketsCnt = 2 )
PipeCnt++
4. Break loop if more than one pipe // if ( $PipesCnt > 1 ) Exit;
was found
This would determine this syntax error in this link
"[[ | | ]]" as well as in "[[ | [[ | | ]] ]]" (on the second call if
called recursively).
So the plan is to actually build an object tree
similar to the Frames
and Parts the preprocessor uses. This'll allow for better handling of
things inside of callbacks.
Which php file do you recommend me to start look at for this logic?
Regards,
// Rolf Lampa