Re: [Wikitech-l] New Project, Link Hooks... Needing some research

18 Aug 2008

The logic for links lies in parser/Parser.php's 
Parser::replceInternalLinks (Actually now it's 
Parser::replaceInternalLinks2 you should see), TimStarling also made 
some recent changes so you may also want to look at 
parser/LinkHolderArray.php.

For the preprocessor stuff parser/Preprocessor_DOM.php, however the 
logic there is actually fairly more complex than what we'll even need.

I'm trying to find a way to get nested things to work right, without 
ruining TimStarling's recent improvements to the memory and speed of 
that area of the parser.
I did a small benchmark between:
A) recursive call; Find [[, walk till the closing ]] and do a recursive 
call for the stuff in between. (This is similar to what we currently do 
now, though we limit to a depth of 2)
B) markers and a single hashtable; As we find [['s we create a stack of 
offsets, when a ]] is found we pop the last offset, create a new token 
for the hashtable with the contents in between, and replace the text 
with a marker. (Though when expanding, we need to expand multiple times 
because the content of markers can have markers inside of them as well)

For a real flat setup A) and B) are similar, though A) does have a 
slightly lower footprint (But do note that this test is flat string 
replacement recursion, there is no link holders setup and we don't run a 
setup which we would be running multiple times with the recursion, so a 
actual Parser implementation would likely be heavier). However, when you 
get into an insane level of nested brackets, A) starts to take 10x the 
time that B) takes. This would be why we limit to a depth of 2 
recursions, but may actually make using B) to create a tree possible. Of 
course, links would never be nested like that, but when you are using a 
different order of parsing it does get needed.

I'm considering creating another parser (inheriting from Parser) in 
order to start experimenting and working on a different order. That 
would allow us to use the Parser_DiffTest to make sure that for all use 
cases syntax remains the same. (And also allow us to benchmark).

~Daniel Friesen(Dantman, Nadir-Seen-Fire) of:
-The Nadir-Point Group (http://nadir-point.com)
--It's Wiki-Tools subgroup (http://wiki-tools.com)
--The ElectronicMe project (http://electronic-me.org)
--Games-G.P.S. (http://ggps.org)
-And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
--Animepedia (http://anime.wikia.com)
--Narutopedia (http://naruto.wikia.com)

Rolf Lampa wrote:
...
  Daniel Friesen skrev:

  Timstarling commented that adding links into the
preprocessor would 
 change they way they are handled and would break cases like 
 http://sandbox.wiki-tools.com/edit/FakeLink changing the syntax of 
 WikiText in an incompatible way.

 Yes, that's a good example.

  <...>However my issue lies in the |, there
is no 
 strict handling of those and making them "safe" is handled by parsing 
 links inside of the links before the | is broken up. Works good for the 
 parser, but not for anything you want to send to a callback.

 A temporary hint for extension writers (until a final generic solution 
 is available in the framework) is to count the brackets and count the 
 pipes only while at the "main-link" level, that is:

 0. Start a loop examining the string or string fragment.
 1. Count UP on [ brackets.          // $BracketsCnt++
 2. Count DOWN on ] brackets.        // $BracketsCnt--
 3. Count | (pipes) ONLY when
     BracketsCnt equals two           // if ( $BracketsCnt = 2 ) 
 PipeCnt++
 4. Break loop if more than one pipe // if ( $PipesCnt > 1 ) Exit;
     was found

 This would determine this syntax error in this link
 "[[ | | ]]" as well as in "[[ | [[ | | ]] ]]" (on the second call if

 called recursively).

  So the plan is to actually build an object tree
similar to the Frames 
 and Parts the preprocessor uses. This'll allow for better handling of 
 things inside of callbacks.

 Which php file do you recommend me to start look at for this logic?

 Regards,

 // Rolf Lampa

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] New Project, Link Hooks... Needing some research