On 25/04/05, Tim Starling timstarling@users.sourceforge.net wrote:
Modified Files: Parser.php Log Message: Cleared $wgLinkHolders after they are used. This fixes bug 1770 and probably improves performance too.
This change has the unfortunate side-effect that one can no longer replace the link holders from just one part of the page, as this will cause all the link targets to be effectively thrown away (with some rather amusing effects).
The only place I know that does this is the hack to make links within image captions not disappear. This is very hacky at the moment anyway, as it uses the $wgParser global from Linker.php, but I'm just about done rewriting that code such that that part (i.e. the parsing part) will be in Parser.php anyway.
Would it be ugly to add a parameter to switch off this clearing out, so that that approach could still be used?
Rowan Collins wrote:
On 25/04/05, Tim Starling timstarling@users.sourceforge.net wrote:
Modified Files: Parser.php Log Message: Cleared $wgLinkHolders after they are used. This fixes bug 1770 and probably improves performance too.
This change has the unfortunate side-effect that one can no longer replace the link holders from just one part of the page, as this will cause all the link targets to be effectively thrown away (with some rather amusing effects).
The only place I know that does this is the hack to make links within image captions not disappear. This is very hacky at the moment anyway, as it uses the $wgParser global from Linker.php, but I'm just about done rewriting that code such that that part (i.e. the parsing part) will be in Parser.php anyway.
Would it be ugly to add a parameter to switch off this clearing out, so that that approach could still be used?
As I said on #mediawiki when I was whinging about bug 1770, if a solution involves globals, it's the wrong solution. I'm not surprised that my quick hack caused side-effects, but I really hate link table corruption so I wanted to fix that first.
Perhaps a better solution would be to break up Skin::makeLinkObj() to the point where a workalike function can be put in Parser with minimal code duplication. The function in Parser would add link holders to a Parser member variable, and the Skin::makeLinkObj() would just resolve links immediately. Skin::makeLinkObj() would mainly be called for navigation links and special pages.
Note that in 1.5 I have added a LinkBatch class, which makes it easy for special pages to add batches of links to the link cache with a single DB query.
-- Tim Starling
On 26/04/05, Tim Starling t.starling@physics.unimelb.edu.au wrote:
On 25/04/05, Tim Starling timstarling@users.sourceforge.net wrote:
As I said on #mediawiki when I was whinging about bug 1770, if a solution involves globals, it's the wrong solution. I'm not surprised that my quick hack caused side-effects, but I really hate link table corruption so I wanted to fix that first.
Well, as I say, in the code I'm working on (will upload to http://bugzilla.wikimedia.org/show_bug.cgi?id=689 soon; no, really, I finally finished what I'd started) - and I notice in your recent change also - that hack doesn't require a global, because it is calling a Parser function from within Parser.
But that was, of course, beside the point: the question is, should it be possible to replace the link-holders on only part of the text, or is this not an intended ability? The way the function was implemented (using a global array and a string of text as an input) suggested that it didn't operate on the article as a whole, but emptying the array of link-holders each call violated that.
However, changes since seem to have reverted to this behaving as expected (the link-holders are replaced in the provided string, and others links still work).
Perhaps a better solution would be to break up Skin::makeLinkObj() to the point where a workalike function can be put in Parser with minimal code duplication. The function in Parser would add link holders to a Parser member variable, and the Skin::makeLinkObj() would just resolve links immediately. Skin::makeLinkObj() would mainly be called for navigation links and special pages.
As far as I understand what you're suggesting here, it doesn't really solve the current issue, which is that within an image caption, the links need to be parsed recursively, but not replaced with link-holders - so that, when used in alt and/or title attributes, the HTML tags can be stripped back out and leave the text.
The current approach does seem kind of wasteful, in that it replaces them with link-holders and then immediately resolves them afterwards; but in order to not replace them at all, the whole of replaceInternalLinks() needs to run in a "resolve immediately" kind of way. Was this what the "postParseLinkColor" system (which I notice you've just removed) used to do, or was that something different?
In general, which approach do you think seems better? (A) $foo = $this->replaceInternalLinks($foo); $foo = $this->replaceLinkHolders($foo); (B) $this->resolveLinksStraightAway(true); # or $skin->... or whatever $foo = $this->replaceInternalLinks($foo); $this->resolveLinksStraightAway(false); # or $skin->... or whatever
I guess the third option is to only use placeholders for the part of the markup that depends on the target's existence, which is presumably just a CSS class or whatever (thus leaving other parts of the code to separate the HTML tags from the displayed text). The problem then being people who want the old-style "link-name?" rather than colour-distinctions, I guess.
Rowan Collins wrote:
On 26/04/05, Tim Starling t.starling@physics.unimelb.edu.au wrote:
On 25/04/05, Tim Starling timstarling@users.sourceforge.net wrote:
As I said on #mediawiki when I was whinging about bug 1770, if a solution involves globals, it's the wrong solution. I'm not surprised that my quick hack caused side-effects, but I really hate link table corruption so I wanted to fix that first.
Well, as I say, in the code I'm working on (will upload to http://bugzilla.wikimedia.org/show_bug.cgi?id=689 soon; no, really, I finally finished what I'd started) - and I notice in your recent change also - that hack doesn't require a global, because it is calling a Parser function from within Parser.
Interesting, you suggested splitting makeImageLinkObj(), which is exactly the idea I came up with independently, and implemented. Great minds think alike.
But that was, of course, beside the point: the question is, should it be possible to replace the link-holders on only part of the text, or is this not an intended ability? The way the function was implemented (using a global array and a string of text as an input) suggested that it didn't operate on the article as a whole, but emptying the array of link-holders each call violated that.
However, changes since seem to have reverted to this behaving as expected (the link-holders are replaced in the provided string, and others links still work).
My original idea was to replace link placeholders in the output buffer, not the parser output text. This is the scheme I helped X13 to implement. This unfortunately had serious problems, and it was soon moved to Parser.php (1.329) by Will Mahan. His CVS log entry was:
"Move replaceLinkHolders() from OutputPage to Parser, because it needs to happen before unstripNoWiki() and before tidy. This also makes the parser more self-contained, so there is no need to create an OutputPage object for the parser tester."
The use of a global variable to hold the links was an inconsistent holdover from before this move, the parser state is the obvious place to put this information.
Emptying the link array on each call was a misguided attempt to fix bug 1770, I'm not advocating or defending it. My more recent changes were mostly code refactoring, I was careful not to change behaviour significantly.
Perhaps a better solution would be to break up Skin::makeLinkObj() to the point where a workalike function can be put in Parser with minimal code duplication. The function in Parser would add link holders to a Parser member variable, and the Skin::makeLinkObj() would just resolve links immediately. Skin::makeLinkObj() would mainly be called for navigation links and special pages.
As far as I understand what you're suggesting here, it doesn't really solve the current issue, which is that within an image caption, the links need to be parsed recursively, but not replaced with link-holders - so that, when used in alt and/or title attributes, the HTML tags can be stripped back out and leave the text.
The current approach does seem kind of wasteful, in that it replaces them with link-holders and then immediately resolves them afterwards; but in order to not replace them at all, the whole of replaceInternalLinks() needs to run in a "resolve immediately" kind of way. Was this what the "postParseLinkColor" system (which I notice you've just removed) used to do, or was that something different?
I don't think a "resolve immediately" mode is a good way to make the parser faster. Optimisation of DB query count is the most important thing. I considered replacing <!--LINK--> tags in the alt text with <!--TLINK-->, which would be later replaced by stripped-down link text, the only thing in my way was Sanitizer::stripAllTags(), which was removing the comments.
The resolve immediately mode was removed from Linker because its function has been replaced by Linker::makeLinkObj(), which always resolves immediately, and Parser::makeLinkHolder(), which never does. This removes the need to bracket calls to Linker::makeLinkObj() with mode changes.
In general, which approach do you think seems better? (A) $foo = $this->replaceInternalLinks($foo); $foo = $this->replaceLinkHolders($foo); (B) $this->resolveLinksStraightAway(true); # or $skin->... or whatever $foo = $this->replaceInternalLinks($foo); $this->resolveLinksStraightAway(false); # or $skin->... or whatever
I think A is better, because it requires one query, not one per link. But if we can delay the call to replaceLinkHolders() to once per parse rather than once per image caption, that would be much better than either.
I guess the third option is to only use placeholders for the part of the markup that depends on the target's existence, which is presumably just a CSS class or whatever (thus leaving other parts of the code to separate the HTML tags from the displayed text). The problem then being people who want the old-style "link-name?" rather than colour-distinctions, I guess.
Yes, I think that's a good idea. It's not just the class, it's also the URL. You'd have to process the HTML at the end of the transformation, find the links, do the existence check query, and modify the tags. To handle question marks you'd have to find the matching </a> and add something after it. If that's too hard, remove the feature, few people will miss it. Least of all developers like me who are obsessed with parser cache hit rate.
-- Tim Starling
Rowan Collins wrote:
This change has the unfortunate side-effect that one can no longer replace the link holders from just one part of the page, as this will cause all the link targets to be effectively thrown away (with some rather amusing effects).
I've done another quick hack to fix this problem on the live site, it remains unfixed in CVS, see my previous post for why.
-- Tim Starling
wikitech-l@lists.wikimedia.org