I've noticed a growing in extensions extending link syntax. (Namely SMW's annotations, and other extensions using Embed:, Video:, or theoretically even Audio: namespaces for embedding things).
However all implementations have strong issues. We have an internal parsing of links, however when an extension does something it's customary to use a regex rather than duplicating a small part of the parser. This normally leads to either a limited syntax substandard of what the parser does, or a regex so complex it causes server errors when syntax is a bit broken (missing a trailing ]] ).
For that reason I'm looking into adding a new feature for the parser Link Hooks. Basically this would allow an extension to hook into link processing for a Namespace, or a pattern.
I plan to support a number of flags (Link/Media callbacks [link modification, vs. embedding], namespace/pattern [ns number, or a special pattern (like SMW's ::)], Multi-params [Pipe separated params rather than one display text], Recursive parameters [Things like Image: where links can be inside parameters], Recursive link text [For patterns which break things up and may contain links]) so it should handle most cases.
Unfortunately I hit a snag in the code when dealing with [[Embedablens:Page|Content with [[link|displaytext]] inside]]. I can't provide data to extensions in a sane way. Either plaintext is sent to them, and they work with that (albet breaking things like usual), or I try to split up the |'s which doesn't work with nested things, or I first parse the nested links, but then extensions get a hard to work with mess passed to them as their data.
The nice way the preprocessor works with objects has pointed me out that the best way this would work, would probably be to recursively parse the text into link objects, and then do our expansion, also allowing them access in special ways to the tree (Extract as WikiText, HTML, Plain Text).
Doing some research into the way the parser handles links at first provided me with good results ([[link [[inside of]] link]] nicely gives you a link to "inside of" with the outside stuff verbatim just as the processor I think of would do). However I ran into an ugly, sticky, mess with image embedding. http://dev.wiki-tools.com/wiki/LinkHook#Old_Tests (Ignore the fact my examples here don't have the frame option) [[Image:File.ext|Caption]] Renders as a image with "Caption" [[Image:File.ext|[[Image:File.ext|Caption]]]] Renders an image inside of another image that has a caption of "Caption". [[Image:File.ext|[[Image:File.ext|[[link]]]]]] Renders [[link]] as a link, the rest is completely verbatim.
Honestly, the syntax is inconsistent with itself. If we were trying to stop embeds inside of embeds, then the last one should render as an image, with a link to [[link]] and the other Image: verbatim as a caption.
I believe there is a bug about the 2nd case, if anyone has it handy I'd love a link. I hunted through bugzilla but couldn't find it.
Some use cases, what's expected would be nice.
My issue is that Image links are functionally supposed to be the same as a setLinkHook using the Media, Multi-params, and Recursive parameters options. (Embed but not with : at the start, pipe separated parameters, and parameters can have links inside of them). However, in terms of any extension or anything that would be using setLinkHook, something like that making use of the recursive parameters option would be expecting something different. [[Embed:Title|[[Otherembed:Title]] and [[link]]]] Would actually render as an embed, with two links (since it's inside of another embed the 'Otherembed' reverts to a link). And: [[Embed:Title|[[Otherembed:Title|[[link]]]]]] Would actually render as an embed, with a link to [[link]] and the rest of the caption verbatim.