Seems like (at least) the API of #pos in ParserFunctions is different from the one in StringFunctions.
{{#pos: haysack|needle|offset}}
While the StringFunctions #pos in MediaWiki 1.14 returned an empty string when the needle was not found, the ParserFunctions implementation of #pos in svn now returns -1.
This is most unfortunate since current usage depends on this. Example:
{{#if: {{#pos: abcd|b}} | found | not found }}
{{#if: {{#pos: abcd|x}} | found | not found }}
Now both of these example will return "found"!
Usage scenario:
I try to use #pos in template calls to implement a sort-of-database functionality in a mediawiki.
I have a big template that contains data in named parameters. those parameters get passed along to a template that can select "columns" by rendering some of those named parameters and ignoring others.
Now I want to implement "row selection" by passing along a parameter name and a substring that should be in the value of that parameter in order for the data to be rendered.
something like this:
{{#if: {{#pos: {{{ {{{selectionattribute}}} }}} | {{{selectionvalue}}} }} | render_row | render_nothing }}
If I want this to work in different MediaWiki installations I need to rely on the API of #pos.
Currently there is seems to be no way to use #pos in a way that works on 1.14 and on 1.15-svn.
cheers -henrik
On Thu, Jun 04, 2009 at 03:55:38PM +0200, H. Langos wrote:
Seems like (at least) the API of #pos in ParserFunctions is different from the one in StringFunctions.
{{#pos: haysack|needle|offset}}
While the StringFunctions #pos in MediaWiki 1.14 returned an empty string when the needle was not found, the ParserFunctions implementation of #pos in svn now returns -1.
I forgot to ask THE question. Is it a bug or is there some good reason to break backward compatibility?
And no, programming language cosmetics is not a good reason. :-)
If something has the same interface, it should have the same behaviour. If the old semantics was too awful to bare, the new one should have been called #strpos or #fpos (for forward-#pos. #rpos always had the "-1 return on no-found" behaviour).
cheers -henrik
On 04/06/2009, at 3:46 PM, H. Langos wrote:
On Thu, Jun 04, 2009 at 03:55:38PM +0200, H. Langos wrote:
Seems like (at least) the API of #pos in ParserFunctions is different from the one in StringFunctions.
{{#pos: haysack|needle|offset}}
While the StringFunctions #pos in MediaWiki 1.14 returned an empty string when the needle was not found, the ParserFunctions implementation of #pos in svn now returns -1.
I forgot to ask THE question. Is it a bug or is there some good reason to break backward compatibility?
And no, programming language cosmetics is not a good reason. :-)
If something has the same interface, it should have the same behaviour. If the old semantics was too awful to bare, the new one should have been called #strpos or #fpos (for forward-#pos. #rpos always had the "-1 return on no-found" behaviour).
This should be left as a comment on the relevant revision in CodeReview. Note that it's likely irrelevant anyway, as, in all likelihood, the merge of String and Parser Functions will be reverted.
-- Andrew Garrett Contract Developer, Wikimedia Foundation agarrett@wikimedia.org http://werdn.us
On Thu, Jun 04, 2009 at 05:05:50PM +0100, Andrew Garrett wrote:
On 04/06/2009, at 3:46 PM, H. Langos wrote:
On Thu, Jun 04, 2009 at 03:55:38PM +0200, H. Langos wrote:
Seems like (at least) the API of #pos in ParserFunctions is different from the one in StringFunctions.
{{#pos: haysack|needle|offset}}
While the StringFunctions #pos in MediaWiki 1.14 returned an empty string when the needle was not found, the ParserFunctions implementation of #pos in svn now returns -1.
I forgot to ask THE question. Is it a bug or is there some good reason to break backward compatibility?
And no, programming language cosmetics is not a good reason. :-)
If something has the same interface, it should have the same behaviour. If the old semantics was too awful to bare, the new one should have been called #strpos or #fpos (for forward-#pos. #rpos always had the "-1 return on no-found" behaviour).
This should be left as a comment on the relevant revision in CodeReview. Note that it's likely irrelevant anyway, as, in all likelihood, the merge of String and Parser Functions will be reverted.
Sorry to bother you but I am not a wikimedia developer so I wouldn't know where to start looking.
Could you point me to the right place/list/article? The svn revision with the String and Parser Functions merge was 50997.
cheers -henrik
On Thu, Jun 4, 2009 at 9:05 AM, Andrew Garrett agarrett@wikimedia.org wrote:
On 04/06/2009, at 3:46 PM, H. Langos wrote:
On Thu, Jun 04, 2009 at 03:55:38PM +0200, H. Langos wrote:
Seems like (at least) the API of #pos in ParserFunctions is different from the one in StringFunctions.
{{#pos: haysack|needle|offset}}
While the StringFunctions #pos in MediaWiki 1.14 returned an empty string when the needle was not found, the ParserFunctions implementation of #pos in svn now returns -1.
I forgot to ask THE question. Is it a bug or is there some good reason to break backward compatibility?
And no, programming language cosmetics is not a good reason. :-)
If something has the same interface, it should have the same behaviour. If the old semantics was too awful to bare, the new one should have been called #strpos or #fpos (for forward-#pos. #rpos always had the "-1 return on no-found" behaviour).
This should be left as a comment on the relevant revision in CodeReview. Note that it's likely irrelevant anyway, as, in all likelihood, the merge of String and Parser Functions will be reverted.
Two devs, who shall remain nameless unless they choose to take credit for it, explicitly encouraged the merge. Personally, I've always thought it made more sense to keep these as separate extensions but I went along with what they encouraged me to do.
Regardless of whether it is one extension or two, I do strongly feel that once a technically acceptable implementation of string functions exists then it should be enabled on WMF sites. (I agree though that the previous StringFunctions was rightly excluded due to implementation problems.)
-Robert Rohde
I was privy to a #mediawiki conversation between brion/tim where tim pointed out that at least one person plans to implement a Natural Language Processing parser for English using StringFunctions just as soon as they are enabled.
It's pretty obvious that you can implement all sorts crazy algorithms using StringFunctions. They need to be limited so that is not possible.
On Thu, Jun 4, 2009 at 10:19 AM, Robert Rohde rarohde@gmail.com wrote:
On Thu, Jun 4, 2009 at 9:05 AM, Andrew Garrett agarrett@wikimedia.org wrote:
On 04/06/2009, at 3:46 PM, H. Langos wrote:
On Thu, Jun 04, 2009 at 03:55:38PM +0200, H. Langos wrote:
Seems like (at least) the API of #pos in ParserFunctions is different from the one in StringFunctions.
{{#pos: haysack|needle|offset}}
While the StringFunctions #pos in MediaWiki 1.14 returned an empty string when the needle was not found, the ParserFunctions implementation of #pos in svn now returns -1.
I forgot to ask THE question. Is it a bug or is there some good reason to break backward compatibility?
And no, programming language cosmetics is not a good reason. :-)
If something has the same interface, it should have the same behaviour. If the old semantics was too awful to bare, the new one should have been called #strpos or #fpos (for forward-#pos. #rpos always had the "-1 return on no-found" behaviour).
This should be left as a comment on the relevant revision in CodeReview. Note that it's likely irrelevant anyway, as, in all likelihood, the merge of String and Parser Functions will be reverted.
Two devs, who shall remain nameless unless they choose to take credit for it, explicitly encouraged the merge. Personally, I've always thought it made more sense to keep these as separate extensions but I went along with what they encouraged me to do.
Regardless of whether it is one extension or two, I do strongly feel that once a technically acceptable implementation of string functions exists then it should be enabled on WMF sites. (I agree though that the previous StringFunctions was rightly excluded due to implementation problems.)
-Robert Rohde
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, Jun 4, 2009 at 2:29 PM, BrianBrian.Mingus@colorado.edu wrote:
I was privy to a #mediawiki conversation between brion/tim where tim pointed out that at least one person plans to implement a Natural Language Processing parser for English using StringFunctions just as soon as they are enabled.
It's pretty obvious that you can implement all sorts crazy algorithms using StringFunctions. They need to be limited so that is not possible.
Note, though, that there are some that are already possible to some extent. You can use the core padright/padleft functions to emulate a couple of the added functions. E.g.:
http://en.wikipedia.org/w/index.php?title=Template:Str_len&action=edit
The most template-heavy pages already tend to run close to the template limits, until they're cut down by users when they fail. It's not clear to me that allowing more functions would actually increase overall load or template complexity significantly. It might decrease it by allowing simpler and more efficient implementations of things that currently need to be worked around. It can't really increase it too much, theoretically -- that's what the template limits are for.
Werdna points out that Tim did say this morning in #mediawiki that he'd probably revert the change.
On Thu, Jun 4, 2009 at 11:52 AM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
Note, though, that there are some that are already possible to some extent. You can use the core padright/padleft functions to emulate a couple of the added functions. E.g.:
http://en.wikipedia.org/w/index.php?title=Template:Str_len&action=edit
<snip>
I would like to note for the record that Brion explicitly "endorsed" the padleft hack to the degree that he re-enabled it after Werdna had removed it. [1]
Maybe he'd change his mind after looking at how the "string manipulation templates" are actually getting used (now in >20,000 enwiki pages and counting), but for the moment he seems to have supported allowing some form of hacked together string manipulation system into Mediawiki. To that end it makes more sense to have a real string implementation rather than the ridiculous templates we have now.
-Robert Rohde
[1] http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=47411
On 04/06/2009, at 8:03 PM, Robert Rohde wrote:
I would like to note for the record that Brion explicitly "endorsed" the padleft hack to the degree that he re-enabled it after Werdna had removed it. [1]
Maybe he'd change his mind after looking at how the "string manipulation templates" are actually getting used (now in >20,000 enwiki pages and counting), but for the moment he seems to have supported allowing some form of hacked together string manipulation system into Mediawiki. To that end it makes more sense to have a real string implementation rather than the ridiculous templates we have now.
I wouldn't read that into it. I think it's better characterised as reverting attempts to create an "arms race" over the hacks.
-- Andrew Garrett Contract Developer, Wikimedia Foundation agarrett@wikimedia.org http://werdn.us
On Thu, Jun 4, 2009 at 11:29 AM, Brian Brian.Mingus@colorado.edu wrote:
I was privy to a #mediawiki conversation between brion/tim where tim pointed out that at least one person plans to implement a Natural Language Processing parser for English using StringFunctions just as soon as they are enabled.
It's pretty obvious that you can implement all sorts crazy algorithms using StringFunctions. They need to be limited so that is not possible.
If you are referring to the conversation I think you are, then my impression was Tim was speaking hypothetically about the issue rather than knowing someone that had this specific intent.
I'm fairly dubious about anyone actually trying natural language processing to any serious degree. Real natural language processing needs huge lookup tables to identify part of speech and relationships etc. Technically possible I suppose, but not easy to do.
I'm even more dubious that full fledged natural language processing -- in templates -- would find significant uses. It is more efficient and more practical to view templates as simple formatting macros rather than as a system for real natural language interaction. There are very useful things that can be done with simple string algorithms, such as detecting the "(bar)" when given a title like "Foo (bar)", but I wouldn't expect anyone to be answering queries with them or anything like that.
When providing tools to content creators, flexibility is generally a positive design feature. We shouldn't go overboard with imposing limits in the advance of actual problems.
The current implementation is artificially limited to 1000 characters or less, which does prevent huge manipulations, however.
-Robert Rohde
On Thu, Jun 4, 2009 at 12:05 PM, Andrew Garrettagarrett@wikimedia.org wrote:
Note that it's likely irrelevant anyway, as, in all likelihood, the merge of String and Parser Functions will be reverted.
Have Tim or Brion said this? https://bugzilla.wikimedia.org/show_bug.cgi?id=6455#c36 is the only clear statement I've seen by either of them that I can recall.
On Thu, Jun 4, 2009 at 6:55 AM, H. Langos henrik-mw@prak.org wrote:
Seems like (at least) the API of #pos in ParserFunctions is different from the one in StringFunctions.
{{#pos: haysack|needle|offset}}
While the StringFunctions #pos in MediaWiki 1.14 returned an empty string when the needle was not found, the ParserFunctions implementation of #pos in svn now returns -1.
<snip>
Prior to the merge 100% of the StringFunction function calls were reimplemented, principally for performance and security reasons.
The short but uninspired answer to your question is that in doing that I didn't notice that #pos and #rpos had different default behavior. Given the way that #if works, returning empty string is a reasonable response to a string-not-found condition, and I am happy to change that back. I'll also recheck to make sure there aren't any other unexpected behavioral changes.
Though they don't have to have the same behavior, I'd be inclined to argue that #pos and #rpos really ought to have the same default behavior on usability grounds, i.e. either both giving -1 or both giving empty string when a match is not found. Though since that does create compatibility issues with existing StringFunctions users, I'll defer to others about whether consistency would be a good enough motivation in this case.
I should warn you though that there is an intentional behavioral change regarding the handling of strip markers. The pre-existing StringFunctions codebase reacted to strip markers in a way that was inefficient, hard for the end user to predict, and in specially crafted cases created security issues.
The following example is illustrative of the change.
Consider the string "ABC<nowiki>jkl</nowiki>DEF<nowiki>mno</nowiki>GHI"
In the new implementation this is treated internally as "ABCDEFGHI" by the string routines. Hence it's length is 9 and it's first five characters are ABCDE.
For complicated reasons the StringFunctions version says its length is 7 and the first "five" characters are ABCjklDEFmnoG.
-Robert Rohde
On Thu, Jun 04, 2009 at 09:11:31AM -0700, Robert Rohde wrote:
On Thu, Jun 4, 2009 at 6:55 AM, H. Langos henrik-mw@prak.org wrote:
Seems like (at least) the API of #pos in ParserFunctions is different from the one in StringFunctions.
{{#pos: haysack|needle|offset}}
While the StringFunctions #pos in MediaWiki 1.14 returned an empty string when the needle was not found, the ParserFunctions implementation of #pos in svn now returns -1.
<snip>
Prior to the merge 100% of the StringFunction function calls were reimplemented, principally for performance and security reasons.
The short but uninspired answer to your question is that in doing that I didn't notice that #pos and #rpos had different default behavior. Given the way that #if works, returning empty string is a reasonable response to a string-not-found condition, and I am happy to change that back. I'll also recheck to make sure there aren't any other unexpected behavioral changes.
That would be very much apreciated.
Though they don't have to have the same behavior, I'd be inclined to argue that #pos and #rpos really ought to have the same default behavior on usability grounds, i.e. either both giving -1 or both giving empty string when a match is not found. Though since that does create compatibility issues with existing StringFunctions users, I'll defer to others about whether consistency would be a good enough motivation in this case.
I'd argue that eventhough pos and rpos are very similar functions, their use cases are very dissimilar. I.e. as long as there is no (regex)match function the #pos function is defacto its replacement.
I should warn you though that there is an intentional behavioral change regarding the handling of strip markers. The pre-existing StringFunctions codebase reacted to strip markers in a way that was inefficient, hard for the end user to predict, and in specially crafted cases created security issues.
The following example is illustrative of the change.
Consider the string "ABC<nowiki>jkl</nowiki>DEF<nowiki>mno</nowiki>GHI"
In the new implementation this is treated internally as "ABCDEFGHI" by the string routines. Hence it's length is 9 and it's first five characters are ABCDE.
For complicated reasons the StringFunctions version says its length is 7 and the first "five" characters are ABCjklDEFmnoG.
That change sounds rather like a bugfix than changing an "intended" behaviour. :-)
BTW: Is there a way to find articles that use ParserFunctions like there is a way to locate usage of Templates ? This would allow users to locate all places that a user would need to pay attention to when upgrading their mediawiki installation?
cheers -henrik
wikitech-l@lists.wikimedia.org