userCan changes

List overview All Threads
Download

newer

older

enwiki mwdumper slows down, memory...

Squid, vary-cookie and Google...

Andrew Garrett

9 Jul 2007 9 Jul '07

7:12 a.m.

Hi all,

After a brief planning discussion with Tim a few days ago, I'm re-working the Title::userCan method. Once I'm done, and after code review, on failure, the method will return, instead of false (although that will still be returned for unexplainable events) an array consisting of a message name, followed by a number of parameters which are the arguments to the message. For example, being unable to edit due to a block will return:

array ( $block->mAuto ? 'autoblockedtext' : 'blockedtext', $id, $reason, $ip, $name, $blockid, $blockExpiry, $intended );

And being unable to edit due to cascading protection will return:

array( 'cascadeprotected', array_len( $cascadingSources ), $pages );

The purpose of these changes is to streamline the way access failures are reported to the client. Currently, if userCan returns false, a correct error message is guessed by the calling method. After these changes, the message will be generated when the access failure is detected. In addition, this method will be the only method that needs to be called in order to ascertain whether a user can perform a specific action on a specific page: block checks, readonly checks, and other stuff is being folded into it, too.

This alteration is part of an effort I'm putting in to clean up and streamline the permissions system.

Questions, comments, objections, abuse, and other feedback is welcome, and will be appreciated.

Andrew

Show replies by date

Christensen, Courtney

9 Jul 9 Jul

8:13 a.m.

Andrew said: ------------- After a brief planning discussion with Tim a few days ago, I'm re-working the Title::userCan method. Once I'm done, and after code review, on failure, the method will return, instead of false (although that will still be returned for unexplainable events) an array consisting of a message name, followed by a number of parameters which are the arguments to the message. For example, being unable to edit due to a block will return: -------------

How will this affect current behavior? Are you also going to be changing functions that expect to receive false from the old working of Title::userCan? When do you expect this change to be "official"?

I need to start checking my extensions and tagging anything that uses userCan.

Thanks! -Courtney

Andrew Garrett

8:18 a.m.

On 7/9/07, Christensen, Courtney ChristensenC@battelle.org wrote:

...

How will this affect current behavior? Are you also going to be changing functions that expect to receive false from the old working of Title::userCan?

This is half of the work I've done so far.. My changes currently affect about 40-odd files.

...

When do you expect this change to be "official"?

I don't know. I spoke to Tim about it, so I expect it's sane. However, it's still subject to the opinions of other developers, including Brion. The change has not yet been applied.

...

I need to start checking my extensions and tagging anything that uses userCan.

Thanks! -Courtney

Rob Church

8:27 a.m.

On 09/07/07, Andrew Garrett andrew@epstone.net wrote:

...

This is half of the work I've done so far.. My changes currently affect about 40-odd files.

Just a quick note; if you're adding any cleverness to the parser (section edit links, etc.), don't forget to maintain cache coherency.

Probably just paranoia, but it's caught me out before, so...

...

I don't know. I spoke to Tim about it, so I expect it's sane. However, it's still subject to the opinions of other developers, including Brion. The change has not yet been applied.

My gut says it might be saner to return a simple array( const, message ), where const is one of a set of constants defined somewhere, perhaps abusing a class for namespace purposes, which provides a simple meaningful check. You might as well render the message *for* the caller, since you're basically passing back all the pieces.

Rob Church

Andrew Garrett

8:32 a.m.

On 7/9/07, Rob Church robchur@gmail.com wrote:

...

Just a quick note; if you're adding any cleverness to the parser (section edit links, etc.), don't forget to maintain cache coherency.

I don't understand what you mean, here. I'm not adding any cleverness to the parser, though, so I suppose it's not really a big deal.

...

My gut says it might be saner to return a simple array( const, message ), where const is one of a set of constants defined somewhere, perhaps abusing a class for namespace purposes, which provides a simple meaningful check. You might as well render the message *for* the caller, since you're basically passing back all the pieces.

That's a really interesting, and good, idea. It makes interpreting the data for purposes other than simply displaying the message much easier (and it saves an issue I had with the cascading protection message. Thanks, Rob!

...

Rob Church

Andrew Garrett

8:34 a.m.

On 7/9/07, Andrew Garrett andrew@epstone.net wrote:

...

On 7/9/07, Rob Church robchur@gmail.com wrote:

...
My gut says it might be saner to return a simple array( const, message ), where const is one of a set of constants defined somewhere, perhaps abusing a class for namespace purposes, which provides a simple meaningful check. You might as well render the message *for* the caller, since you're basically passing back all the pieces.

That's a really interesting, and good, idea. It makes interpreting the data for purposes other than simply displaying the message much easier (and it saves an issue I had with the cascading protection message. Thanks, Rob!

On second thoughts, I wonder if this'd make it harder to take and use the parameters, which are passed back in the array currently, but you'd have to do some string parsing to get them with your suggestion...

Andrew

Rob Church

9:03 a.m.

On 09/07/07, Andrew Garrett andrew@epstone.net wrote:

...

On second thoughts, I wonder if this'd make it harder to take and use the parameters, which are passed back in the array currently, but you'd have to do some string parsing to get them with your suggestion...

Yes, I thought of that one a minute ago, too.

The main thing I'm concerned about here is that the first parameter be some sort of constant value, because message keys change for various reasons, and we want to be as free as possible to change them when needed without hunting down 49 odd files...

...although I suppose we could just do a s/Simetrical// ;)

Rob Church

Andrew Garrett

9:06 a.m.

On 7/9/07, Rob Church robchur@gmail.com wrote:

...

On 09/07/07, Andrew Garrett andrew@epstone.net wrote:

...
On second thoughts, I wonder if this'd make it harder to take and use the parameters, which are passed back in the array currently, but you'd have to do some string parsing to get them with your suggestion...

The main thing I'm concerned about here is that the first parameter be some sort of constant value, because message keys change for various reasons, and we want to be as free as possible to change them when needed without hunting down 49 odd files...

I suppose it wouldn't kill any small furry animals if we tacked a constant on the beginning and shifted it off before passing to call_user_func_array( 'wfMsg' )...

...

...although I suppose we could just do a s/Simetrical// ;)

Rob Church

Rob Church

9:11 a.m.

On 09/07/07, Andrew Garrett andrew@epstone.net wrote:

...

I suppose it wouldn't kill any small furry animals if we tacked a constant on the beginning and shifted it off before passing to call_user_func_array( 'wfMsg' )...

You could just use the message key as the constant value, of course, then you wouldn't need to shift it off at all.

Rob Church

Andrew Garrett

9:13 a.m.

On 7/9/07, Rob Church robchur@gmail.com wrote:

...

On 09/07/07, Andrew Garrett andrew@epstone.net wrote:

...
I suppose it wouldn't kill any small furry animals if we tacked a constant on the beginning and shifted it off before passing to call_user_func_array( 'wfMsg' )...

You could just use the message key as the constant value, of course, then you wouldn't need to shift it off at all.

That was my original intention. I was under the impression that your objection was that message keys can change, for various reasons, and that it would be better to have a separate constant that wasn't being used for anything else.

...

Rob Church

Andrew

Simetrical

9:08 a.m.

On 7/9/07, Rob Church robchur@gmail.com wrote:

...

The main thing I'm concerned about here is that the first parameter be some sort of constant value, because message keys change for various reasons, and we want to be as free as possible to change them when needed without hunting down 49 odd files...

...although I suppose we could just do a s/Simetrical// ;)

For what it's worth, I agree constants are a better way to go. :P

Andrew Garrett

9:47 a.m.

After a conversation with Brion, my intentions have changed. I intend, now, to maintain userCan in its current form, as a wrapper for a new function which will have the changes made to it.

Andrew

Jim Wilson

4:21 p.m.

All of this sounds very interesting, and I wonder if any of it will affect how hooking methods to the 'userCan' hook operate?

That is, will methods which hook 'userCan' be required (or allowed) to follow the same array-based convention? It might be nice from an extension-development standpoint to be able to specify why it is that someone is being denied in the same manner as the core.

-- Jim R. Wilson (jimbojw)

On 7/9/07, Andrew Garrett andrew@epstone.net wrote:

...

After a conversation with Brion, my intentions have changed. I intend, now, to maintain userCan in its current form, as a wrapper for a new function which will have the changes made to it.

Andrew

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l

Simetrical

4:27 p.m.

On 7/9/07, Jim Wilson wilson.jim.r@gmail.com wrote:

...

All of this sounds very interesting, and I wonder if any of it will affect how hooking methods to the 'userCan' hook operate?

That is, will methods which hook 'userCan' be required (or allowed) to follow the same array-based convention? It might be nice from an extension-development standpoint to be able to specify why it is that someone is being denied in the same manner as the core.

It *should* go without saying that extensions should be able to use the new version of the function in the same way as core can.

Andrew Garrett

10 Jul 10 Jul

1:05 a.m.

On 7/10/07, Simetrical Simetrical+wikilist@gmail.com wrote:

...

On 7/9/07, Jim Wilson wilson.jim.r@gmail.com wrote:

...
All of this sounds very interesting, and I wonder if any of it will affect how hooking methods to the 'userCan' hook operate?

That is, will methods which hook 'userCan' be required (or allowed) to follow the same array-based convention? It might be nice from an extension-development standpoint to be able to specify why it is that someone is being denied in the same manner as the core.

Hooks will be allowed, but not required, to follow the new convention. As always, possible responses will be 'true' for "yes", 'false' for "no", or 'null' for "no opinion". Of course, the hook will still be allowed to return an answer with the new array syntax.

...

It *should* go without saying that extensions should be able to use the new version of the function in the same way as core can.

Simetrical

1:16 a.m.

On 7/10/07, Andrew Garrett andrew@epstone.net wrote:

...

As always, possible responses will be 'true' for "yes", 'false' for "no", or 'null' for "no opinion".

Well, except that the last is no longer a possible response. You silly out-of-date runner-away.

Andrew Garrett

2:42 a.m.

On 7/10/07, Simetrical Simetrical+wikilist@gmail.com wrote:

...

On 7/10/07, Andrew Garrett andrew@epstone.net wrote:

...
As always, possible responses will be 'true' for "yes", 'false' for "no", or 'null' for "no opinion".

Well, except that the last is no longer a possible response. You silly out-of-date runner-away.

That seems a little silly. Shouldn't we allow userCan hooks to have a response "yes, if you pass the rest of the tests" (i.e. blocking, protection, cascading protection user css/js). My reading of the code is that that works.

Andrew

Rob Church

3:45 a.m.

On 10/07/07, Andrew Garrett andrew@epstone.net wrote:

...

Hooks will be allowed, but not required, to follow the new convention. As always, possible responses will be 'true' for "yes", 'false' for "no", or 'null' for "no opinion". Of course, the hook will still be allowed to return an answer with the new array syntax.

That isn't how hooks work, however; the userCan hook always has been somewhat broken in this respect.

A hook must return a boolean, which indicates whether or not the standard process to complete an operation should proceed; this boolean is also responsible for allowing the rest of the hook chain to complete. For example, an authentication hook would usually return false, to prevent later, more lenient hooks, from overriding it.

To pass more meaningful results into the caller, we generally use the good old-fashioned, C-style method of passing a reference (no pointers) to a variable which should contain the result. This could be a boolean, or it could be null, as you describe.

Rob Church

Andrew Garrett

8:06 a.m.

On 7/10/07, Rob Church robchur@gmail.com wrote:

...

On 10/07/07, Andrew Garrett andrew@epstone.net wrote:

...
Hooks will be allowed, but not required, to follow the new convention. As always, possible responses will be 'true' for "yes", 'false' for "no", or 'null' for "no opinion". Of course, the hook will still be allowed to return an answer with the new array syntax.

That isn't how hooks work, however; the userCan hook always has been somewhat broken in this respect.

A hook must return a boolean, which indicates whether or not the standard process to complete an operation should proceed; this boolean is also responsible for allowing the rest of the hook chain to complete. For example, an authentication hook would usually return false, to prevent later, more lenient hooks, from overriding it.

To pass more meaningful results into the caller, we generally use the good old-fashioned, C-style method of passing a reference (no pointers) to a variable which should contain the result. This could be a boolean, or it could be null, as you describe.

I stand corrected. What is your suggestion with respect to this, then? Break backwards-compatibility and require the hook to accept a variable reference to fill with its response (true, false, null or an array), create a new hook, retaining the previous one for the sake of backwards-compatibility, or something different?

...

Rob Church

-- Andrew Garrett

Jim Wilson

5:30 p.m.

...

What is your suggestion with respect to this, then?

My understanding of userCan (which may be erroneous) was that the return value must be a boolean that means semantically "continue hook processing?" and that the $result var was to contain the result (if any).

It's possible that an implementor of userCan may only be interested in certain titles, and so returning true with no $result is totally valid. Consider for example an extension which limits access to User pages - returning true and not touching &$result would be the correct behavior for non-User pages.

It would seem to me appropriate to allow $result to be either a boolean or an array depending on whether the hook is an old implementation or wants to leverage the new api.

-- Jim

On 7/10/07, Andrew Garrett andrew@epstone.net wrote:

...

On 7/10/07, Rob Church robchur@gmail.com wrote:

...
On 10/07/07, Andrew Garrett andrew@epstone.net wrote:

...
Hooks will be allowed, but not required, to follow the new convention. As always, possible responses will be 'true' for "yes", 'false' for "no", or 'null' for "no opinion". Of course, the hook will still be allowed to return an answer with the new array syntax.

That isn't how hooks work, however; the userCan hook always has been somewhat broken in this respect.

A hook must return a boolean, which indicates whether or not the standard process to complete an operation should proceed; this boolean is also responsible for allowing the rest of the hook chain to complete. For example, an authentication hook would usually return false, to prevent later, more lenient hooks, from overriding it.

To pass more meaningful results into the caller, we generally use the good old-fashioned, C-style method of passing a reference (no pointers) to a variable which should contain the result. This could be a boolean, or it could be null, as you describe.

I stand corrected. What is your suggestion with respect to this, then? Break backwards-compatibility and require the hook to accept a variable reference to fill with its response (true, false, null or an array), create a new hook, retaining the previous one for the sake of backwards-compatibility, or something different?

...
Rob Church

-- Andrew Garrett

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l

Andrew Garrett

7:13 p.m.

On 7/11/07, Jim Wilson wilson.jim.r@gmail.com wrote:

...

...
What is your suggestion with respect to this, then?

My understanding of userCan (which may be erroneous) was that the return value must be a boolean that means semantically "continue hook processing?" and that the $result var was to contain the result (if any).

It's possible that an implementor of userCan may only be interested in certain titles, and so returning true with no $result is totally valid. Consider for example an extension which limits access to User pages - returning true and not touching &$result would be the correct behavior for non-User pages.

It would seem to me appropriate to allow $result to be either a boolean or an array depending on whether the hook is an old implementation or wants to leverage the new api.

-- Jim

I've rewritten the hook call like so. Code review, et cetera welcomed:

$result = null; $hooked = false; $hooked = wfRunHooks( 'userCan', array( &$this, &$wgUser, $action, &$result ) ); if ( !$hooked ) { wfProfileOut( $fname ); return $result; }

-- Andrew

Yuri Astrakhan

7:54 p.m.

Two cents from the API dept:

In order to quickly and efficiently operate, the API must know what *read* rights the user has *before* performing db page queries.

Example: get 100 page titles Normal operations: API has to query with limit 101, add each to the result. If there is 101st row, more data is available, so add "from=nextTitle" to let the client know how to get more results.

* If "can read" rule is page-based, API has to check each page title before adding to the result. So the user may get < 100 titles, even nothing, but still get a "from=xxx" to continue paging. But what if that xxx is also non-visible? API may perform another query, try to find the next readable title... In our current scenario, this might mean the entire NS will have to be read until the user realizes that the whole NS is blocked for her.

Proposed solution: Only allow the entire NS to be hidden from a everybody/group/user. In such case, the API will refuse to get data for any pages in that NS. In case there is a white list allowing a specific page to be visible (like Main or Login), API ignores it as well, unless the page is asked specifically by name.

Simetrical

8:06 p.m.

On 7/10/07, Yuri Astrakhan yuriastrakhan@gmail.com wrote:

...

In case there is a white list allowing a specific page to be visible (like Main or Login), API ignores it as well, unless the page is asked specifically by name.

Why is that? Does it make the query less efficient to use WHERE page_namespace IN (.......) OR page_id IN (.......)?

Yuri Astrakhan

9:13 p.m.

The very important part is that no hook should be able to dynamically say that a page cannot be viewed. As for whitelists, I am not sure it's efficient to pass a list of whitelisted items in every query:

SELECT ... WHERE ... AND ( xx_namespace NOT IN (banned NSs) OR (xx_namespace IN (banned NSs) AND (xx_title IN (whitelist) )

Instead of just "xx_namespace NOT IN (bannedr NSs)". But this is a question for Domas.

This is only for reading. Modification is totally different.

--Yurik

On 7/10/07, Simetrical Simetrical+wikilist@gmail.com wrote:

...

On 7/10/07, Yuri Astrakhan yuriastrakhan@gmail.com wrote:

...
In case there is a white list allowing a specific page to be visible (like Main or Login), API ignores it as well, unless the page is asked specifically by name.

Why is that? Does it make the query less efficient to use WHERE page_namespace IN (.......) OR page_id IN (.......)?

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l

Simetrical

9:21 p.m.

On 7/10/07, Yuri Astrakhan yuriastrakhan@gmail.com wrote:

...

The very important part is that no hook should be able to dynamically say that a page cannot be viewed. As for whitelists, I am not sure it's efficient to pass a list of whitelisted items in every query:

SELECT ... WHERE ... AND ( xx_namespace NOT IN (banned NSs) OR (xx_namespace IN (banned NSs) AND (xx_title IN (whitelist) )

Instead of just "xx_namespace NOT IN (bannedr NSs)".

It would only need to be xx_namespace NOT IN (banned NSes) OR xx_id IN (whitelist). See, those logic classes are good for something:

!p || (p && q) <=> (!p || p) && (!p || q) <=> !p || q

I shouldn't imagine that would be any less efficient than what you currently do. You should probably look at EXPLAIN and see.

Yuri Astrakhan

9:37 p.m.

Good point, I don't need the second "xx_namespace IN (banned NSs)" :)

We still need to do it by the page title, not pageIDs for two reasons: 1) an extra query is needed to resolve whitelist into pageids (can be memcached, but why?) 2) only 'page' table has both the ns+title and the pageid, and queries against any other table need to filter by title (e.g. pagelinks - the fromid will be verified before querying, but the ns & title still need to be checked).

--Yurik

On 7/10/07, Simetrical Simetrical+wikilist@gmail.com wrote:

...

On 7/10/07, Yuri Astrakhan yuriastrakhan@gmail.com wrote:

...
The very important part is that no hook should be able to dynamically say that a page cannot be viewed. As for whitelists, I am not sure it's efficient to pass a list of whitelisted items in every query:

SELECT ... WHERE ... AND ( xx_namespace NOT IN (banned NSs) OR (xx_namespace IN (banned NSs) AND (xx_title IN (whitelist) )

Instead of just "xx_namespace NOT IN (bannedr NSs)".

It would only need to be xx_namespace NOT IN (banned NSes) OR xx_id IN (whitelist). See, those logic classes are good for something:

!p || (p && q) <=> (!p || p) && (!p || q) <=> !p || q

I shouldn't imagine that would be any less efficient than what you currently do. You should probably look at EXPLAIN and see.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l

Simetrical

10:42 p.m.

On 7/10/07, Yuri Astrakhan yuriastrakhan@gmail.com wrote:

...

We still need to do it by the page title, not pageIDs for two reasons:

Then you need to account for the namespaces too. *That's* probably not going to make MySQL happy: (xx_namespace = ... AND xx_title = ...) OR (xx_namespace = ... AND xx_title = ...) OR ... I don't think there's a way to do that with IN.

Yuri Astrakhan

10:59 p.m.

On 7/10/07, Simetrical Simetrical+wikilist@gmail.com wrote:

...

Then you need to account for the namespaces too. *That's* probably not going to make MySQL happy: (xx_namespace = ... AND xx_title = ...) OR (xx_namespace = ... AND xx_title = ...) OR ... I don't think there's a way to do that with IN.

We only have to do this if we have a blacklist for an otherwise readable namespace. I don't think we have such feature. On the other hand, if whitelists allow multiple namespaces, it can get fairly ugly:

WHERE ... AND ( xx_namespace NOT IN (banned NSs) OR (xx_namespace = banned_ns1 AND xx_title IN (ns1_whitelist) OR (xx_namespace = banned_ns2 AND xx_title IN (ns2_whitelist) .... )

In short - its easier to restrict API from performing "get all pages in a non-readable NS". API would still give the content of the whitelisted pages - individual page rights validation there.

If we allow individual page visibility to be controlled through a plugin, API could become very inefficient.

Thomas Dalton

11 Jul 11 Jul

8:43 a.m.

...

Then you need to account for the namespaces too. *That's* probably not going to make MySQL happy: (xx_namespace = ... AND xx_title = ...) OR (xx_namespace = ... AND xx_title = ...) OR ... I don't think there's a way to do that with IN.

(xx_namespace=... AND xx_title IN (whitelist1)) OR (xx_namespace=... AND xx_title IN (whitelist2)) OR ...

That reduces it to one term per namespace rather than one term per whitelist item. As long as you don't include namespaces for which the whitelist is empty, that should be an improvement.

Rob Church

3:36 a.m.

On 11/07/07, Andrew Garrett andrew@epstone.net wrote:

...

I've rewritten the hook call like so. Code review, et cetera welcomed:

$result = null; $hooked = false; $hooked = wfRunHooks( 'userCan', array( &$this, &$wgUser, $action, &$result ) ); if ( !$hooked ) { wfProfileOut( $fname ); return $result; }

1. Lose the temporary variable 2. Don't need to pass objects by reference any more

Rob Church

Andrew Garrett

3:51 a.m.

On 7/11/07, Rob Church robchur@gmail.com wrote:

...

Don't need to pass objects by reference any more

wfRunHooks( 'userCan', array( $this, $wgUser, $action, $result ) )?

...

Rob Church

-- Andrew Garrett

Rob Church

3:53 a.m.

On 11/07/07, Andrew Garrett andrew@epstone.net wrote:

...

wfRunHooks( 'userCan', array( $this, $wgUser, $action, $result ) )?

No, objects don't need to be passed by reference, but variables you're going to want the callback to change (like $result) do.

Rob Church

Andrew Garrett

3:55 a.m.

On 7/11/07, Rob Church robchur@gmail.com wrote:

...

On 11/07/07, Andrew Garrett andrew@epstone.net wrote:

...
wfRunHooks( 'userCan', array( $this, $wgUser, $action, $result ) )?

No, objects don't need to be passed by reference, but variables you're going to want the callback to change (like $result) do.

Oh, right.

I have a few changes to make, as suggested by Simetrical, and then I'll post the revised patch for review.

...

Rob Church

-- Andrew Garrett

Simetrical

9 Jul 9 Jul

8:38 a.m.

On 7/9/07, Andrew Garrett andrew@epstone.net wrote:

...

Hi all,

After a brief planning discussion with Tim a few days ago, I'm re-working the Title::userCan method. . . .

Definitely a good thing. But while you're at it, I have another proposal related to it:

[070706 14:52:03] <Simetrical> brion-office, by the way, I've decided I hate Title::userCan and want to scrap it in favor of User::hasPermission. So instead of $title->userCan( 'edit' ), $wgUser->hasPermission( 'edit', $title ). This reads better, allows it to be called on non-$wgUser users, and allows it to be called with different things for different permissions (e.g. $user1->hasPermission( 'block', $user2 ) ). [070706 14:52:07] <Simetrical> Sound good? [070706 14:53:26] <robchurch> Why not just pass a User to Title::userCan() ? [070706 14:53:36] <robchurch> User::hasPermission() sounds too much like User::isAllowed() [070706 14:53:47] <Simetrical> Er, that's what I meant. [070706 14:53:52] <Simetrical> Just add a second parameter to that. [070706 14:54:00] <Simetrical> I never could remember its bloody name. [070706 14:56:23] <robchurch> Simetrical: What about backwards compatibility with all the existing uses of Title::userCan(), specifically by hooks? [070706 14:56:33] <Simetrical> robchurch, obviously it's just going to be deprecated, not removed entirely. [070706 14:57:01] <robchurch> What specific use case is prompting you to want this change? [070706 14:57:14] <Simetrical> A discussion last night. [070706 14:57:23] <brion-office> Simetrical: hmmmmmmm [070706 14:57:35] <Simetrical> Also, I always hated the call syntax for that function, it's completely backwards and harder to understand at first glance. [070706 14:57:49] <Simetrical> You can't say that for $wgUser->isAllowed( 'edit', $title );

Note that on reflection, I figured it would be best to use a different function name to preserve reverse compatibility, although it's always a shame when a nice name like User::isAllowed has to be thrown out.

I had actually been pondering exactly the same sort of change to return values for errors as you propose, too, as part of the permissions cleanup. A couple of thoughts:

1) Returning something that evaluates to true on an error is sketchy. I thought about this and concluded it would probably be best, for readability, if a separate function were used to retrieve errors. Then the call syntax would be something like

if( !$user->isAllowed( 'edit' ) ) { $wgOut->addHtml( $user->errorMessage( 'edit' ) ); }

instead of the more inscrutable

if( ($error = $user->isAllowed( 'edit' ) ) !== true ) { $wgOut->addHtml( $error ); }

or whatever. (You can tell I like Python over Perl, can't you?) This also potentially saves callers who are, e.g., bots or maintenance scripts from having the error messages calculated when they might not need to read them.

2) I don't see that you're accounting for the fact that there may be multiple errors preventing the action. It's kind of annoying if you, e.g., try to edit and get told "Anonymous users don't have permission to edit, please log in", then log in and get told "Your IP address is blocked". This should be fixed while you're at it. A meta-message saying something like "You cannot do the action you have requested for multiple reasons. <ul><li>..." should be used when multiple errors occur.

Daniel Cannon

10 Jul 10 Jul

11:04 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Sorry for totally branching the thread here; I just wanted to add in some insight from a discussion that Simetrical and I had about reform of the access control in MediaWiki.

Firstly, everyone seems to agree that we need to change the behavior of Title::userCan and access control on MediaWiki in general. Along with this there is the general agreement that either the backward-compatibility of our current method should be maintained or that all modules, both in the mw core and extensions, should be updated to work with the new method.

Part of this reform may include Aaron's suggestion--that is, to allow us to determine *if* the user can do something and, if not, *why* he can't in as few invocations as possible and in a manner that reduces the duplication of effort. Mechanisms of doing this include Aaron's suggestion of changing the return type of Title:userCan to an appropriate failure message (this option I find sort of .. gross), using a method to *assert* a user's ability to do something and to throw an exception if he is unable to do (what is generally my preference, but certainly not everyone's cup of tea), or somehow storing data pertaining to why the access request failed that can later be accessed by methods that care. In any case, what needs to be done away with is our current circus of:

1. Method asks if the user can edit something. 2. userCan method cycles through a hundred reasons why he couldn't, finds one, and returns false. 3. Method loops through its (possibly smaller) set of reasons for why the user couldn't do something, finds one, and calls the appropriate OutputPage function. 4. OutputPage function loops through its hundred reasons, finds one, and displays the corresponding message.

The far more pressing issue here, however, and indeed the one that is most important to the developers of extensions and the API and similar frameworks, is the need for a single, reliable, complete, centralized access-checking mechanism. At present we have only basic centralized access-checking, such as Title::userCan, and for more complex cases we are forced to rely upon individual interpretation of what a certain right implies. This is quite dangerous. Take, for instance, Yuri's case of determining whether or not to include the title of a page in a list--it is not set in stone anywhere that User:Joe Blow can or cannot see the title, so how is the API to make this determination? We can mimic what the standard interface does, but what do we do when that changes? Another well-known problem with MediaWiki is that restricting read access is next to impossible, as all extensions and functions are free to use whatever methods they want (or none at all) to determine if a user can view a page--as such, even if userCan says no, Special:Export or action=diff might decide otherwise. It's not the fault of these features nor of their implementors; rather, it is the fault of the core code for making it such a PITA to have to check a whole slew of conditions on their own before doing anything. It's only natural that one or two conditions may be forgotten, and again we see an unnecessary duplication of effort

In my opinion, access-checking is performed best not independent of but rather as part of all accessors--that is, rather than relying upon extensions to check that the user can read or modify a bit of data before doing so, simply don't allow access to the data without first performing checks to verify that the user can access it. By this design, if Special:MySuperCoolSpecialPage calls Title::getPageContent() on a page the user cannot read, he gets an access exception. Implementation of this functionality is quite simple: (pseudocode ...)

/* In User.php */ function assertUserCan( $action, $title ) { if ( !$this->isAllowed( 'read' ) ) throw new MWAccessException( wfMsg( 'errormsgname' ) ); }

/* In Title.php */

function getPageContent() { assertUserCan( 'read', $this ); // .... Hand back the content. }

/* In MySuperExtension */

function execute() { global $wgOut; try { $this->fooBar(): } catch ( MWAccessException $e ) { $wgOut->addHTML( $e->getComment() ); } }

function fooBar() { global $wgOut; $title = Title::newFromText( 'Foobar' ); $wgOut->addWikiText( $title->getPageContent() ): }

Naturally, we can also catch these exceptions in wfRunHooks or elsewhere so as to prevent the writers of extensions from having to fuss with them, although they'll be free to deal with them themselves if they want. It also frees us up from having to use stuff like OutputPage::readOnlyPage() in that the message to use is determined by the assertUserCan method, and it saves us from a *multitude* of invocations--in fact (theoretically of course) we shouldn't have to do *any* access checking outside of that one method; our cost is only having to update accessors to include _one_ invocation of that method.

In any case, this is likely not a final solution, just some food for thought ...

Now to touch upon a few points that Simetrical, Yuri, Andrew, and I discussed (and then I assure you my blabber is coming to an end..). For one, we really need not one, but two userCan methods. One should be in Title and should check a user's access to a *page*. That is, can the user read this page? Can he edit it? Can he move, protect, delete it? The other should find itself in User.php and should take two arguments: an action and another User. This should be used to determine if the user can block, unblock, rename, or whatever else to the provided user. By separating out these two items and providing a mechanism for checking a user's access over another user we provide previously unavailable functionality to webmasters such as the ability to prevent sysops from blocking crats, etc., and are again able to centralize our access control, rather than using our current method of checking "Does this user have the 'block' right?" before blocking, etc.

One last suggestion, one that is likely further down the road if on the road at all, would be a migration or incorporation of user rights from our current internal system ($wgGroupPermissions, etc.) to a dynamic table that associates rights with pages or other entities, or to the simple addition of a couple of fields to the page table. Such a table could have the structure: - page_id (the id of the page) - page_name (the name of the page) - page_namespace (the pagenamespace) - edit_right (the right required to edit this page) - move_right (the right required to move this page) - read_right (the right required to read this page)

Naturally other structures are possible, and likely much, much better, than the above one; however, the basic idea is that the rights required to perform an action on a page are associated *somehow* with that page in the database. The table could be updated through an update script that would simply set the rights for each page according to what would make userCan happy. Naturally, the script would have to be run after *every* change to user groups or rights configuration.

Eventually it would be very nice to have all groups and associated permissions stored in the database itself and modifiable through an onsite UI. Such a mechanism is used by most webware, and this editing of a LocalSettings file is unique, to my knowledge, to MediaWiki and indeed is quite awkward to most users.

Anyway, consider this blabber at an end. Hopefully there are some ideas in here that can be of use.

- -- Daniel Cannon (AmiDaniel)

http://amidaniel.com cannon.danielc@gmail.com

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFGlEi3FRAT5u/mSaMRAqu6AKCT+22XalTgYMLrslOQHeZKh1UwTQCfcc+t ZxzB90ngEfCKafHiWKO+qp8= =/5hq -----END PGP SIGNATURE-----

Simetrical

11 Jul 11 Jul

12:04 a.m.

On 7/10/07, Daniel Cannon cannon.danielc@gmail.com wrote:

...

Firstly, everyone seems to agree that we need to change the behavior of Title::userCan and access control on MediaWiki in general.

I wouldn't go that far. Change the specific implementation of some functions that are serviceable but a bit awkward, is how I'd put it. At least, for most cases: read permissions do need to be overhauled if they're meant to actually work, but we knew that anyway.

...

Part of this reform may include Aaron's suggestion

I think you mean Andrew? Also, it was agreed(/mandated) that userCan cannot change its return type or anything like that. We're making new methods.

...

The far more pressing issue here, however, and indeed the one that is most important to the developers of extensions and the API and similar frameworks, is the need for a single, reliable, complete, centralized access-checking mechanism. At present we have only basic centralized access-checking, such as Title::userCan, and for more complex cases we are forced to rely upon individual interpretation of what a certain right implies.

I don't see this at all. All rights are basically well-defined. You just have to make sure you check if the user is allowed to do them.

...

In my opinion, access-checking is performed best not independent of but rather as part of all accessors--that is, rather than relying upon extensions to check that the user can read or modify a bit of data before doing so, simply don't allow access to the data without first performing checks to verify that the user can access it. By this design, if Special:MySuperCoolSpecialPage calls Title::getPageContent() on a page the user cannot read, he gets an access exception.

Lovely in theory, but impossible in practice. You don't know that it's calling the page for the purpose of serving the content to any particular user. What if the special page is "E-mail this to a friend"? Then it's the friend whose read permissions need to be checked. Or is it? Maybe you want users to be able to send anyone pages that the sender can read. Or maybe you want them to only be able to send pages that both they can read *and* the recipient can read. Only the caller could even conceivably make such a decision.

Despite your citation of Yurik's unrelated troubles (which involve the fact that functions have to be used at all to check permissions), I see no usage case where this kind of change is actually helpful. The only possibility is if someone forgets a permissions check somewhere, but that strikes me as not nearly likely enough to merit such a complicated and error-prone structure to prevent it.

...

It also frees us up from having to use stuff like OutputPage::readOnlyPage() in that the message to use is determined by the assertUserCan method, and it saves us from a *multitude* of invocations--in fact (theoretically of course) we shouldn't have to do *any* access checking outside of that one method; our cost is only having to update accessors to include _one_ invocation of that method.

Werdna's implementation-in-progress that sparked this thread works fine for that.

...

Now to touch upon a few points that Simetrical, Yuri, Andrew, and I discussed (and then I assure you my blabber is coming to an end..). For one, we really need not one, but two userCan methods. One should be in Title . . . The other should find itself in User.php

I still think they both belong in User.php, as part of the same method. I don't view your counterproposal as a suitable alternative, because it's perfectly conceivable that some actions would fit best with two Titles (move), a Revision (oversight/rev_deleted), or something from an extension. Passing arrays of objects of arbitrary type is no different than we do for wfRunHooks, for instance.

...

One last suggestion, one that is likely further down the road if on the road at all, would be a migration or incorporation of user rights from our current internal system ($wgGroupPermissions, etc.) to a dynamic table that associates rights with pages or other entities, or to the simple addition of a couple of fields to the page table. Such a table could have the structure: - page_id (the id of the page) - page_name (the name of the page) - page_namespace (the pagenamespace) - edit_right (the right required to edit this page) - move_right (the right required to move this page) - read_right (the right required to read this page)

We have this table already. It's called page_restrictions. If you mean we should duplicate all group permissions in every row of the page table, good grief, that's what joins are for. It's wildly impractical to reproduce every group permission two million times, and there's nothing to be gained from it.

...

The table could be updated through an update script that would simply set the rights for each page according to what would make userCan happy.

userCan requires a User object, not just usergroups. You can't compartmentalize its current flexibility of output into the database. userCan hooks can prevent pages in the Project namespace from being moved by non-sysop autoconfirmed users in the hour after midnight if they like. We have to cut that out for read permissions, but I'm not sure whether or not it's fully necessary to do it for other permissions. It could certainly accelerate batch actions a fair bit if we did, but there are few enough of those that I'm not sure it's necessary.

Brion Vibber

10:13 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Daniel Cannon wrote:

...

Take, for instance, Yuri's case of determining whether or not to include the title of a page in a list--it is not set in stone anywhere that User:Joe Blow can or cannot see the title, so how is the API to make this determination?

Titles are always visible.

- -- brion vibber (brion @ wikimedia.org)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGlOWdwRnhpk1wk44RAjHwAJ9yakTkXGKW/hzQYTOPEJZJk96lWQCgwhtz 1dfUS7bjp7x3Oxj2Lq3suwA= =hTFv -----END PGP SIGNATURE-----

Yuri Astrakhan

12:19 p.m.

If only the title is visible (but the content is not), interesting problems arise:

You can perform "Show any pages that link to X" (Assuming X is visible). The result is a list of pageIDs, which can be resolved to titles.

This means that anyone can find all the links, templates, and images on a page whose content is not available.

I think per-namespace readability instead of per-page readability (implemented as part of the Lockdown extension - http://www.mediawiki.org/wiki/Extension:Lockdown ) are the best way to allow access rights while also maintaining robust implementation. Lockdown page discusses some other issues they faced to read-protect wiki.

On 7/11/07, Brion Vibber brion@wikimedia.org wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Daniel Cannon wrote:

...
Take, for instance, Yuri's case of determining whether or not to include the title of a page in a list--it is not set in stone anywhere that User:Joe Blow can or cannot see the title, so how is the API to make this determination?

Titles are always visible.

-- brion vibber (brion @ wikimedia.org)

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlOWdwRnhpk1wk44RAjHwAJ9yakTkXGKW/hzQYTOPEJZJk96lWQCgwhtz 1dfUS7bjp7x3Oxj2Lq3suwA= =hTFv -----END PGP SIGNATURE-----

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l

Rob Church

12:26 p.m.

On 11/07/07, Yuri Astrakhan yuriastrakhan@gmail.com wrote:

...

If only the title is visible (but the content is not), interesting problems arise:

You can perform "Show any pages that link to X" (Assuming X is visible). The result is a list of pageIDs, which can be resolved to titles.

This means that anyone can find all the links, templates, and images on a page whose content is not available.

Ultimately, it's this sort of subtle thing, that isn't obvious right off the bat, that means it's probably a very silly idea to try and modify MediaWiki to support this kind of complex read permissions. You'd be far better off rewriting the whole application from scratch, and there's no need to do that in light of other packages available.

Rob Church

Simetrical

2:26 p.m.

On 7/11/07, Rob Church robchur@gmail.com wrote:

...

Ultimately, it's this sort of subtle thing, that isn't obvious right off the bat, that means it's probably a very silly idea to try and modify MediaWiki to support this kind of complex read permissions. You'd be far better off rewriting the whole application from scratch, and there's no need to do that in light of other packages available.

I still think that this really overstates the complexity of making this change (allowing pages to even be hidden from lists of titles). I'm not volunteering to do it myself, but if we had someone willing to do it, you'd just have to have an extra couple of config options and conditional modifications to all the read queries in the software. A pain to do, undoubtedly, but hardly anything like a rewrite of the software, and not something that will affect anyone who doesn't enable the option.

Given that a) this is heavily requested and b) it's been established that this cannot be done in extensions since userCan is infeasible, it seems perfectly reasonable to allow this to be put in core if Duesentrieb or anyone wants to take full responsibility and be careful not to break anything when the options are off. As I've pointed out several times over the past day or two, we have basically the same setup for PostgreSQL, and that works out fine. But it's Brion's decision.

Andrew Garrett

12 Jul 12 Jul

11:13 a.m.

A proposed patch for these changes is available at http://wiki.epstone.net/~andrew/userCan-diff. It'll be a few days before it's committed (due to the pending release). Code review is most welcome.

-- Andrew Garrett

Andrew Garrett

1 Aug 1 Aug

7:46 a.m.

(Finally) committed. Thanks for all the input.

As I enumerated in the commit message, - None of the interface actually uses the new function. - A new function to generate the error-page, but not output it, is still required.

I'll get around to it soonish, but I have a lot on my plate at the moment, and may not get around to it.

Andrew

6200

Age (days ago)

6223

Last active (days ago)

wikitech-l@lists.wikimedia.org

41 comments

9 participants

tags (0)

participants (9)

Andrew Garrett
Brion Vibber
Christensen, Courtney
Daniel Cannon
Jim Wilson
Rob Church
Simetrical
Thomas Dalton
Yuri Astrakhan