[Mediawiki-l] Solving the GROUP BY problem

Greg Sabino Mullane greg at endpoint.com
Wed Apr 27 14:15:06 UTC 2011


Inside of Article.php, there is a query that joins the user table 
to the revisions table in the function getContributors. The 
important bits are:

$fields = array(
  "$userTable.*",
  'rev_user_text AS user_name',
  'MAX(rev_timestamp) AS timestamp',
);
...
'GROUP BY' => array( 'rev_user', 'rev_user_text' ),
...
$res = $dbr->select( $tables, $fields, $conds, __METHOD__, $options, $jconds );
return new UserArrayFromResult( $res );

This breaks for those databases that don't allow for a 
non-deterministic GROUP BY (yes, sometimes you can 
technically get away with leaving fields out of the GROUP BY, 
but this is not one of those cases :). One solution that 
has been raised to this general problem is to have MW 
recognize 'group by' queries and add fields as needed to the 
GROUP BY by parsing the SELECT list. Obviously this is 
a non-starter in this case, as we are doing a user.* in the 
SELECT clause. Which makes me wonder if we really need that 
here in the first place. We definitely don't need all those fields 
to display user contributions, but what's the minimum needed 
to satisfy UserArrayFromResult? This tracks back to loadFromRow 
in User.php. No doubt it is convenient having a complete user 
object ready to use anywhere, but do we really want to encourage 
pulling back user.* everywhere or is another approach better?

So I'm looking for comments on the best solution here, both 
in the specific and general case. As I see it, the options are:

1) Manually add in all the columns from the user table to the 
GROUP BY. Painful, and subject to immediate breakage when a 
column is added or removed from the user table.

2) Teach MediaWiki how to parse queries and expand the user.* 
into a list of columns to feed to the GROUP BY. Actually not 
as hard as it sounds, but not very elegant at all.

3) Rewrite the SELECT above to only pull the columns we need 
for that section of code. In this case, probably user_id, 
user_name, user_real_name, and user_email? This would mean 
that the user object returned would have unset fields for 
things like $this->mToken. The best short-term solution, but 
breaks the underlying promise that a user object is a complete 
and populated user object.

4) Something else. Return a new object that's a subset of the 
User table? Perhaps with the fields that are commonly used 
for displays of other users, but without the you-as-the-user 
specific fields such as the password, last touched, etc.? 
Seems silly to be slinging around all those attributes when 
we don't need to be.

-- 
Greg Sabino Mullane greg at turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201104271012
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
Url : http://lists.wikimedia.org/pipermail/mediawiki-l/attachments/20110427/4a0bfae8/attachment.pgp 


More information about the MediaWiki-l mailing list