On 8/27/07, Yuri Astrakhan yuriastrakhan@gmail.com wrote:
Ouch, thanks Simetrical. I guess "user_password" should not be allowed in any "explain" queries. Any other risky ones out there?
Basically the entire user table needs to be considered private, although certain prescribed uses might be okay. Any query that involves only a single data item (e.g. only user_options but no other columns from user) or that involves multiple data items that are publicly associated with each other anyway (e.g. user_name plus user_real_name plus user_registration plus user_edit_count) could be okay.
As for other tables: the entire archive table. IP info in recentchanges. filearchive. These decisions have already been made for the toolserver, those could probably be copied over to this. On the other hand, I'm not sure of the usefulness of this tool for the general public when developers can easily get toolserver access to run EXPLAINs on "public" columns.
In fact, someone could write an EXPLAIN engine and run it on the toolserver. But first it needs to be ensured that there are no injection opportunities, which *should* be straightforward: I'm *pretty* sure that if you start off with "EXPLAIN" and only allow one statement (as the PHP API does), there's no way to inject any actual queries, they'll just be EXPLAINed. But personally, I'm not quite sure enough of that to try running such a script myself . . .
On the other hand - this only introduces another possibility of a login - same as going through the regular login page. I can't think how it would be a security risk - automated running through a list of md5 hashes. We could implement a memcached solution so that no more than 10 queries would run per minute.
The regular login requires a captcha. Since there are more serious issues with other tables that can't reasonably be solved using captchas or other things that merely stop automated checks, we may as well blacklist the password field entirely along with the others if this is ever offered in any way.
On 8/27/07, Gregory Maxwell gmaxwell@gmail.com wrote:
My view has long been that having this stuff external to mediawiki is good for flexibility, good for development scaling, good for site performance (bots can't break caching), but bad for reliability (some bot author vanishes and the process dies). As a result I've considered it better for us to provide more robust APIs, facilities for running these tasks (toolserver), and standards for development and documentation so the tools can outlive the attention spans of their authors (which is where we suck the most). ... So this is what I've long thought, but I'm prepared to have my eyes opened.
I think that there's a place for a robust and general workflow-management system in the software. I think Rob or someone started a page on MW.org or somewhere that dealt with specs for a software implementation of task handling, but I can't find it in about a fifteen-second search.