On 8/27/07, Yuri Astrakhan <yuriastrakhan(a)gmail.com> wrote:
Ouch, thanks Simetrical. I guess
"user_password" should not be allowed
in any "explain" queries. Any other risky ones out there?
Basically the entire user table needs to be considered private,
although certain prescribed uses might be okay. Any query that
involves only a single data item (e.g. only user_options but no other
columns from user) or that involves multiple data items that are
publicly associated with each other anyway (e.g. user_name plus
user_real_name plus user_registration plus user_edit_count) could be
okay.
As for other tables: the entire archive table. IP info in
recentchanges. filearchive. These decisions have already been made
for the toolserver, those could probably be copied over to this. On
the other hand, I'm not sure of the usefulness of this tool for the
general public when developers can easily get toolserver access to run
EXPLAINs on "public" columns.
In fact, someone could write an EXPLAIN engine and run it on the
toolserver. But first it needs to be ensured that there are no
injection opportunities, which *should* be straightforward: I'm
*pretty* sure that if you start off with "EXPLAIN" and only allow one
statement (as the PHP API does), there's no way to inject any actual
queries, they'll just be EXPLAINed. But personally, I'm not quite
sure enough of that to try running such a script myself . . .
On the other hand - this only introduces another
possibility of a
login - same as going through the regular login page. I can't think
how it would be a security risk - automated running through a list of
md5 hashes. We could implement a memcached solution so that no more
than 10 queries would run per minute.
The regular login requires a captcha. Since there are more serious
issues with other tables that can't reasonably be solved using
captchas or other things that merely stop automated checks, we may as
well blacklist the password field entirely along with the others if
this is ever offered in any way.
On 8/27/07, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:
My view has long been that having this stuff external
to mediawiki is
good for flexibility, good for development scaling, good for site
performance (bots can't break caching), but bad for reliability (some
bot author vanishes and the process dies). As a result I've
considered it better for us to provide more robust APIs, facilities
for running these tasks (toolserver), and standards for development
and documentation so the tools can outlive the attention spans of
their authors (which is where we suck the most). ... So this is what
I've long thought, but I'm prepared to have my eyes opened.
I think that there's a place for a robust and general
workflow-management system in the software. I think Rob or someone
started a page on
MW.org or somewhere that dealt with specs for a
software implementation of task handling, but I can't find it in about
a fifteen-second search.