SOA + Auth - MediaWiki-Core

30 Apr 2014

Hi all (sorry for the duplicate Gabriel),

I was going to send this to wikitech, but I wanted to get an internal
opinion. We may have this documented already, but I haven't managed to find
it.

tl;dr: As I'm looking at authentication for whatever "SOA" plans we have,
the basic structure of how we design things will influence the best way to
do authn/z. If you have strong feelings one way or the other, speak up so
we can plan for it.

I wanted to get some clarity around what "SOA" means to us and our
community. Aaron asked the question during the SOA kickoff, something like
is everyone comfortable with making the endpoints public. I don't think
there was much discussion or consensus (or I may have missed it). When I
talked with Owen and Gabriel afterward, it seems like there were two
definitions of "SOA" that we're dealing with:

* Hidden services: e.g., a user gets an edit page, and does an HTTP POST
with action=edit, a session cookie to identify themselves, an anti-csrf
token. The wiki gets the request, then calls a service to check the edit
text for spam, gives the edit to a revision storage service that stores the
revision, maybe generates an event that a logging service picks up and
stores in the general and checkuser logs.

* Public services: A user gets an edit page from the wiki, then submits the
edit to the revision service directly. The revision service may call the
spam-detection service, or use a service for csrf tokens. The success /
failure is passed back to the user's browser, and the edit page lets the
user know the save happened, and probably shows the user the new version of
the page.

Parsoid, iirc, is moving towards the later definition. Wikia is mostly
looking at the first.

The answer I'm expecting is we need "both", and the distinction is probably
really only important when we decide how identification and authorization
work. But I want to make sure we make that decision consciously instead of
accidentally.

The first case has a more straightforward implement for authentication, and
I would argue is easier to secure. Http-only cookies are set and included
with each call to identify the user's session. These can't be stolen via
xss, and we can centrally invalidate all sessions. We have one place to
check if a nonce has been used (e.g. with OAuth), and have common code for
checking csrf tokens and logging.

Having a public service handle processing a user's edit would require the
service to check session invalidation and nonce use, and correctly handle
checking csrf tokens.

If we go the public route, we would need to decide if the various services
should run on the same domain (en.wikipedia.or/revision/whatever), same top
level domain (revision.wikipedia.org/whatever), or entirely different
domain (revision.wikimedia.org). The same top-level domain would only work
if all subdomains have the same idea of the user (at the WMF, that would
mean all users are CentralAuth/global users). A different domain means
either users have to go through a login process to get authentication
cookies on that domain (like we use for login.wikimedia.org), or the
authentication tokens have to be accessible to javascript (and vulnerable
to being stolen via xss attacks). So I have a strong preference for same
domain.

Does anyone have strong plans / ideas for the overall direction we want to
take here?