On Fri, Apr 11, 2014 at 3:06 PM, Sumana Harihareswara <sumanah(a)wikimedia.org
wrote:
The way we do stuff now: within your code, you write
your core or
extension code so that it interacts with various objects, subclasses
this and event-handles that and hooks into the other thing. And there
are several files and classes and globals that you kind of have to
interact with to do many different things, like index.php and $wgTitle
and $mediawiki .
A service-oriented architecture would change this; instead, code authors
would be able to read and write data via services. You sort of see this
now, in how there's a Notifications service and a Parsoid service that
new core or extensions code can push to or read from. In an SOA, we'd
extend that to include LOTS of functionality - basically any retrieval
or modification of data that a new feature might request or activate
that, in a more monolithic architecture, would require direct database
access or other permanent data store.
I'm not sure that distinction is entirely correct.
For example, database access itself is a service (provided by MySQL, or
PostgreSQL, or other SQL servers), although it's not often though of that
way because we still interact with it via various objects, subclasses, and
so on. And it doesn't use REST, of course.
The same goes for the "caching" service, which is often thought of as
memcached even though it might be redis, or the database, or even nothing
more than a hash table in RAM that's not saved anywhere.
Search is another example. MediaWiki core has a search engine based on the
database, and WMF sites have been using a service provided by lucene. Now
we're in the process of starting to use an ElasticSearch service. Not much
in core needed to change for this, because the details of accessing the
different services are already abstracted behind various classes.
The same could be said for a "revision storage" service: most code would
probably still interact with it via the Revision object's getContent()
method, without having to care that Revision is calling out to something
else via REST versus accessing database tables or the like. In fact, now
that I look at it it looks like Revision already does something like this
using the classes in the includes/externalstore/ directory for anything
besides storing it in the database alongside the rest of the revision data.
I'm not familiar with how Parsoid works, but it's certainly possible that
someone could create an interface class of some sort that would allow
callers to not have to know whether wikitext was being rendered using
Parsoid or the PHP parser.
The details of the interfacing objects might need to change or new
interfacing objects might need to be created where things are poking at the
database directly now, but there's no reason that all code has to interact
with the services directly rather than via an object that manages the
details.
I think a more salient difference between the current situation and a more
SOA setup would be that the different services could in theory be used
outside of the rest of MediaWiki. This has both benefits and drawbacks. For
example, Parsoid (theoretically[1]) allows for parsing wikitext without
having to set up all of MediaWiki to be able to do so, which is good. But
on the other hand it means that "setting up MediaWiki" actually means
"setting up Parsoid, then setting up MediaWiki and configuring it to use
the Parsoid you just set up", and then having to upgrade and administer
both Parsoid and MediaWiki (including managing version incompatibilities
and so on), which all together makes for a more complicated system.
[1]: I say "theoretically" because I believe it currently calls back into
MediaWiki's web API to do various things. But it's not impossible that
Parsoid or some other program could be a standalone wikitext parser.
REST (Representational State Transfer) is a model for how those services
would work. Any given chunk of data is a resource. We
have well-defined
verbs in HTTP for the service's client to use when either asking for a
representation of that resource (GET) or suggesting a new representation
so as to get the service to change the state of the resource (often POST).
That sounds correct. As I see it, the general idea of REST is that it's a
generic model for clients to access resources from a server ("server" being
"a thing that provides a service"). So every service doesn't have to invent
it's own format for specifying which resource to act on, how to specify the
actions, and so on. And so every client doesn't have to implement code to
connect to the server over the network, send the action and
resource-identifier and such, wait for a response, and so on.
--
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation