On Fri, Apr 11, 2014 at 3:06 PM, Sumana Harihareswara <sumanah@wikimedia.org
wrote:
The way we do stuff now: within your code, you write your core or extension code so that it interacts with various objects, subclasses this and event-handles that and hooks into the other thing. And there are several files and classes and globals that you kind of have to interact with to do many different things, like index.php and $wgTitle and $mediawiki .
A service-oriented architecture would change this; instead, code authors would be able to read and write data via services. You sort of see this now, in how there's a Notifications service and a Parsoid service that new core or extensions code can push to or read from. In an SOA, we'd extend that to include LOTS of functionality - basically any retrieval or modification of data that a new feature might request or activate that, in a more monolithic architecture, would require direct database access or other permanent data store.
I'm not sure that distinction is entirely correct.
For example, database access itself is a service (provided by MySQL, or PostgreSQL, or other SQL servers), although it's not often though of that way because we still interact with it via various objects, subclasses, and so on. And it doesn't use REST, of course.
The same goes for the "caching" service, which is often thought of as memcached even though it might be redis, or the database, or even nothing more than a hash table in RAM that's not saved anywhere.
Search is another example. MediaWiki core has a search engine based on the database, and WMF sites have been using a service provided by lucene. Now we're in the process of starting to use an ElasticSearch service. Not much in core needed to change for this, because the details of accessing the different services are already abstracted behind various classes.
The same could be said for a "revision storage" service: most code would probably still interact with it via the Revision object's getContent() method, without having to care that Revision is calling out to something else via REST versus accessing database tables or the like. In fact, now that I look at it it looks like Revision already does something like this using the classes in the includes/externalstore/ directory for anything besides storing it in the database alongside the rest of the revision data.
I'm not familiar with how Parsoid works, but it's certainly possible that someone could create an interface class of some sort that would allow callers to not have to know whether wikitext was being rendered using Parsoid or the PHP parser.
The details of the interfacing objects might need to change or new interfacing objects might need to be created where things are poking at the database directly now, but there's no reason that all code has to interact with the services directly rather than via an object that manages the details.
I think a more salient difference between the current situation and a more SOA setup would be that the different services could in theory be used outside of the rest of MediaWiki. This has both benefits and drawbacks. For example, Parsoid (theoretically[1]) allows for parsing wikitext without having to set up all of MediaWiki to be able to do so, which is good. But on the other hand it means that "setting up MediaWiki" actually means "setting up Parsoid, then setting up MediaWiki and configuring it to use the Parsoid you just set up", and then having to upgrade and administer both Parsoid and MediaWiki (including managing version incompatibilities and so on), which all together makes for a more complicated system.
[1]: I say "theoretically" because I believe it currently calls back into MediaWiki's web API to do various things. But it's not impossible that Parsoid or some other program could be a standalone wikitext parser.
REST (Representational State Transfer) is a model for how those services
would work. Any given chunk of data is a resource. We have well-defined verbs in HTTP for the service's client to use when either asking for a representation of that resource (GET) or suggesting a new representation so as to get the service to change the state of the resource (often POST).
That sounds correct. As I see it, the general idea of REST is that it's a generic model for clients to access resources from a server ("server" being "a thing that provides a service"). So every service doesn't have to invent it's own format for specifying which resource to act on, how to specify the actions, and so on. And so every client doesn't have to implement code to connect to the server over the network, send the action and resource-identifier and such, wait for a response, and so on.