Since we're considering https://www.mediawiki.org/wiki/Requests_for_comment/Services_and_narrow_inte... and asking for discussion on https://www.mediawiki.org/wiki/Requests_for_comment/Content_API , I put some time into understanding what a Service-Oriented Architecture and Representational State Transfer mean in general, and what they'd mean for MediaWiki. Here's what I think we're talking about. Please correct me where I'm wrong!
So, just to clarify, this is NOT a discussion of overhauling the outward-facing MediaWiki web API -- that's taking place in https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap .
Instead, this is about refactoring how MediaWiki works *internally*, for everything from page editing to export to watchlists to permissions to stats. (And we've already sort of started doing this; see how Parsoid gives you parsing-as-a-service, for instance.)
The way we do stuff now: within your code, you write your core or extension code so that it interacts with various objects, subclasses this and event-handles that and hooks into the other thing. And there are several files and classes and globals that you kind of have to interact with to do many different things, like index.php and $wgTitle and $mediawiki .
A service-oriented architecture would change this; instead, code authors would be able to read and write data via services. You sort of see this now, in how there's a Notifications service and a Parsoid service that new core or extensions code can push to or read from. In an SOA, we'd extend that to include LOTS of functionality - basically any retrieval or modification of data that a new feature might request or activate that, in a more monolithic architecture, would require direct database access or other permanent data store.
REST (Representational State Transfer) is a model for how those services would work. Any given chunk of data is a resource. We have well-defined verbs in HTTP for the service's client to use when either asking for a representation of that resource (GET) or suggesting a new representation so as to get the service to change the state of the resource (often POST).
So the future might look like: the heart of MediaWiki core is PHP code that talks to the database and provides well-defined interfaces for other components to talk to by saying, e.g., GET /pages/123 , PUT /pages/123 . There are a bunch of examples in https://www.mediawiki.org/wiki/Requests_for_comment/Storage_service_and_cont... although I don't quite understand the question marks in the URIs.
Is this about right?
--------------------- Sumana Harihareswara Senior Technical Writer Wikimedia Foundation
On Fri, Apr 11, 2014 at 3:06 PM, Sumana Harihareswara <sumanah@wikimedia.org
wrote:
The way we do stuff now: within your code, you write your core or extension code so that it interacts with various objects, subclasses this and event-handles that and hooks into the other thing. And there are several files and classes and globals that you kind of have to interact with to do many different things, like index.php and $wgTitle and $mediawiki .
A service-oriented architecture would change this; instead, code authors would be able to read and write data via services. You sort of see this now, in how there's a Notifications service and a Parsoid service that new core or extensions code can push to or read from. In an SOA, we'd extend that to include LOTS of functionality - basically any retrieval or modification of data that a new feature might request or activate that, in a more monolithic architecture, would require direct database access or other permanent data store.
I'm not sure that distinction is entirely correct.
For example, database access itself is a service (provided by MySQL, or PostgreSQL, or other SQL servers), although it's not often though of that way because we still interact with it via various objects, subclasses, and so on. And it doesn't use REST, of course.
The same goes for the "caching" service, which is often thought of as memcached even though it might be redis, or the database, or even nothing more than a hash table in RAM that's not saved anywhere.
Search is another example. MediaWiki core has a search engine based on the database, and WMF sites have been using a service provided by lucene. Now we're in the process of starting to use an ElasticSearch service. Not much in core needed to change for this, because the details of accessing the different services are already abstracted behind various classes.
The same could be said for a "revision storage" service: most code would probably still interact with it via the Revision object's getContent() method, without having to care that Revision is calling out to something else via REST versus accessing database tables or the like. In fact, now that I look at it it looks like Revision already does something like this using the classes in the includes/externalstore/ directory for anything besides storing it in the database alongside the rest of the revision data.
I'm not familiar with how Parsoid works, but it's certainly possible that someone could create an interface class of some sort that would allow callers to not have to know whether wikitext was being rendered using Parsoid or the PHP parser.
The details of the interfacing objects might need to change or new interfacing objects might need to be created where things are poking at the database directly now, but there's no reason that all code has to interact with the services directly rather than via an object that manages the details.
I think a more salient difference between the current situation and a more SOA setup would be that the different services could in theory be used outside of the rest of MediaWiki. This has both benefits and drawbacks. For example, Parsoid (theoretically[1]) allows for parsing wikitext without having to set up all of MediaWiki to be able to do so, which is good. But on the other hand it means that "setting up MediaWiki" actually means "setting up Parsoid, then setting up MediaWiki and configuring it to use the Parsoid you just set up", and then having to upgrade and administer both Parsoid and MediaWiki (including managing version incompatibilities and so on), which all together makes for a more complicated system.
[1]: I say "theoretically" because I believe it currently calls back into MediaWiki's web API to do various things. But it's not impossible that Parsoid or some other program could be a standalone wikitext parser.
REST (Representational State Transfer) is a model for how those services
would work. Any given chunk of data is a resource. We have well-defined verbs in HTTP for the service's client to use when either asking for a representation of that resource (GET) or suggesting a new representation so as to get the service to change the state of the resource (often POST).
That sounds correct. As I see it, the general idea of REST is that it's a generic model for clients to access resources from a server ("server" being "a thing that provides a service"). So every service doesn't have to invent it's own format for specifying which resource to act on, how to specify the actions, and so on. And so every client doesn't have to implement code to connect to the server over the network, send the action and resource-identifier and such, wait for a response, and so on.
On Apr 11, 2014, at 1:29 PM, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
I think a more salient difference between the current situation and a more SOA setup would be that the different services could in theory be used outside of the rest of MediaWiki
This may be a very noticeable, but I'm not sure it's the most important aspect. The drive for this isn't to separate out the current services of mediawiki into an outside-MediaWiki or 3rd party use, but rather to untangle the current tight coupling[1] in the architecture itself. While the access interface between parts of MediaWiki are well-unterstood, the interfaces are broad enough to allow (in some cases require) content coupling.
I think the most drive and importance of SOA is just to narrow the interfaces to the point where they are a collection of well-defined, highly-cohesive[2] systems. In your example, the interfaces two the database and the caching system are narrow and service-oriented, but that doesn't necessarily imply a good architecture. As an example: an object is a DataObject abstracted from a database, but an independent part of the system were to modify the row in database directly an independently of the DataObject in order to take advantage of a side-effect (you can modify the data without modifying the object in the current request); furthermore if this pattern is not discouraged, but actually encouraged—this is poor architecture, even when the design of the interface (SQL) is narrow.
(Where it is very noticeable is that 3rd party use or independent use of a service causes the service to have a loosly-coupled, highly-cohesive architecture, since the service itself doesn't distinguish between an internal call and an external 3rd party use. It also makes it easier to test and mock.)
[1] http://en.wikipedia.org/wiki/Coupling_(computer_programming) [2] http://en.wikipedia.org/wiki/Cohesion_(computer_science)
This has both benefits and drawbacks. For example, Parsoid (theoretically[1]) allows for parsing wikitext without having to set up all of MediaWiki to be able to do so, which is good.
[1]: I say "theoretically" because I believe it currently calls back into MediaWiki's web API to do various things. But it's not impossible that Parsoid or some other program could be a standalone wikitext parser.
This is correct. But for wikitext processing, I feel it likely that the change would simply be that Parsoid would call a non-web-based, lower-overhead service for the pieces of the API it needs.
That sounds correct. As I see it, the general idea of REST is that it's a generic model for clients to access resources from a server ("server" being "a thing that provides a service"). So every service doesn't have to invent it's own format for specifying which resource to act on, how to specify the actions, and so on. And so every client doesn't have to implement code to connect to the server over the network, send the action and resource-identifier and such, wait for a response, and so on.
Again, I feel a more important aspect of REST is that the interface is extremely narrow: basically a representation of a resource (URL) and a set of 4 CRUD commands (create read update delete = post get put delete). The fact that each resource is independent and each action is stateless, allows it to be highly scalable. But you are correct, that REST has the advantage over non-RESTful APIs in that the access language is defined naturally in the protocol, rather than convention.
Take care,
terry
On Wed, Apr 16, 2014 at 11:43 AM, Terry Chay tchay@wikimedia.org wrote:
Again, I feel a more important aspect of REST is that the interface is extremely narrow: basically a representation of a resource (URL) and a set of 4 CRUD commands (create read update delete = post get put delete). The fact that each resource is independent and each action is stateless, allows it to be highly scalable. But you are correct, that REST has the advantage over non-RESTful APIs in that the access language is defined naturally in the protocol, rather than convention.
I'd like to point out that the REST API for the Socialtext wiki is very very well-designed: https://www.socialtext.net/st-rest-docs/
Every action possible in the Socialtext wiki UI is also possible via a call to a REST endpoint, and the REST endpoints are simple to manipulate. The entire application is a thin presentation layer on top of a killer wiki engine all driven via the REST API. (Or it was when I last involved with it, and given that Audrey Tang was last to update the API docs, I'd guess that is still true.)
To the best of my knowledge this API is still in use, although there is no longer an open source version of the Socialtext wiki available. (Although if anyone really wants a Socialtext wiki to experiment with, I think I could get you one.)
-Chris
On 04/11/2014 12:06 PM, Sumana Harihareswara wrote:
So, just to clarify, this is NOT a discussion of overhauling the outward-facing MediaWiki web API -- that's taking place in https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap .
The discussion is not about replacing the existing PHP API. We do however plan to complement it with an outward-facing REST content API as sketched in https://www.mediawiki.org/wiki/Requests_for_comment/Content_API.
So the future might look like: the heart of MediaWiki core is PHP code that talks to the database
No, ideally the only code that directly talks to the database would live in a storage service, which exposes a REST API.
REST is very much about the definition of narrow interfaces, idempotence and statelessness. Its division of vocabulary into URL-addressed resources and orthogonal verbs also avoids the need to perform data access through specialized RPC-style objects [1]. It enforces the use of simple value objects, which in turn helps to keep interfaces narrow. Those values can -- but don't need to be -- embellished with wrapper objects or service classes for the consuming code's convenience.
Gabriel
[1]: https://www.mediawiki.org/wiki/Requests_for_comment/PHP_Virtual_REST_Service
wikitech-l@lists.wikimedia.org