Erik,
On 11/05/2014 10:07 AM, Erik Moeller wrote:
I'm wondering if a lightweight service that
satisfies the following
requirements might be a good idea:
- community-created schemas (similar to the EventLogging schemas on meta)
- basic per-user authentication/authorization
- basic namespacing (e.g. "WikiProject Medicine:Quality" refers to a
specific schema + specific permissions)
the need for storing different formats and metadata per revision was
actually one of the motivations for creating RESTBase [1]. Currently it is
set up to store html, wikitext, data-parsoid and data-mw per revision, with
each property being stored in its own bucket behind the scenes. It is
possible to add new revisioned buckets for new types of content with a
simple PUT, and the plan is to have separate ACLs per bucket.
What are the indexing requirements for this metadata? If fast access by
specific properties is needed, then using tables would make more sense, as
we'll then be able to leverage secondary indexing. Tables have the same
properties as buckets, and can also be created with a PUT of the schema.
Query results are returned as JSON.
A limitation for queries in RESTBase is that they are limited to indexes
defined in the schema. If ad-hoc queries on arbitrary combinations of
attributes are needed, then ElasticSearch would be more suitable.
If such a service existed, community members,
researchers and
occasionally WMF itself could create their own tools/gadgets that use
this service, perhaps with a lightweight global approval process.
If this seems like a good idea, I'd be curious about implementation
strategies -- are we blocked on something like SOA Auth [1] to
implement this as a standalone service? My sense is that you'd want to
pull this out of MediaWiki for maximum flexibility and simplicity.
It might be possible to improvise a bit, but we'll need basic SOA auth
fairly soon for other use cases too. I'm optimistic that we can start small
though, especially if this doesn't need to tie into browser-based SUL
straight away.
Gabriel
[1]:
https://github.com/gwicke/restbase