Erik,
On 11/05/2014 10:07 AM, Erik Moeller wrote:
I'm wondering if a lightweight service that satisfies the following requirements might be a good idea:
- community-created schemas (similar to the EventLogging schemas on meta)
- basic per-user authentication/authorization
- basic namespacing (e.g. "WikiProject Medicine:Quality" refers to a
specific schema + specific permissions)
the need for storing different formats and metadata per revision was actually one of the motivations for creating RESTBase [1]. Currently it is set up to store html, wikitext, data-parsoid and data-mw per revision, with each property being stored in its own bucket behind the scenes. It is possible to add new revisioned buckets for new types of content with a simple PUT, and the plan is to have separate ACLs per bucket.
What are the indexing requirements for this metadata? If fast access by specific properties is needed, then using tables would make more sense, as we'll then be able to leverage secondary indexing. Tables have the same properties as buckets, and can also be created with a PUT of the schema. Query results are returned as JSON.
A limitation for queries in RESTBase is that they are limited to indexes defined in the schema. If ad-hoc queries on arbitrary combinations of attributes are needed, then ElasticSearch would be more suitable.
If such a service existed, community members, researchers and occasionally WMF itself could create their own tools/gadgets that use this service, perhaps with a lightweight global approval process.
If this seems like a good idea, I'd be curious about implementation strategies -- are we blocked on something like SOA Auth [1] to implement this as a standalone service? My sense is that you'd want to pull this out of MediaWiki for maximum flexibility and simplicity.
It might be possible to improvise a bit, but we'll need basic SOA auth fairly soon for other use cases too. I'm optimistic that we can start small though, especially if this doesn't need to tie into browser-based SUL straight away.
Gabriel