On 27.03.2012 02:19, Daniel Friesen wrote:
Non-wikitext data is supposed to give extensions the ability to do things beyond WikiText. The data is always going to be an opaque form controlled by the extension. I don't think that low level serialized data should be visible at all to clients. Even if they know it's there.
The serialized form of the data needs to be visible at least in the XML dump format. How else could we transfer non-wikitext content between wikis?
Using the serialized form may also make sense for editing via the web API, though I'm not sure yet what the best ways is here:
a) keep using the current general, text based interface with the serialized form of the content
or b) require a specialized editing API for each content type.
Going with a) has the advantage of that it will simply work with current API client code. However, if the client modifies the content and writes it back without being aware of the format, it may corrupt the data. So perhaps we should return an error when a client tries to edit a non-wikitext page "the old way".
The b) option is a bit annoying because it means that we have to define a potentially quite complex mapping between the content model and API's result model (nested php arrays). This is easy enough for Wikidata, which uses a JSON based internal model. But for, say, SVG... well, I guess the specialized mapping could still be "escaped XML as a string".
Note that if we allow a), we can still allow b) at the same time - for Wikidata, we will definitely implement a special purpose editing interface that supports stuff like "add value for language x to property y", etc.
Just like database schemas change, I expect extensions to also want to alter the format of data as they add new features.
Indeed. This is why in addition to a data model identifier, the serialization format is explicitly tracked in the database and will be present in dumps and via the web API.
Also I've thought about something like this for quite awhile. One of the things I'd really like us to do is start using real metadata even within normal WikiText pages. We should really replace in-page [[Category:]] with a real string of category metadata. Which we can then use to provide good intuitive category interfaces. ([[Category:]] would be left in for templates, compatibility, etc...).
That could be implemented using a "multipart" content type. But I don't want to get into this too deeply - multipart has a lot of cool uses, but it's beyond what we will do for Wikidata.
This case especially tells me that raw is not something that should be outputting the raw data, but should be something which is implemented by whatever implements the normal handling for that serialized data.
you mean action=raw? yes, I agree. action=raw should not return the actual serialized format. It should probably return nothing or an error for non-text content. For multipart pages it would just return the "main part", without the "extensions".
But the entire "multipart" stuff needs more thought. It has a lot of great applications, but it's beyond the scope of Wikidata, and it has some additional implications (e.g. can the old editing interface be used to edit "just the text" while keeping the attachments?).
-- daniel