On 04.05.2013 12:05, Jona Christopher Sahnwaldt wrote:
On 26 April 2013 17:15, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
*internal* JSON representation, which is different from what the API returns, and may change at any time without notice.
Somewhat off-topic: I didn't know you have different JSON representations. I'm curious and I'd be happy about a few quick answers...
- How many are there? Just the two, internal and external?
Yes, these two.
- Which JSON representations do the API and the XML dump provide? Will
they do so in the future?
The XML dump provides the internal representations (since it's a dump of the raw page content). The API uses the external representation.
This is pretty much dictated by the nature of the dumps and the API, so it will stay that way. However, we plan to add more types of dumps, including:
* a plain JSON dump (using the external representation) * an RDF/XML dump
It's not sure yet when or even if we'll provide these, but we are considering it.
- Are the API and XML dump representations stable? Or should we expect
some changes?
The internal representation is unstable and subject to changes without notice. In fact, it may even change to something other than JSON. I don't think it's even documented anywhere outside the source code.
The external representation is pretty stable, though not final yet. We will definitely make additions to it, and some (hopefully minor) structural changes may be necessary. We'll try to stay largely backwards compatible, but can't promise full stability yet.
Also, the external representation uses the API framework for generating the actual JSON, and may be subject to changes imposed by that framework.
Unfortunately, this means that there are currently no dumps with a reliable representation of our data. You need to a) use the API or b) use the unstable internal JSON or c) wait for "real" data dumps.
-- daniel