Hello services-list,
since my very first contact with JSON, I was asking myself: Why can't people simply use XML? In the meantime I got aware of the advantages, especially for fast prototyping and high performance applications. However, when applications get larger, more complex and more mature the absence of schema information is problematic.
For example, I found writing the parser for the WikiData dump [1] quite exhausting. Alternatives like Json-lib work well for testing, but I was quite worried about stability after hitting a log tail bug [2]. Moreover, in the PHP Math extension it's often uncomfortable to figure out which JSON properties are set under certain circumstances [3]. Yesterday, I discovered another problem related to a missing JSON schema [4] which finally motivated me to start this effort to discuss about JSON schema options.
For the communication between services, we use spec files. This is a great thing. But would it not be even better to use a JSON schema even within services. So one could throw exceptions right at the place where the problem occurs. I'm aware that there are approaches for a JSON schema like [5], but I'm not sure if that is convenient to use in practice.
To keep the discussion focused, we could use "how HTTP errors are supposed to look" [6] as a running example to discuss how JSON schema definition and validation could work.
Best Physikerwelt
PS: This is my first post the services-list. I hope it fits well to the idea of this list.
[1] https://github.com/physikerwelt/WikidataListGenerator/blob/master/src/main/j... [2] https://twitter.com/physikerwelt/status/683286844721741824 [3] https://phabricator.wikimedia.org/T119300 [4] https://phabricator.wikimedia.org/T126057 [5] http://jsonschema.net [6] https://github.com/wikimedia/service-template-node/blob/master/doc/coding.md...
On 6 February 2016 at 03:52, Physikerwelt wiki@physikerwelt.de wrote:
Hello services-list,
since my very first contact with JSON, I was asking myself: Why can't people simply use XML? In the meantime I got aware of the advantages, especially for fast prototyping and high performance applications. However, when applications get larger, more complex and more mature the absence of schema information is problematic.
That's indeed both the strength and weakness of JSON's free-style format: it allows you to move fast, but also to shoot yourself in the foot on the way to your destination.
For example, I found writing the parser for the WikiData dump [1] quite exhausting. Alternatives like Json-lib work well for testing, but I was quite worried about stability after hitting a log tail bug [2]. Moreover, in the PHP Math extension it's often uncomfortable to figure out which JSON properties are set under certain circumstances [3]. Yesterday, I discovered another problem related to a missing JSON schema [4] which finally motivated me to start this effort to discuss about JSON schema options.
For the communication between services, we use spec files. This is a great thing. But would it not be even better to use a JSON schema even within services. So one could throw exceptions right at the place where the problem occurs. I'm aware that there are approaches for a JSON schema like [5], but I'm not sure if that is convenient to use in practice.
For defining a service's public interface we use the Swagger specification~[1], which itself is a quite close relative of JSON-Schema. It even uses it directly for field declarations and other things, but it's more tailored towards defining API interfaces instead of JSON fields.
Recently, we have started working on a new sub-system that delivers and propagates MW events reliably, called EventBus~[2]. There, each communication channel accepts only a certain type of event messages, which are defined using JSON-Schema schemas~[3], which allows us to cleanly define the contract between the system itself and event producers and consumers.
But, I think you are right -- ideally schemas should be defined even for intra-service communications and protocols, as they can serve not only as a reference point, but for documentation and communication purposes as well. The downside of doing so, though, is that adhering to the schema internally means checking it, which slows down execution and hurts performance. So, there should be balance and we should choose wisely what to "schematise" and what not to.
To keep the discussion focused, we could use "how HTTP errors are supposed to look" [6] as a running example to discuss how JSON schema definition and validation could work.
This is the perfect example of something that should have a defined schema available, all the more so because the code explicitly validates HTTPError object's properties~[4]. Error log entries and error responses are definitely something that needs to be standardised across our services. I'll write a JSON-Schema document for it.
Thank you for bringing this up!
Cheers, Marko
Best Physikerwelt
PS: This is my first post the services-list. I hope it fits well to the idea of this list.
[1] https://github.com/physikerwelt/WikidataListGenerator/blob/master/src/main/j... [2] https://twitter.com/physikerwelt/status/683286844721741824 [3] https://phabricator.wikimedia.org/T119300 [4] https://phabricator.wikimedia.org/T126057 [5] http://jsonschema.net [6] https://github.com/wikimedia/service-template-node/blob/master/doc/coding.md...
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
[1] http://swagger.io/specification/ [2] https://wikitech.wikimedia.org/wiki/EventBus [3] https://github.com/wikimedia/mediawiki-event-schemas [4] https://github.com/wikimedia/service-template-node/blob/01eb28f90f3cccdf248d...