[Openstack] XML and JSON for API's

Mark Nottingham mnot at mnot.net
Sun Jun 12 05:08:06 UTC 2011


Hey,

Sorry it took a while to get back; other things intervened.


On 03/06/2011, at 11:56 PM, Jorge Williams wrote:

> 
> On Jun 2, 2011, at 10:41 PM, Mark Nottingham wrote:
> 
>> The problem I mentioned before, though, is that XML Schema brings more issues to the table than it solves.
>> 
>> 1) People inevitably use schema to generate bindings to [insert language], and because of the complexity of the underlying data model of XML (Infoset), the mapping of information items to objects can happen in a variety of different ways. This is an endless source of bugs.
> 
> 
> I understand where you're coming from Mark.  I'm still suffering PTSD from the SOAP days.  One of the lessons leaned there was that auto generated language bindings are a bad idea.  Unless you strictly control the client and server implementations -- it all falls apart really quickly.   That's not an XML thing, honestly, I think an auto-generated JSON client would suffer from similar interoperability problems -- there really needs to be a human in the loop. 
> 
> Given that, we should be building and distributing language bindings for common languages with all our APIs -- it's well worth the investment in my opinion.

+1, especially for the complex ones. It's not a small engineering effort, though (especially if some users want synchronous implementations while others need async, for example). I suppose that's a real driver for keeping the APIs as simple as possible... 


> Also, I really don't see people generating language bindings for REST services the way they did for SOAP.  Note that XML Schema isn't going to give you a language binding in the first place because it describes data types not operations -- and I don't see people using WADL in that way.  We use this sort of stuff, internally, for machine processable documentation and validation -- and there are many benefits in both of those cases.

Absolutely; I think those are the sweet spots for WADL. It's also helpful in the API design phase, to make sure you've covered all of your bases.


>> 2) It's very, very hard to define an XML Schema that's reasonably extensible; unless you use exactly the right design patterns in your schema (which are absurdly convoluted, btw), you'll end up locking out future backwards-compatible changes. The authority in this space is Dave Orchard; see his conclusions at  <http://www.pacificspirit.com/Authoring/Compatibility/ProvidingCompatibleSchemaEvolution.html>.
> 
> A lot of this has changed with XSD 1.1 -- and we are using it to define our extensible contracts.  In particular a lot of restrictions based on ordering have gone away, the unique particle attribution issue is now also gone.  Frankly, I'm running into more issues with extensibility and JSON, I don't know a lot of truly extensible JSON media types, where different vendors may define different extensions and you need to prevent clashes etc. We can and will make things work in JSON, it's our default format and it should remain so. But this level of extensibility with JSON  is a bit uncharted  at the moment -- and we still need to figure out the best approach -- in XML this sort of extensibility is a no brainer. 

That's good to hear; much of my concern was motivated from seeing people only discover these problems after it was too late, but it looks like that's not the case here.


>> 3) An XML Schema can never express all of the constraints on the format. So, you'll still need to document those that aren't captured in the schema.
>> 
> 
> XSD 1.1 goes pretty far in this regard as well in that it includes the ability to add schematron like assertions. Most of what can't be captured in the XSD directly can be included as an assertion.
> 
> 
>> I suppose the central question is what people are using the schema for. If it's just to document the format, that's great; we can have a discussion about how to do that. If they're using it for databinding, I'd suggest that JSON is far superior, as a separate databinding step isn't needed. Finally, if they're using it for runtime validation, I'd agree with Jay below; it's much easier to use json parse + runtime value checks for validation (especially in HTTP, where clients always have to be ready for errors anyway).
> 
> 
> The validation that Jay is proposing works great when there is a single implementation.  This isn't always going to be the case.  If our API's are going to become the ubiquitous cloud APIs we want them to be, then others are going to want/have to implement them.  This is happening with compute today -- there will literally be two implementations of the compute 1.1 API from day one.   We need assurances that a client that works with one implementation can work with any of them seamlessly.  The validation rules can't simply be defined in the code itself  -- they need to be described outside of it -- being able to describe these rules in a formal language and use this for validation and conformance testing is very useful.  This isn't strictly an XML vs JSON thing -- though today there are better tools for doing this sort of thing with XML.

I don't disagree on any specific point, but still have reservations :)

Going forward, one thing that might help guide these sorts of discussions would be more data. If it were easy to slice and dice usage data so it would be clear how many clients use XML vs. JSON APIs, it'd be possible to make much more informed decisions here, and not only about formats; it'd also be good for verifying that the granularity of the APIs is right (e.g., if 90% of your API responses are huge, you may have a problem), caching is useful, etc.

That might be as simple as some logfile analysis, but it'd obviously need to be handled delicately to manage privacy, user data, commercial data, and similar concerns, if it were to be shared among the community. 

Cheers,


> 
> 
>> 
>> Just my .02.
>> 
>> Cheers,
>> 
>> 
>> On 03/06/2011, at 5:20 AM, Jorge Williams wrote:
>> 
>>> It's not just about the service itself  validating it, its as Joseph said, making sure that the data structures themselves are documented in detail to the client.  To my knowledge there is no accepted schema language in JSON  though JSON schema is starting to catch on.
>>> 
>>> At the end of the day it should be a matter of providing our customers with a representation that they can readily use.  It could be that my perception is wrong, but it seems to me that there's support for both representations.   I'll try to get some data to back this up.
>>> 
>>> -jOrGe W.
>>> 
>>> 
>>> On Jun 2, 2011, at 2:00 PM, Jay Pipes wrote:
>>> 
>>>> On Thu, Jun 2, 2011 at 1:54 PM, Rick Clark <rick at openstack.org> wrote:
>>>>> Hi All,
>>>>> Is it required for new openstack API's to support both JSON and XML, or
>>>>> would it be acceptable to only support JSON?
>>>> 
>>>> Glance currently does not support XML and I have no plans in the
>>>> immediate future to add support for it.
>>>> 
>>>> IMHO, JSON can be validated just as easily as XML. Simply
>>>> json.loads(req.body) and then, if parsing succeeds, compare the
>>>> mapping against a model. No need for XSDs, WADLs, or any other
>>>> acronym.
>>>> 

--
Mark Nottingham   http://www.mnot.net/







More information about the Openstack mailing list