[openstack-dev] [openstack-sdk-php] discussion: json schema to define apis

Jamie Hannaford jamie.hannaford at rackspace.com
Tue Apr 29 10:27:15 UTC 2014


Before I answer these questions, I want to clear up some ambiguity around our use of terminology:

Schemas
JSON schemas are documents written in JSON according to an RFC specification. All they do is define how an API resource (such as a server) is structured: what type they are (string/array/object), what properties they have, what type these properties are, how many properties, etc. etc. In other words, it acts as a blueprint. Every time we are given arbitrary data that is SUPPOSED to represent one of these resources, we use schemas to validate them. If validation succeeds we know everything's well and that the data truly represents the resource.

Schema-like DSL
Where the confusion is coming from, I think, his how then do we define all the other business logic - like API operations, iterators, exceptions, etc. Well, we'd use a DSL which is extremely similar to how JSON schemas work, but is not strictly identical. We'd use a similar kind of syntax, but the actual aims of the document would not be the same. JSON schemas define resource models, our Schema-like DSL would define end-user Commands (that directly maps to an API operation). Google's Discovery API, for example, does this: it defines both data schemas AND API operations in one JSON document.


How would methods and commands work?

In php-opencloud we do a very similar thing to what the Python folks are doing and what you're suggesting: performing commands in a class hierarchy. So in order to retrieve an object, I need to call the grandparent first (service), then the parent class (container), then finally the child (object):

$container = $service->getContainer('foo');
$object       = $container->getObject('bar.txt');

What I'm proposing is slightly different. Instead of having all these commands dependent on a hierarchy, I was thinking about executing everything against the base service class:

$object = $service->getObject([
'container' => 'foo',
'name'        => 'bar.txt'
]);

There are 2 advantages to the above option: with a flat hierarchy you don't need to remember which class another resource belongs to (since everything is called against the service), and the other advantage is that you keep HTTP calls to a minimum and thereby increase performance and speed. With the first solution you execute a GET for the container and then a GET for the object; with the second solution you only execute one request. If you consider the implications for a full application, it would make a big difference.

Now, this is just an idea - nothing set in stone. JSON schema doesn't enforce any particular solution - we can achieve either of the above with it.

What does everyone think? Is there a real advantage to using a class hierarchy like $service->getContainer('foo')->getObject('bar') instead of a flat hierarchy?


How does a user discover commands?

I think the ability for an end-user to easily understand our public API is incredibly important. If we used a flat hierarchy, each command would be called as if in a registry. Our schema-like DSL would act as this registry: each command is explicitly defined, containing all the details the end-user needs to fulfill a particular operation. Because each command is defined in this way (like documentation), we could easily add in a feature for a user to "query" what a command expects of them:

echo $service->queryCommand('getObject');

The Swift service would know that "getObject" is a registered command. So it would return all information it has (about expected parameters to the user). I already thought about this in my prototype from a while back:

http://php-opencloud.readthedocs.org/en/latest/intro.html#executing-commands


How does a user actually execute a command?

The user can EXECUTE commands in two ways:

// Method 1: calling the registry:
$command = $service->getCommand('GetObject', [
'name' => 'bar.txt', 'container' => 'foo'
]);
$response   = $command->execute();

// Method 2 using magic methods:
$response = $service->getObject([
'name' => 'bar.txt', 'container' => 'foo'
]);

This is how Guzzle does it: it encapsulates HTTP operations as "commands". The end-user then calls a Command and receives the HTTP response.


Popularity with other SDKs

Other client libraries are definitely using json-schemas, such as the Glance and Marconi official clients.  The below link shows how many official OpenStack projects/client libraries currently use schemas:

https://wiki.openstack.org/wiki/OpenStack-SDK-PHP/Schema-Adoption

There is also active support for incorporating schemas into other client libraries, like Keystone.


Debugging commands

The schemas only handle how data is populated into the models. What you're referring to is Commands (i.e. what the end-user interacts with). These commands are defined in a schema-like DSL (in either JSON or PHP). The link I gave you yesterday (both to my sample and the Google Discovery API) put these Command definitions in the same file as the model Schemas - but there's no reason they should be conflated. Guzzle, for example, defines commands using PHP arrays.

A simple thread of execution could work like this:

1. User calls the command (either by calling the registry's "getCommand" method, or using a magic method):

$response = $service->listServers();

2. The service detects a magic method is being used. It realizes "listServers" is the Command's name, and checks the command registry to see whether this name exists. As I've previously said, this registry could be in JSON (in a similar syntax to Google's Discovery API) or PHP (like Guzzle's DSL, etc.)

3. If the command exists, it checks to see whether the user passed in all the correct parameters. This is where it validates user input. If a required param is missing, an exception is thrown.

4. Now it serializes everything into a Request, depending on how the command is defined. It looks at the URL path, the verb, any headers to include, what the request body needs to look like, etc. All this is defined in the schema/DSL mentioned in step 2.

5. The request is sent and a response is procured through cURL. If it receives back an unsuccessful header (4xx or 5xx), it looks to see if the bad status is expected. If, for example, we've set a custom exception to be thrown for a 404, it will do so. Otherwise, it parses the response.

6. Here is where JSON schemas enter the picture. Schemas are responsible for validating remote data structures. It validates the response body to see whether the structure within it matches/validates against the schema blueprint. If it does, a Model is returned to the user. If a particular property fails, it's left out.


Can we convert Tempest files?

We wouldn't need to - schemas are written in a consistent, interoperable way.


Jamie

From: Matthew Farina <matt at mattfarina.com<mailto:matt at mattfarina.com>>
Date: Tuesday, April 29, 2014 at 3:28 AM
To: Jamie Hannaford <jamie.hannaford at rackspace.com<mailto:jamie.hannaford at rackspace.com>>
Cc: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>, "sam.choi at hp.com<mailto:sam.choi at hp.com>" <sam.choi at hp.com<mailto:sam.choi at hp.com>>
Subject: Re: [openstack-sdk-php] discussion: json schema to define apis

While reading this it struck me that we should prioritize the experience of end-user, that is application developers, over the experience of those working on the SDK. I don't think we'd ever directly talked about this so I wanted to take a moment and state it.

What I put in below isn't my full set of questions but I think it's enough for now.

On Mon, Apr 28, 2014 at 11:34 AM, Jamie Hannaford <jamie.hannaford at rackspace.com<mailto:jamie.hannaford at rackspace.com>> wrote:

Thanks Matt for bringing up these questions - I think having this kind of discussion is essential for such a big idea. It also helps me clarify my own thinking towards this issue.

Before I answer, I want to point out that I'm not staunchly for or against any particular idea. I do think that schemas offer a lot of advantages over writing user-land code, but I'm more than willing to abandon the proposal if we all determine there's a stronger and more compelling alternative.

1. Why use schemas instead of userland code?

I've put my answer to this question here: https://wiki.openstack.org/wiki/OpenStack-SDK-PHP/JSON-schema

Can we look at this from the experience an end user would have? In the Python SDK they are working on an ORM style system. It's sorta similar to the system currently in the PHP SDK. For example you could do something like this in Python,

    o = Container.get_by_id('foo').get_object('bar/baz.awesome')

I would imagine something similar in PHP like,

    $o = $objectStore->getContainer('foo')->getObject('bar/baz.awesome');

I don't think you can do this using the json schema code I've seen so far. Can you touch on the experience for developers who are using the library? For example, the coding style or ability to know what they have access to? I was just thinking of how magic methods using a schema aren't going to work for tools that do autocompletion.

I'm curious about blueprints for the schema support. Things on the mailing list are great. I'm curious about plans and what's in the blueprints. Do you have any info on that?

If the other SDKs aren't interested in using json schema, wouldn't that be a lack of consistency?


2. How will debugging work?

I'll highlight two conceivable issues which might need debugging. The first issue is the API rejecting a request for whatever reason (i.e. a proxy modifying headers); the second issue is when a data structure returned from the API fails to validate against a particular schema file.

Issue 1: Malformed requests
There are two reasons why a request would fail: if an end-user stocks it with bad data, or if something in the middle deforms it. A very easy solution to the first problem is using schemas to perform basic parameter checking before a request is serialized. If we know, for example, that the API is expecting a particular value - or a particular header - the schema is in charge of making that happen. Performing basic validation catches most errors - and debugging is very easy due to the exception thrown. If you're ever in doubt, you just refer to the schema to see what was serialized into a request in the same way you do for a concrete class method.

If something in the middle deforms the request, the API will naturally reject it. When it comes to debugging this issue, all you need to do is wrap your original code in a try/catch block and use Guzzle's BadResponseException to return the API's response. You can easily see the type of failure through the HTTP status code, and the exact reason why the request failed. So it doesn't matter where the failure happens - all that matters is that there's a way to catch and spit out the API's response and the originating request.


First, this assumes Guzzle. Since we aren't tightly coupled to Guzzle we can't always assume that. But, for practical purposes we can assume it for now.

I was curious how things would work in PHP, such as the stack trace. For magic methods you'll have a call to the magic method and to __call() where the logic actually sits. In a debugger you'll be able to step through this just fine.

One thing that may be more difficult is that knowing how the json schema system works to debug and understand what's going on. How the schemas work and how something gets translated into a method. Walking through a few methods that are extended would be less logic to understand in the process.

I'm curious how the debugging experience would be for an end user who doesn't know the json schema system but is using the library. Out of curiosity I might try to find some time to sit down with some PHP developers and see how they handle the debugging experience.

Issue 2: Incorrect API data
Say we've defined that a Server has two properties: a name (which is a string) and metadata (which is an object). If the API returns a name as an array, that obviously fails validation. When the schema code goes to validate the API data, it will raise validation error when it comes to validate that "name" property. How you consequently use this validation error them is completely up to you: you could output it to STDOUT, you could save it to a local log on the filesystem, you could buffer it temporarily in memory.

Any API data that does not validate successfully against a schema should not be presented to the end-user. So if a "created_date" property is returned, that isn't defined in our schema, it should not be populated in the resulting model. The model returned to the end-user would be a simple object that implements \ArrayAccess, meaning that it can be accessed like a simple array.

3. Where would JSON schemas come from?

It depends on each OpenStack service. Glance and Marconi (soon) offer schemas directly through the API - so they are directly responsible for maintaining this - we'd just consume it. We could probably cache a local version to minimize requests.

For services that do not offer schemas yet, we'd have to use local schema files. There's a project called Tempest which does integration tests for OpenStack clusters, and it uses schema files. So there might be a possibility of using their files in the future. If this is not possible, we'd write them ourselves. It took me 1-2 days to write the entire Nova API. Once a schema file has been fully tested and signed off as 100% operational, it can be frozen as a set version.

Can we convert the schema files from Tempest into something we can use?


4. What would the workflow look like?

I don't really understand what you mean: can you elaborate?

For example, when would validation happen? Is that for testing or runtime for use in an application?


5. How does schema files handle business logic?

That's a really great question. I've written a brief write-up here: https://wiki.openstack.org/wiki/OpenStack-SDK-PHP/JSON-schema-business-logic<https://wiki.openstack.org/wiki/OpenStack-SDK-PHP/JSON-schema-business-logic#So_how_can_it_be_done_well.3F>


I think what you're proposing is that the methods map to API calls. There isn't any logic in these objects that isn't an API call.


Jamie

From: Matthew Farina <matt at mattfarina.com<mailto:matt at mattfarina.com>>
Date: Thursday, April 24, 2014 at 5:42 PM
To: Jamie Hannaford <jamie.hannaford at rackspace.com<mailto:jamie.hannaford at rackspace.com>>, "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Cc: "sam.choi at hp.com<mailto:sam.choi at hp.com>" <sam.choi at hp.com<mailto:sam.choi at hp.com>>
Subject: [openstack-sdk-php] discussion: json schema to define apis

Jamie (and whom ever else wants to jump in),

It's been proposed to use json schema to describe the API calls rather
than code. The operations to perform and what they do would be
described rather than coded and then some code would use the schema to
know how to act.

Others are already doing this. For example, the AWS SDK for PHP. Take
their S3 structure as an example
https://github.com/aws/aws-sdk-php/blob/master/src/Aws/S3/Resources/s3-2006-03-01.php.
The ability to do this goes beyond this one example. It just appears
to be something similar to what we're considering.

Given this in the scope of PHP I've got a number of questions. Several
of these I've compiled while discussing this with others so they don't
represent my point of view. Rather, they are just a list of
outstanding questions. Since this is a different method for handling
the API calls from the other SDKs being built the concept should be
really vetted.

Here are the questions:

1. Why use json schema rather than other reuse methods? I've discussed
the use of json schemas with others and those working on the other
languages have not been interested in json schema at the moment. Why
do it differently given the context?

Note, it might be worth looking at the python SDK which is doing
things differently. If I understand it right they are moving aware
from using managers and resources all together.

2. How will debugging work in practice? For example, a call is made
from behind a proxy. The proxy alters the HTTP headers so the request
fails and an exception is thrown. The schema and endpoint are valid.
It's something in the middle that changed things. Walking through the
code goes through magic methods to handle the schema. How would
debugging that work to understand what's happening compared to what
was expected.

3. Where would the json schemas for services come from and who would
manage them?

4. What would the workflow look like for working with the schemas at
both execution time for everyday use and for testing?

5. How would logic happen? Sometimes a request to an API is more than
just a request and response. For example, calling to something in
object storage where the object does not exist. The transport layer
will throw an exception (this goes all the way down to Guzzle throwing
one) that needs to be caught and managed. How should cases with some
logic like this be handled and easy to understand?

Thanks for looking into this. The topic has really sparked my
interest. I for one am really curious about the practicalities of
using json schema and the developer experience around it.

- Matt Farina




Jamie Hannaford

Software Developer III - CH
        [experience Fanatical Support]

Tel:    +41434303908<tel:%2B41434303908>
Mob:    +41791009767<tel:%2B41791009767>
        [Rackspace]




Rackspace International GmbH a company registered in the Canton of Zurich, Switzerland (company identification number CH-020.4.047.077-1) whose registered office is at Pfingstweidstrasse 60, 8005 Zurich, Switzerland. Rackspace International GmbH privacy policy can be viewed at www.rackspace.co.uk/legal/swiss-privacy-policy<http://www.rackspace.co.uk/legal/swiss-privacy-policy>
-
Rackspace Hosting Australia PTY LTD a company registered in the state of Victoria, Australia (company registered number ACN 153 275 524) whose registered office is at Suite 3, Level 7, 210 George Street, Sydney, NSW 2000, Australia. Rackspace Hosting Australia PTY LTD privacy policy can be viewed at www.rackspace.com.au/company/legal-privacy-statement.php<http://www.rackspace.com.au/company/legal-privacy-statement.php>
-
Rackspace US, Inc, 5000 Walzem Road, San Antonio, Texas 78218, United States of America
Rackspace US, Inc privacy policy can be viewed at www.rackspace.com/information/legal/privacystatement<http://www.rackspace.com/information/legal/privacystatement>
-
Rackspace Limited is a company registered in England & Wales (company registered number 03897010) whose registered office is at 5 Millington Road, Hyde Park Hayes, Middlesex UB3 4AZ.
Rackspace Limited privacy policy can be viewed at www.rackspace.co.uk/legal/privacy-policy<http://www.rackspace.co.uk/legal/privacy-policy>
-
Rackspace Benelux B.V. is a company registered in the Netherlands (company KvK nummer 34276327) whose registered office is at Teleportboulevard 110, 1043 EJ Amsterdam.
Rackspace Benelux B.V privacy policy can be viewed at www.rackspace.nl/juridisch/privacy-policy<http://www.rackspace.nl/juridisch/privacy-policy>
-
Rackspace Asia Limited is a company registered in Hong Kong (Company no: 1211294) whose registered office is at 9/F, Cambridge House, Taikoo Place, 979 King's Road, Quarry Bay, Hong Kong.
Rackspace Asia Limited privacy policy can be viewed at www.rackspace.com.hk/company/legal-privacy-statement.php<http://www.rackspace.com.hk/company/legal-privacy-statement.php>
-
This e-mail message (including any attachments or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at abuse at rackspace.com<mailto:abuse at rackspace.com> and delete the original message. Your cooperation is appreciated.




Jamie Hannaford
Software Developer III - CH     [experience Fanatical Support]

Tel:    +41434303908
Mob:    +41791009767
        [Rackspace]



Rackspace International GmbH a company registered in the Canton of Zurich, Switzerland (company identification number CH-020.4.047.077-1) whose registered office is at Pfingstweidstrasse 60, 8005 Zurich, Switzerland. Rackspace International GmbH privacy policy can be viewed at www.rackspace.co.uk/legal/swiss-privacy-policy
-
Rackspace Hosting Australia PTY LTD a company registered in the state of Victoria, Australia (company registered number ACN 153 275 524) whose registered office is at Suite 3, Level 7, 210 George Street, Sydney, NSW 2000, Australia. Rackspace Hosting Australia PTY LTD privacy policy can be viewed at www.rackspace.com.au/company/legal-privacy-statement.php
-
Rackspace US, Inc, 5000 Walzem Road, San Antonio, Texas 78218, United States of America
Rackspace US, Inc privacy policy can be viewed at www.rackspace.com/information/legal/privacystatement
-
Rackspace Limited is a company registered in England & Wales (company registered number 03897010) whose registered office is at 5 Millington Road, Hyde Park Hayes, Middlesex UB3 4AZ.
Rackspace Limited privacy policy can be viewed at www.rackspace.co.uk/legal/privacy-policy
-
Rackspace Benelux B.V. is a company registered in the Netherlands (company KvK nummer 34276327) whose registered office is at Teleportboulevard 110, 1043 EJ Amsterdam.
Rackspace Benelux B.V privacy policy can be viewed at www.rackspace.nl/juridisch/privacy-policy
-
Rackspace Asia Limited is a company registered in Hong Kong (Company no: 1211294) whose registered office is at 9/F, Cambridge House, Taikoo Place, 979 King's Road, Quarry Bay, Hong Kong.
Rackspace Asia Limited privacy policy can be viewed at www.rackspace.com.hk/company/legal-privacy-statement.php
-
This e-mail message (including any attachments or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at abuse at rackspace.com and delete the original message. Your cooperation is appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140429/db339767/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image9ae567.JPG
Type: image/jpeg
Size: 3124 bytes
Desc: image9ae567.JPG
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140429/db339767/attachment-0004.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image6ffd37.JPG
Type: image/jpeg
Size: 990 bytes
Desc: image6ffd37.JPG
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140429/db339767/attachment-0005.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image8b57d5.JPG
Type: image/jpeg
Size: 6844 bytes
Desc: image8b57d5.JPG
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140429/db339767/attachment-0006.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image1270b0.JPG
Type: image/jpeg
Size: 1074 bytes
Desc: image1270b0.JPG
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140429/db339767/attachment-0007.jpe>


More information about the OpenStack-dev mailing list