[openstack-dev] [Product] [all][log] Openstack HTTP error codes

Sean Dague sean at dague.net
Sat Jan 31 16:27:09 UTC 2015


On 01/31/2015 05:24 AM, Duncan Thomas wrote:
> Hi
> 
> This discussion came up at the cinder mid-cycle last week too,
> specifically in the context of 'Can we change the details text in an
> existing error, or is that an unacceptable API change'.
> 
> I have to second security / operational concerns about exposing too much
> granularity of failure in these error codes.
> 
> For cases where there is something wrong with the request (item out of
> range, invalid names, feature not supported, etc) I totally agree that
> we should have good, clear, parsable response, and standardisation would
> be good. Having some fixed part of the response (whether a numeric code
> or, as I tend to prefer, a CamelCaseDescription so that I don't have to
> go look it up) and a human readable description section that is subject
> to change seems sensible.
> 
> What I would rather not see is leakage of information when something
> internal to the cloud goes wrong, that the tenant can do nothing
> against. We certainly shouldn't be leaking internal implementation
> details like vendor details - that is what request IDs and logs are for.
> The whole point of the cloud, to me, is that separation between the
> things a tenant controls (what they want done) and what the cloud
> provider controls (the details of how the work is done).
> 
> For example, if a create volume request fails because cinder-scheduler
> has crashed, all the tenant should get back is 'Things are broken, try
> again later or pass request id 1234-5678-abcd-def0 to the cloud admin'.
> They should need to or even be allowed to care about the details of the
> failure, it is not their domain.

Sure, the value really is in determining things that are under the
client's control to do differently. A concrete one is a multi hypervisor
cloud with 2 hypervisors (say kvm and docker). The volume attach
operation to a docker instance (which presumably is a separate set of
instance types) can't work. The user should be told that that can't work
with this instance_type if they try it.

That's actually user correctable information. And doesn't require a
ticket to move forward.

I also think we could have a detail level knob, because I expect the
level of information exposure might be considered different in public
cloud use case vs. a private cloud at an org level or a private cloud at
a dept level.

	-Sean

> 
> 
> 
> On 30 January 2015 at 02:34, Rochelle Grober <rochelle.grober at huawei.com
> <mailto:rochelle.grober at huawei.com>> wrote:
> 
>     Hi folks!
> 
>     Changed the tags a bit because this is a discussion for all projects
>     and dovetails with logging rationalization/standards/
> 
>     At the Paris summit, we had a number of session on logging that kept
>     circling back to Error Codes.  But, these codes would not be http
>     codes, rather, as others have pointed out, codes related to the
>     calling entities and referring entities and the actions that
>     happened or didn’t.  Format suggestions were gathered from the
>     Operators and from some senior developers.  The Logging Working
>     Group is planning to put forth a spec for discussion on formats and
>     standards before the Ops mid-cycle meetup.
> 
>     Working from a Glance proposal on error codes: 
>     https://review.openstack.org/#/c/127482/ and discussions with
>     operators and devs, we have a strawman to propose.  We also have a
>     number of requirements from Ops and some Devs.
> 
>     Here is the basic idea:
> 
>     Code for logs would have four segments:
>     Project                                 Vendor/Component      Error
>     Catalog number     Criticality
>     Def         [A-Z] [A-Z] [A-Z]               -             
>     [{0-9}|{A-Z}][A-Z] -         [0000-9999]-                       [0-9]
>     Ex.          CIN-                                       NA-         
>                                   0001-                                 
>        2
>                     Cinder                                   NetApp     
>                                               driver error no           
>           Criticality
>     Ex.          GLA-                                      0A-         
>                                    0051                                 
>          3
>                     Glance                                  Api         
>                                    error no                             
>      Criticality
>     Three letters for project,  Either a two letter vendor code or a
>     number and letter for 0+letter for internal component of project
>     (like API=0A, Controller =0C, etc),  four digit error number which
>     could be subsetted for even finer granularity, and a criticality number.
> 
>     This is for logging purposes and tracking down root cause faster for
>     operators, but if an error is generated, why can the same codes be
>     used internally for the code as externally for the logs?  This also
>     allows for a unique message to be associated with the error code
>     that is more descriptive and that can be pre translated.  Again, for
>     logging purposes, the error code would not be part of the message
>     payload, but part of the headers.  Referrer IDs and other info would
>     still be expected in the payload of the message and could include
>     instance ids/names, NICs or VIFs, etc.  The message headers is code
>     in Oslo.log and when using the Oslo.log library, will be easy to use.
> 
>     Since this discussion came up, I thought I needed to get this info
>     out to folks and advertise that anyone will be able to comment on
>     the spec to drive it to agreement.  I will be  advertising it here
>     and on Ops and Product-WG mailing lists.  I’d also like to invite
>     anyone who want to participate in discussions to join them.  We’ll
>     be starting a bi-weekly or weekly IRC meeting (also announced in the
>     stated MLs) in February.
> 
>     And please realize that other than Oslo.log, the changes to make the
>     errors more useable will be almost entirely community created
>     standards with community created tools to help enforce them.  None
>     of which exist yet, FYI.
> 
>     --RockyG
> 
> 
> 
> 
> 
> 
>     From: Eugeniya Kudryashova [mailto:ekudryashova at mirantis.com
>     <mailto:ekudryashova at mirantis.com>]
>     Sent: Thursday, January 29, 2015 8:33 AM
>     To: openstack-dev at lists.openstack.org
>     <mailto:openstack-dev at lists.openstack.org>
>     Subject: [openstack-dev] [api][nova] Openstack HTTP error codes
> 
> 
>     Hi, all
> 
> 
> 
>     Openstack APIs interact with each other and external systems
>     partially by passing of HTTP errors. The only valuable difference
>     between types of exceptions is HTTP-codes, but current codes are
>     generalized, so external system can’t distinguish what actually
>     happened.
> 
> 
>     As an example two different failures below differs only by error
>     message:
> 
> 
>     request:
> 
>     POST /v2/790f5693e97a40d38c4d5bfdc45acb09/servers HTTP/1.1
> 
>     Host: 192.168.122.195:8774
>     <http://192.168.122.195:8774><http://192.168.122.195:8774>
> 
>     X-Auth-Project-Id: demo
> 
>     Accept-Encoding: gzip, deflate, compress
> 
>     Content-Length: 189
> 
>     Accept: application/json
> 
>     User-Agent: python-novaclient
> 
>     X-Auth-Token: 2cfeb9283d784cfba694f3122ef413bf
> 
>     Content-Type: application/json
> 
> 
>     {"server": {"name": "demo", "imageRef":
>     "171c9d7d-3912-4547-b2a5-ea157eb08622", "key_name": "test",
>     "flavorRef": "42", "max_count": 1, "min_count": 1,
>     "security_groups": [{"name": "bar"}]}}
> 
>     response:
> 
>         HTTP/1.1 400 Bad Request
> 
>     Content-Length: 118
> 
>     Content-Type: application/json; charset=UTF-8
> 
>     X-Compute-Request-Id: req-a995e1fc-7ea4-4305-a7ae-c569169936c0
> 
>     Date: Fri, 23 Jan 2015 10:43:33 GMT
> 
> 
>     {"badRequest": {"message": "Security group bar not found for project
>     790f5693e97a40d38c4d5bfdc45acb09.", "code": 400}}
> 
> 
>     and
> 
> 
>     request:
> 
>     POST /v2/790f5693e97a40d38c4d5bfdc45acb09/servers HTTP/1.1
> 
>     Host: 192.168.122.195:8774
>     <http://192.168.122.195:8774><http://192.168.122.195:8774>
> 
>     X-Auth-Project-Id: demo
> 
>     Accept-Encoding: gzip, deflate, compress
> 
>     Content-Length: 192
> 
>     Accept: application/json
> 
>     User-Agent: python-novaclient
> 
>     X-Auth-Token: 24c0d30ff76c42e0ae160fa93db8cf71
> 
>     Content-Type: application/json
> 
> 
>     {"server": {"name": "demo", "imageRef":
>     "171c9d7d-3912-4547-b2a5-ea157eb08622", "key_name": "foo",
>     "flavorRef": "42", "max_count": 1, "min_count": 1,
>     "security_groups": [{"name": "default"}]}}
> 
>     response:
> 
>     HTTP/1.1 400 Bad Request
> 
>     Content-Length: 70
> 
>     Content-Type: application/json; charset=UTF-8
> 
>     X-Compute-Request-Id: req-87604089-7071-40a7-a34b-7bc56d0551f5
> 
>     Date: Fri, 23 Jan 2015 10:39:43 GMT
> 
> 
>     {"badRequest": {"message": "Invalid key_name provided.", "code": 400}}
> 
> 
>     The former specifies an incorrect security group name, and the
>     latter an incorrect keypair name. And the problem is, that just
>     looking at the response body and HTTP response code an external
>     system can’t understand what exactly went wrong. And parsing of
>     error messages here is not the way we’d like to solve this problem.
> 
> 
>     Another example for solving this problem is AWS EC2 exception codes [1]
> 
> 
>     So if we have some service based on Openstack projects it would be
>     useful to have some concrete error codes(textual or numeric), which
>     could allow to define what actually goes wrong and later correctly
>     process obtained exception. These codes should be predefined for
>     each exception, have documented structure and allow to parse
>     exception correctly in each step of exception handling.
> 
> 
>     So I’d like to discuss implementing such codes and its usage in
>     openstack projects.
> 
>     [1] -
>     http://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html
>     _______________________________________________
>     Product-wg mailing list
>     Product-wg at lists.openstack.org <mailto:Product-wg at lists.openstack.org>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/product-wg
> 
> 
> 
> 
> -- 
> Duncan Thomas
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 465 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150131/6bd88dfc/attachment.pgp>


More information about the OpenStack-dev mailing list