New subject: [lists.openstack.org代发]Re: [nova] Thoughts on exposing exception type to non-admins in instance action event

13 Nov 2019

      tl;dr: What do people think about storing and showing the *type* of 
exception that is recorded with a failed instance action event (like a 
fault) to the owner of the server who may not be an admin?

Details:

As noted here [1] and recreated here [2] the instance action event 
details that a non-admin owner of a server sees do not contain any 
useful information about what caused the failure of the action. Here is 
an example of a failed resize from that paste (this is what the 
non-admin owner of the server would see):

$ openstack --os-compute-api-version 2.51 server event show vm2 
req-11487504-da59-411b-b3b8-267bebe9b0d2 -f json -c events
{
   "events": [
     {
       "finish_time": "2019-11-13T16:18:27.000000",
       "start_time": "2019-11-13T16:18:26.000000",
       "event": "cold_migrate",
       "result": "Error"
     },
     {
       "finish_time": "2019-11-13T16:18:27.000000",
       "start_time": "2019-11-13T16:18:26.000000",
       "event": "conductor_migrate_server",
       "result": "Error"
     }
   ]
}

Super useful, right?

In this case scheduling failed for the resize so the instance is not in 
ERROR status which means the user cannot see a fault message with the 
NoValidHost error either.

The admin can see the traceback in the failed action event list:

$ openstack --os-compute-api-version 2.51 server event show 
3ef043ea-e2d7-4565-a401-5c758e149f23 
req-11487504-da59-411b-b3b8-267bebe9b0d2 -f json -c events
{
   "events": [
     {
       "finish_time": "2019-11-13T16:18:27.000000",
       "start_time": "2019-11-13T16:18:26.000000",
       "traceback": "  File 
\"/opt/stack/nova/nova/conductor/manager.py\", line 301, in 
migrate_server\n    host_list)\n  File 
\"/opt/stack/nova/nova/conductor/manager.py\", line 367, in 
_cold_migrate\n    raise exception.NoValidHost(reason=msg)\n",
       "event": "cold_migrate",
       "result": "Error"
     },
     {
       "finish_time": "2019-11-13T16:18:27.000000",
       "start_time": "2019-11-13T16:18:26.000000",
       "traceback": "  File \"/opt/stack/nova/nova/compute/utils.py\", 
line 1411, in decorated_function\n    return function(self, context, 
*args, **kwargs)\n  File \"/opt/stack/nova/nova/conductor/manager.py\", 
line 301, in migrate_server\n    host_list)\n  File 
\"/opt/stack/nova/nova/conductor/manager.py\", line 367, in 
_cold_migrate\n    raise exception.NoValidHost(reason=msg)\n",
       "event": "conductor_migrate_server",
       "result": "Error"
     }
   ]
}

So when the admin gets the support ticket they can at least tell that 
scheduling failed and then dig into why.

My idea is to store the exception *type* with the action event, similar 
to the recorded instance fault message for non-NovaExceptions [3] which 
will show to the non-admin owner of the server if the server status is 
ERROR or DELETED [4].

We should record the exc_val to get a prettier message like "No valid 
host was found." but that could leak details in the error message that 
we don't want non-admins to see [5].

With what I'm thinking, the non-admin owner of the server could see 
something like this for a failed event:

     {
       "finish_time": "2019-11-13T16:18:27.000000",
       "start_time": "2019-11-13T16:18:26.000000",
       "event": "cold_migrate",
       "result": "Error",
       "details": "NoValidHost"
     }

That's pretty simple, doesn't leak details, and at least indicates to 
the user that maybe they can retry the resize with another flavor or 
something. It's just an example.

This would require a microversion so before writing a spec I wanted to 
get general feelings about this in the mailing list. I accept that it 
might not really be worth the effort so that's good feedback if it's how 
you feel (I'll only cry a little).

[1] https://review.opendev.org/#/c/693937/2/nova/objects/instance_action.py
[2] http://paste.openstack.org/show/786054/
[3] https://github.com/openstack/nova/blob/20.0.0/nova/compute/utils.py#L101
[4] 
https://github.com/openstack/nova/blob/20.0.0/nova/api/openstack/compute/vie...
[5] https://bugs.launchpad.net/nova/+bug/1851587

-- 

Thanks,

Matt

[nova] Thoughts on exposing exception type to non-admins in instance action event

Matt Riedemann

Matt Riedemann

Eric Fried

Sylvain Bauza

Matt Riedemann

Eric Fried

Matt Riedemann

Matt Riedemann

Tom Barron

Matt Riedemann

Brin Zhang(张百林)

Matt Riedemann

tags

participants (5)