[nova] Thoughts on exposing exception type to non-admins in instance action event
Matt Riedemann
mriedemos at gmail.com
Wed Nov 13 16:51:59 UTC 2019
tl;dr: What do people think about storing and showing the *type* of
exception that is recorded with a failed instance action event (like a
fault) to the owner of the server who may not be an admin?
Details:
As noted here [1] and recreated here [2] the instance action event
details that a non-admin owner of a server sees do not contain any
useful information about what caused the failure of the action. Here is
an example of a failed resize from that paste (this is what the
non-admin owner of the server would see):
$ openstack --os-compute-api-version 2.51 server event show vm2
req-11487504-da59-411b-b3b8-267bebe9b0d2 -f json -c events
{
"events": [
{
"finish_time": "2019-11-13T16:18:27.000000",
"start_time": "2019-11-13T16:18:26.000000",
"event": "cold_migrate",
"result": "Error"
},
{
"finish_time": "2019-11-13T16:18:27.000000",
"start_time": "2019-11-13T16:18:26.000000",
"event": "conductor_migrate_server",
"result": "Error"
}
]
}
Super useful, right?
In this case scheduling failed for the resize so the instance is not in
ERROR status which means the user cannot see a fault message with the
NoValidHost error either.
The admin can see the traceback in the failed action event list:
$ openstack --os-compute-api-version 2.51 server event show
3ef043ea-e2d7-4565-a401-5c758e149f23
req-11487504-da59-411b-b3b8-267bebe9b0d2 -f json -c events
{
"events": [
{
"finish_time": "2019-11-13T16:18:27.000000",
"start_time": "2019-11-13T16:18:26.000000",
"traceback": " File
\"/opt/stack/nova/nova/conductor/manager.py\", line 301, in
migrate_server\n host_list)\n File
\"/opt/stack/nova/nova/conductor/manager.py\", line 367, in
_cold_migrate\n raise exception.NoValidHost(reason=msg)\n",
"event": "cold_migrate",
"result": "Error"
},
{
"finish_time": "2019-11-13T16:18:27.000000",
"start_time": "2019-11-13T16:18:26.000000",
"traceback": " File \"/opt/stack/nova/nova/compute/utils.py\",
line 1411, in decorated_function\n return function(self, context,
*args, **kwargs)\n File \"/opt/stack/nova/nova/conductor/manager.py\",
line 301, in migrate_server\n host_list)\n File
\"/opt/stack/nova/nova/conductor/manager.py\", line 367, in
_cold_migrate\n raise exception.NoValidHost(reason=msg)\n",
"event": "conductor_migrate_server",
"result": "Error"
}
]
}
So when the admin gets the support ticket they can at least tell that
scheduling failed and then dig into why.
My idea is to store the exception *type* with the action event, similar
to the recorded instance fault message for non-NovaExceptions [3] which
will show to the non-admin owner of the server if the server status is
ERROR or DELETED [4].
We should record the exc_val to get a prettier message like "No valid
host was found." but that could leak details in the error message that
we don't want non-admins to see [5].
With what I'm thinking, the non-admin owner of the server could see
something like this for a failed event:
{
"finish_time": "2019-11-13T16:18:27.000000",
"start_time": "2019-11-13T16:18:26.000000",
"event": "cold_migrate",
"result": "Error",
"details": "NoValidHost"
}
That's pretty simple, doesn't leak details, and at least indicates to
the user that maybe they can retry the resize with another flavor or
something. It's just an example.
This would require a microversion so before writing a spec I wanted to
get general feelings about this in the mailing list. I accept that it
might not really be worth the effort so that's good feedback if it's how
you feel (I'll only cry a little).
[1] https://review.opendev.org/#/c/693937/2/nova/objects/instance_action.py
[2] http://paste.openstack.org/show/786054/
[3] https://github.com/openstack/nova/blob/20.0.0/nova/compute/utils.py#L101
[4]
https://github.com/openstack/nova/blob/20.0.0/nova/api/openstack/compute/views/servers.py#L564
[5] https://bugs.launchpad.net/nova/+bug/1851587
--
Thanks,
Matt
More information about the openstack-discuss
mailing list