[openstack-dev] [nova] Silent nova fails
Andrew Laski
andrew at lascii.com
Mon Aug 17 14:03:54 UTC 2015
On 08/17/15 at 02:59pm, Timofei Durakov wrote:
>Hello,
>
>In current design there are places when nova fails while executing users
>CLI commands, but no error messages, except some logs in nova-compute,
>produced [1] . The problem is that there is no response from compute node
>to conductor, as RPC cast is used.
>
>To fix this nova should make a synchronous call before operation itself to
>verify that it is valid. E.g. here is my patch that fixes this problem in
>resize operation [2]
I think that Nova should avoid synchronous calls when at all possible.
They often end up leading to timeouts and needing to be very careful
about locking or idempotence because the natural reaction to a timeout
is to try again, but often the original operation is still in progress.
And when there is a timeout, or disconnect, you've lost the benefit you
were hoping to gain of providing immediate feedback. I think that
rather than trying to treat requests as local operations we should
embrace the asynchronous nature of the distributed system and work on a
robust way to provide feedback that works with, rather than against, how
Nova is architected.
There is already a framework in place for doing this called "instance
actions" which are visible via the Nova API. And a longer term solution
under discussion called tasks. By having a resize task exposed in the
API a user could check on the status of that and see if it had
succeeded/failed and get a relevant error message for a failure.
>
>So, I would like to get feedback about such hypervisor checks before
>operations. Nova already makes these checks during live-migration process:
>conductor calls compute manager[3], which also consults with driver[4]. And
>as for me I think we should use such logic in resize operation.
>
>Timofey.
>
>[1] https://bugs.launchpad.net/nova/+bug/1455460
>
>[2] https://review.openstack.org/195088
>
>[3]
>https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L144
>
>[4]
>https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L5157
>__________________________________________________________________________
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list