[openstack-dev] [nova] VM diagnostics - V3 proposal

John Garbutt john at johngarbutt.com
Mon Dec 16 15:37:39 UTC 2013


On 16 December 2013 15:25, Daniel P. Berrange <berrange at redhat.com> wrote:
> On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
>> Hi,
>> At the moment the administrator is able to retrieve diagnostics for a running VM. Currently the implementation is very loosely defined, that is, each driver returns whatever they have to return. This is problematic in a number of respects:
>>
>>  1.  The tempest tests were written specifically for one driver and break with all other drivers (the test was removed to prevent this – bug 1240043)
>>  2.  An admin is unable to write tools that may work with a hybrid cloud
>>  3.  Adding support for get_diagnostics for drivers that do not support is painful
>
> Technically 3 is currently easy, since currently you don't need to care
> about what the other drivers have done - you can return any old info
> for your new driver's get_diagnostics API impl ;-)
>
> Seriously though, I agree the current API is a big trainwreck.

+1

>> I'd like to propose the following for the V3 API (we will not touch V2
>> in case operators have applications that are written against this – this
>> may be the case for libvirt or xen. The VMware API support was added
>> in I1):
>>
>>  1.  We formalize the data that is returned by the API [1]
>
> Before we debate what standard data should be returned we need
> detail of exactly what info the current 3 virt drivers return.
> IMHO it would be better if we did this all in the existing wiki
> page associated with the blueprint, rather than etherpad, so it
> serves as a permanent historical record for the blueprint design.

+1

> While we're doing this I think we should also consider whether
> the 'get_diagnostics' API is fit for purpose more generally.
> eg currently it is restricted to administrators. Some, if
> not all, of the data libvirt returns is relevant to the owner
> of the VM but they can not get at it.

Ceilometer covers that ground, we should ask them about this API.

> For a cloud administrator it might be argued that the current
> API is too inefficient to be useful in many troubleshooting
> scenarios since it requires you to invoke it once per instance
> if you're collecting info on a set of guests, eg all VMs on
> one host. It could be that cloud admins would be better
> served by an API which returned info for all VMs ona host
> at once, if they're monitoring say, I/O stats across VM
> disks to identify one that is causing I/O trouble ? IOW, I
> think we could do with better identifying the usage scenarios
> for this API if we're to improve its design / impl.

I like the API that helps you dig into info for a specific host that
other system highlight as problematic.
You can do things that could be expensive to compute, but useful for
troubleshooting.

But you are right, we should think about it first.

>
>>  2.  We enable the driver to add extra information that will assist the administrators in troubleshooting problems for VM's
>>

I think we need to version this information, if possible. I don't like
the idea of the driver just changing the public API as it wishes.

>> I have proposed a BP for this - https://blueprints.launchpad.net/nova/+spec/diagnostics-namespace (I'd like to change the name to v3-api-diagnostics – which is more apt)
>
> The bp rename would be a good idea.
+1

>> [1] https://etherpad.openstack.org/p/vm-diagnostics

John



More information about the OpenStack-dev mailing list