[openstack-dev] [nova] VM diagnostics - V3 proposal

Daniel P. Berrange berrange at redhat.com
Thu Dec 19 16:07:56 UTC 2013

On Thu, Dec 19, 2013 at 08:02:16AM -0800, Gary Kotton wrote:
> On 12/19/13 5:50 PM, "Daniel P. Berrange" <berrange at redhat.com> wrote:
> >On Tue, Dec 17, 2013 at 04:28:30AM -0800, Gary Kotton wrote:
> >> Hi,
> >> Following the discussion yesterday I have updated the wiki - please see
> >> 
> >>https://urldefense.proofpoint.com/v1/url?u=https://wiki.openstack.org/wik
> >>i/Nova_VM_Diagnostics&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=eH0pxTUZo8NPZ
> >>yF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3D%0A&m=vzUZT3t%2BPvKlvBTFueAjUjo8YUZvDE
> >>tRLmlzFb5ORuM%3D%0A&s=d13969885872ea187937a89d12aab9b36b51452ba47e35c7e41
> >>692335967b9f7. The proposal is
> >> backwards compatible and will hopefully provide us with the tools to be
> >> able to troubleshoot VM issues.
> >
> >Some comments
> >
> > "If the driver is unable to return the value or does not have
> >  access to it at the moment then it should return 'n/a'."
> >
> >I think it is better if the driver just omitted any key that
> >it doesn't support altogether. That avoids clients / users
> >having to do magic string comparisons to identify omitted
> >data.
> I am fine with this. If the data is marked optional then whoever is
> parsing the data should check to see if the field exists prior.
> >
> > "An ID for the diagnostics version. The structure defined below
> >  is version 1 (Integer)"
> >
> >What are the proposed semantics for version numbers. Do they incremented
> >on any change, or only on backwards incompatible changes ?
> The purpose of this was to be backward compatible. But I guess that if we
> go with the optional approach then this is redundant.
> >
> > "The amount of time in seconds that the VM has been running (Integer)"
> >
> >I'd suggest nano-seconds here. I've been burnt too many times in the
> >past providing APIs where we rounded data to a coarse unit like seconds.
> Sure, sounds reasonable.

Oh hang on, when you say 'amount of time in seconds the VM has been running'
you're meaning wall-clock time since boot.  Seconds is fine for wall clock
time actually.

I was getting mixed up with CPU utilization time, since libvirt doesn't
actually provide any way to get "uptime".

> >Let client programs convert from nanoseconds to seconds if they wish
> >to display it in that way, but keep the API with the full precision.
> >
> >  "The version of the raw data"
> I guess that this is redundant too.
> >
> >Same question as previously.
> >
> >
> >
> >The allowed keys in network/disk/memory details seem to be
> >unduly limited. Just having a boolean "activity" for disk
> >or NICs seems almost entirely useless. eg the VM might have
> >sent 1 byte when it first booted and nothing more for the
> >next 10 days, and an admin can't see this.
> >
> >I'd suggest we should follow the much expanded set of possible
> >stats shown by the libvirt driver. These are pretty common
> >things to show for disk/nic activity and a driver wouldn't have
> >to support all of them if it doesn't have that info.
> Ok. I was just trying to provide an indicator for the admin to dive into
> the raw data. But I am fine with this.
> >
> >It would be nice to have CPU stats available too.
> At the moment libvirt only return the cpu0_time. Can you please let me
> know what other stats you would like here?

Since we have numCpus, I'd suggest we allow for a list of cpus in the
same way we do for disk/nics and returning the execution time split
out for each vCPU.  We could still have a merged execution time too
since I can imagine some hypervisors won't be able to provide the
split out per-vcpu time.

|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

More information about the OpenStack-dev mailing list