[openstack-dev] [nova] VM diagnostics - V3 proposal
Daniel P. Berrange
berrange at redhat.com
Thu Dec 19 16:07:56 UTC 2013
On Thu, Dec 19, 2013 at 08:02:16AM -0800, Gary Kotton wrote:
>
>
> On 12/19/13 5:50 PM, "Daniel P. Berrange" <berrange at redhat.com> wrote:
>
> >On Tue, Dec 17, 2013 at 04:28:30AM -0800, Gary Kotton wrote:
> >> Hi,
> >> Following the discussion yesterday I have updated the wiki - please see
> >>
> >>https://urldefense.proofpoint.com/v1/url?u=https://wiki.openstack.org/wik
> >>i/Nova_VM_Diagnostics&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=eH0pxTUZo8NPZ
> >>yF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3D%0A&m=vzUZT3t%2BPvKlvBTFueAjUjo8YUZvDE
> >>tRLmlzFb5ORuM%3D%0A&s=d13969885872ea187937a89d12aab9b36b51452ba47e35c7e41
> >>692335967b9f7. The proposal is
> >> backwards compatible and will hopefully provide us with the tools to be
> >> able to troubleshoot VM issues.
> >
> >Some comments
> >
> > "If the driver is unable to return the value or does not have
> > access to it at the moment then it should return 'n/a'."
> >
> >I think it is better if the driver just omitted any key that
> >it doesn't support altogether. That avoids clients / users
> >having to do magic string comparisons to identify omitted
> >data.
>
> I am fine with this. If the data is marked optional then whoever is
> parsing the data should check to see if the field exists prior.
>
> >
> > "An ID for the diagnostics version. The structure defined below
> > is version 1 (Integer)"
> >
> >What are the proposed semantics for version numbers. Do they incremented
> >on any change, or only on backwards incompatible changes ?
>
> The purpose of this was to be backward compatible. But I guess that if we
> go with the optional approach then this is redundant.
>
> >
> > "The amount of time in seconds that the VM has been running (Integer)"
> >
> >I'd suggest nano-seconds here. I've been burnt too many times in the
> >past providing APIs where we rounded data to a coarse unit like seconds.
>
> Sure, sounds reasonable.
Oh hang on, when you say 'amount of time in seconds the VM has been running'
you're meaning wall-clock time since boot. Seconds is fine for wall clock
time actually.
I was getting mixed up with CPU utilization time, since libvirt doesn't
actually provide any way to get "uptime".
> >Let client programs convert from nanoseconds to seconds if they wish
> >to display it in that way, but keep the API with the full precision.
> >
> > "The version of the raw data"
>
> I guess that this is redundant too.
>
> >
> >Same question as previously.
> >
> >
> >
> >The allowed keys in network/disk/memory details seem to be
> >unduly limited. Just having a boolean "activity" for disk
> >or NICs seems almost entirely useless. eg the VM might have
> >sent 1 byte when it first booted and nothing more for the
> >next 10 days, and an admin can't see this.
> >
> >I'd suggest we should follow the much expanded set of possible
> >stats shown by the libvirt driver. These are pretty common
> >things to show for disk/nic activity and a driver wouldn't have
> >to support all of them if it doesn't have that info.
>
> Ok. I was just trying to provide an indicator for the admin to dive into
> the raw data. But I am fine with this.
>
> >
> >It would be nice to have CPU stats available too.
>
> At the moment libvirt only return the cpu0_time. Can you please let me
> know what other stats you would like here?
Since we have numCpus, I'd suggest we allow for a list of cpus in the
same way we do for disk/nics and returning the execution time split
out for each vCPU. We could still have a merged execution time too
since I can imagine some hypervisors won't be able to provide the
split out per-vcpu time.
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the OpenStack-dev
mailing list