[openstack-dev] [nova][ceilometer] proposal to send bulk hypervisor stats data in periodic notifications

Matt Riedemann mriedem at linux.vnet.ibm.com
Fri Jun 26 19:48:10 UTC 2015

On 6/26/2015 2:17 PM, Matt Riedemann wrote:
> On 6/22/2015 4:55 AM, Daniel P. Berrange wrote:
>> On Sun, Jun 21, 2015 at 11:14:00AM -0500, Matt Riedemann wrote:
>>> On 6/20/2015 3:35 PM, Daniel P. Berrange wrote:
>>>> On Sat, Jun 20, 2015 at 01:50:53PM -0500, Matt Riedemann wrote:
>>>>> Waking up from a rare nap opportunity on a Saturday, this is what was
>>>>> bothering me:
>>>>> The proposal in the etherpad assumes that we are just getting bulk
>>>>> host/domain/guest VM stats from the hypervisor and sending those in a
>>>>> notification, but how do we go about filtering those out to only
>>>>> instances
>>>>> that were booted through Nova?
>>>> In general I would say that is an unsupported deployment scenario to
>>>> have other random virt guests running on a nova compute node.
>>>> Having said that, when nova uses libguestfs, it will create some temp
>>>> guests via libvirt, so we do have to consider that possibility.
>>>> Even today with the general list domains virt driver call, we could be
>>>> getting domains that weren't launched by Nova I believe.
>>>>> Jason pointed out the ceilometer code gets all of the non-error state
>>>>> instances from nova first [1] and then for each of those it does
>>>>> the domain
>>>>> lookup from libvirt, filtering out any that are in SHUTOFF state [2].
>>>>> When talking about the new virt driver API for bulk stats, danpb
>>>>> said to use
>>>>> virConnectGetAllDomainStats with libvirt [3] but I'm not aware of
>>>>> that being
>>>>> able to filter out instances that weren't created by nova.  I don't
>>>>> think we
>>>>> want a notification from nova about the hypervisor stats to include
>>>>> things
>>>>> that were created outside nova, like directly through virsh or
>>>>> vCenter.
>>>>> For at least libvirt, if virConnectGetAllDomainStats returns the
>>>>> domain
>>>>> metadata then we can filter those since there is nova-specific
>>>>> metadata in
>>>>> the domains created through nova [4] but I'm not sure that's true
>>>>> about the
>>>>> other virt types in nova (I think the vCenter driver tags VMs
>>>>> somehow as
>>>>> being created by OpenStack/Nova, but not sure about
>>>>> xen/hyper-v/ironic).
>>>> The nova database hsa a list of domains that it owns, so if you
>>>> query the
>>>> database for a list of valid UUIDs for the host, you can use that to
>>>> filter
>>>> the domains that libvirt reports by comparing UUIDs.
>>>> Regards,
>>>> Daniel
>>> Dan, is virsh domstats using virConnectGetAllDomainStats?  I have
>>> libvirt
>>> 1.2.8 on RHEL 7.1, created two m1.tiny instances through nova and got
>>> this
>>> from virsh domstats:
>>> http://paste.openstack.org/show/310874/
>>> Is that similar to what we'd see from virConnectGetAllDomainStats?  I
>>> haven't yet written any code in the libvirt driver to use
>>> virConnectGetAllDomainStats to see what that looks like.
>> Yes, that's the kind of data you'd expect.
>> Regards,
>> Daniel
> Here is another issue I just thought of.  There are limits to the size
> of a message you can send through RPC right?  So what if you have a lot
> of instances running and you're pulling bulk stats on them and sending
> over rpc via a notification?  Is there the possibility that we blow that
> up on message size limits?
> For libvirt/xen/hyper-v this is maybe not a big deal since the compute
> node is 1:1 with the hypervisor and I'd think in most cases you don't
> have enough instances running on that compute host to blow the size
> limit on the message payload, unless you have a big ass compute host.
> But what about clustered virt drivers like vcenter and ironic?  That one
> compute node could be getting bulk stats on an entire cloud (vcenter
> cluster at least).
> Maybe we could just chunk the messages/notifications if we know the rpc
> message limit?

With respect to message size limit, I found a thread in the rabbitmq 
mailing list [1] talking about message size limits which basically says 
you're only bounded by resources available, but sending things too large 
is obviously a bad idea since you starve the system and can potentially 
screw up the heartbeat checking.

The actual 64K size limit I was really thinking of originally was a Qpid 
limitation that was fixed in the long long ago by bnemec [2].

So I guess for the purpose of a bulk stats notification, we'd probably 
be safe to keep the messages under 64K and just chunk through the list 
of instances.

[2] https://review.openstack.org/#/c/28711/



Matt Riedemann

More information about the OpenStack-dev mailing list