[openstack-dev] [nova][ceilometer] proposal to send bulk hypervisor stats data in periodic notifications
Matt Riedemann
mriedem at linux.vnet.ibm.com
Fri Jun 26 19:48:10 UTC 2015
On 6/26/2015 2:17 PM, Matt Riedemann wrote:
>
>
> On 6/22/2015 4:55 AM, Daniel P. Berrange wrote:
>> On Sun, Jun 21, 2015 at 11:14:00AM -0500, Matt Riedemann wrote:
>>>
>>>
>>> On 6/20/2015 3:35 PM, Daniel P. Berrange wrote:
>>>> On Sat, Jun 20, 2015 at 01:50:53PM -0500, Matt Riedemann wrote:
>>>>> Waking up from a rare nap opportunity on a Saturday, this is what was
>>>>> bothering me:
>>>>>
>>>>> The proposal in the etherpad assumes that we are just getting bulk
>>>>> host/domain/guest VM stats from the hypervisor and sending those in a
>>>>> notification, but how do we go about filtering those out to only
>>>>> instances
>>>>> that were booted through Nova?
>>>>
>>>> In general I would say that is an unsupported deployment scenario to
>>>> have other random virt guests running on a nova compute node.
>>>>
>>>> Having said that, when nova uses libguestfs, it will create some temp
>>>> guests via libvirt, so we do have to consider that possibility.
>>>>
>>>> Even today with the general list domains virt driver call, we could be
>>>> getting domains that weren't launched by Nova I believe.
>>>>
>>>>> Jason pointed out the ceilometer code gets all of the non-error state
>>>>> instances from nova first [1] and then for each of those it does
>>>>> the domain
>>>>> lookup from libvirt, filtering out any that are in SHUTOFF state [2].
>>>>>
>>>>> When talking about the new virt driver API for bulk stats, danpb
>>>>> said to use
>>>>> virConnectGetAllDomainStats with libvirt [3] but I'm not aware of
>>>>> that being
>>>>> able to filter out instances that weren't created by nova. I don't
>>>>> think we
>>>>> want a notification from nova about the hypervisor stats to include
>>>>> things
>>>>> that were created outside nova, like directly through virsh or
>>>>> vCenter.
>>>>>
>>>>> For at least libvirt, if virConnectGetAllDomainStats returns the
>>>>> domain
>>>>> metadata then we can filter those since there is nova-specific
>>>>> metadata in
>>>>> the domains created through nova [4] but I'm not sure that's true
>>>>> about the
>>>>> other virt types in nova (I think the vCenter driver tags VMs
>>>>> somehow as
>>>>> being created by OpenStack/Nova, but not sure about
>>>>> xen/hyper-v/ironic).
>>>>
>>>> The nova database hsa a list of domains that it owns, so if you
>>>> query the
>>>> database for a list of valid UUIDs for the host, you can use that to
>>>> filter
>>>> the domains that libvirt reports by comparing UUIDs.
>>>>
>>>> Regards,
>>>> Daniel
>>>>
>>>
>>> Dan, is virsh domstats using virConnectGetAllDomainStats? I have
>>> libvirt
>>> 1.2.8 on RHEL 7.1, created two m1.tiny instances through nova and got
>>> this
>>> from virsh domstats:
>>>
>>> http://paste.openstack.org/show/310874/
>>>
>>> Is that similar to what we'd see from virConnectGetAllDomainStats? I
>>> haven't yet written any code in the libvirt driver to use
>>> virConnectGetAllDomainStats to see what that looks like.
>>
>> Yes, that's the kind of data you'd expect.
>>
>>
>> Regards,
>> Daniel
>>
>
> Here is another issue I just thought of. There are limits to the size
> of a message you can send through RPC right? So what if you have a lot
> of instances running and you're pulling bulk stats on them and sending
> over rpc via a notification? Is there the possibility that we blow that
> up on message size limits?
>
> For libvirt/xen/hyper-v this is maybe not a big deal since the compute
> node is 1:1 with the hypervisor and I'd think in most cases you don't
> have enough instances running on that compute host to blow the size
> limit on the message payload, unless you have a big ass compute host.
>
> But what about clustered virt drivers like vcenter and ironic? That one
> compute node could be getting bulk stats on an entire cloud (vcenter
> cluster at least).
>
> Maybe we could just chunk the messages/notifications if we know the rpc
> message limit?
>
With respect to message size limit, I found a thread in the rabbitmq
mailing list [1] talking about message size limits which basically says
you're only bounded by resources available, but sending things too large
is obviously a bad idea since you starve the system and can potentially
screw up the heartbeat checking.
The actual 64K size limit I was really thinking of originally was a Qpid
limitation that was fixed in the long long ago by bnemec [2].
So I guess for the purpose of a bulk stats notification, we'd probably
be safe to keep the messages under 64K and just chunk through the list
of instances.
[1]
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2012-March/018699.html
[2] https://review.openstack.org/#/c/28711/
--
Thanks,
Matt Riedemann
More information about the OpenStack-dev
mailing list