[Openstack-operators] Lets talk capacity monitoring

Mathieu Gagné mgagne at iweb.com
Thu Jan 15 17:25:59 UTC 2015


On 2015-01-15 11:43 AM, Jesse Keating wrote:
> We have a need to better manage the various openstack capacities across
> our numerous clouds. We want to be able to detect when capacity of one
> system or another is approaching the point where it would be a good idea
> to arrange to increase that capacity. Be it volume space, VCPU
> capability, object storage space, etc...
>
> What systems are you folks using to monitor and react to such things?
>

Thanks for bringing up the subject Jesse.

I believe you are not the only one facing this challenge because I am too.

I added the subject to the midcycle ops meetup (Capacity 
planning/monitoring) which I hope to be able to attend:
https://etherpad.openstack.org/p/PHL-ops-meetup


We are using host aggregates and have a complex combination of them. 
(imaging a venn diagram)

What we do is retrieving all:
- hypervisor stats
- host aggregates

 From there, we compute resource usage (vcpus, ram, disk) in any given 
host aggregate.

This part is very challenging as we have to partially reimplement 
nova-scheduler logic to determine if a given hypervisor has different 
resource allocation ratios based on host aggregate attributes.

The result in a table with resource usage percentage (and absolute 
numbers) for each host aggregates (and combinations).

Unfortunately, I can't share yet this first tool as my coworker very 
tightly integrated it to our internal monitoring tool and wouldn't work 
outside it. No promise but I'll try to find time to extract it and share 
it with you guys.


We also coded a very primitive tool which takes a flavor name and 
compute available "slots" on each hypervisors (regardless of host 
aggregate memberships):

https://gist.github.com/mgagne/bc54c3434a119246a88d

This tool is not actively used in our monitoring due to mentioned 
limitation as we would again have to partially reimplement 
nova-scheduler logic to determine if a given flavor can (or not) be 
spawn on a given hypervisor and filter it out from the output if it 
can't accept the flavor. Furthermore, it does not take into account 
resource allocation ratios based on host aggregates.

Hopefully, other people will join in and share their tools so we can all 
improve our OpenStack operations experience.

-- 
Mathieu



More information about the OpenStack-operators mailing list