[openstack-dev] [Magnum] Next auto-scaling feature design?

hieulq at vn.fujitsu.com hieulq at vn.fujitsu.com
Thu Aug 18 07:56:31 UTC 2016

Hi Magnum folks,

I have some interests in our auto scaling features and currently testing with some container monitoring solutions such as heapster, telegraf and prometheus. I have seen the PoC session corporate with Senlin in Austin and have some questions regarding of this design:
- We have decided to move all container management from Magnum to Zun, so is there only one level of scaling (node) instead of both node and container?
- The PoC design show that Magnum (Magnum Scaler) need to depend on Heat/Ceilometer for gathering metrics and do the scaling work based on auto scaling policies, but is Heat/Ceilometer is the best choice for Magnum auto scaling? 

Currently, I saw that Magnum only send CPU and Memory metric to Ceilometer, and Heat can grab these to decide the right scaling method. IMO, this approach have some problems, please take a look and give feedbacks:
- The AutoScaling Policy and AutoScaling Resource of Heat cannot handle complex scaling policies. For example: 
If CPU > 80% then scale out
If Mem < 40% then scale in
-> What if CPU = 90% and Mem = 30%, the conflict policy will appear.
There are some WIP patch-set of Heat conditional logic in [1]. But IMO, the conditional logic of Heat also cannot resolve the conflict of scaling policies. For example:
If CPU > 80% and Mem >70% then scale out
If CPU < 30% or Mem < 50% then scale in
-> What if CPU = 90% and Mem = 30%.
Thus, I think that we need to implement magnum scaler for validating the policy conflicts.
- Ceilometer may have troubles if we deploy thousands of COE. 

I think we need a new design for auto scaling feature, not for Magnum only but also Zun (because the scaling level of container maybe forked to Zun too). Here are some ideas:
1. Add new field enable_monitor to cluster template (ex baymodel) and show the monitoring URL when creating cluster (bay) complete. For example, we can use Prometheus as monitoring container for each cluster. (Heapster is the best choice for k8s, but not good enough for swarm or mesos).
2. Create Magnum scaler manager (maybe a new service):
- Monitoring enabled monitor cluster and send metric to ceilometer if need.
- Manage user-defined scaling policy: not only cpu and memory but also other metrics like network bw, CCU.
- Validate user-defined scaling policy and trigger heat for scaling actions. (can trigger nova-scheduler for more scaling options)
- Need highly scalable architecture, first step we can implement simple validator method but in the future, there are some other approach such as using fuzzy logic or AI to make an appropriate decision.

Some use case for operators:
- I want to create a k8s cluster, and if CCU or network bandwidth is high please scale-out X nodes in other regions.
- I want to create swarm cluster, and if CPU or memory is too high, please scale-out X nodes to make sure total CPU and memory is about 50%.

What do you think about these above ideas/problems?

[1]. https://blueprints.launchpad.net/heat/+spec/support-conditions-function

Hieu LE.

More information about the OpenStack-dev mailing list