[openstack-dev] [magnum] Discuss the idea of manually managing the bay nodes

Hongbin Lu hongbin.lu at huawei.com
Mon Jun 20 22:54:08 UTC 2016


Hi all,

During the discussion in this ML and team meetings, it seems most of us accepted the idea of supporting heterogeneous cluster. What we didn't agree well is how to implement it. To move it forward, I am going to summarize various implementation options so that we can debate each options thoughtfully.

* Goal:
Add support for provisioning and managing a COE cluster with nodes of various types. For example, a k8s cluster with N groups of nodes: the first group of nodes have flavor A, the second group of nodes have flavor B, and so on.

* Option 1:
Implement it in Heat templates declaratively. For example, if users want to create a cluster with 5 nodes, Magnum will generate a set of mappings of parameters for each node. For example:

  $ heat stack-create -f cluster.yaml \
      -P count=5 \
      -P az_map='{"0":"az1",...,"4":"az4"}' \
      -P flavor_map='{"0":"m1.foo",...,"4":"m1.bar"}'

Inside the top-level template, it contains a single resource group. The trick is passing %index% to the nested template.

  $ cat cluster.yaml
  heat_template_version: 2015-04-30
  parameters:
    count:
      type: integer
    az_map:
      type: json
    flavor_map:
      type: json
  resources:
   AGroup:
      type: OS::Heat::ResourceGroup
      properties:
        count: {get_param: count}
        resource_def:
          type: server.yaml
          properties:
            availability_zone_map: {get_param: az_map}
            flavor_map: {get_param: flavor_map}
            index: '%index%'

In the nested template, use 'index' to retrieve the parameters.

  $ cat server.yaml
  heat_template_version: 2015-04-30
  parameters:
    availability_zone_map:
      type: json
    flavor_map:
      type: json
    index:
      type: string
  resources:
   server:
      type: OS::Nova::Server
      properties:
        image: the_image
        flavor: {get_param: [flavor_map, {get_param: index}]}
        availability_zone: {get_param: [availability_zone_map, {get_param: index}]}

This approach has a critical drawback. As pointed out by Zane [1], we cannot remove member from the middle of the list. Therefore, the usage of resource group was not recommended.

* Option 2:
Generate Heat template by using the generator [2]. The code to generate the Heat template will be something like below:

  $ cat generator.py
  from os_hotgen import composer
  from os_hotgen import heat

  tmpl_a = heat.Template(description="...")
  ...

  for group in rsr_groups:
      # parameters
      param_name = group.name + '_flavor'
      param_type = 'string'
      param_flavor = heat.Parameter(name=param_name, type=param_type)
      tmpl_a.add_parameter(param_flavor)
      param_name = group.name + '_az'
      param_type = 'string'
      param_az = heat.Parameter(name=param_name, type=param_type)
      tmpl_a.add_parameter(param_az)
      ...

      # resources
      rsc = heat.Resource(group.name, 'OS::Heat::ResourceGroup')
      resource_def = {
          'type': 'server.yaml',
          'properties': {
              'availability_zone': heat.FnGetParam(param_az.name),
              'flavor': heat.FnGetParam(param_flavor.flavor),
              ...
          }
      }
      resource_def_prp = heat.ResourceProperty('resource_def', resource_def)
      rsc.add_property(resource_def_prp)
      count_prp = heat.ResourceProperty('count', group.count)
      rsc.add_property(count_prp)
      tmpl_a.add_resource(rsc)
      ...

      print composer.compose_template(tmpl_a)

* Option 3:
Remove the usage of ResourceGroup and manually manage Heat stacks for each bay node. For example, for a cluster with 5 nodes, Magnum is going to create 5 Heat stacks:

  for node in nodes:
      fields = {
          'stack_name': node.name,
          'parameters': {
              'flavor': node.flavor,
              'availability_zone': node.availability_zone,
              ...
          },
          'template': 'server.yaml',
          ...
      }
      osc.heat().stacks.create(**fields)

The major change is to have Magnum manage multiple Heat stacks instead of one big stack. The main advantage is that Magnum can update stack freely and the codebase is relatively simple. I guess the main disadvantage is performance, as Magnum need to iterate all Heat stacks to compute the state of the cluster. An optimization is to combine this approach with ResourceGroup. For example, for a cluster with 2 nodes in flavor A and 3 nodes with flavor B, Magnum will create 2 Heat stacks: the first Heat stack contains a resource group with flavor A, the second Heat stack contains a resource group of flavor B.

Thoughts?

[1] http://lists.openstack.org/pipermail/openstack-dev/2016-June/097522.html
[2] https://review.openstack.org/#/c/328822/

Best regards,
Hongbin

> -----Original Message-----
> From: Ricardo Rocha [mailto:rocha.porto at gmail.com]
> Sent: June-07-16 3:02 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [magnum] Discuss the idea of manually
> managing the bay nodes
> 
> +1 on this. Another use case would be 'fast storage' for dbs, 'any
> storage' for memcache and web servers. Relying on labels for this makes
> it really simple.
> 
> The alternative of doing it with multiple clusters adds complexity to
> the cluster(s) description by users.
> 
> On Fri, Jun 3, 2016 at 1:54 AM, Fox, Kevin M <Kevin.Fox at pnnl.gov> wrote:
> > As an operator that has clouds that are partitioned into different
> host aggregates with different flavors targeting them, I totally
> believe we will have users that want to have a single k8s cluster span
> multiple different flavor types. I'm sure once I deploy magnum, I will
> want it too. You could have some special hardware on some nodes, not on
> others. but you can still have cattle, if you have enough of them and
> the labels are set appropriately. Labels allow you to continue to
> partition things when you need to, and ignore it when you dont, making
> administration significantly easier.
> >
> > Say I have a tenant with 5 gpu nodes, and 10 regular nodes allocated
> into a k8s cluster. I may want 30 instances of container x that doesn't
> care where they land, and prefer 5 instances that need cuda. The former
> can be deployed with a k8s deployment. The latter can be deployed with
> a daemonset. All should work well and very non pet'ish. The whole
> tenant could be viewed with a single pane of glass, making it easy to
> manage.
> >
> > Thanks,
> > Kevin
> > ________________________________________
> > From: Adrian Otto [adrian.otto at rackspace.com]
> > Sent: Thursday, June 02, 2016 4:24 PM
> > To: OpenStack Development Mailing List (not for usage questions)
> > Subject: Re: [openstack-dev] [magnum] Discuss the idea of manually
> > managing the bay nodes
> >
> > I am really struggling to accept the idea of heterogeneous clusters.
> My experience causes me to question whether a heterogeneus cluster
> makes sense for Magnum. I will try to explain why I have this
> hesitation:
> >
> > 1) If you have a heterogeneous cluster, it suggests that you are
> using external intelligence to manage the cluster, rather than relying
> on it to be self-managing. This is an anti-pattern that I refer to as
> “pets" rather than “cattle”. The anti-pattern results in brittle
> deployments that rely on external intelligence to manage (upgrade,
> diagnose, and repair) the cluster. The automation of the management is
> much harder when a cluster is heterogeneous.
> >
> > 2) If you have a heterogeneous cluster, it can fall out of balance.
> This means that if one of your “important” or “large” members fail,
> there may not be adequate remaining members in the cluster to continue
> operating properly in the degraded state. The logic of how to track and
> deal with this needs to be handled. It’s much simpler in the
> heterogeneous case.
> >
> > 3) Heterogeneous clusters are complex compared to homogeneous
> clusters. They are harder to work with, and that usually means that
> unplanned outages are more frequent, and last longer than they with a
> homogeneous cluster.
> >
> > Summary:
> >
> > Heterogeneous:
> >   - Complex
> >   - Prone to imbalance upon node failure
> >   - Less reliable
> >
> > Heterogeneous:
> >   - Simple
> >   - Don’t get imbalanced when a min_members concept is supported by
> the cluster controller
> >   - More reliable
> >
> > My bias is to assert that applications that want a heterogeneous mix
> of system capacities at a node level should be deployed on multiple
> homogeneous bays, not a single heterogeneous one. That way you end up
> with a composition of simple systems rather than a larger complex one.
> >
> > Adrian
> >
> >
> >> On Jun 1, 2016, at 3:02 PM, Hongbin Lu <hongbin.lu at huawei.com> wrote:
> >>
> >> Personally, I think this is a good idea, since it can address a set
> of similar use cases like below:
> >> * I want to deploy a k8s cluster to 2 availability zone (in future 2
> regions/clouds).
> >> * I want to spin up N nodes in AZ1, M nodes in AZ2.
> >> * I want to scale the number of nodes in specific AZ/region/cloud.
> For example, add/remove K nodes from AZ1 (with AZ2 untouched).
> >>
> >> The use case above should be very common and universal everywhere.
> To address the use case, Magnum needs to support provisioning
> heterogeneous set of nodes at deploy time and managing them at runtime.
> It looks the proposed idea (manually managing individual nodes or
> individual group of nodes) can address this requirement very well.
> Besides the proposed idea, I cannot think of an alternative solution.
> >>
> >> Therefore, I vote to support the proposed idea.
> >>
> >> Best regards,
> >> Hongbin
> >>
> >>> -----Original Message-----
> >>> From: Hongbin Lu
> >>> Sent: June-01-16 11:44 AM
> >>> To: OpenStack Development Mailing List (not for usage questions)
> >>> Subject: RE: [openstack-dev] [magnum] Discuss the idea of manually
> >>> managing the bay nodes
> >>>
> >>> Hi team,
> >>>
> >>> A blueprint was created for tracking this idea:
> >>> https://blueprints.launchpad.net/magnum/+spec/manually-manage-bay-
> >>> nodes . I won't approve the BP until there is a team decision on
> >>> accepting/rejecting the idea.
> >>>
> >>> From the discussion in design summit, it looks everyone is OK with
> >>> the idea in general (with some disagreements in the API style).
> >>> However, from the last team meeting, it looks some people disagree
> >>> with the idea fundamentally. so I re-raised this ML to re-discuss.
> >>>
> >>> If you agree or disagree with the idea of manually managing the
> Heat
> >>> stacks (that contains individual bay nodes), please write down your
> >>> arguments here. Then, we can start debating on that.
> >>>
> >>> Best regards,
> >>> Hongbin
> >>>
> >>>> -----Original Message-----
> >>>> From: Cammann, Tom [mailto:tom.cammann at hpe.com]
> >>>> Sent: May-16-16 5:28 AM
> >>>> To: OpenStack Development Mailing List (not for usage questions)
> >>>> Subject: Re: [openstack-dev] [magnum] Discuss the idea of manually
> >>>> managing the bay nodes
> >>>>
> >>>> The discussion at the summit was very positive around this
> >>> requirement
> >>>> but as this change will make a large impact to Magnum it will need
> >>>> a spec.
> >>>>
> >>>> On the API of things, I was thinking a slightly more generic
> >>>> approach to incorporate other lifecycle operations into the same
> API.
> >>>> Eg:
> >>>> magnum bay-manage <bay> <life-cycle-op>
> >>>>
> >>>> magnum bay-manage <bay> reset –hard magnum bay-manage <bay>
> rebuild
> >>>> magnum bay-manage <bay> node-delete <name/uuid> magnum bay-manage
> >>>> <bay> node-add –flavor <flavor> magnum bay-manage <bay> node-reset
> >>>> <name> magnum bay-manage <bay> node-list
> >>>>
> >>>> Tom
> >>>>
> >>>> From: Yuanying OTSUKA <yuanying at oeilvert.org>
> >>>> Reply-To: "OpenStack Development Mailing List (not for usage
> >>>> questions)" <openstack-dev at lists.openstack.org>
> >>>> Date: Monday, 16 May 2016 at 01:07
> >>>> To: "OpenStack Development Mailing List (not for usage questions)"
> >>>> <openstack-dev at lists.openstack.org>
> >>>> Subject: Re: [openstack-dev] [magnum] Discuss the idea of manually
> >>>> managing the bay nodes
> >>>>
> >>>> Hi,
> >>>>
> >>>> I think, user also want to specify the deleting node.
> >>>> So we should manage “node” individually.
> >>>>
> >>>> For example:
> >>>> $ magnum node-create —bay …
> >>>> $ magnum node-list —bay
> >>>> $ magnum node-delete $NODE_UUID
> >>>>
> >>>> Anyway, if magnum want to manage a lifecycle of container
> >>>> infrastructure.
> >>>> This feature is necessary.
> >>>>
> >>>> Thanks
> >>>> -yuanying
> >>>>
> >>>>
> >>>> 2016年5月16日(月) 7:50 Hongbin Lu
> >>>> <hongbin.lu at huawei.com<mailto:hongbin.lu at huawei.com>>:
> >>>> Hi all,
> >>>>
> >>>> This is a continued discussion from the design summit. For recap,
> >>>> Magnum manages bay nodes by using ResourceGroup from Heat. This
> >>>> approach works but it is infeasible to manage the heterogeneity
> >>> across
> >>>> bay nodes, which is a frequently demanded feature. As an example,
> >>>> there is a request to provision bay nodes across availability
> zones
> >>> [1].
> >>>> There is another request to provision bay nodes with different set
> >>>> of flavors [2]. For the request features above, ResourceGroup
> won’t
> >>>> work very well.
> >>>>
> >>>> The proposal is to remove the usage of ResourceGroup and manually
> >>>> create Heat stack for each bay nodes. For example, for creating a
> >>>> cluster with 2 masters and 3 minions, Magnum is going to manage 6
> >>> Heat
> >>>> stacks (instead of 1 big Heat stack as right now):
> >>>> * A kube cluster stack that manages the global resources
> >>>> * Two kube master stacks that manage the two master nodes
> >>>> * Three kube minion stacks that manage the three minion nodes
> >>>>
> >>>> The proposal might require an additional API endpoint to manage
> >>>> nodes or a group of nodes. For example:
> >>>> $ magnum nodegroup-create --bay XXX --flavor m1.small --count 2 --
> >>>> availability-zone us-east-1 ….
> >>>> $ magnum nodegroup-create --bay XXX --flavor m1.medium --count 3 -
> -
> >>>> availability-zone us-east-2 …
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> [1] https://blueprints.launchpad.net/magnum/+spec/magnum-
> >>> availability-
> >>>> zones
> >>>> [2] https://blueprints.launchpad.net/magnum/+spec/support-
> multiple-
> >>>> flavor
> >>>>
> >>>> Best regards,
> >>>> Hongbin
> >>>>
> >>>
> ____________________________________________________________________
> >>> __
> >>>> _
> >>>> ___
> >>>> OpenStack Development Mailing List (not for usage questions)
> >>>> Unsubscribe: OpenStack-dev-
> >>>> request at lists.openstack.org?subject:unsubscribe<http://OpenStack-
> de
> >>>> v- request at lists.openstack.org?subject:unsubscribe>
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>>
> >>>
> ____________________________________________________________________
> >>> __
> >>>> _
> >>>> ___
> >>>> OpenStack Development Mailing List (not for usage questions)
> >>>> Unsubscribe: OpenStack-dev-
> >>>> request at lists.openstack.org?subject:unsubscribe
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> _____________________________________________________________________
> >> _____ OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe:
> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> ______________________________________________________________________
> > ____ OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> ______________________________________________________________________
> > ____ OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________________________________
> ___
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


More information about the OpenStack-dev mailing list