[openstack-dev] [TripleO] RFC: profile matching
dtantsur at redhat.com
Wed Dec 2 09:59:23 UTC 2015
On 12/01/2015 06:55 PM, Ben Nemec wrote:
> Sorry for not getting to this earlier. Some thoughts inline.
> On 11/09/2015 08:51 AM, Dmitry Tantsur wrote:
>> Hi folks!
>> I spent some time thinking about bringing profile matching back in, so
>> I'd like to get your comments on the following near-future plan.
>> First, the scope of the problem. What we do is essentially kind of
>> capability discovery. We'll help nova scheduler with doing the right
>> thing by assigning a capability like "suits for compute", "suits for
>> controller", etc. The most obvious path is to use inspector to assign
>> capabilities like "profile=1" and then filter nodes by it.
>> A special care, however, is needed when some of the nodes match 2 or
>> more profiles. E.g. if we have all 4 nodes matching "compute" and then
>> only 1 matching "controller", nova can select this one node for
>> "compute" flavor, and then complain that it does not have enough hosts
>> for "controller".
>> We also want to conduct some sanity check before even calling to
>> heat/nova to avoid cryptic "no valid host found" errors.
>> (1) Inspector part
>> During the liberty cycle we've landed a whole bunch of API's to
>> inspector that allow us to define rules on introspection data. The plan
>> is to have rules saying, for example:
>> rule 1: if memory_mb >= 8192, add capability "compute_profile=1"
>> rule 2: if local_gb >= 100, add capability "controller_profile=1"
>> Note that these rules are defined via inspector API using a JSON-based
>> DSL .
>> As you see, one node can receive 0, 1 or many such capabilities. So we
>> need the next step to make a final decision, based on how many nodes we
>> need of every profile.
> Is the intent that this will replace the standalone ahc-match call that
> currently assigns profiles to nodes? In general I'm +1 on simplifying
> the process (which is why I'm finally revisiting this) so I think I'm
> onboard with that idea.
>> (2) Modifications of `overcloud deploy` command: assigning profiles
>> New argument --assign-profiles will be added. If it's provided,
>> tripleoclient will fetch all ironic nodes, and try to ensure that we
>> have enough nodes with all profiles.
>> Nodes with existing "profile:xxx" capability are left as they are. For
>> nodes without a profile it will look at "xxx_profile" capabilities
>> discovered on the previous step. One of the possible profiles will be
>> chosen and assigned to "profile" capability. The assignment stops as
>> soon as we have enough nodes of a flavor as requested by a user.
> And this assignment would follow the same rules as the existing AHC
> version does? So if I had a rules file that specified 3 controllers, 3
> cephs, and an unlimited number of computes, it would first find and
> assign 3 controllers, then 3 cephs, and finally assign all the other
> matching nodes to compute.
There's no longer a spec file, though we could create something like
that. The spec file had 2 problems:
1. it was used to maintain state in local file system
2. it was completely out of sync with what was later passed to the
deploy command. So you could, for example, request 1 controller and the
remaining to be computes in a spec file, and then request deploy with 2
controllers, which was doomed to fail.
> I guess there's still a danger if ceph nodes also match the controller
> profile definition but not the other way around, because a ceph node
> might get chosen as a controller and then there won't be enough matching
> ceph nodes when we get to that. IIRC (it's been a while since I've done
> automatic profile matching) that's how it would work today so it's an
> existing problem, but it would be nice if we could fix that as part of
> this work. I'm not sure how complex the resolution code for such
> conflicts would need to be.
My current patch does not deal with it. Spec file only had ordering, so
you could process 'ceph' before 'controller'. We can do the same by
accepting something like --profile-ordering=ceph,controller,compute. WDYT?
I can't think of something smarter for now, any ideas are welcome.
>> (3) Modifications of `overcloud deploy` command: validation
>> To avoid 'no valid host found' errors from nova, the deploy command will
>> fetch all flavors involved and look at the "profile" capabilities. If
>> they are set for any flavors, it will check if we have enough ironic
>> nodes with a given "profile:xxx" capability. This check will happen
>> after profiles assigning, if --assign-profiles is used.
By the way, this is already implemented. I was not aware of it while
writing my first email.
>> Please let me know what you think.
>>  https://github.com/openstack/ironic-inspector#introspection-rules
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev