Re: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning
Mark Goddard <mark@stackhpc.com> 于2019年6月12日周三 下午2:45写道:
On Wed, 12 Jun 2019, 06:23 Alex Xu, <soulxu@gmail.com> wrote:
Mark Goddard <mark@stackhpc.com> 于2019年6月12日周三 上午1:39写道:
On Mon, 10 Jun 2019 at 06:18, Alex Xu <soulxu@gmail.com> wrote:
Eric Fried <openstack@fried.cc> 于2019年6月7日周五 上午1:59写道:
Looking at the specs, it seems it's mostly talking about changing
Yes, it requires a reboot to reflect the BIOS changes. This reboot can be either be done by Nova IronicDriver or Ironic deploy step can also do it. So I am not sure if the spec actually satisfies the use case. I hope to get more response from the team to get more clarity.
Waitwait. The VM needs to be rebooted for the BIOS change to take effect? So (non-live) resize would actually satisfy your use case just fine. But the problem is that the ironic driver doesn't support resize at all?
Without digging too hard, that seems like it would be a fairly straightforward thing to add. It would be limited to only "same host" and initially you could only change this one attribute (anything else would have to fail).
Nova people, thoughts?
Contribute another idea.
So just as Jay said in this thread. Those CUSTOM_HYPERTHREADING_ON and CUSTOM_HYPERTHREADING_OFF are configuration. Those configuration isn't used for scheduling. Actually, Traits is designed for scheduling.
So yes, there should be only one trait. CUSTOM_HYPERTHREADING, this
VMs resources without rebooting. However that's not the actual intent of the Ironic use case I explained in the email. trait is used for indicating the host support HT. About whether enable it in the instance is configuration info.
That is also pain for change the configuration in the flavor. The
flavor is the spec of instance's virtual resource, not the configuration.
So another way is we should store the configuration into another
place. Like the server's metadata.
So for the HT case. We only fill the CUSTOM_HYPERTHREADING trait in
the flavor, and fill a server metadata 'hyperthreading_config=on' in server metadata. The nova will find out a BM node support HT. And ironic based on the server metadata 'hyperthreading_config=on' to enable the HT.
When change the configuration of HT to off, the user can update the
server's metadata. Currently, the nova will send a rpc call to the compute node and calling a virt driver interface when the server metadata is updated. In the ironic virt driver, it can trigger a hyper-threading configuration deploy step to turn the HT off, and do a reboot of the instance. (The reboot is a step inside deploy-step, not part of ironic virt driver flow)
But yes, this changes some design to the original deploy-steps and
deploy-templates. And we fill something into the server's metadata which I'm not sure nova people like it.
Anyway, just put my idea at here.
We did consider using metadata. The problem is that it is user-defined, so there is no way for an operator to restrict what can be done by a user. Flavors are operator-defined and so allow for selection from a 'menu' of types and configurations.
The end user can change the BIOS config by the ipmi inside the guest OS, and do a reboot. It is already out of control for the operator. (Correct me if ironic doesn't allow the end user change the config inside the guest OS)
It depends. Normally you can't configure BIOS via IPMI, but need to use a vendor interface such as racadm or on hardware that supports it, Redfish. Access to the management controller can and should be locked down though. It's also usually possible to reconfigure via serial console, if this is exposed to users.
It sounds that breaking the operator control partially. (Sorry for drop the mallist thread again...I will paste a note to the wall "click the "Reply All"...")
So Flavor should be thing to strict the resource( or resource's capable) which can be requested by the end user. For example, flavor will say I need a BM node has hyper-thread capable. But enable or disable can be controlled by the end user.
What might be nice is if we could use a flavor extra spec like this:
deploy-config:hyperthreading=enabled
The nova ironic virt driver could pass this to ironic, like it does with traits.
Then in the ironic deploy template, have fields like this:
name: Hyperthreading enabled config-type: hyperthreading config-value: enabled steps: <deploy steps>
Ironic would then match on the config-type and config-value to find a suitable deploy template.
As an extension, the deploy template could define a trait (or list of traits) that must be supported by a node in order for the template to be applied. Perhaps this would even be a standard relationship between config-type and traits?
Haven't thought this through completely, I'm sure it has holes.
efried .
Hi All, Thank you everyone for your responses. We have created an etherpad[1] with suggested solution and concerns. I request Nova and Ironic developers to provide their input on the etherpad. [1] https://etherpad.openstack.org/p/ironic-nova-reset-configuration Regards, Madhuri From: Alex Xu [mailto:soulxu@gmail.com] Sent: Thursday, June 13, 2019 11:25 AM To: Mark Goddard <mark@stackhpc.com>; openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning Mark Goddard <mark@stackhpc.com<mailto:mark@stackhpc.com>> 于2019年6月12日周三 下午2:45写道: On Wed, 12 Jun 2019, 06:23 Alex Xu, <soulxu@gmail.com<mailto:soulxu@gmail.com>> wrote: Mark Goddard <mark@stackhpc.com<mailto:mark@stackhpc.com>> 于2019年6月12日周三 上午1:39写道: On Mon, 10 Jun 2019 at 06:18, Alex Xu <soulxu@gmail.com<mailto:soulxu@gmail.com>> wrote:
Eric Fried <openstack@fried.cc<mailto:openstack@fried.cc>> 于2019年6月7日周五 上午1:59写道:
Looking at the specs, it seems it's mostly talking about changing VMs resources without rebooting. However that's not the actual intent of the Ironic use case I explained in the email. Yes, it requires a reboot to reflect the BIOS changes. This reboot can be either be done by Nova IronicDriver or Ironic deploy step can also do it. So I am not sure if the spec actually satisfies the use case. I hope to get more response from the team to get more clarity.
Waitwait. The VM needs to be rebooted for the BIOS change to take effect? So (non-live) resize would actually satisfy your use case just fine. But the problem is that the ironic driver doesn't support resize at all?
Without digging too hard, that seems like it would be a fairly straightforward thing to add. It would be limited to only "same host" and initially you could only change this one attribute (anything else would have to fail).
Nova people, thoughts?
Contribute another idea.
So just as Jay said in this thread. Those CUSTOM_HYPERTHREADING_ON and CUSTOM_HYPERTHREADING_OFF are configuration. Those configuration isn't used for scheduling. Actually, Traits is designed for scheduling.
So yes, there should be only one trait. CUSTOM_HYPERTHREADING, this trait is used for indicating the host support HT. About whether enable it in the instance is configuration info.
That is also pain for change the configuration in the flavor. The flavor is the spec of instance's virtual resource, not the configuration.
So another way is we should store the configuration into another place. Like the server's metadata.
So for the HT case. We only fill the CUSTOM_HYPERTHREADING trait in the flavor, and fill a server metadata 'hyperthreading_config=on' in server metadata. The nova will find out a BM node support HT. And ironic based on the server metadata 'hyperthreading_config=on' to enable the HT.
When change the configuration of HT to off, the user can update the server's metadata. Currently, the nova will send a rpc call to the compute node and calling a virt driver interface when the server metadata is updated. In the ironic virt driver, it can trigger a hyper-threading configuration deploy step to turn the HT off, and do a reboot of the instance. (The reboot is a step inside deploy-step, not part of ironic virt driver flow)
But yes, this changes some design to the original deploy-steps and deploy-templates. And we fill something into the server's metadata which I'm not sure nova people like it.
Anyway, just put my idea at here.
We did consider using metadata. The problem is that it is user-defined, so there is no way for an operator to restrict what can be done by a user. Flavors are operator-defined and so allow for selection from a 'menu' of types and configurations. The end user can change the BIOS config by the ipmi inside the guest OS, and do a reboot. It is already out of control for the operator. (Correct me if ironic doesn't allow the end user change the config inside the guest OS) It depends. Normally you can't configure BIOS via IPMI, but need to use a vendor interface such as racadm or on hardware that supports it, Redfish. Access to the management controller can and should be locked down though. It's also usually possible to reconfigure via serial console, if this is exposed to users. It sounds that breaking the operator control partially. (Sorry for drop the mallist thread again...I will paste a note to the wall "click the "Reply All"...") So Flavor should be thing to strict the resource( or resource's capable) which can be requested by the end user. For example, flavor will say I need a BM node has hyper-thread capable. But enable or disable can be controlled by the end user. What might be nice is if we could use a flavor extra spec like this: deploy-config:hyperthreading=enabled The nova ironic virt driver could pass this to ironic, like it does with traits. Then in the ironic deploy template, have fields like this: name: Hyperthreading enabled config-type: hyperthreading config-value: enabled steps: <deploy steps> Ironic would then match on the config-type and config-value to find a suitable deploy template. As an extension, the deploy template could define a trait (or list of traits) that must be supported by a node in order for the template to be applied. Perhaps this would even be a standard relationship between config-type and traits? Haven't thought this through completely, I'm sure it has holes.
efried .
We discussed this today in the nova meeting [1] with a little bit of followup in the main channel after the meeting closed [2]. There seems to be general support (or at least not objection) for implementing "resize" for ironic, limited to: - same host [3] - just this feature (i.e. "hyperthreading") or possibly "anything deploy template" And the consensus was that it's time to put this into a spec. There was a rocky spec [4] that has some overlap and could be repurposed; or a new one could be introduced. efried [1] http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-13-14.00.log.... [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2... (interleaved) [3] an acknowledged wrinkle here was that we need to be able to detect at the API level that we're dealing with an Ironic instance, and ignore the allow_resize_to_same_host option (because always forcing same host) [4] https://review.opendev.org/#/c/449155/
Hi Eric, Thank you for following up and the notes. The spec[4] is related but a complex one too with all the migration implementation. So I will try to put a new spec with a limited implementation of resize. Regards, Madhuri
-----Original Message----- From: Eric Fried [mailto:openstack@fried.cc] Sent: Thursday, June 13, 2019 11:15 PM To: openstack-discuss@lists.openstack.org Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning
We discussed this today in the nova meeting [1] with a little bit of followup in the main channel after the meeting closed [2].
There seems to be general support (or at least not objection) for implementing "resize" for ironic, limited to:
- same host [3] - just this feature (i.e. "hyperthreading") or possibly "anything deploy template"
And the consensus was that it's time to put this into a spec.
There was a rocky spec [4] that has some overlap and could be repurposed; or a new one could be introduced.
efried
[1] http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-13- 14.00.log.html#l-309 [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack- nova.2019-06-13.log.html#t2019-06-13T15:02:10 (interleaved) [3] an acknowledged wrinkle here was that we need to be able to detect at the API level that we're dealing with an Ironic instance, and ignore the allow_resize_to_same_host option (because always forcing same host) [4] https://review.opendev.org/#/c/449155/
On Fri, 14 Jun 2019 at 11:18, Kumari, Madhuri <madhuri.kumari@intel.com> wrote:
Hi Eric,
Thank you for following up and the notes.
The spec[4] is related but a complex one too with all the migration implementation. So I will try to put a new spec with a limited implementation of resize.
I was talking with Madhuri in #openstack-ironic about this today [1]. While talking it through I raised some concerns about the nova resize-based design, which I'll try to outline here. When we deploy a node using deploy templates, we have the following sequence. * user picks a flavor and image, which may specify required traits * selected traits are pushed to ironic via instance_info.traits * ironic finds all deploy templates with name matching one of the selected traits * deploy steps from the matching templates are used when provisioning the node The deploy steps could include RAID config, BIOS config, or something else. If we now resize the instance to a different flavor which has a different set of traits, we would end up with a new set of traits, which map a new set of deploy templates, with a new set of steps. How do we apply this change? Should we execute all matching deploy steps, which could (e.g. RAID) result in losing data? Or should we attempt to execute only those deploy steps that have changed? Would that always work? I don't think we keep a record of the steps used to provision a node, so if templates have changed in the intervening time then we might not get a correct diff. The original RFE [2] just called for specifying a list of deploy steps via ironic API, however this doesn't really work for the nova model. [1] http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/latest.log.html#t... [2] https://storyboard.openstack.org/#!/story/2005129
Regards, Madhuri
-----Original Message----- From: Eric Fried [mailto:openstack@fried.cc] Sent: Thursday, June 13, 2019 11:15 PM To: openstack-discuss@lists.openstack.org Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning
We discussed this today in the nova meeting [1] with a little bit of followup in the main channel after the meeting closed [2].
There seems to be general support (or at least not objection) for implementing "resize" for ironic, limited to:
- same host [3] - just this feature (i.e. "hyperthreading") or possibly "anything deploy template"
And the consensus was that it's time to put this into a spec.
There was a rocky spec [4] that has some overlap and could be repurposed; or a new one could be introduced.
efried
[1] http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-13- 14.00.log.html#l-309 [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack- nova.2019-06-13.log.html#t2019-06-13T15:02:10 (interleaved) [3] an acknowledged wrinkle here was that we need to be able to detect at the API level that we're dealing with an Ironic instance, and ignore the allow_resize_to_same_host option (because always forcing same host) [4] https://review.opendev.org/#/c/449155/
If we now resize the instance to a different flavor which has a different set of traits, we would end up with a new set of traits, which map a new set of deploy templates, with a new set of steps.
How do we apply this change? Should we execute all matching deploy steps, which could (e.g. RAID) result in losing data? Or should we attempt to execute only those deploy steps that have changed? Would that always work? I don't think we keep a record of the steps used to provision a node, so if templates have changed in the intervening time then we might not get a correct diff.
Not being intimately familiar with the workings, the approach I've been advocating is to only support the changes you support, and fail on anything else. In other words, compare the old flavor to the new flavor. If the diff contains anything other than this "hyperthreading" gizmo, fail. Ironic resize is a special snowflake, and only supports a very limited set of changes done in a very limited way. At first, it's just one thing. You can add other pieces as demand arises, but by default you're rejecting big complicated things like your RAID example. efried .
On Mon, 24 Jun 2019 at 17:13, Eric Fried <openstack@fried.cc> wrote:
If we now resize the instance to a different flavor which has a different set of traits, we would end up with a new set of traits, which map a new set of deploy templates, with a new set of steps.
How do we apply this change? Should we execute all matching deploy steps, which could (e.g. RAID) result in losing data? Or should we attempt to execute only those deploy steps that have changed? Would that always work? I don't think we keep a record of the steps used to provision a node, so if templates have changed in the intervening time then we might not get a correct diff.
Not being intimately familiar with the workings, the approach I've been advocating is to only support the changes you support, and fail on anything else.
In other words, compare the old flavor to the new flavor. If the diff contains anything other than this "hyperthreading" gizmo, fail.
Hmm, I hadn't realised it would be quite this restricted. Although this could make it work, it does seem to be baking more ironic specifics into nova. There is an issue of standardisation here. Currently we do not have standard traits to describe these things, instead we use custom traits. The reason for this has been discussed earlier in this thread, essentially that we need to encode configuration key and value into the trait, and use the lack of a trait as 'don't care'. We did briefly discuss an alternative approach, but we're a fair way off having that.
Ironic resize is a special snowflake, and only supports a very limited set of changes done in a very limited way. At first, it's just one thing. You can add other pieces as demand arises, but by default you're rejecting big complicated things like your RAID example.
efried .
Hmm, I hadn't realised it would be quite this restricted. Although this could make it work, it does seem to be baking more ironic specifics into nova.
Well, that's what virt drivers are for. In the simplest implementation, you have the Ironic virt driver's migrate_disk_and_power_off do the restrictive checking (all the information you need should be available to that method) and fail if necessary. That sucks a little bit because the failure is late (at compute vs. conductor or API). But that seems acceptable for something this limited, and is really no different than e.g. the libvirt driver failing if you try to resize the ephemeral disk down [1].
There is an issue of standardisation here. Currently we do not have standard traits to describe these things, instead we use custom traits. The reason for this has been discussed earlier in this thread, essentially that we need to encode configuration key and value into the trait, and use the lack of a trait as 'don't care'. We did briefly discuss an alternative approach, but we're a fair way off having that.
I'm not sure that should really matter. If the logic lives in the virt driver as suggested above, you can do whatever fancy parsing and interpretation you like. efried P.S. I'll continue to repeat this disclaimer: I'm just spitballing here, no idea if this approach would have the support of Nova maintainers at large, or if there are major architectural blockers I'm not thinking of. [1] https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/drive...
Hi Mark,
-----Original Message----- From: Mark Goddard [mailto:mark@stackhpc.com]
Hmm, I hadn't realised it would be quite this restricted. Although this could make it work, it does seem to be baking more ironic specifics into nova.
There is an issue of standardisation here. Currently we do not have standard traits to describe these things, instead we use custom traits. The reason for this has been discussed earlier in this thread, essentially that we need to encode configuration key and value into the trait, and use the lack of a trait as 'don't care'. We did briefly discuss an alternative approach, but we're a fair way off having that.
I think the issue of standardization is not related to the specific use case we are discussing here. It applies to the current state of ironic virt driver as well as you said. The idea of using the flavor metadata can fix this but that’s in itself another piece of work. Regards, Madhuri
On Wed, 26 Jun 2019 at 12:20, Kumari, Madhuri <madhuri.kumari@intel.com> wrote:
Hi Mark,
-----Original Message----- From: Mark Goddard [mailto:mark@stackhpc.com]
Hmm, I hadn't realised it would be quite this restricted. Although this could make it work, it does seem to be baking more ironic specifics into nova.
There is an issue of standardisation here. Currently we do not have standard traits to describe these things, instead we use custom traits. The reason for this has been discussed earlier in this thread, essentially that we need to encode configuration key and value into the trait, and use the lack of a trait as 'don't care'. We did briefly discuss an alternative approach, but we're a fair way off having that.
I think the issue of standardization is not related to the specific use case we are discussing here. It applies to the current state of ironic virt driver as well as you said. The idea of using the flavor metadata can fix this but that’s in itself another piece of work.
That's not quite true. Currently we don't specify any trait values for deploy templates anywhere in nova or ironic. They're entirely defined by the operator (as are the deploy templates that reference them). This would need to become standard if we're to add it to code.
Regards, Madhuri
Hi Mark,
-----Original Message----- From: Mark Goddard [mailto:mark@stackhpc.com]
I was talking with Madhuri in #openstack-ironic about this today [1]. While talking it through I raised some concerns about the nova resize-based design, which I'll try to outline here.
When we deploy a node using deploy templates, we have the following sequence.
* user picks a flavor and image, which may specify required traits * selected traits are pushed to ironic via instance_info.traits * ironic finds all deploy templates with name matching one of the selected traits * deploy steps from the matching templates are used when provisioning the node
The deploy steps could include RAID config, BIOS config, or something else.
If we now resize the instance to a different flavor which has a different set of traits, we would end up with a new set of traits, which map a new set of deploy templates, with a new set of steps.
How do we apply this change? Should we execute all matching deploy steps, which could (e.g. RAID) result in losing data? Or should we attempt to execute only those deploy steps that have changed? Would that always work? I don't think we keep a record of the steps used to provision a node, so if templates have changed in the intervening time then we might not get a correct diff.
Mark and I had discussion about this yesterday[1]. A possible way to fix this issue is by restricting deploy_steps from specific interface such as allow bios but restrict raid. However there could be some deploy_steps with bios interface which we might not want to allow during a resize. I don't have an example now. So I think it's better to restrict individual deploy_steps rather than a driver type. [1] http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-iron... Regards, Madhuri
participants (4)
-
Alex Xu
-
Eric Fried
-
Kumari, Madhuri
-
Mark Goddard