[openstack-dev] [nova] key_pair update on rebuild (a whole lot of conversations)

Clint Byrum clint at fewbar.com
Wed Oct 4 04:29:01 UTC 2017

Excerpts from Sean Dague's message of 2017-10-03 16:16:48 -0400:
> There is currently a spec up for being able to specify a new key_pair
> name during the rebuild operation in Nova -
> https://review.openstack.org/#/c/375221/
> For those not completely familiar with Nova operations, rebuild triggers
> the "reset this vm to initial state" by throwing out all the disks, and
> rebuilding them from the initial glance images. It does however keep the
> IP address and device models when you do that. So it's useful for
> ephemeral but repeating workloads, where you'd rather not have the
> network information change out from under you.
> The spec is a little vague about when this becomes really useful,
> because this will not save you from "I lost my private key, and I have
> important data on that disk". Because the disk is destroyed. That's the
> point of rebuild. We once added this preserve_ephemeral flag to rebuild
> for trippleo on ironic, but it's so nasty we've scoped it to only work
> with ironic backends. Ephemeral should mean ephemeral.

Let me take a moment to apologize for that feature. It was the worst idea
we had in TripleO, even worse than the name. ;)

> Rebuild bypasses the scheduler. A rebuilt server stays on the same host
> as it was before, which means the operation has a good chance of being
> faster than a DELETE + CREATE, as the image cache on that host should
> already have the base image for you instance.

There are some pro's, but for the most part I'd rather train my users
to be creating new instances than train them to cling to fixed IPs and
single compute node resources. It's a big feature, and obviously we've
given it to users so they use it. But that doesn't mean it's the best
use of Nova development's time to be supporting it, nor is it the most
scalable way for users to interact with a cloud.

A trade-off for instance, is that a rebuilding server is unavailable while
rebuilding. The user cannot choose how long that server is unavailable,
or choose to roll back and make it available if something goes wrong. It's
rebuilding until it isn't. A new server, spun up somewhere else, can be
fully prepared before any switch is made. One of the best things about
being a cloud operator is that you put more onus on the users to fix
their own problems, and give them lots of tools to do it. But while a
server is being rebuilt it is entirely _the operator's problem_.

Also as an operator, while I appreciate that it's quick on that compute
node, I'd rather new servers be scheduled to the places that my scheduler
rules say they should go. I will at times want to drain a compute node,
and the longer the pet servers stick around and are rebuilt, the more
likely I am to have to migrate them forcibly.

> = Where I think we are? =
> I think with all this data we're at the following:
> Q: Should we add this to rebuild
> A: Yes, probably - after some enhancement to the spec *
> * - we really should have much better use cases about the situations it
> is expected to be used in. We spend a lot of time 2 and 3 years out
> trying to figure out how anyone would ever use a feature, and adding
> another one without this doesn't seem good
> Q: should this also be on reboot?
> A: NO - it would be too fragile
> I also think figuring out a way to get Nova out of the key storage
> business (which it really shouldn't be in) would be good. So if anyone
> wants to tackle Nova using Barbican for keys, that would be ++. Rebuild
> doesn't wait on that, but Barbican urls for keys seems like a much
> better world to be in.

The keys are great. Barbican is a fantastic tool for storing _secret_
keys, but feels like a massive amount of overkill for this tiny blob of
public data.

More information about the OpenStack-dev mailing list