[openstack-dev] [nova][cinder] About rebuilding volume-backed instances.
Clint Byrum
clint at fewbar.com
Wed Nov 11 17:44:49 UTC 2015
Excerpts from Murray, Paul (HP Cloud)'s message of 2015-11-11 09:01:16 -0800:
> > Unfortunately, you're trying to work around misuse of the cloud API's, not
> > missing features from them. Don't use those volume types, and don't build
> > systems that rely on single ports and interfaces. IMO rebuild is a misguided
> > concept (something that took me a long time to realize).
>
> Slightly tangential to this discussion but specifically to your comment about
> rebuild: HP (and I believe RAX) kept asking OpenStack infra to use rebuild in
> node pool instead of constantly deleting and creating instances.
> The reason being it would give them a significant performance advantage
> and dramatically improve the use of resources in the operators cloud. Node
> pool would have gained a vastly better resource utilization. The
> fact infra did not do that (I believe) is partly because it was difficult to refactor
> node pool for the purpose (after all, it works and there's other things to do) and
> partly because the resources were free. In a pay-per-use scenario the decision
> would have been different.
>
> This is off topic because it's not about cinder volumes. But I guess there are two
> take aways:
>
> On one hand, in the end they didn't really need rebuild - node pool works without it.
>
> On the other, it would have reduced their costs (if paying) and made better use of
> resources.
>
> I'll leave it to the reader to decide which of these matter to you.
>
Thanks, that is a helpful anecdote that I think shows the dangerous
nature of this feature. It _seems_ like a good thing because it lets the
operator expose cost savings to the user.
But IMO, the cloud operators were forced into this way of thinking
because deletes and creates have traditionally been expensive. They
go through schedulers, cause relational database churn, leave behind
dangling resources, etc. Rebuild is effectively a bypass around those
things, and for that reason I understand why it exists.
But, from a user perspective, any issue where you need to use rebuild
because it doesn't eat up more space on an already-full cloud or saves
you money in billed hours isn't real. If you can't build a new one
due to capacity, and you can handle downtime, simply stop the old one
before the build (leaving volumes and/or objects behind for the new one
to build from). No part of rebuild actually helps users other than
preserving ID's and attachments, which is not going to help them at all
if they need to move the workload to a new region/AZ/cloud.
So, if we actually make cloud churn scale well (which should not be
hard and is important anyway since cloud native apps will want to work
like this anyway), rebuild becomes a quaint server side way to stop,
create new, start, and delete old.
More information about the OpenStack-dev
mailing list