[Openstack] Rebuild instance from failed host

Shyam Kaushik shyam at zadarastorage.com
Tue Nov 29 06:04:08 UTC 2011


Hi Folks,



Today in openstack, “rebuild” instance tears down a running instance & sets
up a fresh instance in its place on the same host. “resize” instance
migrates the underlying instance disk to another physical host and spawns
the instance there. However both these options require that the origin host
that was running the instance to be up for the operations to work. If the
host is failed (could be irrecoverable if the root FS is corrupted), we
cannot recover the instance. All operations on that instance would fail.



We want to introduce a new “rebuild instance from failed host” operation
whereby we could rebuild the instance on another host with the same
properties (instance-id, name, network info, metadata, volume attachments)
and mark the old instance on failed host for cleanup. Whenever the failed
host comes up, it will clear cache for the old instance. This operation is
essentially a modified form of today’s “rebuild” instance, in terms of
allowing to rebuild the instance even if the underlying host has failed.



Essentially the “rebuild instance from failed host” will do the following
steps:
# See if it can terminate running instance on existing host. If not create
a migration record

# Change “host” for instance to a new host (picked up by scheduler) & spawn
the instance on that host (with volume attachments, networks connected as
it was with the original instance)

# Optionally during this procedure allow instance flavor to be changed +
possibility to give a different “image reference” for it to bootup (could
possibly be used to upgrade OS image of the instance during this procedure).

# Whenever the failed host comes up, it will read through migration records
(as part of init_host), clear up its cache & mark the migration complete.



Note that this procedure could also be used for Upgrading image versions +
changing instance flavors even when the origin host is alive, but that is
not the primary intended use case.



Question is, is this a reasonable proposal to go forward? If not, are there
any other alternative procedures available to meet the requirement?



If this is a reasonable proposal to go forward, I will submit a blueprint &
follow-up with implementation.



Thanks.



--Shyam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20111129/3f0b0878/attachment.html>


More information about the Openstack mailing list