[openstack-dev] [TripleO] Fixing Swift rings when upscaling/replacing nodes in TripleO deployments

Christian Schwede cschwede at redhat.com
Thu Jan 5 12:13:49 UTC 2017


Hello everyone,

there was an earlier discussion on $subject last year [1] regarding a
bug when upscaling or replacing nodes in TripleO [2].

Shortly summarized: Swift rings are built on each node separately, and
if adding or replacing nodes (or disks) this will break the rings
because they are no longer consistent across the nodes. What's needed
are the previous ring builder files on each node before changing the rings.

My former idea in [1] was to build the rings in advance on the
undercloud, and also using introspection data to gather a set of disks
on each node for the rings.

However, this changes the current way of deploying significantly, and
also requires more work in TripleO and Mistral (for example to trigger a
ring build on the undercloud after the nodes have been started, but
before the deployment triggers the Puppet run).

I prefer smaller steps to keep everything stable for now, and therefore
I changed my patches quite a bit. This is my updated proposal:

1. Two temporary undercloud Swift URLs (one PUT, one GET) will be
computed before Mistral starts the deployments. A new Mistral action to
create such URLs is required for this [3].
2. Each overcloud node will try to fetch rings from the undercloud Swift
deployment before updating it's set of rings locally using the temporary
GET url. This guarantees that each node uses the same source set of
builder files. This happens in step 2. [4]
3. puppet-swift runs like today, updating the rings if required.
4. Finally, at the end of the deployment (in step 5) the nodes will
upload their modified rings to the undercloud using the temporary PUT
urls. swift-recon will run before this, ensuring that all rings across
all nodes are consistent.

The two required patches [3][4] are not overly complex IMO, but they
solve the problem of adding or replacing nodes without changing the
current workflow significantly. It should be even easy to backport them
if needed.

I'll continue working on an improved way of deploying Swift rings (using
introspection data), but using this approach it could be even done using
todays workflow, feeding data into puppet-swift (probably with some
updates to puppet-swift/tripleo-heat-templates to allow support for
regions, zones, different disk layouts and the like). However, all of
this could be built on top of these two patches.

I'm curious about your thoughts and welcome any feedback or reviews!

Thanks,

-- Christian


[1]
http://lists.openstack.org/pipermail/openstack-dev/2016-August/100720.html
[2] https://bugs.launchpad.net/tripleo/+bug/1609421
[3] https://review.openstack.org/#/c/413229/
[4] https://review.openstack.org/#/c/414460/



More information about the OpenStack-dev mailing list