Open Stack

Thu Jan 5 14:56:15 UTC 2017

I have concern to rely on undercloud for overcloud swift.
Undercloud is not HA (yet) so it may not be operational when disk failed or swift overcloud node is added/deleted.

-----Original Message-----
From: Christian Schwede [mailto:cschwede at redhat.com] 
Sent: Thursday, January 05, 2017 6:14 AM
To: OpenStack Development Mailing List <openstack-dev at lists.openstack.org>
Subject: [openstack-dev] [TripleO] Fixing Swift rings when upscaling/replacing nodes in TripleO deployments

Hello everyone,

there was an earlier discussion on $subject last year [1] regarding a bug when upscaling or replacing nodes in TripleO [2].

Shortly summarized: Swift rings are built on each node separately, and if adding or replacing nodes (or disks) this will break the rings because they are no longer consistent across the nodes. What's needed are the previous ring builder files on each node before changing the rings.

My former idea in [1] was to build the rings in advance on the undercloud, and also using introspection data to gather a set of disks on each node for the rings.

However, this changes the current way of deploying significantly, and also requires more work in TripleO and Mistral (for example to trigger a ring build on the undercloud after the nodes have been started, but before the deployment triggers the Puppet run).

I prefer smaller steps to keep everything stable for now, and therefore I changed my patches quite a bit. This is my updated proposal:

1. Two temporary undercloud Swift URLs (one PUT, one GET) will be computed before Mistral starts the deployments. A new Mistral action to create such URLs is required for this [3].
2. Each overcloud node will try to fetch rings from the undercloud Swift deployment before updating it's set of rings locally using the temporary GET url. This guarantees that each node uses the same source set of builder files. This happens in step 2. [4] 3. puppet-swift runs like today, updating the rings if required.
4. Finally, at the end of the deployment (in step 5) the nodes will upload their modified rings to the undercloud using the temporary PUT urls. swift-recon will run before this, ensuring that all rings across all nodes are consistent.

The two required patches [3][4] are not overly complex IMO, but they solve the problem of adding or replacing nodes without changing the current workflow significantly. It should be even easy to backport them if needed.

I'll continue working on an improved way of deploying Swift rings (using introspection data), but using this approach it could be even done using todays workflow, feeding data into puppet-swift (probably with some updates to puppet-swift/tripleo-heat-templates to allow support for regions, zones, different disk layouts and the like). However, all of this could be built on top of these two patches.

I'm curious about your thoughts and welcome any feedback or reviews!

Thanks,

-- Christian

[1]
http://lists.openstack.org/pipermail/openstack-dev/2016-August/100720.html
[2] https://bugs.launchpad.net/tripleo/+bug/1609421
[3] https://review.openstack.org/#/c/413229/
[4] https://review.openstack.org/#/c/414460/

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [TripleO] Fixing Swift rings when upscaling/replacing nodes in TripleO deployments

OpenStack

Community

Documentation

Branding & Legal