[openstack-dev] [TripleO] Fixing Swift rings when upscaling/replacing nodes in TripleO deployments

Arkady.Kanevsky at dell.com Arkady.Kanevsky at dell.com
Mon Jan 9 13:21:08 UTC 2017


Thanks

-----Original Message-----
From: Christian Schwede [mailto:cschwede at redhat.com] 
Sent: Thursday, January 05, 2017 1:29 PM
To: openstack-dev at lists.openstack.org
Subject: Re: [openstack-dev] [TripleO] Fixing Swift rings when upscaling/replacing nodes in TripleO deployments

On 05.01.2017 17:03, Steven Hardy wrote:
> On Thu, Jan 05, 2017 at 02:56:15PM +0000, Arkady.Kanevsky at dell.com wrote:
>> I have concern to rely on undercloud for overcloud swift.
>> Undercloud is not HA (yet) so it may not be operational when disk failed or swift overcloud node is added/deleted.
> 
> I think the proposal is only for a deploy-time dependency, after the 
> overcloud is deployed there should be no dependency on the undercloud 
> swift, because the ring data will have been copied to all the nodes.

Yes, exactly - there is no runtime dependency. The overcloud will continue to work even if the undercloud is gone.

If you "loose" the undercloud (or more precisely, the overcloud rings that are stored on the undercloud Swift) you can copy them from any overcloud node and run an update.

Even if one deletes the rings from the undercloud, the deployment will continue to work after an update - puppet-swift will simply continue to use the already existing .builder files on the nodes.

Only if one deletes the rings on the undercloud and runs an update with new/replaced nodes it will fail - the swift-recon check will raise an error in step 5 because rings are inconsistent on the new/replaced nodes. But the inconsistency is already the case today (in fact it's the same way as it works today), except that there is no check and no warning to the operator.

-- Christian

> During create/update operations you need the undercloud operational by 
> definition, so I think this is probably OK?
> 
> Steve
>>
>> -----Original Message-----
>> From: Christian Schwede [mailto:cschwede at redhat.com]
>> Sent: Thursday, January 05, 2017 6:14 AM
>> To: OpenStack Development Mailing List 
>> <openstack-dev at lists.openstack.org>
>> Subject: [openstack-dev] [TripleO] Fixing Swift rings when 
>> upscaling/replacing nodes in TripleO deployments
>>
>> Hello everyone,
>>
>> there was an earlier discussion on $subject last year [1] regarding a bug when upscaling or replacing nodes in TripleO [2].
>>
>> Shortly summarized: Swift rings are built on each node separately, and if adding or replacing nodes (or disks) this will break the rings because they are no longer consistent across the nodes. What's needed are the previous ring builder files on each node before changing the rings.
>>
>> My former idea in [1] was to build the rings in advance on the undercloud, and also using introspection data to gather a set of disks on each node for the rings.
>>
>> However, this changes the current way of deploying significantly, and also requires more work in TripleO and Mistral (for example to trigger a ring build on the undercloud after the nodes have been started, but before the deployment triggers the Puppet run).
>>
>> I prefer smaller steps to keep everything stable for now, and therefore I changed my patches quite a bit. This is my updated proposal:
>>
>> 1. Two temporary undercloud Swift URLs (one PUT, one GET) will be computed before Mistral starts the deployments. A new Mistral action to create such URLs is required for this [3].
>> 2. Each overcloud node will try to fetch rings from the undercloud Swift deployment before updating it's set of rings locally using the temporary GET url. This guarantees that each node uses the same source set of builder files. This happens in step 2. [4] 3. puppet-swift runs like today, updating the rings if required.
>> 4. Finally, at the end of the deployment (in step 5) the nodes will upload their modified rings to the undercloud using the temporary PUT urls. swift-recon will run before this, ensuring that all rings across all nodes are consistent.
>>
>> The two required patches [3][4] are not overly complex IMO, but they solve the problem of adding or replacing nodes without changing the current workflow significantly. It should be even easy to backport them if needed.
>>
>> I'll continue working on an improved way of deploying Swift rings (using introspection data), but using this approach it could be even done using todays workflow, feeding data into puppet-swift (probably with some updates to puppet-swift/tripleo-heat-templates to allow support for regions, zones, different disk layouts and the like). However, all of this could be built on top of these two patches.
>>
>> I'm curious about your thoughts and welcome any feedback or reviews!
>>
>> Thanks,
>>
>> -- Christian
>>
>>
>> [1]
>> http://lists.openstack.org/pipermail/openstack-dev/2016-August/100720
>> .html [2] https://bugs.launchpad.net/tripleo/+bug/1609421
>> [3] https://review.openstack.org/#/c/413229/
>> [4] https://review.openstack.org/#/c/414460/
>>
>> _____________________________________________________________________
>> _____ OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> _____________________________________________________________________
>> _____ OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list