[openstack-dev] [TripleO] Improving Swift deployments with TripleO

Christian Schwede cschwede at redhat.com
Wed Aug 3 14:18:30 UTC 2016


Thanks Steven for your feedback! Please see my answers inline.

On 02.08.16 23:46, Steven Hardy wrote:
> On Tue, Aug 02, 2016 at 09:36:45PM +0200, Christian Schwede wrote:
>> Hello everyone,
>>
>> I'd like to improve the Swift deployments done by TripleO. There are a
>> few problems today when deployed with the current defaults:
> 
> Thanks for digging into this, I'm aware this has been something of a
> known-issue for some time, so it's great to see it getting addressed :)
> 
> Some comments inline;
> 
>> 1. Adding new nodes (or replacing existing nodes) is not possible,
>> because the rings are built locally on each host and a new node doesn't
>> know about the "history" of the rings. Therefore rings might become
>> different on the nodes, and that results in an unusable state eventually.
>>
>> 2. The rings are only using a single device, and it seems that this is
>> just a directory and not a mountpoint with a real device. Therefore data
>> is stored on the root device - even if you have 100TB disk space in the
>> background. If not fixed manually your root device will run out of space
>> eventually.
>>
>> 3. Even if a real disk is mounted in /srv/node, replacing a faulty disk
>> is much more troublesome. Normally you would simply unmount a disk, and
>> then replace the disk sometime later. But because mount_check is set to
>> False in the storage servers data will be written to the root device in
>> the meantime; and when you finally mount the disk again, you can't
>> simply cleanup.
>>
>> 4. In general, it's not possible to change cluster layout (using
>> different zones/regions/partition power/device weight, slowly adding new
>> devices to avoid 25% of the data will be moved immediately when adding
>> new nodes to a small cluster, ...). You could manually manage your
>> rings, but they will be overwritten finally when updating your overcloud.
>>
>> 5. Missing erasure coding support (or storage policies in general)
>>
>> This sounds bad, however most of the current issues can be fixed using
>> customized templates and some tooling to create the rings in advance on
>> the undercloud node.
>>
>> The information about all the devices can be collected from the
>> introspection data, and by using node placement the nodenames in the
>> rings are known in advance if the nodes are not yet powered on. This
>> ensures a consistent ring state, and an operator can modify the rings if
>> needed and to customize the cluster layout.
>>
>> Using some customized templates we can already do the following:
>> - disable rinbguilding on the nodes
>> - create filesystems on the extra blockdevices
>> - copy ringfiles from the undercloud, using pre-built rings
>> - enable mount_check by default
>> - (define storage policies if needed)
>>
>> I started working on a POC using tripleo-quickstart, some custom
>> templates and a small Python tool to build rings based on the
>> introspection data:
>>
>> https://github.com/cschwede/tripleo-swift-ring-tool
>>
>> I'd like to get some feedback on the tool and templates.
>>
>> - Does this make sense to you?
> 
> Yes, I think the basic workflow described should work, and it's good to see
> that you're passing the ring data via swift as this is consistent with how
> we already pass some data to nodes via our DeployArtifacts interface:
> 
> https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/deploy-artifacts.yaml
> 
> Note however that there are no credentials to access the undercloud swift
> on the nodes, so you'll need to pass a tempurl reference in (which is what
> we do for deploy artifacts, obviously you will have credentials to create
> the container & tempurl on the undercloud).

Ah, that's very useful! I updated my POC; makes one less customized
template and less code to support in the Python tool. Works as expected!

> One slight concern I have is mandating the use of predictable placement -
> it'd be nice to think about ways we might avoid that but the undercloud
> centric approach seems OK for a first pass (in either case I think the
> delivery via swift will be the same).

Do you mean the predictable artifact filename? We could just add a
randomized prefix to the filename IMO.

>> - How (and where) could we integrate this upstream?
> 
> So I think the DeployArtefacts interface may work for this, and we have a
> helper script that can upload data to swift:
> 
> https://github.com/openstack/tripleo-common/blob/master/scripts/upload-swift-artifacts
> 
> This basically pushes a tarball to swift, creates a tempurl, then creates a
> file ($HOME/.tripleo/environments/deployment-artifacts.yaml) which is
> automatically read by tripleoclient on deployment.
> 
> DeployArtifactURLs is already a list, but we'll need to test and confirm we
> can pass both e.g swift ring data and updated puppet modules at the same
> time.

If I see this correct the artifacts are deployed just before Puppet
runs; and the Swift rings doesn't affect the Puppet modules, so that
should be fine? At least it's working in my tests this morning.

> The part that actually builds the rings on the undercloud will probably
> need to be created as a custom mistral action:
> 
> https://github.com/openstack/tripleo-common/tree/master/tripleo_common/actions
> 
> These are then driven as part of the deployment workflow (although the
> final workflow where this will wire in hasn't yet landed):
> 
> https://review.openstack.org/#/c/298732/

Alright, I'll have a look how to integrate this.

>> - Templates might be included in tripleo-heat-templates?
> 
> Yes, although by the look of it there may be few template changes required.
> 
> If you want to remove the current ringbuilder puppet step completely, you
> can simply remove OS::TripleO::Services::SwiftRingBuilder from the
> ControllerServices/ObjectStorageServices list:
> 
> https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.yaml#L393
> https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.yaml#L492
> 
> Or, map the current implementation to OS::Heat::None:
> 
> cat no_ringbuild_env.yaml
> 
> resource_registry:
>   OS::TripleO::Services::SwiftRingBuilder: OS::Heat::None

That would only work on current master, right? Setting RingBuild to
False would make it possible to use that on Mitaka too.

> Obviously this same approach could be used to easily map in an alternative
> template (replacing puppet/services/swift-ringbuilder.yaml) but it sounds
> like the primary integration point here will on the undercloud?

Indeed, the action itself should happen on the undercloud.

>> IMO the most important change would be to avoid overwriting rings on the
>> overcloud. There is a good chance to mess up your cluster if the
>> template to disable ring building isn't used and you already have
>> working rings in place. Same for the mount_check option.
>>
>> I'm curious about your thoughts!
> 
> This all sounds pretty good - I'd be pleased if you could raise some bugs
> (either one, or one per logical issue, your choice), and let me know asap
> if this is something you're likely to be trying to land for Newton, clearly
> time is running out and we'll have to prioritize already very overloaded
> reviewer resources but this is clearly an important thing to fix.

I created only a single bug; I think the topics are closely tied to each
other, so IMO it makes sense to have a single reference for them:

https://bugs.launchpad.net/tripleo/+bug/1609421

I would be happy to see land some improvements in the Newton release,
but I fully understand the tight schedule for this. So my idea would be
to submit a few patches to make this easily consumable:

- add a loopback device to the gate nodes to enable testing
- one tht template to partition all blockdevices except root devices
- submit a tripleo-common patch for the ring building script
- submit a tripleo-docs patch how to use this using a customized env
- eventually make the switch to the new workflow

This sounds doable to me? Please let me know if there is a better way to
handle this. Otherwise I start hacking on this :)

-- Christian



More information about the OpenStack-dev mailing list