[openstack-dev] [TripleO] Improving Swift deployments with TripleO

Steven Hardy shardy at redhat.com
Tue Aug 2 21:46:38 UTC 2016


On Tue, Aug 02, 2016 at 09:36:45PM +0200, Christian Schwede wrote:
> Hello everyone,
> 
> I'd like to improve the Swift deployments done by TripleO. There are a
> few problems today when deployed with the current defaults:

Thanks for digging into this, I'm aware this has been something of a
known-issue for some time, so it's great to see it getting addressed :)

Some comments inline;

> 1. Adding new nodes (or replacing existing nodes) is not possible,
> because the rings are built locally on each host and a new node doesn't
> know about the "history" of the rings. Therefore rings might become
> different on the nodes, and that results in an unusable state eventually.
> 
> 2. The rings are only using a single device, and it seems that this is
> just a directory and not a mountpoint with a real device. Therefore data
> is stored on the root device - even if you have 100TB disk space in the
> background. If not fixed manually your root device will run out of space
> eventually.
> 
> 3. Even if a real disk is mounted in /srv/node, replacing a faulty disk
> is much more troublesome. Normally you would simply unmount a disk, and
> then replace the disk sometime later. But because mount_check is set to
> False in the storage servers data will be written to the root device in
> the meantime; and when you finally mount the disk again, you can't
> simply cleanup.
> 
> 4. In general, it's not possible to change cluster layout (using
> different zones/regions/partition power/device weight, slowly adding new
> devices to avoid 25% of the data will be moved immediately when adding
> new nodes to a small cluster, ...). You could manually manage your
> rings, but they will be overwritten finally when updating your overcloud.
> 
> 5. Missing erasure coding support (or storage policies in general)
> 
> This sounds bad, however most of the current issues can be fixed using
> customized templates and some tooling to create the rings in advance on
> the undercloud node.
> 
> The information about all the devices can be collected from the
> introspection data, and by using node placement the nodenames in the
> rings are known in advance if the nodes are not yet powered on. This
> ensures a consistent ring state, and an operator can modify the rings if
> needed and to customize the cluster layout.
> 
> Using some customized templates we can already do the following:
> - disable rinbguilding on the nodes
> - create filesystems on the extra blockdevices
> - copy ringfiles from the undercloud, using pre-built rings
> - enable mount_check by default
> - (define storage policies if needed)
> 
> I started working on a POC using tripleo-quickstart, some custom
> templates and a small Python tool to build rings based on the
> introspection data:
> 
> https://github.com/cschwede/tripleo-swift-ring-tool
> 
> I'd like to get some feedback on the tool and templates.
> 
> - Does this make sense to you?

Yes, I think the basic workflow described should work, and it's good to see
that you're passing the ring data via swift as this is consistent with how
we already pass some data to nodes via our DeployArtifacts interface:

https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/deploy-artifacts.yaml

Note however that there are no credentials to access the undercloud swift
on the nodes, so you'll need to pass a tempurl reference in (which is what
we do for deploy artifacts, obviously you will have credentials to create
the container & tempurl on the undercloud).

One slight concern I have is mandating the use of predictable placement -
it'd be nice to think about ways we might avoid that but the undercloud
centric approach seems OK for a first pass (in either case I think the
delivery via swift will be the same).

> - How (and where) could we integrate this upstream?

So I think the DeployArtefacts interface may work for this, and we have a
helper script that can upload data to swift:

https://github.com/openstack/tripleo-common/blob/master/scripts/upload-swift-artifacts

This basically pushes a tarball to swift, creates a tempurl, then creates a
file ($HOME/.tripleo/environments/deployment-artifacts.yaml) which is
automatically read by tripleoclient on deployment.

DeployArtifactURLs is already a list, but we'll need to test and confirm we
can pass both e.g swift ring data and updated puppet modules at the same
time.

The part that actually builds the rings on the undercloud will probably
need to be created as a custom mistral action:

https://github.com/openstack/tripleo-common/tree/master/tripleo_common/actions

These are then driven as part of the deployment workflow (although the
final workflow where this will wire in hasn't yet landed):

https://review.openstack.org/#/c/298732/

> - Templates might be included in tripleo-heat-templates?

Yes, although by the look of it there may be few template changes required.

If you want to remove the current ringbuilder puppet step completely, you
can simply remove OS::TripleO::Services::SwiftRingBuilder from the
ControllerServices/ObjectStorageServices list:

https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.yaml#L393

https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.yaml#L492

Or, map the current implementation to OS::Heat::None:

cat no_ringbuild_env.yaml

resource_registry:
  OS::TripleO::Services::SwiftRingBuilder: OS::Heat::None

Obviously this same approach could be used to easily map in an alternative
template (replacing puppet/services/swift-ringbuilder.yaml) but it sounds
like the primary integration point here will on the undercloud?

> IMO the most important change would be to avoid overwriting rings on the
> overcloud. There is a good chance to mess up your cluster if the
> template to disable ring building isn't used and you already have
> working rings in place. Same for the mount_check option.
> 
> I'm curious about your thoughts!

This all sounds pretty good - I'd be pleased if you could raise some bugs
(either one, or one per logical issue, your choice), and let me know asap
if this is something you're likely to be trying to land for Newton, clearly
time is running out and we'll have to prioritize already very overloaded
reviewer resources but this is clearly an important thing to fix.

Thanks!

Steve



More information about the OpenStack-dev mailing list