[openstack-dev] [TripleO] [Ironic] [Cinder] "Baremetal volumes" -- how to model direct attached storage

Robert Collins robertc at robertcollins.net
Thu Nov 13 09:25:55 UTC 2014


Back in the day before the ephemeral hack (though that was something
folk have said they would like for libvirt too - so its not such a
hack per-se) this was (broadly) sketched out. We spoke with the cinder
PTL at the time in portland, from memory.

There was no spec, so here is my brain-dumpy-recollection...

- actual volumes are a poor match because we wouldn't be running
cinder-volume on an ongoing basis and service records would accumulate
etc.
- we'd need cross-service scheduler support to make cinder operations
line up with allocated bare metal nodes (and to e.g. make sure both
our data volume and golden image volume are scheduled to the same
machine).

- folk want to be able to do fairly arbitrary RAID(& JBOD) setups and
that affects scheduling as well, one way to work it is to have Ironic
export capabilities and specify actual RAID setups via matching
flavors - this is the direction the ephemeral work took us, and is
conceptually straight forwardly extended to RAID. We did talk about
doing a little JSON schema to describe RAID / volume layouts, which
cinder could potentially use for user defined volume flavors too.

One thing I think that is missing from your description is in this: "

To be clear, in TripleO, we need a way to keep the data on a local
direct attached storage device while deploying a new image to the box."

I think we need to be able to do this with a single drive shared
between image and data - doing one disk image, one disk data would add
substantial waste given the size of disks these days (and for some
form factors like moonshot it would rule out using them at all).

Of course, being able to do entirely network stored golden images
might be something some deployments want, but we can't require them
all to do that ;)

-Rob



On 13 November 2014 11:30, Clint Byrum <clint at fewbar.com> wrote:
> Each summit since we created "preserve ephemeral" mode in Nova, I have
> some conversations where at least one person's brain breaks for a
> second. There isn't always alcohol involved before, there almost
> certainly is always a drink needed after. The very term is vexing, and I
> think we have done ourselves a disservice to have it, even if it was the
> best option at the time.
>
> To be clear, in TripleO, we need a way to keep the data on a local
> direct attached storage device while deploying a new image to the box.
> If we were on VMs, we'd attach volumes, and just deploy new VMs and move
> the volume over. If we had a SAN, we'd just move the LUN's. But at some
> point when you deploy a cloud you're holding data that is expensive to
> replicate all at once, and so you'd rather just keep using the same
> server instead of trying to move the data.
>
> Since we don't have baremetal Cinder, we had to come up with a way to
> do this, so we used Nova rebuild, and slipped it a special command that
> said "don't overwrite the partition you'd normally make the 'ephemeral'"
> partition. This works fine, but it is confusing and limiting. We'd like
> something better.
>
> I had an interesting discussion with Devananda in which he suggested an
> alternative approach. If we were to bring up cinder-volume on our deploy
> ramdisks, and configure it in such a way that it claimed ownership of
> the section of disk we'd like to preserve, then we could allocate that
> storage as a volume. From there, we could boot from volume, or "attach"
> the volume to the instance (which would really just tell us how to find
> the volume). When we want to write a new image, we can just delete the old
> instance and create a new one, scheduled to wherever that volume already
> is. This would require the nova scheduler to have a filter available
> where we could select a host by the volumes it has, so we can make sure to
> send the instance request back to the box that still has all of the data.
>
> Alternatively we can keep on using rebuild, but let the volume model the
> preservation rather than our special case.
>
> Thoughts? Suggestions? I feel like this might take some time, but it is
> necessary to consider it now so we can drive any work we need to get it
> done soon.
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list