[OpenStack-DefCore] Getting DefCore certificate for clouds that instances can only boot from volume

Zhipeng Huang zhipengh512 at gmail.com
Thu Sep 15 03:45:43 UTC 2016


Thanks for Chirs and Monty's great conversation at the issue.

As you both pointed out, one of the problem is that we are lacking good
enough abstractions for the backends.

What we could do to solve this problem ? Should we consider to push a
defcore workaround to handle the short term problem, and discuss this with
the Glance team on Barcelona summit for the long term fix ?

On Wed, Sep 14, 2016 at 8:44 PM, Monty Taylor <mordred at inaugust.com> wrote:

> On 09/14/2016 12:30 AM, Chris Hoge wrote:
> > I’ve been thinking quite a bit about your response Monty, and have
> > some observations and suggestions below.
> >
> >> On Sep 2, 2016, at 8:30 AM, Monty Taylor <mordred at inaugust.com> wrote:
> >>
> >> On 09/02/2016 01:47 AM, Zhenyu Zheng wrote:
> >>> Hi All,
> >>>
> >>> We are deploying Public Cloud platform based on OpensStack in EU, we
> are
> >>> now working on DefCore certificate for our public cloud platform and we
> >>> meet some problems;
> >>>
> >>> OpenStack(Nova) supports both "boot from Image" and "boot from Volume"
> >>> when launching instances; When we talk about large scale commercial
> >>> deployments such as Public Cloud, the reliability of the service is
> been
> >>> considered as the key factor;
> >>>
> >>> When we use "boot from Image" we can have two kinds of deployments: 1.
> >>> Nova-compute with no shared storage backend; 2. Nova-compute with
> shared
> >>> storage backend. As for case 1, the system disk created from the image
> >>> will be created on the local disk of the host that nova-compute is on,
> >>> and the reliability of the userdata is considered low and it will be
> >>> very hard to manage this large amount of disks from different hosts all
> >>> over the deployment, thus it can be considered not commercially ready
> >>> for large scale deployments. As for case 2, the problem of reliability
> >>> and manage can be solved, but new problems are introduced - the
> resource
> >>> usage and capacity amounts tracking being incorrect, this has been an
> >>> known issue[1] in Nova for a long time and the Nova team is trying to
> >>> solve the problem by introducing a new "resource provider" architecture
> >>> [2], this new architecture will need few releases to be fully
> >>> functional, thus case 2 is also considered to be not commercially
> ready.
> >>>
> >>> For the reasons I listed above, we have chosen to use "boot from
> Volume"
> >>> to be the only way of booting instance in our Public Cloud, by doing
> >>> this, we can overcome the above mentioned cons and get other benefits
> >>> such as:
> >>>
> >>> Resiliency - Cloud Block Storage is a persistent volume, users can
> >>> retain it after the server is deleted. Users can then use the volume to
> >>> create a new server.
> >>> Flexibility - User can have control over the size and type (SSD or
> SATA)
> >>> of volume that used to boot the server. This control enables users to
> >>> fine-tune the storage to the needs of your operating system or
> application.
> >>> Improvements in managing and recovering from server outages
> >>> Unified volume management
> >>>
> >>> Only support "boot from Volume" brings us problems when pursuing the
> >>> DefCore certificate:
> >>>
> >>> we have tests that trying to get instance list filtered by "image_id"
> >>> which is None for volume booted instances:
> >>>
> >>> tempest.api.compute.servers.test_create_server.
> ServersTestJSON.test_verify_server_details
> >>> tempest.api.compute.servers.test_create_server.
> ServersTestManualDisk.test_verify_server_details
> >>> tempest.api.compute.servers.test_list_server_filters.
> ListServerFiltersTestJSON.test_list_servers_detailed_filter_by_image
> >>> tempest.api.compute.servers.test_list_server_filters.
> ListServerFiltersTestJSON.test_list_servers_filter_by_image
> >>>
> >>>    - The detailed information for instances booted from volumes does
> >>> not contain informations about image_id, thus the test cases filter
> >>> instance by image id cannot pass.
> >>>
> >>>
> >>> we also have tests like this:
> >>>
> >>> tempest.api.compute.images.test_images.ImagesTestJSON.
> test_delete_saving_image
> >>>
> >>>    - This test tests creating an image for an instance, and delete the
> >>> created instance snapshot during the image status of “saving”. As for
> >>> instances booted from images, the snapshot status flow will be:
> >>> queued->saving->active. But for instances booted from volumes, the
> >>> action of instance snapshotting is actually an volume snapshot action
> >>> done by cinder, the image saved in glance will only have the link to
> the
> >>> created cinder volume snapshot, and the image status will be directly
> >>> change to “active”, as the logic in this test will wait for the image
> >>> status in glance change to “saving”, so it cannot pass for volume
> booted
> >>> instances.
> >>>
> >>>
> >>> Also:
> >>>
> >>> test_attach_volume.AttachVolumeTestJSON.test_
> list_get_volume_attachments
> >>>
> >>>    - This test attaches one volume to an instance and then counts the
> >>> number of attachments for that instance, the expected count was
> >>> hardcoded to be 1. As for volume booted instances, the system disk is
> >>> already an attachment, so the actual count of attachment will be 2, and
> >>> the test fails.
> >>>
> >>> And finally:
> >>>
> >>> tempest.api.compute.servers.test_server_actions.
> ServerActionsTestJSON.test_rebuild_server
> >>> tempest.api.compute.servers.test_servers_negative.
> ServersNegativeTestJSON.test_rebuild_deleted_server
> >>> tempest.api.compute.servers.test_servers_negative.
> ServersNegativeTestJSON.test_rebuild_non_existent_server
> >>>
> >>>    - Rebuilding action is not supported when the instance is created
> >>> via volume.
> >>>
> >>>
> >>> All those tests mentioned above are not friendly to "boot from Volume"
> >>> instances, we hope we can have some workarounds about the above
> >>> mentioned tests, as the problem that is having with "boot from Image"
> is
> >>> really stopping us using it and it will also be good for DefCore if we
> >>> can figure out how to deal with this two types of instance creation.
> >>>
> >>>
> >>> References:
> >>> [1] Bugs related to resource usage reporting and calculation:
> >>>
> >>> * Hypervisor summary shows incorrect total storage (Ceph)
> >>>  https://bugs.launchpad.net/nova/+bug/1387812
> >>> * rbd backend reports wrong 'local_gb_used' for compute node
> >>>  https://bugs.launchpad.net/nova/+bug/1493760
> >>> * nova hypervisor-stats shows wrong disk usage with shared storage
> >>>  https://bugs.launchpad.net/nova/+bug/1414432
> >>> * report disk consumption incorrect in nova-compute
> >>>  https://bugs.launchpad.net/nova/+bug/1315988
> >>> * VMWare: available disk spaces(hypervisor-list) only based on a single
> >>>  datastore instead of all available datastores from cluster
> >>>  https://bugs.launchpad.net/nova/+bug/1347039
> >>>
> >>> [2] BP about solving resource usage reporting and calculation with a
> >>> generic resource pool (resource provider):
> >>>
> >>> https://git.openstack.org/cgit/openstack/nova-specs/
> tree/specs/newton/approved/generic-resource-pools.rst
> >>
> >> I can totally understand why would would value boot from volume. It has
> >> a bunch of great features, as you mention.
> >>
> >> However, running a cloud that disables boot from image is a niche choice
> >> and I do not think that we should allow such a cloud to be considered
> >> "normal". If I were to encounter such a cloud, based on the workloads I
> >> currently run in 10 other public OpenStack clouds, I would consider it
> >> broken - and none of my automation that has been built based on how
> >> OpenStack clouds work consistently would work with that cloud.
> >
> > If I understand their implementation correctly, the boot from volume
> > is an implementation detail of booting from what appears to be a
> > standard image from an API standpoint. We need to differentiate
> > between disallowing a set of APIs and implementing a different
> > backend for a particular API.
>
> Yes, I whole heartedly agree!
>
> >> I do think that we should do whatever we need to to push that boot from
> >> volume is a regular, expected and consistent thing that people who are
> >> using clouds can count on. I do not think that we should accept lack of
> >> boot from image as a valid choice. It does not promote interoperability,
> >> and it removes choice from the end user, which is a Bad Thing.
> >
> > I think a better point of view is if a vendor chooses to use a different
> > backend, we should still expect the user-facing API to behave
> predictably.
> > It seems that we’re facing a leaky abstraction more than we are
> > some decision to not conform to the expected API behavior.
>
> I believe that we are in agreement but that you have stated it better
> than I did.
>
> >> It seems that some analysis has been done to determine that
> >> boot-from-image is somehow not production ready or scalable.
> >>
> >> To counter that, I would like to point out that the OpenStack Infra
> >> team, using resources in Rackspace, OVH, Vexxhost, Internap, BlueBox,
> >> the OpenStack Innovation Center, a private cloud run by the TripleO team
> >> and a private cloud run by the Infra team boot 20k instance per day
> >> using custom images. We upload those custom-made images using Glance
> >> image upload daily. We have over 10 different custom images - each about
> >> 7.7G in size. While we _DO_ have node-launch errors given the number we
> >> launch each day:
> >>
> >> http://grafana.openstack.org/dashboard/db/nodepool?panelId=
> 16&fullscreen
> >>
> >> it's a small number compared to the successful node launches:
> >>
> >> http://grafana.openstack.org/dashboard/db/nodepool?panelId=
> 15&fullscreen
> >>
> >> And we have tracked ZERO of the problems down to anything related to
> >> images. (it's most frequently networking related)
> >>
> >> We _do_ have issues successfully uploading new images to the cloud - but
> >> we also have rather large images since they contain a bunch of cached
> >> data ... and the glance team is working on making the image upload
> >> process more resilient and scalable.
> >>
> >> In summary:
> >>
> >> * Please re-enable boot from image on your cloud if you care about
> >> interoperability and end users
> >
> > -or- fix the unexpected behavior in the interoperable API to account
> > for the implementation details.
>
> ++
>
> >> * Please do not think that after having disabled one of the most common
> >> and fundamental features of the cloud that the group responsible for
> >> ensuring cloud interoperability should change anything to allow your
> >> divergent cloud to be considered interoperable. It is not. It needs to
> >> be fixed.
> >
> > I don’t disagree, but I also think it’s important to work with the
> issues that
> > vendors who deploy OpenStack are facing and try to understand how
> > they fit into the larger ecosystem. Part of what we’re trying to
> accomplish
> > here is build a virtuous circle between upstream developers, downstream
> > deployers, and users.
>
> I absolutely agree. Again, you said words better than I did. If/when
> there is an issue a deployer has, it's important to fix it.
>
> >> If the tests we have right now are only testing boot-from-image as an
> >> implementation happenstance, we should immediately add tests that
> >> EXPLICITLY test for boot-from-image. If we cannot count on that basic
> >> functionality, the we truly will have given up on the entire idea of
> >> interoperable clouds.
> >
> > We can count on the basic black-box functionality at some level. It’s
> > the interaction of the implementation with the rest of the API that’s
> > causing problems.
> >
> > The ‘create from image’ call itself does part of what it’s advertised to
> do,
> > it boots an image. And at first pass (the tests for the actual launching
> > of the image) all looks well. The issue comes later when the user
> > queries the ‘read vms that are images’ api, and it’s clear the
> abstraction
> > loop hasn’t been closed. This is exactly what the interoperability
> > tests are meant to catch, and indicates what needs to be fixed.
>
> Yup
>
> > How it’s fixed is a different story, and I think that we as a community
> > need to be careful about prescribing one solution over another.
>
> Totally.
>
> I should be clear about my POV on this, just for the record.
>
> I by and large speak from the perspective of someone who consumes
> OpenStack APIs across a lot of clouds. One of the things I think is
> great about OpenStack is that in theory I should never need to know the
> implementation details that someone has chosen to make.
>
> A good example of this is cells v1. Rackspace is the only public cloud
> I'm aware of that runs cells v1 ... but I do not know this as a consumer
> of the API. It's completely transparent to me, even though it's an
> implementation choice Rackspace made to deal with scaling issues. It's a
> place where our providers have been able to make the choices that make
> sense and our end-users don't suffer because of it. This is good!
>
> Very related to this thread, one of the places where the abstraction
> breaks down is, in fact, with Images - as I have to, as an end-user,
> know what image format the cloud I'm using has decided to use. All of
> the choices that deployers make around this are valid choices, but we
> haven't done a good enough job in OpenStack to hide this from our users,
> so they suffer.
>
> The above thread about a user issuing boot from image and the backend
> doing boot from volume I think should (and can) be more like cells and
> less like image type. I'm confident that it can be done.
>
> The ceph driver may be a place to look for inspiration. ceph has a
> glance driver, and when you upload an image to glance, glance stores it
> in ceph. BUT - when the nova driver decides to boot from an image when
> the image is stored in ceph, it bypasses all of the normal image
> download/caching code and essentially does a boot from volume behind the
> scenes. (it's a zero-cost COW operation, so booting vms when the ceph
> glance and nova drivers are used is very quick)
>
> Now, those interactions do not result in a volume object that's visible
> through the Cinder api. They're implementation details, so the user
> don't have to know that they are booting from a COW volume in ceph.
>
> I do not know enough details about how this is implemented, but I'd
> imagine that if ceph was able to achieve what sound like the semantics
> that are desired here without introducing API issues, it should be
> totally possible to achieve them here to.
>
> Thank you Chris for being much clearer and much more helpful in what you
> said than what I did.
>
> Monty
>
> _______________________________________________
> Defcore-committee mailing list
> Defcore-committee at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/defcore-committee
>



-- 
Zhipeng (Howard) Huang

Standard Engineer
IT Standard & Patent/IT Prooduct Line
Huawei Technologies Co,. Ltd
Email: huangzhipeng at huawei.com
Office: Huawei Industrial Base, Longgang, Shenzhen

(Previous)
Research Assistant
Mobile Ad-Hoc Network Lab, Calit2
University of California, Irvine
Email: zhipengh at uci.edu
Office: Calit2 Building Room 2402

OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/defcore-committee/attachments/20160915/5d648bb6/attachment-0001.html>


More information about the Defcore-committee mailing list