Open Stack

Fri Sep 2 06:47:15 UTC 2016

Hi All,

We are deploying Public Cloud platform based on OpensStack in EU, we are
now working on DefCore certificate for our public cloud platform and we
meet some problems;

OpenStack(Nova) supports both "boot from Image" and "boot from Volume" when
launching instances; When we talk about large scale commercial deployments
such as Public Cloud, the reliability of the service is been considered as
the key factor;

When we use "boot from Image" we can have two kinds of deployments: 1.
Nova-compute with no shared storage backend; 2. Nova-compute with shared
storage backend. As for case 1, the system disk created from the image will
be created on the local disk of the host that nova-compute is on, and
the reliability of the userdata is considered low and it will be very hard
to manage this large amount of disks from different hosts all over the
deployment, thus it can be considered not commercially ready for large
scale deployments. As for case 2, the problem of reliability and manage can
be solved, but new problems are introduced - the resource usage and
capacity amounts tracking being incorrect, this has been an known issue[1]
in Nova for a long time and the Nova team is trying to solve the problem by
introducing a new "resource provider" architecture [2], this new
architecture will need few releases to be fully functional, thus case 2 is
also considered to be not commercially ready.

For the reasons I listed above, we have chosen to use "boot from Volume" to
be the only way of booting instance in our Public Cloud, by doing this, we
can overcome the above mentioned cons and get other benefits such as:

Resiliency - Cloud Block Storage is a persistent volume, users can retain
it after the server is deleted. Users can then use the volume to create a
new server.
Flexibility - User can have control over the size and type (SSD or SATA) of
volume that used to boot the server. This control enables users to
fine-tune the storage to the needs of your operating system or application.
Improvements in managing and recovering from server outages
Unified volume management

Only support "boot from Volume" brings us problems when pursuing the
DefCore certificate:

we have tests that trying to get instance list filtered by "image_id" which
is None for volume booted instances:

tempest.api.compute.servers.test_create_server.ServersTestJSON.test_verify_server_details
tempest.api.compute.servers.test_create_server.ServersTestManualDisk.test_verify_server_details
tempest.api.compute.servers.test_list_server_filters.ListServerFiltersTestJSON.test_list_servers_detailed_filter_by_image
tempest.api.compute.servers.test_list_server_filters.ListServerFiltersTestJSON.test_list_servers_filter_by_image

    - The detailed information for instances booted from volumes does not
contain informations about image_id, thus the test cases filter instance by
image id cannot pass.

we also have tests like this:

tempest.api.compute.images.test_images.ImagesTestJSON.test_delete_saving_image

    - This test tests creating an image for an instance, and delete the
created instance snapshot during the image status of “saving”. As for
instances booted from images, the snapshot status flow will be:
queued->saving->active. But for instances booted from volumes, the action
of instance snapshotting is actually an volume snapshot action done by
cinder, the image saved in glance will only have the link to the created
cinder volume snapshot, and the image status will be directly change to
“active”, as the logic in this test will wait for the image status in
glance change to “saving”, so it cannot pass for volume booted instances.

Also:

test_attach_volume.AttachVolumeTestJSON.test_list_get_volume_attachments

    - This test attaches one volume to an instance and then counts the
number of attachments for that instance, the expected count was hardcoded
to be 1. As for volume booted instances, the system disk is already an
attachment, so the actual count of attachment will be 2, and the test fails.

And finally:

tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_rebuild_server
 tempest.api.compute.servers.test_servers_negative.ServersNegativeTestJSON.test_rebuild_deleted_server
tempest.api.compute.servers.test_servers_negative.ServersNegativeTestJSON.test_rebuild_non_existent_server

    - Rebuilding action is not supported when the instance is created via
volume.

All those tests mentioned above are not friendly to "boot from Volume"
instances, we hope we can have some workarounds about the above mentioned
tests, as the problem that is having with "boot from Image" is really
stopping us using it and it will also be good for DefCore if we can figure
out how to deal with this two types of instance creation.

References:
[1] Bugs related to resource usage reporting and calculation:

* Hypervisor summary shows incorrect total storage (Ceph)
  https://bugs.launchpad.net/nova/+bug/1387812
* rbd backend reports wrong 'local_gb_used' for compute node
  https://bugs.launchpad.net/nova/+bug/1493760
* nova hypervisor-stats shows wrong disk usage with shared storage
  https://bugs.launchpad.net/nova/+bug/1414432
* report disk consumption incorrect in nova-compute
  https://bugs.launchpad.net/nova/+bug/1315988
* VMWare: available disk spaces(hypervisor-list) only based on a single
  datastore instead of all available datastores from cluster
  https://bugs.launchpad.net/nova/+bug/1347039

[2] BP about solving resource usage reporting and calculation with a
generic resource pool (resource provider):

https://git.openstack.org/cgit/openstack/nova-specs/tree/specs/newton/approved/generic-resource-pools.rst

Thanks,

Kevin Zheng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/defcore-committee/attachments/20160902/987e0c4a/attachment.html>

Open Stack

[OpenStack-DefCore] Getting DefCore certificate for clouds that instances can only boot from volume

OpenStack

Community

Documentation

Branding & Legal