[openstack-dev] [Glance] Summit Session Summaries
gongysh at unitedstack.com
Sat Nov 16 00:29:19 UTC 2013
On Sat, Nov 16, 2013 at 5:10 AM, Mark Washenberger <
mark.washenberger at markwash.net> wrote:
> Hi folks,
> My summary notes from the OpenStack Design Summit Glance sessions follow.
> Enjoy, and please help correct any misunderstandings.
> Image State Consistency:
> In this session, we focused on the problem of snapshots that fail
> after the image is created but before the image data is uploaded
> result in a pending image that will never become active, and the
> only operation nova can do is to delete the image. Thus there is
> not a very good way to communicate the failure to users without
> just leaving a useless image record around.
> A solution was proposed to allow Nova to directly set the status
> of the image, say to "killed" or some other state.
> A problem with the proposed solution is that we generally have
> kept the "status" field internally controlled by glance, which
> means there are some modeling and authorization concerns.
> However, it is actually something Nova could do today through
> the hacky mechanism of initiating a PUT with data, but then
> terminating the connection without sending a complete body. So
> the authorization aspects are not really a fundamental concern.
> It was suggested that the solution to this problem
> is to make Nova responsible for reporting these failures rather
> than Glance. In the short term, we could do the following
> - have nova delete the image when snapshot fails (already merged)
> - merge nova patch to report the failure as part of instance
> error reporting
> In the longer term, it was seen as desirable for nova to treat
> snapshots as asynchronous tasks and reflect those tasks in the
> api, including the failure/success of those tasks.
> Another long term option that was viewed mostly favorably was
> to add another asynchronous task to glance for vanilla uploads
> so that nova snapshots can avoid creating the image until it
> is fully active.
> Fei Long Wang is going to follow up on what approach makes the
> most sense for Nova and report back for our next steps.
> What to do about v1?
> In this discussion, we hammered out the details for how to drop
> the v1 api and in what timetable.
> Leaning heavily on cinder's experience dropping v1, we came
> up with the following schedule.
> - Announce plan to deprecate the V1 API and registry in J and remove
> it in K
> - Announce feature freeze for v1 API immediately
> - Make sure everything in OpenStack is using v2 (cinder, nova, ?)
> - Ensure v2 is being fully covered in tempest tests
> - Ensure there are no gaps in the migration strategy from v1 to v2
> - after the fact, it seems to me we need to produce a migration
> guide as a way to evaluate the presence of such gaps
> - Make v2 the default in glanceclient
> - Turn v2 on by default in glance API
> - Mark v1 as deprecated
> - Turn v1 off by default in config
> - Delete v1 api and v1 registry
> A few gotchas were identified, in particular, a concern was raised
> about breaking stable branch testing when we switch the default in
> glanceclient to v2--since latest glanceclient will be used to test
> glance in say Folsom or Grizzly where the v2 api didn't really
> work at all.
> In addition, it was suggested that we should be very aggressive
> in using deprecation warnings for config options to communicate
> this change as loudly as possible.
> Image Sharing
> This session focused on the gaps between the current image sharing
> functionality and what is needed to establish an image marketplace.
> One issue was the lack of verification of project ids when sharing an
> A few other issues were identified:
> - there is no way to share an image with a large number of projects in a
> single api operation
> - membership lists are not currently paged
> - there is no way to share an image with everyone, you must know each
> other project id
> We identified a potential issue with bulk operations and
> verification--namely there is no way to do bulk verification of project ids
> in keystone that we know of, so probably keystone work would be needed to
> have both of these features in place without implying super slow api calls.
> In addition, we spent some time toying with the idea of image catalogs. If
> publishers put images in catalogs, rather than having shared images show up
> directly in other users' image lists, things would be a lot safer and we
> could relax some of our restrictions. However, there are some issues with
> this approach as well,
> - How do you find the catalog of a trusted image publisher?
> - Are we just pushing the issue of sensible world-listings to another
> - This would be a big change.
> Enhancing Image Locations:
> This session proposed adding several attributes to image locations
> 1. Add 'status' to each location.
> I think consensus was that this approach makes sense moving forward. In
> particular, it would be nice to have a 'pending-delete' status for image
> locations, so that when you delete a single location from an image it can
> be picked up properly by the glance scrubber.
> There was some concern about how we define the overall image status if we
> allow other statuses on locations. Is image status just stored
> independently of image locations statuses? Or is it newly defined as a
> function of those image locations statuses?
> 2. Allow disk_format, container_format, and checksum to vary per location.
> The usecase here is that if you have a multi-hypervisor cloud, where
> different formats are needed, the client can automatically select the
> correct format when it downloads an image.
> This idea was initially met with some skepticism because we have a strong
> view that an image is immutable once it is created, and the checksum is a
> big part of how we enforce that.
> However it was correctly pointed out that the immutability we care about
> is actually a property of the block device that each image format
> represents. But for the moment we were unsure how to enforce that block
> device immutability save keeping the checksum and image formats the same.
> 3. Add metrics to each image location.
> The essential idea here is to track the performance metrics of each image
> location to ensure we choose the fastest location. These metrics would not
> be revealed as part of the API.
> I think most of us were initially a bit confused by this suggestion.
> However, after talking with Zhi Yan after the session, I think it makes
> sense to support this in a local sense rather than storing such information
> in the database. Locality is critical because different glance nodes likely
> have different relationships to the underlying locations in terms of
> network distance, so each node should be gearing towards what is best for
> We can also probably reuse a local metrics tracking library to enable
> similar optimizations in a future incarnation of the glance client.
> Images and Taskflow
> In this session we discussed both the general layout of taskflow the
> strategy for porting the current image tasks under development to use
> taskflow, and came up with the following basic outline.
> Short Term:
> As we add more and more complexity to the import task, we can try to
> compose the work as a flow of tasks. With this set up, our local,
> eventlet-backed executor (glance task execution engine) could be just a
> thin wrapper around a local taskflow engine.
> Medium Term:
> At some point pretty early on we are going to want to have glance tasks
> running on distributed worker processes, mostly likely having the tasks
> triggered by rpc. At this point, we can copy the existing approach in
> cinder c.a. Havana
> Longer Term:
> When taskflow engines support distributing tasks across different workers,
> we can fall back to having a local task engine that is distributing tasks
> using that engine.
> During the discussion a few concerns were discussed about working with
> - tasks have to be structured in the right way to make restart, recovery,
> and rollback work
> - in other words, if we don't think about this carefully, we'll likely
> screw things up
> - it remains difficult to determine if a task has stalled or failed
> - we are not sure how to restart a failed task at this point
> Some of these concerns may already be being addressed in the library.
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev