[openstack-dev] [glance] proposed priorities for Mitaka

Monty Taylor mordred at inaugust.com
Mon Sep 14 17:06:48 UTC 2015


On 09/14/2015 02:41 PM, Flavio Percoco wrote:
> On 14/09/15 08:10 -0400, Doug Hellmann wrote:
>>
>> After having some conversations with folks at the Ops Midcycle a
>> few weeks ago, and observing some of the more recent email threads
>> related to glance, glance-store, the client, and the API, I spent
>> last week contacting a few of you individually to learn more about
>> some of the issues confronting the Glance team. I had some very
>> frank, but I think constructive, conversations with all of you about
>> the issues as you see them. As promised, this is the public email
>> thread to discuss what I found, and to see if we can agree on what
>> the Glance team should be focusing on going into the Mitaka summit
>> and development cycle and how the rest of the community can support
>> you in those efforts.
>>
>> I apologize for the length of this email, but there's a lot to go
>> over. I've identified 2 high priority items that I think are critical
>> for the team to be focusing on starting right away in order to use
>> the upcoming summit time effectively. I will also describe several
>> other issues that need to be addressed but that are less immediately
>> critical. First the high priority items:
>>
>> 1. Resolve the situation preventing the DefCore committee from
>>   including image upload capabilities in the tests used for trademark
>>   and interoperability validation.
>>
>> 2. Follow through on the original commitment of the project to
>>   provide an image API by completing the integration work with
>>   nova and cinder to ensure V2 API adoption.
>
> Hi Doug,
>
> First and foremost, I'd like to thank you for taking the time to dig
> into these issues, and for reaching out to the community seeking for
> information and a better understanding of what the real issues are. I
> can imagine how much time you had to dedicate on this and I'm glad you
> did.

Ditto. Thanks so much for the work Doug!

> Now, to your email, I very much agree with the priorities you
> mentioned above and I'd like for, whomever will win Glance's PTL
> election, to bring focus back on that.
>
> Please, find some comments in-line for each point:
>
>
>>
>> I. DefCore
>>
>> The primary issue that attracted my attention was the fact that
>> DefCore cannot currently include an image upload API in its
>> interoperability test suite, and therefore we do not have a way to
>> ensure interoperability between clouds for users or for trademark
>> use. The DefCore process has been long, and at times confusing,
>> even to those of us following it sort of closely. It's not entirely
>> surprising that some projects haven't been following the whole time,
>> or aren't aware of exactly what the whole thing means. I have
>> proposed a cross-project summit session for the Mitaka summit to
>> address this need for communication more broadly, but I'll try to
>> summarize a bit here.
>
> +1
>
> I think it's quite sad that some projects, especially those considered
> to be part of the `starter-kit:compute`[0], don't follow closely
> what's going on in DefCore. I personally consider this a task PTLs
> should incorporate in their role duties. I'm glad you proposed such
> session, I hope it'll help raising awareness of this effort and it'll
> help moving things forward on that front.
>
>
>>
>> DefCore is using automated tests, combined with business policies,
>> to build a set of criteria for allowing trademark use. One of the
>> goals of that process is to ensure that all OpenStack deployments
>> are interoperable, so that users who write programs that talk to
>> one cloud can use the same program with another cloud easily. This
>> is a *REST API* level of compatibility. We cannot insert cloud-specific
>> behavior into our client libraries, because not all cloud consumers
>> will use those libraries to talk to the services. Similarly, we
>> can't put the logic in the test suite, because that defeats the
>> entire purpose of making the APIs interoperable. For this level of
>> compatibility to work, we need well-defined APIs, with a long support
>> period, that work the same no matter how the cloud is deployed. We
>> need the entire community to support this effort. From what I can
>> tell, that is going to require some changes to the current Glance
>> API to meet the requirements. I'll list those requirements, and I
>> hope we can discuss them to a degree that ensures everyone understands
>> them. I don't want this email thread to get bogged down in
>> implementation details or API designs, though, so let's try to keep
>> the discussion at a somewhat high level, and leave the details for
>> specs and summit discussions. I do hope you will correct any
>> misunderstandings or misconceptions, because unwinding this as an
>> outside observer has been quite a challenge and it's likely I have
>> some details wrong.
>>
>> As I understand it, there are basically two ways to upload an image
>> to glance using the V2 API today. The "POST" API pushes the image's
>> bits through the Glance API server, and the "task" API instructs
>> Glance to download the image separately in the background. At one
>> point apparently there was a bug that caused the results of the two
>> different paths to be incompatible, but I believe that is now fixed.
>> However, the two separate APIs each have different issues that make
>> them unsuitable for DefCore.
>>
>> The DefCore process relies on several factors when designating APIs
>> for compliance. One factor is the technical direction, as communicated
>> by the contributor community -- that's where we tell them things
>> like "we plan to deprecate the Glance V1 API". In addition to the
>> technical direction, DefCore looks at the deployment history of an
>> API. They do not want to require deploying an API if it is not seen
>> as widely usable, and they look for some level of existing adoption
>> by cloud providers and distributors as an indication of that the
>> API is desired and can be successfully used. Because we have multiple
>> upload APIs, the message we're sending on technical direction is
>> weak right now, and so they have focused on deployment considerations
>> to resolve the question.
>
> The task upload process you're referring to is the one that uses the
> `import` task, which allows you to download an image from an external
> source, asynchronously, and import it in Glance. This is the old
> `copy-from` behavior that was moved into a task.
>
> The "fun" thing about this - and I'm sure other folks in the Glance
> community will disagree - is that I don't consider tasks to be a
> public API. That is to say, I would expect tasks to be an internal API
> used by cloud admins to perform some actions (bsaed on its current
> implementation). Eventually, some of these tasks could be triggered
> from the external API but as background operations that are triggered
> by the well-known public ones and not through the task API.
>
> Ultimately, I believe end-users of the cloud simply shouldn't care
> about what tasks are or aren't and more importantly, as you mentioned
> later in the email, tasks make clouds not interoperable. I'd be pissed
> if my public image service would ask me to learn about tasks to be
> able to use the service.
>
> Long story short, I believe the only upload API that should be
> considered is the one that uses HTTP and, eventually, to bring
> compatibility with v1 as far as the copy-from behavior goes, Glance
> could bring back that behavior on top of the task (just dropping this
> here for the sake of discussion and interoperability).

Yes. 1000x yes.

>> The POST API is enabled in many public clouds, but not consistently.
>> In some clouds like HP, a tenant requires special permission to use
>> the API. At least one provider, Rackspace, has disabled the API
>> entirely. This is apparently due to what seems like a fair argument
>> that uploading the bits directly to the API service presents a
>> possible denial of service vector. Without arguing the technical
>> merits of that decision, the fact remains that without a strong
>> consensus from deployers that the POST API should be publicly and
>> consistently available, it does not meet the requirements to be
>> used for DefCore testing.
>
> This is definitely unfortunate. I believe a good step forward for this
> discussion would be to create a list of issues related to uploading
> images and see how those issues can be addressed. The result from that
> work might be that it's not recommended to make that endpoint public
> but again, without going through the issues, it'll be hard to
> understand how we can improve this situation. I expect most of this
> issues to have a security impact.
>
>
>> The task API is also not widely deployed, so its adoption for DefCore
>> is problematic. If we provide a clear technical direction that this
>> API is preferred, that may overcome the lack of adoption, but the
>> current task API seems to have technical issues that make it
>> fundamentally unsuitable for DefCore consideration. While the task
>> API addresses the problem of a denial of service, and includes
>> useful features such as processing of the image during import, it
>> is not strongly enough defined in its current form to be interoperable.
>> Because it's a generic API, the caller must know how to fully
>> construct each task, and know what task types are supported in the
>> first place. There is only one "import" task type supported in the
>> Glance code repository right now, but it is not clear that "import"
>> always uses the same arguments, or interprets them in the same way.
>> For example, the upstream documentation [1] describes a task that
>> appears to use a URL as source, while the Rackspace documentation [2]
>> describes a task that appears to take a swift storage location.
>> I wasn't able to find JSONSchema validation for the "input" blob
>> portion of the task in the code [3], though that may happen down
>> inside the task implementation itself somewhere.
>
>
> The above sounds pretty accurate as there's currently just 1 flow that
> can be triggered (the import flow) and that accepts an input, which is
> a json. As I mentioned above, I don't believe tasks should be part of
> the public API and this is yet another reason why I think so. The
> tasks API is not well defined as there's, currently, not good way to
> define the expected input in a backwards compatible way and to provide
> all the required validation.
>
> I like having tasks in Glance, despite my comments above - but I like
> them for cloud usage and not public usage.

I like them much more if they're not public facing. They're not BAD - 
they just don't have an end-user semantic.

> As far as Rackspace's docs/endpoint goes, I'd assume this is an error
> in their documetation since Glance currently doesn't allow[0] for
> swift URLs to be imported (not even in juno[1]).
>
> [0]
> http://git.openstack.org/cgit/openstack/glance/tree/glance/common/scripts/utils.py#n84
>
> [1]
> http://git.openstack.org/cgit/openstack/glance/tree/glance/common/scripts/utils.py?h=stable/juno#n83

Nope. You MUST upload the image to swift and then provide a swift 
location. (Infra does this in production, I promise it's the only thing 
that works)

>> Tasks also come from plugins, which may be installed differently
>> based on the deployment. This is an interesting approach to creating
>> API extensions, but isn't discoverable enough to write interoperable
>> tools against. Most of the other projects are starting to move away
>> from supporting API extensions at all because of interoperability
>> concerns they introduce. Deployers should be able to configure their
>> clouds to perform well, but not to behave in fundamentally different
>> ways. Extensions are just that, extensions. We can't rely on them
>> for interoperability testing.
>
> This is, indeed, an interesting interpretation of what tasks are for.
> I'd probably just blame us (Glance team) for not communicating
> properly what tasks are meant to be. I don't believe tasks are a way
> to extend the *public* API and I'd be curious to know if others see it
> that way. I fully agree that just breaks interoperability and as I've
> mentioned a couple of times in this reply already, I don't even think
> tasks should be part of the public API.
>
> But again, very poor job communicating so[0]. Nonetheless, for the
> sake of providing enough information about tasks and sources to read
> from, I'd also like to point out the original blueprint[1], some
> discussions during the havana's summit[2], the wiki page for tasks[3]
> and a patch I just reviewed today (thanks Brian) that introduces docs
> for tasks[4]. These links show already some differences in what tasks
> are.
>
> [0]
> http://git.openstack.org/cgit/openstack/glance/tree/etc/policy.json?h=stable/juno#n28
>
> [1] https://blueprints.launchpad.net/glance/+spec/async-glance-workers
> [2] https://etherpad.openstack.org/p/havana-glance-requirements
> [3] https://wiki.openstack.org/wiki/Glance-tasks-api
> [4] https://review.openstack.org/#/c/220166/
>
>>
>> There is a lot of fuzziness around exactly what is supported for
>> image upload, both in the documentation and in the minds of the
>> developers I've spoken to this week, so I'd like to take a step
>> back and try to work through some clear requirements, and then we
>> can have folks familiar with the code help figure out if we have a
>> real issue, if a minor tweak is needed, or if things are good as
>> they stand today and it's all a misunderstanding.
>>
>> 1. We need a strongly defined and well documented API, with arguments
>>   that do not change based on deployment choices. The behind-the-scenes
>>   behaviors can change, but the arguments provided by the caller
>>   must be the same and the responses must look the same. The
>>   implementation can run as a background task rather than receiving
>>   the full image directly, but the current task API is too vaguely
>>   defined to meet this requirement, and IMO we need an entry point
>>   focused just on uploading or importing an image.
>>
>> 2. Glance cannot require having a Swift deployment. It's not clear
>>   whether this is actually required now, so if it's not then we're
>>   in a good state.
>
> This is definitely not the case. Glance doesn't require any specific
> store to be deployed. It does require at least one other than the http
> one (because it doesn't support write operations).

Awesome.

>> It's fine to provide an optional way to take
>>   advantage of Swift if it is present, but it cannot be a required
>>   component. There are three separate trademark "programs", with
>>   separate policies attached to them. There is an umbrella "Platform"
>>   program that is intended to include all of the TC approved release
>>   projects, such as nova, glance, and swift. However, there is
>>   also a separate "Compute" program that is intended to include
>>   Nova, Glance, and some others but *not* Swift. This is an important
>>   distinction, because there are many use cases both for distributors
>>   and public cloud providers that do not incorporate Swift for a
>>   variety of reasons. So, we can't have Glance's primary configuration
>>   require Swift and we need to provide tests for the DefCore team
>>   that run without Swift. Duplicate tests that do use Swift are
>>   fine, and might be used for "Platform" compliance tests.
>>
>> 3. We need an integration test suite in tempest that fully exercises
>>   the public image API by talking directly to Glance. This applies
>>   to the entire API, not just image uploads. It's fine to have
>>   duplicate tests using the proxy in Nova if the Nova team wants
>>   those, but DefCore should be using tests that talk directly to
>>   the service that owns each feature, without relying on any
>>   proxying. We've already missed the chance to deal with this in
>>   the current DefCore definition, which uses image-related tests
>>   that talk to the Nova proxy [4][5], so we'll have to maintain
>>   the proxy for the required deprecation period. But we won't be
>>   able to consider removing that proxy until we provide alternate
>>   tests for those features that speak directly to Glance. We may
>>   have some coverage already, but I wasn't able to find a task-based
>>   image upload test and there is no "image create" mentioned in
>>   the current draft of capabilities being reviewed [6]. There may
>>   be others missing, so someone more familiar with the feature set
>>   of Glance should do an audit and document what tests are needed
>>   so the work can be split up.
>>
>
> +1 This should become one of the top priorities for Mitaka (as you
> mentioned at the beginning of this email).

++

>> 4. Once identified and incorporated into the DefCore capabilities
>>   set, the selected API needs to remain stable for an extended
>>   period of time and follow the deprecation timelines defined by
>>   DefCore.  That has implications for the V3 API currently in
>>   development to turn Glance into a more generic artifacts service.
>>   There are a lot of ways to handle those implications, and no
>>   choice needs to be made today, so I only mention it to make sure
>>   it's clear that (a) we must get V2 into shape for DefCore and
>>   (b) when that happens, we will need to maintain V2 even if V3
>>   is finished. We won't be able to deprecate V2 quickly.
>>
>> Now, it's entirely possible that we can meet all of those requirements
>> today, and that would be great. If that's the case, then the problem
>> is just one of clear communication and documentation. I think there's
>> probably more work to be done than that, though.
>
>
> There's clearly a communication problem. The fact that this very email
> has been sent out is a sign of that. However, I'd like to say, in a
> very optimistic way, that Glance is not so far away from the expecte
> status. There are things to fix, other things to clarify, tons to
> discuss but, IMHO, besides the tempests tests and DefCore, the most
> critical one is the one you mentioned in the following section.
>
>>
>> [1] http://developer.openstack.org/api-ref-image-v2.html#os-tasks-v2
>> [2]
>> http://docs.rackspace.com/images/api/v2/ci-devguide/content/POST_importImage_tasks_Image_Task_Calls.html#d6e4193
>>
>> [3]
>> http://git.openstack.org/cgit/openstack/glance/tree/glance/api/v2/tasks.py
>>
>> [4] http://git.openstack.org/cgit/openstack/defcore/tree/2015.05.json#n70
>> [5]
>> http://git.openstack.org/cgit/openstack/defcore/tree/doc/source/guidelines/2015.07.rst
>>
>> [6] https://review.openstack.org/#/c/213353/
>>
>> II. Complete Cinder and Nova V2 Adoption
>>
>> The Glance team originally committed to providing an Image Service
>> API. Besides our end users, both Cinder and Nova consume that API.
>> The shift from V1 to V2 has been a long road. We're far enough
>> along, and the V1 API has enough issues preventing us from using
>> it for DefCore, that we should push ahead and complete the V2
>> adoption. That will let us properly deprecate and drop V1 support,
>> and concentrate on maintaining V2 for the necessary amount of time.
>>
>> There are a few specs for the work needed in Nova, but that work
>> didn't land in Liberty for a variety of reasons. We need resources
>> from both the Glance and Nova teams to work together to get this
>> done as early as possible in Mitaka to ensure that it actually lands
>> this time. We should be able to schedule a joint session at the
>> summit to have the conversation, and we need to take advantage of
>> that opportunity to ensure the details are fully resolved so that
>> everyone understands the plan.
>
> Super important point. I'd like people replying to this email to focus
> on what we can do next and not why this hasn't been done. The later
> will take us down a path that won't be useful at all at it'll just
> waste everyone's time.

++

> That said, I fully agree with the above. Last time we talked, John
> Garbutt and Jay Pipes, from the nova team, raised their hands to help
> out with this effort. From Glance's side, Fei Long Wang and myself
> were working on the implementation. To help moving this forward and to
> follow on the latest plan, which allows this migration to be smoother
> than our original plan, we need folks from Glance to raise their hand.
>
> If I'm not elected PTL, I'm more than happy to help out here but we
> need someone that can commit to the above right now and we'll likely
> need a team of at least 2 people to help moving this forward in early
> Mitaka.
>
>
>> The work in Cinder is more complete, but may need to be reviewed
>> to ensure that it is using the API correctly, safely, and efficiently.
>> Again, this is a joint effort between the Glance and Cinder teams
>> to identify any issues and work out a resolution.
>>
>> Part of this work will also be to audit the Glance API documentation,
>> to ensure it accurately reflects what the APIs expect to receive
>> and return. There are reportedly at least a few cases where things
>> are out of sync right now. This will require some coordination with
>> the Documentation team.
>>
>>
>> Those are the two big priorities I see, based on things the rest
>> of the community needs from the team and existing commitments that
>> have been made. There are some other things that should also be
>> addressed.
>>
>>
>> III. Security audits & bug fixes
>>
>> Five of 18 recent security reports were related to Glance [7]. It's
>> not surprising, given recent resource constraints, that addressing
>> these has been a challenge. Still, these should be given high
>> priority.
>>
>> [7]
>> https://security.openstack.org/search.html?q=glance&check_keywords=yes&area=default
>>
>
>
> +1 FWIW, we're in the process of growing Glance's security team. But
> it's clear from the above that there needs to be quicker replies to
> security issues.
>
>> IV. Sorting out the glance-store question
>>
>> This was perhaps the most confusing thing I learned about this week.
>> The perception outside of the Glance team is that the library is
>> meant to be used by Nova and Cinder to communicate directly with
>> the image store, bypassing the REST API, to improve performance in
>> several cases. I know the Cinder team is especially interested in
>> some sort of interface for manipulating images inside the storage
>> system without having to download them to make copies (for RBD and
>> other systems that support CoW natively).
>
> Correct, the above was one of the triggerers for this effort and I
> like to think it's still one of the main drivers. There are other
> fancier things that could be done in the future assuming the
> librarie's API is refactored in a way that such features can be
> implemented.[0]
>
> [0] https://review.openstack.org/#/c/188050/
>
>> That doesn't seem to be
>> what the library is actually good for, though, since most of the
>> Glance core folks I talked to thought it was really a caching layer.
>> This discrepancy in what folks wanted vs. what they got may explain
>> some of the heated discussions in other email threads.
>
> It's strange that some folks think of it as a caching layer. I believe
> one of the reasons there's such discrepancy is because not enough
> effort has been put in the refactor this library requires. The reason
> this library requires such a refactor is that it came out from the old
> `glance/store` code which was very specific to Glance's internal use.
>
> The mistake here could be that the library should've been refactored
> *before* adopting it in Glance.
>
>>
>> Frankly, given the importance of the other issues, I recommend
>> leaving glance-store standalone this cycle. Unless the work for
>> dealing with priorities I and II is made *significantly* easier by
>> not having a library, the time and energy it will take to re-integrate
>> it with the Glance service seems like a waste of limited resources.
>> The time to even discuss it may be better spent on the planning
>> work needed. That said, if the library doesn't provide the features
>> its users were expecting, it may be better to fold it back in and
>> create a different library with a better understanding of the
>> requirements at some point. The path to take is up to the Glance
>> team, of course, but we're already down far enough on the priority
>> list that I think we'll be lucky to finish the preceding items this
>> cycle.
>
>
> I don't think merging glance-store back into Glance will help with any
> of the priorities mentioned in this thread. If anything, refactoring
> the API might help with future work that could come after the v1 -> v2
> migration is complete.
>
>>
>>
>> Those are the development priorities I was able to identify in my
>> interviews this week, and there is one last thing the team needs
>> to do this cycle: Recruit more contributors.
>>
>> Almost every current core contributor I spoke with this week indicated
>> that their time was split between another project and Glance. Often
>> higher priority had to be given, understandibly, to internal product
>> work. That's the reality we work in, and everyone feels the same
>> pressures to some degree. One way to address that pressure is to
>> bring in help. So, we need a recruiting drive to find folks willing
>> to contribute code and reviews to the project to keep the team
>> healthy. I listed this item last because if you've made it this far
>> you should see just how much work the team has ahead. We're a big
>> community, and I'm confident that we'll be able to find help for
>> the Glance team, but it will require mentoring and education to
>> bring people up to speed to make them productive.
>
> Fully agree here as well. However, I also believe that the fact that
> some efforts have gone to the wrong tasks has taken Glance to the
> situation it is today. More help is welcomed and required but a good
> strategy is more important right now.
>
> FWIW, I agree that our focus has gone to different thing and this has
> taken us to the status you mentioned above. More importantly, it's
> postponed some important tasks. However, I don't believe Glance is
> completely broken - I know you are not saying this but I'd like to
> mention it - and I certainly believe we can bring it back to a good
> state faster than expecte, but I'm known for being a bit optimistic
> sometimes.
>
> In this reply I was hard on us (Glance team), because I tend to be
> hard on myself and to dig deep into the things that are not working
> well. Many times I do this based on the feedback provided by others,
> which I personally value **a lot**. Unfortunately, I have to say that
> there hasn't been enough feedback about these issues until now. There
> was Mike's email[0] where I explicitly asked the community to speak
> up. This is to say that I appreciate the time you've taken to dig into
> this a lot and to encourage folks to *always* speak up and reach out
> through every *public* medium possible..
>
> No one can fix rumors, we can fix issues, though.
>
> Thanks again and lets all work together to improve this situation,
> Flavio
>
> [0]
> http://lists.openstack.org/pipermail/openstack-dev/2015-August/071971.html
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>




More information about the OpenStack-dev mailing list