[openstack-dev] [nova] ops meetup feedback
Anita Kuno
anteaya at anteaya.info
Tue Sep 20 13:46:18 UTC 2016
On 16-09-20 09:20 AM, Sean Dague wrote:
> This is a bit delayed due to the release rush, finally getting back to
> writing up my experiences at the Ops Meetup.
>
> Nova Feedback Session
> =====================
>
> We had a double session for Feedback for Nova from Operators, raw
> etherpad here - https://etherpad.openstack.org/p/NYC-ops-Nova.
>
> The median release people were on in the room was Kilo. Some were
> upgrading to Liberty, many had older than Kilo clouds. Remembering
> these are the larger ops environments that are engaged enough with the
> community to send people to the Ops Meetup.
>
>
> Performance Bottlenecks
> -----------------------
>
> * scheduling issues with Ironic - (this is a bug we got through during
> the week after the session)
> * live snapshots actually end up performance issue for people
>
> The workarounds config group was not well known, and everyone in the
> room wished we advertised that a bit more. The solution for snapshot
> performance is in there.
>
> There were also general questions about what scale cells should be
> considered at.
>
> ACTION: we should make sure workarounds are advertised better
> ACTION: we should have some document about "when cells"?
>
> Networking
> ----------
>
> A number of folks in the room were still on Nova Net, and were a bit
> nervous about it going away. As they are Kilo / Liberty it's still a
> few upgrades before they get there, but that nervousness and concern
> was definitely there.
>
> Policy
> ------
>
> How are you customizing policy? People were largely making policy
> changes to protect their users that didn't really understand cloud
> semantics. Turning off features that they thought would confuse them
> (like pause). The large number of VM states is confusing, and not
> clearly useful for end users, and they would like simplification.
>
> Ideally policy could be set on a project by project admin, because
> they would like to delegate that responsibility down.
>
> No one was using the user_id based custom policy (yay!).
>
> There was desire that flavors could be RBAC locked down, which was
> actually being done via policy hacks right now. Providers want to
> expose some flavors (especially those with aggregate affinity) to only
> some projects.
>
> People were excited about the policy in code effort, only concern was
> that the defacto documentation of what you could change wouldn't be in
> the sample config.
>
> ACTION: ensure there is policy config reference now that the sample
> file is empty
> ACTION: flavor RBAC is a thing most of the room wanted, is there a
> taker on spec / implementation?
>
> Upgrade
> -------
>
> Everyone waits to do any optional thing until they absolutely have
> to.
>
> The Cells API db caught a bunch of people off guard because it was in
> Kilo (with release note) optional. Status quo in Liberty, with no
> release note about it existing, then forced in Mitaka. When an
> optional component is out there make sure it continues to be talked
> about in releases even when it's status did not change, or people
> forget.
>
> People were on Kilo, so EC2 out of tree didn't really have any
> data. About 25% of folks users have some existing AWS tooling, that
> it's good to be able to just let them use to onboard them.
>
> The current DB online data upgrade model feels *very opaque* to
> ops. They didn't realize the current model Nova was using, and didn't
> feel like it was documented anywhere.
>
> ACTION: document the DB data lifecycle better for operators
> ACTION: make sure we are cautious in rewarning people about changes
> they have to make (like Cells API db)
>
> API
> ---
>
> API upgrade seemed fine for folks. The only question was the new
> policy names, which was taking folks a bit of time to adjust to.
>
> No one in the room was using custom API extensions (or at least
> admitted to it when I asked).
>
> Tracking Feedback
> -----------------
>
> We talked a bit about tracking feedback. The silence on the ops list
> mostly comes from people not using a particular feature, so they don't
> really have an opinion.
>
> Most ops do not have time to look at our specs. That is an unlikely
> place to get feedback.
>
> Additional Questions
> --------------------
>
> There was an ask about VM HA. I stated that was beyond scope for Nova,
> plus Nova's view of the world is non authoritative enough you didn't
> want it to do that anyway. I told folks that the NFV efforts were
> working on this kind of thing beyond Nova, and people should team up
> there.
>
> There was an ask on status of Cinder Multi Attach. We gave them a bit
> of status on where things were at.
>
> ACTION: Cinder Multi Attach should maybe be a priority effort in the
> next cycle.
>
>
> Upgrade Pain Points
> ===================
>
> Raw etherpad -
> https://etherpad.openstack.org/p/NYC-ops-Upgrades-Pain-points
>
> Most people are a couple of releases back (Kilo / Liberty or even
> older). The only team CDing in the room was RAX, they are now 2 to 3
> months behind master.
>
> Everyone agrees upgrades are getting better with every release.
>
> Most are taking change windows and downtime for upgrades.
>
> Why are upgrades taking so long?
> --------------------------------
>
> About half way through this session I threw this powder keg into the
> room, it generated a lot of feedback.
>
> People are holding a lot of out of tree patches, causing
> latency. These are for:
>
> * bug fixes that get fixed on old versions of OpenStack, so upstream
> won't take them (chicken / egg problem of being on an old release)
> * custom identity driver for keystone
> * some new feature that customer wants, not taken upstream
> * lack of time to invest working patches upstream
>
> (Thinking out loud, I do wonder if there is a way we could close this gap)
>
> Defcore is actually forcing upgrades, because people loose their
> OpenStack trademark if they don't stay in the Defcore supported window.
>
> Other Sessions
> ==============
>
> There were a ton of other sessions there. Interesting things that I
> remember from them.
>
> In the session on deploying OpenStack in containers, there was a split
> between idempotent (docker) vs. system (lxc/lxd) containers. Both were
> getting used in different ways, and there was debate between the camps
> as to which was most effective.
>
> Ceilometer deployments are highly coupled to Heat, and seem to be only
> used when users want Heat auto scaling.
>
> There are noticable failures in our CLI / API paths when using UTF8
> names for projects / resources. Noticed by many .eu folks. Would be
> good to increase testing in areas like this.
>
>
> The full list of all etherpads are here for anyone else looking to
> dive in and learn more - https://etherpad.openstack.org/p/NYC-ops-meetup
>
> -Sean
>
Thanks Sean, these are great notes and very consumable. I really
appreciate you taking the time to convey this information so well.
I'm sorry i couldn't attend myself, but your summary really helps to
communicate the highlights as you saw them.
Thank you,
Anita.
More information about the OpenStack-dev
mailing list