[openstack-dev] [ironic] Summit recap

Yuiko Takada yuikotakada0313 at gmail.com
Fri May 13 04:42:41 UTC 2016


Hi,

Jim, thank you for recap and blog!

> # Newton priorities
>
> [Etherpad](
https://etherpad.openstack.org/p/ironic-newton-summit-priorities)
>
> We discussed our priorities for the Newton cycle here.
>
> Of note, we decided that we need to get cold upgrade testing (i.e.
> grenade) running ASAP. We have lots of large changes lined up that feel
> like they could easily break upgrades, and want to be able to test them.
> Much of the team is jumping in to help get this going.
>
> The priorities for the cycle have been published
> [here](
http://specs.openstack.org/openstack/ironic-specs/priorities/newton-priorities.html
).
I'd like to discuss about priority.
I don't think that we should not implement items not in priority list.
But, we need to concentrate on high-priority items which need to be
completed in this cycle.
Especially, core developers are. Because they have Workflow+1 privilege :)

I think there are 2 concerns.
one is, we need to set scope in the cycle.
For example, we have implemented Neutron integration from L cycle,
and we want to complete it in this cycle.
There are many things which we want, and to complete everything takes very
long time.
So that, we need to set priority in Neutron integration also, and we need
to give up to implement
some items and implement them in the next cycle.

Another is, we need to be strict about priority a little bit.
Recently, I saw the presentation
"How Open Source Projects Survive Poisonous People (And You Can Too)".
https://www.youtube.com/watch?v=Q52kFL8zVoM
In this presentation, subversion developers are talking about
how to manage OSS community.
They have a TODO list, and when a new proposal comes, if it is not in the
TODO list,
it will be denied.
Do we need to do the same thing? Of course, NO.
But, now is the time we need to concern about it...

Thank you for reading this long email.
What should we do to manage the project efficiently?
Everyone is welcome.


Best Regards,
Yuiko Takada Mori



2016-05-11 23:16 GMT+09:00 Jim Rollenhagen <jim at jimrollenhagen.com>:

> Others made good points for posting this on the ML, so here it is in
> full. Sorry for the markdown formatting, I just copied this from the
> blog post.
>
> // jim
>
> Another cycle, another summit. The ironic project had ten design summit
> sessions to get together and chat about some of our current and future
> work. We also led a cross-project session on bare metal networking, had
> a joint session with nova, and a contributor's meetup for the first half
> of Friday. The following is a summary of those sessions.
>
> # Cross-project: the future of bare-metal networking
>
> [Etherpad](https://etherpad.openstack.org/p/newton-baremetal-networking)
>
> This session was meant to have the Nova, Ironic, and Neutron folks get
> together and figure out some of the details of the [work we're
> doing](https://review.openstack.org/#/c/277853/) to decouple the
> physical network infrastructure from the logical networking that users
> interact with. Unfortunately, we spent most of the time explaining the
> problem and the goals, and not much time actually figuring out how
> things should work. We were able to decide that the trunk port work in
> neutron should mostly work for us.
>
> There was plenty of hallway chats about this throughout the week, and
> from those I think we have a good idea of what needs to be done. The
> spec linked above will be updated soon to clarify where we are at here.
>
> # Nova-compatible serial and graphical consoles
>
> [Etherpad](https://etherpad.openstack.org/p/ironic-newton-summit-console)
>
> This session began with a number of proposals to implement serial and
> graphical consoles that would work with Nova, and a goal to narrow them
> down so folks can move forward with the code.
>
> The first thing we decided is that in the short term, we want to focus
> on the serial console. It's supported by almost all hardware and most
> cases where someone needs a console are covered by a simple serial
> console. We do want to do graphical consoles eventually, but would like
> to take one thing at a time.
>
> We then spent some time dissecting our requirements (and preferences)
> for what we want an implementation to do, which are listed toward the
> bottom of the etherpad.
>
> We narrowed the serial console work down to two implementations:
>
> * [ironic-console-server](https://review.openstack.org/#/c/306755/).
>   The tl;dr here is that the conductor will shell out to a command that
>   creates a listening port, and forks a process that connects to the
>   console, and pipes data between the two. This command is called once
>   per console session. The upside with this approach is that operators
>   don't need to do much when the change is deployed.
>
> * [ironic-ipmiproxy](https://review.openstack.org/#/c/296869/).
>   This is similar to ironic-console-server, except that it runs as its
>   own daemon with a small REST API for start/stop/get. It spawns a
>   process for each console, which does not fork itself. The upside here
>   is that it can be scaled independently, and has no implications on
>   conductor failover; however it will need its own HA model as well, and
>   will be more work for deployers.
>
> It seems like most folks agree that the latter is more desirable, in
> terms of scaling model and such, but we didn't quite come to consensus
> during the session. We need to do that ASAP.
>
> We also talked a bit about console logging, and the pitfalls of doing it
> automatically for every instance. For example, some BMCs crash if power
> status is called repeatedly with a serial-over-lan session active (this
> is something to consider for regular console attach as well). We'll need
> to make this operator configurable, possibly per-node, so that we aren't
> automatically crashing bad BMCs for people. The nova team agreed later
> that this is fine, as long as a decent error is returned in this case.
>
> # Status and future of our gate
>
> [Etherpad](https://etherpad.openstack.org/p/ironic-newton-summit-gate)
>
> We discussed the current status of our gate, and the plans for Newton.
>
> We first talked about third-party CI, and where we're at with that. Kurt
> Taylor is doing the main tracking of that, and explained where and how
> we're tracking it. There was a call for help for some of the missing
> data, and getting all the right pages updated (stackalytics,
> openstack.org marketplace, etc).
>
> We also talked about the current changes going into our gate that we
> want to push forward. Moving to tinyipa and virtualbmc (with ipmitool
> drivers) are the main changes right now. We discussed the progress on
> upgrade testing via grenade. There hasn't been a lot of progress made,
> but some of the groundwork to make local testing easy has been done.
> Later in the week, during the priorities session, we agreed that the
> upgrade testing was our top priority right now, and some folks
> volunteered to help move it along.
>
> # Hardware pool management
>
> [Etherpad](
> https://etherpad.openstack.org/p/ironic-newton-summit-hardware-pools)
>
> This topic is talked about at nearly every summit, and we said that we
> need to at least solve the internals this round.
>
> We narrowed in on what we think is a good architecture for this. Given
> that names are hard, we decided we would add a "thing" resource (yes,
> this name will change). This is some sort of management interface for a
> group of nodes, and every node will be mapped 1:1 to a "thing" by
> default. Credentials can be optionally placed in the "thing" instead of
> on the node, and there can be a 1:n thing:node mapping. This will allow
> ironic to do group operations for hardware to support it. Of course,
> there will be "thing" drivers, because every hardware does this
> differently. :)
>
> We also decided to make this an internal-only feature for now, and not
> expose it to the REST API yet. It can be used as an optimization in
> internal code, or to support hardware that can only be managed as a
> group. We may eventually decide to expose group management features to
> the REST API, but not yet.
>
> # Driver composition
>
> [Etherpad](
> https://etherpad.openstack.org/p/ironic-newton-summit-driver-composition)
>
> This is another topic we keep discussing without coming to conclusions,
> and I think we made good progress this round. Significant work went into
> the spec before the summit, and folks came prepared with a solid
> proposal.
>
> We agreed on the concept of a "hardware type", which declares
> compatibility between driver interface implementations. These will be
> hard-coded into ironic, to what the vendor expects and the generic
> interfaces that ironic provides.
>
> We also agreed that out of tree implementations of an interface should
> not be allowed to declare compatibility with in-tree vendor hardware
> types. For example, one could not make a power interface out of tree
> that declares compatibility with the in-tree "pizza box" hardware type.
> Out of tree drivers can, however, declare their own hardware types that
> may be used.
>
> We also discussed upgrades, and using a new `hardware_type` field on the
> node object, to be used for the migration path. We didn't fully come to
> consensur on the upgrade path, but we're close, and the details are in
> the etherpad.
>
> # Making ops less worse
>
> [Etherpad](https://etherpad.openstack.org/p/ironic-newton-summit-ops)
>
> We discussed some common failure cases that operators see, and how we
> can solve them in code.
>
> We discussed flaky BMCs, which end with the node in maintenance mode,
> and if Ironic can get them out of that mode automagically. We identified
> the need to distinguish between maintenance set by ironic and set by
> operators, and do things like attempt to connect to the BMC on a power
> state request, and turn off maintenance mode if successful. JayF is
> going to write a spec for this differentiation.
>
> Folks also expressed the desire to be able to reset the BMC via APIs. We
> have a BMC reset function in the vendor interface for the ipmitool
> driver; dtantsur volunteered to write a spec to promote that method to
> an official ManagementInterface method.
>
> We also talked for a while about stuck states. This has been mostly
> solved in code, but is still a problem for some deployers. We decided
> that we should not have a "reset-state" API like nova does, but rather a
> command line tool to handle this. lintan has volunteered to write a
> proposal for this; I have also posted some [straw man
> code](https://review.openstack.org/#/c/311273/) that someone is welcome
> to take over or use.
>
> # Anomaly detection and resolution
>
> [Etherpad](
> https://etherpad.openstack.org/p/ironic-newton-summit-anomaly-detection)
>
> This session kind of naturally flowed from the previous one, but was
> more about how we can automatically detect failure cases, and
> potentially automatically fix them. If we can't do these, then what do
> we need to build to allow external tooling to do it?
>
> We concluded with a few things that we can do now to get started:
>
> * Build a node error event DB table, such that errors can be fetched via
>   our API.
>
> * Sending notifications on every state change (this is already in
>   progress). Other tools can subscribe to them to watch for anomalies.
>
> * Add a periodic task that polls BMCs for hardware event/error logs. We
>   could store these or emit them as notifications.
>
> # Ansible deploy driver
>
> [Etherpad](
> https://etherpad.openstack.org/p/ironic-newton-summit-ansible-deploy)
>
> Some folks from Mirantis presented their proposal for an ansible-based
> deploy driver, and allowed us to ask questions.
>
> The primary use case for this is to allow operators to easily change the
> deploy process; the ansible playbooks for this are configuration, node
> code. Some people had concerns about this, especially around
> supportability and the fact that we (upstream) effectively have no
> control over how a deployment works. This is analogous to allowing an
> operator to modify spawn() in the nova libvirt driver. However, most
> people present were okay with this.
>
> We discussed how to build and secure ramdisks for this, and tossed some
> ideas around. We didn't come to any clear consensus, though.
>
> Last, we found that currently each node is deployed in serial. We noted
> that this driver is a non-starter until it can deploy many (50?)
> machines in parallel from a single conductor host. As such, we've
> decided that until this is possible, the team shouldn't be spending much
> review time on this.
>
> # Live upgrades
>
> [Etherpad](
> https://etherpad.openstack.org/p/ironic-newton-summit-live-upgrades)
>
> Here, we discussed what we need to get to rolling upgrades, and what we
> should be testing to confirm these work.
>
> We noted that the requirement for the "supports rolling upgrade" tag in
> the governance repo is only testing last-stable to master upgrade (and
> the equivalent for changes on the stable branch. Given that we do
> intermediate releases, we also want to test upgrades from last numbered
> release (because that may be more recent than last stable) to master.
> Last, we should run a job that upgrades ironic but does not upgrade
> nova, to make sure services can be upgraded independently.
>
> We decided that for ironic upgrades, conductor should go first, followed
> by the API. This is so that the API doesn't expose functionality before
> a conductor supports it.
>
> We decided for full cloud upgrades, ironic should go before nova,
> because older nova should always work with newer ironic. We should also
> upgrade neutron before ironic, because ironic consumes neutron and we
> don't want to depend on functionality that doesn't exist yet. There's an
> action item for me to check with the Neutron folks on this, to make sure
> Neutron before ironic before nova seems kosher to them.
>
> # Inspector HA
>
> [Etherpad](https://etherpad.openstack.org/p/ironic-newton-summit-inspector
> )
>
> Milan gave a quick presentation on his proposal for an HA model for
> inspector, and we discussed.
>
> Things we agreed on:
>
> * the general proposal
>
> * use tooz for locking and leader election
>
> * split it into an api and conductor service
>
>   * conductor runs active-active
>
> * don't split firewall and dhcp services to a separate service
>
> Details are in the etherpad. :)
>
> # Newton priorities
>
> [Etherpad](
> https://etherpad.openstack.org/p/ironic-newton-summit-priorities)
>
> We discussed our priorities for the Newton cycle here.
>
> Of note, we decided that we need to get cold upgrade testing (i.e.
> grenade) running ASAP. We have lots of large changes lined up that feel
> like they could easily break upgrades, and want to be able to test them.
> Much of the team is jumping in to help get this going.
>
> The priorities for the cycle have been published
> [here](
> http://specs.openstack.org/openstack/ironic-specs/priorities/newton-priorities.html
> ).
>
> The etherpad also lists some smaller work that we want to prioritize,
> but did not publish as such. My big task for early this cycle is to
> build a quick landing page that has all priorities with relevant links
> to them, and these small things will be included on this page.
>
> # Nova/Ironic cross-project
>
> [Etherpad](https://etherpad.openstack.org/p/newton-nova-ironic)
>
> We started this session by updating the Nova team on the status of a few
> things.
>
> We discussed the multitenant networking work, and what's left to do
> there. We wondered out loud if the "routed networks" feature planned for
> Nova will conflict with this work - johnthetubaguy and myself are to
> investigate this further.
>
> We talked about the multiple-compute work, and if the
> generic-resource-pools work is a better route to getting there. This
> discussion has continued beyond the summit and is being investigated
> further.
>
> We then talked about the future console work, and went over what we
> decided in the previous session we had about that.
>
> We discussed what nova needs from the ironic team - full tempest runs
> (minus what ironic doesn't support) and faster CI runs. Surprise! We
> discussed some progress and some options here.
>
> Last, we talked for a few minutes about passing configuration from
> flavors to ironic - think BIOS configuration on the fly, depending on
> the flavor requested. This was obviously too big a topic to solve in a
> few minutes, but we got the wheels spinning.
>
> # Summary
>
> All in all, it was a productive summit for the ironic team, and we have
> a clear vision for the next six months.
>
> On Mon, May 09, 2016 at 06:00:46PM -0400, Jim Rollenhagen wrote:
> > Hey all,
> >
> > I wrote a recap of the summit on my blog:
> > http://jroll.ghost.io/newton-summit-recap/
> >
> > I hope this covers everything that folks missed or couldn't remember. As
> > always, questions/comments/concerns welcome.
> >
> > // jim
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160513/a3c5c198/attachment.html>


More information about the OpenStack-dev mailing list