<div dir="ltr">Hi,<div><br></div><div>Jim, thank you for recap and blog!</div><div><br></div><div><div>> # Newton priorities</div><div>> </div><div>> [Etherpad](<a href="https://etherpad.openstack.org/p/ironic-newton-summit-priorities">https://etherpad.openstack.org/p/ironic-newton-summit-priorities</a>)</div><div>> </div><div>> We discussed our priorities for the Newton cycle here.</div><div>> </div><div>> Of note, we decided that we need to get cold upgrade testing (i.e.</div><div>> grenade) running ASAP. We have lots of large changes lined up that feel</div><div>> like they could easily break upgrades, and want to be able to test them.</div><div>> Much of the team is jumping in to help get this going.</div><div>> </div><div>> The priorities for the cycle have been published</div><div>> [here](<a href="http://specs.openstack.org/openstack/ironic-specs/priorities/newton-priorities.html">http://specs.openstack.org/openstack/ironic-specs/priorities/newton-priorities.html</a>).</div><div>I'd like to discuss about priority.</div><div>I don't think that we should not implement items not in priority list.</div><div>But, we need to concentrate on high-priority items which need to be completed in this cycle.</div><div>Especially, core developers are. Because they have Workflow+1 privilege :)</div><div><br></div><div>I think there are 2 concerns.</div><div>one is, we need to set scope in the cycle.</div><div><div>For example, we have implemented Neutron integration from L cycle, </div><div>and we want to complete it in this cycle.</div><div>There are many things which we want, and to complete everything takes very long time.</div><div>So that, we need to set priority in Neutron integration also, and we need to give up to implement</div><div>some items and implement them in the next cycle.</div><div><br></div><div>Another is, we need to be strict about priority a little bit.</div><div>Recently, I saw the presentation </div><div>"How Open Source Projects Survive Poisonous People (And You Can Too)".</div><div><a href="https://www.youtube.com/watch?v=Q52kFL8zVoM">https://www.youtube.com/watch?v=Q52kFL8zVoM</a><br></div><div>In this presentation, subversion developers are talking about</div><div>how to manage OSS community.</div><div>They have a TODO list, and when a new proposal comes, if it is not in the TODO list,</div><div>it will be denied.</div><div>Do we need to do the same thing? Of course, NO.</div><div>But, now is the time we need to concern about it...</div><div><br></div><div>Thank you for reading this long email.<br></div><div>What should we do to manage the project efficiently?</div><div>Everyone is welcome.</div><div><br></div><div><br></div><div>Best Regards,</div><div>Yuiko Takada Mori</div><div><br></div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">2016-05-11 23:16 GMT+09:00 Jim Rollenhagen <span dir="ltr"><<a href="mailto:jim@jimrollenhagen.com" target="_blank">jim@jimrollenhagen.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Others made good points for posting this on the ML, so here it is in<br>

full. Sorry for the markdown formatting, I just copied this from the<br>

blog post.<br>

<br>

// jim<br>

<br>

Another cycle, another summit. The ironic project had ten design summit<br>

sessions to get together and chat about some of our current and future<br>

work. We also led a cross-project session on bare metal networking, had<br>

a joint session with nova, and a contributor's meetup for the first half<br>

of Friday. The following is a summary of those sessions.<br>

<br>

# Cross-project: the future of bare-metal networking<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/newton-baremetal-networking" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/newton-baremetal-networking</a>)<br>

<br>

This session was meant to have the Nova, Ironic, and Neutron folks get<br>

together and figure out some of the details of the [work we're<br>

doing](<a href="https://review.openstack.org/#/c/277853/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/277853/</a>) to decouple the<br>

physical network infrastructure from the logical networking that users<br>

interact with. Unfortunately, we spent most of the time explaining the<br>

problem and the goals, and not much time actually figuring out how<br>

things should work. We were able to decide that the trunk port work in<br>

neutron should mostly work for us.<br>

<br>

There was plenty of hallway chats about this throughout the week, and<br>

from those I think we have a good idea of what needs to be done. The<br>

spec linked above will be updated soon to clarify where we are at here.<br>

<br>

# Nova-compatible serial and graphical consoles<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/ironic-newton-summit-console" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/ironic-newton-summit-console</a>)<br>

<br>

This session began with a number of proposals to implement serial and<br>

graphical consoles that would work with Nova, and a goal to narrow them<br>

down so folks can move forward with the code.<br>

<br>

The first thing we decided is that in the short term, we want to focus<br>

on the serial console. It's supported by almost all hardware and most<br>

cases where someone needs a console are covered by a simple serial<br>

console. We do want to do graphical consoles eventually, but would like<br>

to take one thing at a time.<br>

<br>

We then spent some time dissecting our requirements (and preferences)<br>

for what we want an implementation to do, which are listed toward the<br>

bottom of the etherpad.<br>

<br>

We narrowed the serial console work down to two implementations:<br>

<br>

* [ironic-console-server](<a href="https://review.openstack.org/#/c/306755/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/306755/</a>).<br>

  The tl;dr here is that the conductor will shell out to a command that<br>

  creates a listening port, and forks a process that connects to the<br>

  console, and pipes data between the two. This command is called once<br>

  per console session. The upside with this approach is that operators<br>

  don't need to do much when the change is deployed.<br>

<br>

* [ironic-ipmiproxy](<a href="https://review.openstack.org/#/c/296869/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/296869/</a>).<br>

  This is similar to ironic-console-server, except that it runs as its<br>

  own daemon with a small REST API for start/stop/get. It spawns a<br>

  process for each console, which does not fork itself. The upside here<br>

  is that it can be scaled independently, and has no implications on<br>

  conductor failover; however it will need its own HA model as well, and<br>

  will be more work for deployers.<br>

<br>

It seems like most folks agree that the latter is more desirable, in<br>

terms of scaling model and such, but we didn't quite come to consensus<br>

during the session. We need to do that ASAP.<br>

<br>

We also talked a bit about console logging, and the pitfalls of doing it<br>

automatically for every instance. For example, some BMCs crash if power<br>

status is called repeatedly with a serial-over-lan session active (this<br>

is something to consider for regular console attach as well). We'll need<br>

to make this operator configurable, possibly per-node, so that we aren't<br>

automatically crashing bad BMCs for people. The nova team agreed later<br>

that this is fine, as long as a decent error is returned in this case.<br>

<br>

# Status and future of our gate<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/ironic-newton-summit-gate" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/ironic-newton-summit-gate</a>)<br>

<br>

We discussed the current status of our gate, and the plans for Newton.<br>

<br>

We first talked about third-party CI, and where we're at with that. Kurt<br>

Taylor is doing the main tracking of that, and explained where and how<br>

we're tracking it. There was a call for help for some of the missing<br>

data, and getting all the right pages updated (stackalytics,<br>

<a href="http://openstack.org" rel="noreferrer" target="_blank">openstack.org</a> marketplace, etc).<br>

<br>

We also talked about the current changes going into our gate that we<br>

want to push forward. Moving to tinyipa and virtualbmc (with ipmitool<br>

drivers) are the main changes right now. We discussed the progress on<br>

upgrade testing via grenade. There hasn't been a lot of progress made,<br>

but some of the groundwork to make local testing easy has been done.<br>

Later in the week, during the priorities session, we agreed that the<br>

upgrade testing was our top priority right now, and some folks<br>

volunteered to help move it along.<br>

<br>

# Hardware pool management<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/ironic-newton-summit-hardware-pools" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/ironic-newton-summit-hardware-pools</a>)<br>

<br>

This topic is talked about at nearly every summit, and we said that we<br>

need to at least solve the internals this round.<br>

<br>

We narrowed in on what we think is a good architecture for this. Given<br>

that names are hard, we decided we would add a "thing" resource (yes,<br>

this name will change). This is some sort of management interface for a<br>

group of nodes, and every node will be mapped 1:1 to a "thing" by<br>

default. Credentials can be optionally placed in the "thing" instead of<br>

on the node, and there can be a 1:n thing:node mapping. This will allow<br>

ironic to do group operations for hardware to support it. Of course,<br>

there will be "thing" drivers, because every hardware does this<br>

differently. :)<br>

<br>

We also decided to make this an internal-only feature for now, and not<br>

expose it to the REST API yet. It can be used as an optimization in<br>

internal code, or to support hardware that can only be managed as a<br>

group. We may eventually decide to expose group management features to<br>

the REST API, but not yet.<br>

<br>

# Driver composition<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/ironic-newton-summit-driver-composition" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/ironic-newton-summit-driver-composition</a>)<br>

<br>

This is another topic we keep discussing without coming to conclusions,<br>

and I think we made good progress this round. Significant work went into<br>

the spec before the summit, and folks came prepared with a solid<br>

proposal.<br>

<br>

We agreed on the concept of a "hardware type", which declares<br>

compatibility between driver interface implementations. These will be<br>

hard-coded into ironic, to what the vendor expects and the generic<br>

interfaces that ironic provides.<br>

<br>

We also agreed that out of tree implementations of an interface should<br>

not be allowed to declare compatibility with in-tree vendor hardware<br>

types. For example, one could not make a power interface out of tree<br>

that declares compatibility with the in-tree "pizza box" hardware type.<br>

Out of tree drivers can, however, declare their own hardware types that<br>

may be used.<br>

<br>

We also discussed upgrades, and using a new `hardware_type` field on the<br>

node object, to be used for the migration path. We didn't fully come to<br>

consensur on the upgrade path, but we're close, and the details are in<br>

the etherpad.<br>

<br>

# Making ops less worse<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/ironic-newton-summit-ops" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/ironic-newton-summit-ops</a>)<br>

<br>

We discussed some common failure cases that operators see, and how we<br>

can solve them in code.<br>

<br>

We discussed flaky BMCs, which end with the node in maintenance mode,<br>

and if Ironic can get them out of that mode automagically. We identified<br>

the need to distinguish between maintenance set by ironic and set by<br>

operators, and do things like attempt to connect to the BMC on a power<br>

state request, and turn off maintenance mode if successful. JayF is<br>

going to write a spec for this differentiation.<br>

<br>

Folks also expressed the desire to be able to reset the BMC via APIs. We<br>

have a BMC reset function in the vendor interface for the ipmitool<br>

driver; dtantsur volunteered to write a spec to promote that method to<br>

an official ManagementInterface method.<br>

<br>

We also talked for a while about stuck states. This has been mostly<br>

solved in code, but is still a problem for some deployers. We decided<br>

that we should not have a "reset-state" API like nova does, but rather a<br>

command line tool to handle this. lintan has volunteered to write a<br>

proposal for this; I have also posted some [straw man<br>

code](<a href="https://review.openstack.org/#/c/311273/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/311273/</a>) that someone is welcome<br>

to take over or use.<br>

<br>

# Anomaly detection and resolution<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/ironic-newton-summit-anomaly-detection" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/ironic-newton-summit-anomaly-detection</a>)<br>

<br>

This session kind of naturally flowed from the previous one, but was<br>

more about how we can automatically detect failure cases, and<br>

potentially automatically fix them. If we can't do these, then what do<br>

we need to build to allow external tooling to do it?<br>

<br>

We concluded with a few things that we can do now to get started:<br>

<br>

* Build a node error event DB table, such that errors can be fetched via<br>

  our API.<br>

<br>

* Sending notifications on every state change (this is already in<br>

  progress). Other tools can subscribe to them to watch for anomalies.<br>

<br>

* Add a periodic task that polls BMCs for hardware event/error logs. We<br>

  could store these or emit them as notifications.<br>

<br>

# Ansible deploy driver<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/ironic-newton-summit-ansible-deploy" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/ironic-newton-summit-ansible-deploy</a>)<br>

<br>

Some folks from Mirantis presented their proposal for an ansible-based<br>

deploy driver, and allowed us to ask questions.<br>

<br>

The primary use case for this is to allow operators to easily change the<br>

deploy process; the ansible playbooks for this are configuration, node<br>

code. Some people had concerns about this, especially around<br>

supportability and the fact that we (upstream) effectively have no<br>

control over how a deployment works. This is analogous to allowing an<br>

operator to modify spawn() in the nova libvirt driver. However, most<br>

people present were okay with this.<br>

<br>

We discussed how to build and secure ramdisks for this, and tossed some<br>

ideas around. We didn't come to any clear consensus, though.<br>

<br>

Last, we found that currently each node is deployed in serial. We noted<br>

that this driver is a non-starter until it can deploy many (50?)<br>

machines in parallel from a single conductor host. As such, we've<br>

decided that until this is possible, the team shouldn't be spending much<br>

review time on this.<br>

<br>

# Live upgrades<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/ironic-newton-summit-live-upgrades" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/ironic-newton-summit-live-upgrades</a>)<br>

<br>

Here, we discussed what we need to get to rolling upgrades, and what we<br>

should be testing to confirm these work.<br>

<br>

We noted that the requirement for the "supports rolling upgrade" tag in<br>

the governance repo is only testing last-stable to master upgrade (and<br>

the equivalent for changes on the stable branch. Given that we do<br>

intermediate releases, we also want to test upgrades from last numbered<br>

release (because that may be more recent than last stable) to master.<br>

Last, we should run a job that upgrades ironic but does not upgrade<br>

nova, to make sure services can be upgraded independently.<br>

<br>

We decided that for ironic upgrades, conductor should go first, followed<br>

by the API. This is so that the API doesn't expose functionality before<br>

a conductor supports it.<br>

<br>

We decided for full cloud upgrades, ironic should go before nova,<br>

because older nova should always work with newer ironic. We should also<br>

upgrade neutron before ironic, because ironic consumes neutron and we<br>

don't want to depend on functionality that doesn't exist yet. There's an<br>

action item for me to check with the Neutron folks on this, to make sure<br>

Neutron before ironic before nova seems kosher to them.<br>

<br>

# Inspector HA<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/ironic-newton-summit-inspector" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/ironic-newton-summit-inspector</a>)<br>

<br>

Milan gave a quick presentation on his proposal for an HA model for<br>

inspector, and we discussed.<br>

<br>

Things we agreed on:<br>

<br>

* the general proposal<br>

<br>

* use tooz for locking and leader election<br>

<br>

* split it into an api and conductor service<br>

<br>

  * conductor runs active-active<br>

<br>

* don't split firewall and dhcp services to a separate service<br>

<br>

Details are in the etherpad. :)<br>

<br>

# Newton priorities<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/ironic-newton-summit-priorities" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/ironic-newton-summit-priorities</a>)<br>

<br>

We discussed our priorities for the Newton cycle here.<br>

<br>

Of note, we decided that we need to get cold upgrade testing (i.e.<br>

grenade) running ASAP. We have lots of large changes lined up that feel<br>

like they could easily break upgrades, and want to be able to test them.<br>

Much of the team is jumping in to help get this going.<br>

<br>

The priorities for the cycle have been published<br>

[here](<a href="http://specs.openstack.org/openstack/ironic-specs/priorities/newton-priorities.html" rel="noreferrer" target="_blank">http://specs.openstack.org/openstack/ironic-specs/priorities/newton-priorities.html</a>).<br>

<br>

The etherpad also lists some smaller work that we want to prioritize,<br>

but did not publish as such. My big task for early this cycle is to<br>

build a quick landing page that has all priorities with relevant links<br>

to them, and these small things will be included on this page.<br>

<br>

# Nova/Ironic cross-project<br>

<br>

[Etherpad](<a href="https://etherpad.openstack.org/p/newton-nova-ironic" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/newton-nova-ironic</a>)<br>

<br>

We started this session by updating the Nova team on the status of a few<br>

things.<br>

<br>

We discussed the multitenant networking work, and what's left to do<br>

there. We wondered out loud if the "routed networks" feature planned for<br>

Nova will conflict with this work - johnthetubaguy and myself are to<br>

investigate this further.<br>

<br>

We talked about the multiple-compute work, and if the<br>

generic-resource-pools work is a better route to getting there. This<br>

discussion has continued beyond the summit and is being investigated<br>

further.<br>

<br>

We then talked about the future console work, and went over what we<br>

decided in the previous session we had about that.<br>

<br>

We discussed what nova needs from the ironic team - full tempest runs<br>

(minus what ironic doesn't support) and faster CI runs. Surprise! We<br>

discussed some progress and some options here.<br>

<br>

Last, we talked for a few minutes about passing configuration from<br>

flavors to ironic - think BIOS configuration on the fly, depending on<br>

the flavor requested. This was obviously too big a topic to solve in a<br>

few minutes, but we got the wheels spinning.<br>

<br>

# Summary<br>

<br>

All in all, it was a productive summit for the ironic team, and we have<br>

a clear vision for the next six months.<br>

<div class=""><div class="h5"><br>

On Mon, May 09, 2016 at 06:00:46PM -0400, Jim Rollenhagen wrote:<br>

> Hey all,<br>

><br>

> I wrote a recap of the summit on my blog:<br>

> <a href="http://jroll.ghost.io/newton-summit-recap/" rel="noreferrer" target="_blank">http://jroll.ghost.io/newton-summit-recap/</a><br>

><br>

> I hope this covers everything that folks missed or couldn't remember. As<br>

> always, questions/comments/concerns welcome.<br>

><br>

> // jim<br>

><br>

> __________________________________________________________________________<br>

> OpenStack Development Mailing List (not for usage questions)<br>

> Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</div></div></blockquote></div><br></div></div></div></div>