[openstack-dev] [ironic][edge] Notes from the PTG
jim at jimrollenhagen.com
Wed Sep 19 14:49:55 UTC 2018
I wrote up some notes from my perspective at the PTG for some internal
teams and figured I may as well share them here. They're primarily from the
ironic and edge WG rooms. Fairly raw, very long, but hopefully useful to
Edge WG (IMHO) has historically just talked about use cases, hand-waved a
bit, and jumped to requiring an autonomous control plane per edge site -
thus spending all of their time talking about how they will make glance and
keystone sync data between control planes.
penick described roughly what we do with keystone/athenz and how that can
be used in a federated keystone deployment to provide autonomy for any
control plane, but also a single view via a global keystone.
penick and I both kept pushing for people to define a real architecture,
and we ended up with 10-15 people huddled around an easel for most of the
afternoon. Of note:
- Windriver (and others?) refuse to budge on the many control plane thing
- This means that they will need some orchestration tooling up top in
the main DC / client machines to even come close to reasonably managing all
of these sites
- They will probably need some syncing tooling
- glance->glance isn’t a thing, no matter how many people say it is.
- Glance PTL recommends syncing metadata outside of glance process, and
a global(ly distributed?) glance backend.
- We also defined the single pane of glass architecture that Oath plans to
- Okay with losing connectivity from central control plane to single
- Each edge site is a cell
- Each far edge site is just compute nodes
- Still may want to consider image distribution to edge sites so we
don’t have to go back to main DC?
- Keystone can be distributed the same as first architecture
- Nova folks may start investigating putting API hosts at the cell
level to get the best of both worlds - if there’s a network partition, can
still talk to cell API to manage things
- Need to think about removing the need for rabbitmq between edge and
- Kafka was suggested in the edge room for oslo.messaging in general
- Etcd watchers may be another option for an o.msg driver
- Other other options are more invasive into nova - involve
changing how nova-compute talks to conductor (etcd, etc) or even putting
REST APIs in nova-compute (and nova-conductor?)
- Neutron is going to work on an OVS “superagent” - superagent does
the RPC handling, talks some other way to child agents. Intended to scale
to thousands of children. Primary use case is smart nics but seems like a
win for the edge case as well.
penick took an action item to draw up the architecture diagrams in a
Wednesday: ironic things
Started with a retrospective. See
https://etherpad.openstack.org/p/ironic-stein-ptg-retrospective for the
notes - there wasn’t many surprising things here. We did discuss trying to
target some quick wins for the beginning of the cycle, so that we didn’t
have all of our features trying to land at the end. Using wsgi with the
ironic-api was mentioned as a potential regression, but we agreed it’s a
config/documentation issue. I took an action to make a task to document
Next we quickly reviewed our vision doc, and people didn’t have much to say
Metalsmith: it’s a thing, it’s being included into the ironic project.
Dmitry is open to optionally supporting placement. Multiple instances will
be a feature in the future. Otherwise mostly feature complete, goal is to
keep it simple.
Networking-ansible: redhat building tooling that integrates with upstream
ansible modules for networking gear. Kind of an alternative to n-g-s. Not
really much on plans here, RH just wanted to introduce it to the community.
Some discussion about it possibly replacing n-g-s later, but no hard plans.
Deploy steps/templates: we talked about what the next steps are, and what
an MVP looks like. Deploy templates are triggered by the traits that nodes
are scheduled against, and can add steps before or after (or in between?)
the default deploy steps. We agreed that we should add a RAID deploy step,
with standing questions for how arguments are passed to that deploy step,
and what the defaults look like. Myself and mgoddard took an action item to
open an RFE for this. We also agreed that we should start thinking about
how the current (only) deploy step should be split into multiple steps.
Graphical console: we discussed what the next steps are for this work. We
agreed that we should document the interface and what is returned (a URL),
and also start working on a redfish driver for graphical consoles. We also
noted that we can test in the gate with qemu, but we only need to test that
a correct URL is returned, not that the console actually works (because we
don’t really care that qemu’s console works).
Python 3: we talked about the changes to our jobs that are needed. We
agreed to use the base name of the jobs for Python 3 (as those will be used
for a long time), and add a “python2” prefix for the Python 2 jobs. We also
discussed dropping certain coverage for Python 2, as our CI jobs tend to
mostly test the same codepaths with some config differences. Last, we
talked about mixed environment Python 2 and 3 testing, as this will be a
thing people doing rolling upgrades of Python versions will hit. I sent an
email to the ML asking if others had done or thought about this, and it
sounds like we can limit that testing to oslo.messaging, and a task was
Pre-upgrade checks: Not much was discussed here; TheJulia is going to look
into it. One item of note is that there is an oslo project being proposed
that can carry some of the common code for this.
Performance improvements: We first discussed our virt driver’s performance.
It was found that Nova’s power sync loop makes a call to Ironic for each
instance that the compute service is managing. We do some node caching in
our driver that would be useful for this. I took an action item to look
into it, and have a WIP patch: https://review.openstack.org/#/c/602127/ .
That patch just needs a bug filed and unit tests written. On Thursday, we
talked with Nova about other performance things, and agreed we should
implement a hook in Nova that Ironic can do to say “power changed” and
“deploy done” and other things like this. This will help reduce or
eliminate polling from our virt driver to Ironic, and also allow Nova to
notice these changes faster. More on that later?
Splitting the conductor: we discussed the many tasks the conductor is
responsible for, and pondered if we could or should split things up. This
has implications (good and bad) for operability, scalability, and security.
Splitting the conductor to multiple workers would allow operators to use
different security models for different tasks (e.g. only allowing an “OOB
worker” access to the OOB network). It would also allow folks to scale out
workers that do lots of work (like the power status loop) separately from
those that do minimal work (writing PXE configs). I intend to investigate
this more during this cycle and lay out a plan for doing the work. This
also may require better distributed locking, which TheJulia has started
Changing boot mode defaults: Apparently Intel is going to stop shipping
hardware that is capable of legacy BIOS booting in 2020. We agreed that we
should work toward changing the default boot mode to UEFI to better prepare
our users, but we can’t drop legacy BIOS mode until all of the old hardware
in the world is gone. TheJulia is going to dig through the code and make a
UEFI HTTPClient booting: This is a DHCP class that allows the DHCP server
to return a URL instead of a “next-server” (TFTP location) response. This
is a clear value add, and TheJulia is going to work on it as she is already
neck deep in that area of code. We also need to ensure that Neutron
supports this. It should, as it’s just more DHCP options, but we need to
SecureBoot: I presented Oath’s secureboot model, which doesn’t depend on a
centralized attestation server. It made sense to people, and we discussed
putting the driver in tree. The process does rely on some enhancements to
iPXE, so Oath is going to investigate upstreaming those changes and
publishing more documentation, and then an in-tree driver should be no
problem. We also discussed Ironic’s current SecureBoot (TrustedBoot?)
implementations. Currently it only works with PXE, not iPXE or Grub2.
TheJulia is going to look into adding this support. We should be able to do
CI jobs for it, as TPM 1.2 and 2.0 emulation both seem to be supported in
QEMU as of 2.11.
NIC PXE configuration as a clean step: the DRAC driver team has a desire to
configure NICs for PXE or not, and sync with the ironic database’s
pxe_enabled field. This has gone back and forth in IRC. We were able to
resolve some of the issues with it, and rpioso is going to write a small
spec to make sure we get the details right.
Thursday: more ironic things
Neutron cross-project discussion: we discussed SmartNICs, which the Neutron
team had also discussed the previous day. In short, SmartNICs are NICs that
run OVS. The Neutron team discussed the scalability of their OVS agent
running across thousands of machines, and are planning to make some sort of
“superagent”. This superagent essentially owns a group of OVS agents. It
will talk to Neutron over rabbit as usual, but then use some other protocol
to talk to the OVS agents it is managing. This should help with rabbit load
even in “standard” Openstack environments, and is especially useful (to me)
for minimizing rabbitmq connections from far edge sites. The catch with
SmartNICs and Ironic is that the NICs must have power to be configured (and
thus the machine must be on). This breaks our general model of “only
configure networking with the machine off, to make sure we don’t cross
streams between tenants and control plane”. We came to a decent compromise
(I think), and agreed to continue in the ironic spec, and revisit the topic
Federation: we discussed federation and people seemed interested, however I
don’t believe we made any real progress toward getting it done. There’s
still a debate whether this should be something in Ironic itself, or if
there should just be some sort of proxy layer in front of multiple Ironic
environments. To be continued in the spec.
Agent polling: we discussed the spec to drop communication from IPA to the
conductor. It seems like nobody has major issues with it, and the spec just
needs some polishing before landing.
L3 deployments: We brought this up, and again there seems to be little
contention. I ended up approving the spec shortly after.
Neutron event processing: This work has been hanging for years and not
getting done. Some folks wondered if we should just poll Neutron, if that
gets the work done more quickly. Others wondered if we should even care
about it at all (we should). TheJulia is going to follow up with dtantsur
and vdrok to see if we can get someone to mainline some caffeine and just
get it done.
CMDB: Oath and CERN presented their work toward speccing out a CMDB
application that can integrate with Ironic. We discussed the problems that
they are trying to solve and agreed they need solving. We also agreed that
strict schema is better than blobjects (© jaypipes). We agreed it probably
doesn’t need to be in Ironic governance, but could be one day. The next
steps are to start hacking in a new repo in the OpenStack infrastructure,
and propose specs for any Ironic integration that is needed. Red Hat and
Dell contributors also showed interest in the project and volunteered to
help. Some folks are going to try and talk to the wider OpenStack community
to find out if there’s interest or needs from projects like
Stein goals: We put together a list of goals and voted on them. Julia has
since proposed the patch to document them:
Last thing Thursday: Cross-project discussions with Nova. Summarized here,
but lots of detail in the etherpad under the Ironic section:
Power sync: We discussed some problems CERN has with the instance power
sync (Rackspace also saw these problems). In short, nova asserts power
state if the instance “should” be off but the power is turned on
out-of-band. Operators definitely need to be aware of this when doing
maintenance on active machines, but we also discussed Ironic calling back
to Nova when Ironic knows that the power state has been updated (via Ironic
API, etc). I volunteered to look at this, and dansmith volunteered to help
API heaviness: We discussed how many API calls our virt driver does. As
mentioned earlier, I proposed a patch to make the power sync loop more
lightweight. There’s also lots of polling for tasks like deploy and rescue,
which we can dramatically reduce with a callback from Ironic to Nova. I
also volunteered to investigate this, and dansmith again agreed to help.
Compute host grouping: Ironic now has a mechanism for grouping conductors
to nodes, and we want to mirror that in Nova. We discussed how to take the
group as a config option and be able to find the other compute services
managing that group, so we can build the hash ring correctly. We concluded
that it’s a really hard problem (TM), and agreed to also add a config
option like “peer_list” that can be used to list other compute services in
the same group. This can be read dynamically each time we build the hash
ring, or can be a mutable config with updates triggered by a SIGHUP. We’ll
hash out the details in a blueprint or spec. Again, I agreed to begin the
work, and dansmith agreed to help.
Capabilities filter: This was the last topic. It’s been on the chopping
block for ages, but we are just now reaching the point where it can be
properly deprecated. We discussed the plan, and mostly agreed it was good
enough. johnthetubaguy is going to send the plan wider and make sure it
will work for folks. We also discussed modeling countable resources on
Ironic resource providers, which will work as long as there is still some
resource class with an inventory of one, like we have today. Some folks may
investigate doing this, but it’s fuzzy how much people care or if we really
need/want to do it.
Friday: kind of bummed around the Ironic and TC rooms. Lots of interesting
discussions, but nothing I feel like writing about here (as Ironic
conversations were things like code deep-dives not worth communicating
widely, and the TC topics have been written about to death).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev