[Openstack-operators] [nova] Queens PTG recap - everything else
Matt Riedemann
mriedemos at gmail.com
Mon Sep 18 21:58:32 UTC 2017
There was a whole lot of other stuff discussed at the PTG. The details
are in [1]. I won't go into everything here, so I'm just highlighting
some of the more concrete items that had owners or TODOs.
Ironic
------
The Ironic team came over on Wednesday afternoon. We talked a bit, had
some laughs, it was a good time. Since I don't speak fluent baremetal,
Dmitry Tantsur is going to recap those discussions in the mailing list.
Thanks again, Dmitry.
Privsep
-------
Michael Still has been going hog wild converting the nova libvirt driver
code to use privsep instead of rootwrap. He has a series of changes
tracked under this blueprint [2]. Most of the discussion was a refresh
on privsep and a recap of what's already been merged and some discussion
on outstanding patches. The goal for Queens is to get the entire libvirt
driver converted and also try to get all of nova-compute converted, but
we want to limit that to getting things merged early in the release to
flush out bugs since a lot of these are weird, possibly untested code
paths. There was also discussion of a kind of privsep heartbeat daemon
to tell if it's running (even though it's not a separate service) but
this is complicated and is not something we'll pursue for Queens.
Websockify security proxy framework
-----------------------------------
This is a long-standing security hardening feature [3] which has changed
hands a few times and hasn't gotten much review. Sean Dague and Melanie
Witt agreed to focus on reviewing this for Queens.
Certificate validation
----------------------
This is another item that's been discussed since at least the Ocata
summit but hasn't made much progress. Sean Dague agreed to help review
this, and Eric Fried said he knew someone that could help review the
security aspects of this change. Sean also suggested scheduling a
hangout so the John Hopkins University team working on this can give a
primer on the feature and what to look out for during review. We also
suggested getting a scenario test written for this in the barbican
tempest plugin, which runs as an experimental queue job for nova.
Notifications
-------------
Given the state of the Searchlight project and how we don't plan on
using Searchlight as a global proxy for the compute REST API, we are not
going to work on parity with versioned notifications there. There are
some cleanups we still need to do in Nova for versioned notifications
from a performance perspective. We also agreed that we aren't going to
consider deprecating legacy unversioned notifications until we have
parity with the versioned notifications, especially given legacy
unversioned notification consumers have not yet moved to using the
versioned notifications.
vGPU support
------------
This depends on nested resource providers (like lots of other things).
It was not clear from the discussion if this is static or dynamic
support, e.g. can we hot plug vGPUs using Cyborg? I assume we will not
support hot plugging at first. We also need improved functional testing
of this space before we can make big changes.
Preemptible (spot) instances
-----------------------------
This was continuing the discussion from the Boston forum session [5].
The major issue in Nova is that we don't want Nova to be in charge of
orchestrating preempting instances when a request comes in for a "paid"
instance. We agreed to start small where you can't burst over quota.
Blazar also delivered some reservation features in Pike [6] which sound
like they can be built on here, which also sound like expiration
policies. Someone will have to prototype an external (to nova) "reaper"
which will cull the preemptible instances based on some configurable
policy. Honestly the notes here are confusing so we're going to need
someone to drive this forward. That might mean picking up John Garbutt's
draft spec for this (link not available right now).
Driver updates
--------------
Various teams from IBM gave updates on plans for their drivers in Queens.
PowerVM (in tree): the team is proposing a few more capabilities to the
driver in Queens. Details are in the spec [7].
zDPM (out of tree): this out of tree driver has had two releases (ocata
and pike) and is working on 3rd party CI. One issue they have with
Tempest is they can only boot from volume.
zVM (out of tree): the team is working on refactoring some code into a
library, similar to os-xenapi, os-powervm and oslo.vmware. They have CI
running but are not yet reporting against nova changes.
Endpoint discovery
------------------
This is carry-over work from Ocata and Pike to standardize how Nova does
endpoint discovery with other services, like
keystone/placement/cinder/glance/neutron/ironic/barbican. The spec is
here [8]. The dependent keystoneauth1 changes were released in Pike so
we should be able to make quick progress on this early in Queens to
flush out bugs.
Documentation
-------------
We talked about the review process for docs changes and agreed it's not
easy to define when we can have a single +2 to approve a docs change
(typo fixes and such are fine). We also noted that it's OK for people to
ping other cores in IRC when they think a docs patch is ready to go, so
that we can help move docs patches along faster.
We also talked a bit about the proposed docs tree structure laid out
here [9] and everyone seemed OK with that. Note that we never had a
cross-project discussion about how to organize the various project team
main pages. If the broader docs team did, maybe someone can please recap
that for us.
Deprecations
------------
We talked about several things worth deprecating in Queens.
* The os-cells REST API: we aren't going to do a formal microversion
deprecation for this. It's just going to go away when the cells v1 code
goes away, hopefully in Rocky.
* Running nova-api under eventlet: we've been running nova-api under
uwsgi in CI since Pike, so we agreed to deprecate running nova-api under
eventlet in Queens and remove that support in Rocky. Doing this
deprecation needs an owner...
* Deprecating (personality) file injection: we talked about this at the
Pike PTG and agreed we should still do this. People can use userdata.
I'm going to write a spec for what this looks like in the API. We also
need to consider, as part of this, if we should allow the user to
specify new userdata during rebuild. Note that the backend code (with
libguestfs) will continue to work with older microversions using file
injection, but this is how we signal it's going away or you shouldn't
use it.
* Ephemeral and swap disks: these aren't going away, but Matthew Booth
would like to remove image caching for them, since it causes lot of
problems. This seems OK to do.
Strict isolation of hosts for image and flavor
----------------------------------------------
This was a re-hash of a spec [10] and discussion we had at the Pike PTG.
Tushar Patil's team is going to take over the spec and update it. And
just like we said in Atlanta, stakeholders that are doing some variant
of this scheduler filter already (Intel, WindRiver) should be reviewing
the spec to make sure it will cover their existing use cases and out of
tree scheduler filters.
Updating server instance keypairs
---------------------------------
Several API consumers, including OpenStack Infra, have asked for the
ability to update the keypair associated with an instance. We initially
talked about just being able to specify a new keypair during rebuild,
but then it was pointed out that depending on how cloud-init is
configured, all one might need to do is update the keypair and reboot
the instance. Kevin Zheng is going to take over the spec [11] and update
it for the rebuild/reboot cases which will probably determine how we
want the API to behave.
More instance action (event) records
------------------------------------
We track instance action start/end events for a lot of server action
APIs but not all, like attach/detach interfaces/volumes. Kevin Zheng is
going to work on filling in some of these gaps in Queens.
Performance issue of listing instances and filtering on IP
----------------------------------------------------------
A long-standing known performance issue [12] is that when listing
instances and filtering on IP, we don't do the IP filtering in the DB
SQL query, we do it in code. We talked about a few ways to solve this,
such as adding a new mapping table for the filter/join, but this might
get out of sync with the instance_info_caches table. Another idea was
doing an up-front filter of instances using the Neutron ports API such
that we could figure out which instances (via port.device_id) have
certain fixed IPs. The issue here is that the Neutron ports API does not
perform a regex filter on the IPs, which the compute API does. It also
means more proxying to other services. We kicked around the idea of
implementing this client-side in openstackclient and then having that be
a template for other client/SDK code to model (something I'm not
personally in love with). The TODO on this is to see what it would take
to get the Neutron ports API to support regex matching on IP filters.
Add a description field to flavors
----------------------------------
There was general agreement on doing this [13], and allowing to update
the description for a flavor. We said we wouldn't persist the
flavor.description with the instance though, so showing an embedded
flavor with server details wouldn't include the description
(microversion >= 2.47).
Deprecating the keymap option
-----------------------------
The proposal to deprecate the keymap option [14] will wait until there
is an alternative, which will not be available until a new release of
the noVNC package happens (version 1.0.0?). As for specifying the keymap
when creating a server, we said to not do this with flavor extra specs,
but instead pass userdata through to cloud-init.
Availability zones with ':' in the name
---------------------------------------
This was a discussion about a latent bug [15] and the forced hosts
capability for admins to specify AZ:HOST:NODE in the API. If the
availability zone name has a colon in it, it breaks the parsing. We
agreed that we will update the schema for AZs to not allow colons in the
name. We realize this means requests which used to work to create an AZ
with ':' in the name will now fail, and we are not going to do this with
a microversion, because even though you can create AZs with colons in
the name, you can't use them - it will just result in an obscure failure
later. We also agreed to backport the fix for this, and make sure it's
clearly documented in the API reference.
[1] https://etherpad.openstack.org/p/nova-ptg-queens
[2] https://blueprints.launchpad.net/nova/+spec/hurrah-for-privsep
[3] https://review.openstack.org/#/c/496160/
[4]
https://review.openstack.org/#/q/topic:bp/nova-validate-certificates+status:open
[5] https://etherpad.openstack.org/p/BOS-forum-advanced-instance-scheduling
[6]
http://blazar.readthedocs.io/en/latest/userdoc/using.instance-reservation.html
[7] https://review.openstack.org/#/c/503061/
[8] https://review.openstack.org/500190
[9] https://etherpad.openstack.org/p/ideal-nova-docs-landing-page
[10] https://review.openstack.org/#/c/381912/
[11] https://review.openstack.org/#/c/375221/
[12] https://bugs.launchpad.net/nova/+bug/1711303
[13] https://review.openstack.org/#/c/501017/
[14] https://review.openstack.org/#/c/483994/
[15] https://review.openstack.org/#/c/490722/
--
Thanks,
Matt
More information about the OpenStack-operators
mailing list