[Openstack-operators] [nova] Queens PTG recap - everything else

Matt Riedemann mriedemos at gmail.com
Mon Sep 18 21:58:32 UTC 2017


There was a whole lot of other stuff discussed at the PTG. The details 
are in [1]. I won't go into everything here, so I'm just highlighting 
some of the more concrete items that had owners or TODOs.

Ironic
------

The Ironic team came over on Wednesday afternoon. We talked a bit, had 
some laughs, it was a good time. Since I don't speak fluent baremetal, 
Dmitry Tantsur is going to recap those discussions in the mailing list. 
Thanks again, Dmitry.

Privsep
-------

Michael Still has been going hog wild converting the nova libvirt driver 
code to use privsep instead of rootwrap. He has a series of changes 
tracked under this blueprint [2]. Most of the discussion was a refresh 
on privsep and a recap of what's already been merged and some discussion 
on outstanding patches. The goal for Queens is to get the entire libvirt 
driver converted and also try to get all of nova-compute converted, but 
we want to limit that to getting things merged early in the release to 
flush out bugs since a lot of these are weird, possibly untested code 
paths. There was also discussion of a kind of privsep heartbeat daemon 
to tell if it's running (even though it's not a separate service) but 
this is complicated and is not something we'll pursue for Queens.

Websockify security proxy framework
-----------------------------------

This is a long-standing security hardening feature [3] which has changed 
hands a few times and hasn't gotten much review. Sean Dague and Melanie 
Witt agreed to focus on reviewing this for Queens.

Certificate validation
----------------------

This is another item that's been discussed since at least the Ocata 
summit but hasn't made much progress. Sean Dague agreed to help review 
this, and Eric Fried said he knew someone that could help review the 
security aspects of this change. Sean also suggested scheduling a 
hangout so the John Hopkins University team working on this can give a 
primer on the feature and what to look out for during review. We also 
suggested getting a scenario test written for this in the barbican 
tempest plugin, which runs as an experimental queue job for nova.

Notifications
-------------

Given the state of the Searchlight project and how we don't plan on 
using Searchlight as a global proxy for the compute REST API, we are not 
going to work on parity with versioned notifications there. There are 
some cleanups we still need to do in Nova for versioned notifications 
from a performance perspective. We also agreed that we aren't going to 
consider deprecating legacy unversioned notifications until we have 
parity with the versioned notifications, especially given legacy 
unversioned notification consumers have not yet moved to using the 
versioned notifications.

vGPU support
------------

This depends on nested resource providers (like lots of other things). 
It was not clear from the discussion if this is static or dynamic 
support, e.g. can we hot plug vGPUs using Cyborg? I assume we will not 
support hot plugging at first. We also need improved functional testing 
of this space before we can make big changes.

Preemptible (spot) instances
-----------------------------

This was continuing the discussion from the Boston forum session [5]. 
The major issue in Nova is that we don't want Nova to be in charge of 
orchestrating preempting instances when a request comes in for a "paid" 
instance. We agreed to start small where you can't burst over quota. 
Blazar also delivered some reservation features in Pike [6] which sound 
like they can be built on here, which also sound like expiration 
policies. Someone will have to prototype an external (to nova) "reaper" 
which will cull the preemptible instances based on some configurable 
policy. Honestly the notes here are confusing so we're going to need 
someone to drive this forward. That might mean picking up John Garbutt's 
draft spec for this (link not available right now).

Driver updates
--------------

Various teams from IBM gave updates on plans for their drivers in Queens.

PowerVM (in tree): the team is proposing a few more capabilities to the 
driver in Queens. Details are in the spec [7].

zDPM (out of tree): this out of tree driver has had two releases (ocata 
and pike) and is working on 3rd party CI. One issue they have with 
Tempest is they can only boot from volume.

zVM (out of tree): the team is working on refactoring some code into a 
library, similar to os-xenapi, os-powervm and oslo.vmware. They have CI 
running but are not yet reporting against nova changes.

Endpoint discovery
------------------

This is carry-over work from Ocata and Pike to standardize how Nova does 
endpoint discovery with other services, like 
keystone/placement/cinder/glance/neutron/ironic/barbican. The spec is 
here [8]. The dependent keystoneauth1 changes were released in Pike so 
we should be able to make quick progress on this early in Queens to 
flush out bugs.

Documentation
-------------

We talked about the review process for docs changes and agreed it's not 
easy to define when we can have a single +2 to approve a docs change 
(typo fixes and such are fine). We also noted that it's OK for people to 
ping other cores in IRC when they think a docs patch is ready to go, so 
that we can help move docs patches along faster.

We also talked a bit about the proposed docs tree structure laid out 
here [9] and everyone seemed OK with that. Note that we never had a 
cross-project discussion about how to organize the various project team 
main pages. If the broader docs team did, maybe someone can please recap 
that for us.

Deprecations
------------

We talked about several things worth deprecating in Queens.

* The os-cells REST API: we aren't going to do a formal microversion 
deprecation for this. It's just going to go away when the cells v1 code 
goes away, hopefully in Rocky.

* Running nova-api under eventlet: we've been running nova-api under 
uwsgi in CI since Pike, so we agreed to deprecate running nova-api under 
eventlet in Queens and remove that support in Rocky. Doing this 
deprecation needs an owner...

* Deprecating (personality) file injection: we talked about this at the 
Pike PTG and agreed we should still do this. People can use userdata. 
I'm going to write a spec for what this looks like in the API. We also 
need to consider, as part of this, if we should allow the user to 
specify new userdata during rebuild. Note that the backend code (with 
libguestfs) will continue to work with older microversions using file 
injection, but this is how we signal it's going away or you shouldn't 
use it.

* Ephemeral and swap disks: these aren't going away, but Matthew Booth 
would like to remove image caching for them, since it causes lot of 
problems. This seems OK to do.

Strict isolation of hosts for image and flavor
----------------------------------------------

This was a re-hash of a spec [10] and discussion we had at the Pike PTG. 
Tushar Patil's team is going to take over the spec and update it. And 
just like we said in Atlanta, stakeholders that are doing some variant 
of this scheduler filter already (Intel, WindRiver) should be reviewing 
the spec to make sure it will cover their existing use cases and out of 
tree scheduler filters.

Updating server instance keypairs
---------------------------------

Several API consumers, including OpenStack Infra, have asked for the 
ability to update the keypair associated with an instance. We initially 
talked about just being able to specify a new keypair during rebuild, 
but then it was pointed out that depending on how cloud-init is 
configured, all one might need to do is update the keypair and reboot 
the instance. Kevin Zheng is going to take over the spec [11] and update 
it for the rebuild/reboot cases which will probably determine how we 
want the API to behave.

More instance action (event) records
------------------------------------

We track instance action start/end events for a lot of server action 
APIs but not all, like attach/detach interfaces/volumes. Kevin Zheng is 
going to work on filling in some of these gaps in Queens.

Performance issue of listing instances and filtering on IP
----------------------------------------------------------

A long-standing known performance issue [12] is that when listing 
instances and filtering on IP, we don't do the IP filtering in the DB 
SQL query, we do it in code. We talked about a few ways to solve this, 
such as adding a new mapping table for the filter/join, but this might 
get out of sync with the instance_info_caches table. Another idea was 
doing an up-front filter of instances using the Neutron ports API such 
that we could figure out which instances (via port.device_id) have 
certain fixed IPs. The issue here is that the Neutron ports API does not 
perform a regex filter on the IPs, which the compute API does. It also 
means more proxying to other services. We kicked around the idea of 
implementing this client-side in openstackclient and then having that be 
a template for other client/SDK code to model (something I'm not 
personally in love with). The TODO on this is to see what it would take 
to get the Neutron ports API to support regex matching on IP filters.

Add a description field to flavors
----------------------------------

There was general agreement on doing this [13], and allowing to update 
the description for a flavor. We said we wouldn't persist the 
flavor.description with the instance though, so showing an embedded 
flavor with server details wouldn't include the description 
(microversion >= 2.47).

Deprecating the keymap option
-----------------------------

The proposal to deprecate the keymap option [14] will wait until there 
is an alternative, which will not be available until a new release of 
the noVNC package happens (version 1.0.0?). As for specifying the keymap 
when creating a server, we said to not do this with flavor extra specs, 
but instead pass userdata through to cloud-init.

Availability zones with ':' in the name
---------------------------------------

This was a discussion about a latent bug [15] and the forced hosts 
capability for admins to specify AZ:HOST:NODE in the API. If the 
availability zone name has a colon in it, it breaks the parsing. We 
agreed that we will update the schema for AZs to not allow colons in the 
name. We realize this means requests which used to work to create an AZ 
with ':' in the name will now fail, and we are not going to do this with 
a microversion, because even though you can create AZs with colons in 
the name, you can't use them - it will just result in an obscure failure 
later. We also agreed to backport the fix for this, and make sure it's 
clearly documented in the API reference.

[1] https://etherpad.openstack.org/p/nova-ptg-queens
[2] https://blueprints.launchpad.net/nova/+spec/hurrah-for-privsep
[3] https://review.openstack.org/#/c/496160/
[4] 
https://review.openstack.org/#/q/topic:bp/nova-validate-certificates+status:open
[5] https://etherpad.openstack.org/p/BOS-forum-advanced-instance-scheduling
[6] 
http://blazar.readthedocs.io/en/latest/userdoc/using.instance-reservation.html
[7] https://review.openstack.org/#/c/503061/
[8] https://review.openstack.org/500190
[9] https://etherpad.openstack.org/p/ideal-nova-docs-landing-page
[10] https://review.openstack.org/#/c/381912/
[11] https://review.openstack.org/#/c/375221/
[12] https://bugs.launchpad.net/nova/+bug/1711303
[13] https://review.openstack.org/#/c/501017/
[14] https://review.openstack.org/#/c/483994/
[15] https://review.openstack.org/#/c/490722/

-- 

Thanks,

Matt



More information about the OpenStack-operators mailing list