[openstack-dev] [nova] Rocky PTG summary - miscellaneous topics from Friday
melanie witt
melwittt at gmail.com
Tue Mar 20 22:57:25 UTC 2018
Howdy all,
I've put together an etherpad [0] with summaries of the items from the
Friday miscellaneous session from the PTG at the Croke Park Hotel "game
room" across from the bar area. I didn't summarize all of the items, but
attempted to do so for most of them, namely the ones that had
discussion/decisions about them.
Cheers,
-melanie
[0] https://etherpad.openstack.org/p/nova-ptg-rocky-misc-summary
*Friday Miscellaneous: Rocky PTG Summary
https://etherpad.openstack.org/p/nova-ptg-rocky L281
*Key topics
* Team / review policy
* Technical debt and cleanup
* Removing nova-network and legacy cells v1
* Community goal to remove usage of mox3 in unit tests
* Dropping support of running nova-api and the metadata API service
under eventlet
* Cruft surrounding rebuild and evacuate
* Bumping the minimum required version of libvirt
* Nova's 'enabled_perf_events' feature will be broken with Linux
Kernel 4.14+ (the feature has been removed from the kernel)
* Miscellaneous topics from the PTG etherpad
*Agreements and decisions
* On team / review policy, for the Rocky cycle we're going to
experiment with a process for "runways" wherein we'll focus review
bandwidth on selected blueprints in 2 week time-boxes
* Details here: https://etherpad.openstack.org/p/nova-runways-rocky
* On technical debt and cleanup:
* We're going to remove nova-network this cycle and see how it
goes. Then we'll look toward removing legacy cells v1.
* NOTE: If you're planning to work on the community-wide goal of
removing mox3 usage, don't bother refactoring nova-network and legacy
cells v1 unit tests. Those tests will be entirely removed soon-ish.
* We're going to dump a warning on service startup and add a
release note for deprecation, and plan for removal of support for
running nova-api and the metadata API service under eventlet in S
* Patch: https://review.openstack.org/#/c/549510/
* For rebuild, we're going to defer the instance.save() until
conductor has passed scheduling and before it casts to compute in order
to address the issue of rolling back instance values if something fails
during rebuild scheduling
* For future work on rebuild tech debt, there was an idea to
deprecate "evacuate" and add an option to rebuild like "--elsewhere" to
collapse the two into using nearly the same code path. Evacuate is a
rebuild and it would be nice to represent it as such. Someone would need
to write up a spec for this.
* We're going to bump the minimum required libvirt version:
https://wiki.openstack.org/wiki/LibvirtDistroSupportMatrix
* kashyap is going to do this
* We're going to log a warning if enabled_perf_events is set in
nova.conf and mark it as deprecated for removal
* kashyap is going to do this
* Abort Cold Migration
* This would add a new API and a significant amount of complexity
as it prone to race conditions (for example, abort request lands just
after the disk migration has finished, having to restore the original
instance), etc.
* We would like to have greater interest from operators for the
feature before going down that path
* takashin will email openstack-operators at lists.openstack.org to
ask if there is broader interest in the feature
* Abort live migrations in queued status
* We agreed this is reasonable functionality to add, just need to
work out the details on the spec
* Kevin_Zheng will update the spec:
https://review.openstack.org/#/c/536722/
* Adding request_id field to migrations object
* The goal here is to be able to lookup the instance action for a
failed migration to determine why it failed, and the request_id is
needed to lookup the instance action.
* We agreed to add the request_id instance action notification
instead and gibi will do this:
https://blueprints.launchpad.net/nova/+spec/add-request-id-to-instance-action-notifications
* Returning Flavor Extra Specs in GET /flavors/detail and GET
/flavors/{flavor_id}
*
https://blueprints.launchpad.net/nova/+spec/add-extra-specs-to-flavor-list
* Doing this would create parity between the servers API (when
showing the instance.flavor) and the flavors API
* We agreed to add a new microversion and implement it the same way
as we have for instance.flavor using policy as the control on whether to
show the extra specs
* Adding host and Error Code field to instance action event
* We agreed that it would be reasonable to add a new microversion
to add the host (how it's shown to be based on a policy check) to the
instance action event but the error code is a much more complex,
cross-project, community-wide effort so we're not going to pursue that
for now
* Spec for adding host: https://review.openstack.org/#/c/543277/
* Allow specifying tolerance for (soft)(anti-)affinity groups
* This requirement is about adding an attribute to the group to
limit the amount of how hard the affinity is (in the filter). Today it's
a max of 1 per host for hard anti-affinity
* We agreed this feature is reasonable, just need to sort out the
api model vs data model in the spec:
https://review.openstack.org/#/c/546925/
* XenAPI: support non file system based SR types - e.g. LVM, ISCSI
* Currently xenapi is only file system-based, cannot yet support
LVM, ISCSI that are supported by XenServer
* We agreed that a specless blueprint is fine for this:
https://blueprints.launchpad.net/nova/+spec/xenapi-image-handler-option-improvement
* Supporting live-migration for xapi pool based hosts
* Have to implement this first before removing the aggregate
upcall, else it would break live migration for shared storage
* We agreed to do a specless blueprint for this:
https://blueprints.launchpad.net/nova/+spec/live-migration-in-xapi-pool
* The removal of the aggregate upcall will be a patch stacked on
top of the live migration implementation ^
* Preemptible instances
* The scientific SIG has been experimenting with doing this
completely outside of Nova and it was a bad user experience, so they're
interested in what a minimal integration in Nova could look like
* There was an idea suggested to put instances that failed to boot
in a new non-ERROR vm_state. The external reaper service could go make
some room, then issue a rebuild on that instance. Notifications could be
used to determine the ordering of the instances that landed in the new
non-ERROR state. Then the reaper service could use that info to decide
what to do. Maybe use reset-state to put an instance into ERROR if the
reaper is giving up on it. A new notification is not needed, we already
have one. The configurable bit will be whether to use the new non-ERROR
"pending" state. We'll need a way to remove the instance from cell0,
etc. "Evacuate for cell0" when you do the rebuild.
* There was agreement that the above idea seemed reasonable (though
need to check if there is any special handling needed for boot-from-volume)
* Exposing virt driver capabilities out of a REST API
* Simple way to tie driver capabilities to scheduling requests
using Placement and provider traits, for things like tagged devices and
volume multiattach
* We agreed this is OK, so do it. Jay has a patch to register the
traits in os-traits; the nova patch will depend on a release of that
* We also agreed to merge traits, somehow. But we'll finish
update_provider_tree as written first
* libvirt: add support for virtio-net rx/tx queue sizes
* Spec: https://review.openstack.org/#/c/539605
* It looks like we thought we had agreement to go with a global
config option for this but it's still being actively discussed on the
spec. Please see the spec discussion for details
* Adding support for rebuild of volume-backed instances
* Review of the spec is underway:
https://review.openstack.org/#/c/532407
* Strict isolation of group of hosts for image and flavor
* Spec: https://review.openstack.org/#/c/381912
* The only problem remaining is if image doesn't contain any
properties, then it will land on any host aggregate group
* We agreed that this should be done with traits. From the spec
review comments, it sounds like the desired behavior will be possible
once the "placement request filtering" work lands:
https://blueprints.launchpad.net/nova/+spec/placement-req-filter
* The solution will involve applying custom traits to compute
nodes and using the placement request filtering to use data from the
RequestSpec to filter the request only for host aggregates that meet the
requirement. Discussion on the spec is in progress.
* Reliable port ordering in Nova
* Ports do not get designated order, backup/restore of db can
result in port orders changing
* We agreed the existing device tagging feature can be used to get
reliable ordering for devices
* Granular API policy
* This is about making policy more granular for GET, POST, PUT,
DELETE separately for APIs
* Interested people should review the spec:
https://review.openstack.org/#/c/547850/
* Block device mapping creation races during attach volume
* We agreed to create a nova-manage command to do BDM clean up and
then add a unique constraint in S
* mriedem will restore the device name spec and someone else can
pick it up
* Ironic instance switchover
* With booting from volume, a failed bare metal node can be
switched to another node by booting the alternative node from the same
node. There are some options regarding which Compute API can be used to
trigger this switchover
* This is about wanting to be able to migrate baremetal instances
* We agreed this would be adding parity with other virt drivers, so
it's okay in that regard. Details need to be worked out on the spec
review: https://review.openstack.org/#/c/449155
* Add UEFI Secure Boot support for QEMU/KVM guests, using OVMF
* Spec: https://review.openstack.org/#/c/506720/
* We agreed that providing this feature would be reasonable.
Details need to be worked out on the spec review
* Validate policy when creating a server group
* We can create a server group that have no policies (empty
policies) currently. We can create a server with it, but all related
scheduler filters return True, so it is useless
* Spec: https://review.openstack.org/#/c/546484
* We agreed this should be a simple thing to do, spec review is
underway. We also said we should consider lumping in some other trivial
API cleanup into the same microversion - we have a lot of TODOs for
similar stuff like this in the API
* Add force flag in cold migration
* We have agreed not to add a way to force bypass of the scheduler
filters
* Skip instance backup image creation when rotation 0
* Spec: https://review.openstack.org/511825
* The spec ^ is old and needs to be re-proposed for Rocky
* We agreed that the approach should be to add a microversion to
disallow 0 and then also add a new API for "purge all backups" to be
used instead of passing 0
* Live resize for hyper-v
* Spec: https://review.openstack.org/#/c/141219
* This uses same workflow as cold migrate, we'll increase
allocations in placement when a host is found, but there's no
confirm/revert step because it's live
* Proposes a new API and resize up only will be allowed. Virt
driver will have a new method "can_live_resize" (or similar) to check if
it supports it before proceeding past the API layer
* Stage 1 only, no automatic live migration fallback
* PowerVM wants to do this too. Should be possible in libvirt
driver too
* We agreed this idea sounds fine, just a matter of getting
interest and review on the spec. Interested parties should review the spec
More information about the OpenStack-dev
mailing list