<div dir="ltr">Thanks for the recap email, Mel. Just a question inline for all the people that were in the room by Wednesday.<br><br><div class="gmail_quote"><div dir="ltr">Le jeu. 27 sept. 2018 à 00:10, melanie witt <<a href="mailto:melwittt@gmail.com">melwittt@gmail.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello everybody,<br>
<br>
I've written up a high level summary of the discussions we had at the <br>
PTG -- please feel free to reply to this thread to fill in anything I've <br>
missed.<br>
<br>
We used our PTG etherpad:<br>
<br>
<a href="https://etherpad.openstack.org/p/nova-ptg-stein" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/nova-ptg-stein</a><br>
<br>
as an agenda and each topic we discussed was filled in with agreements, <br>
todos, and action items during the discussion. Please check out the <br>
etherpad to find notes relevant to your topics of interest, and reach <br>
out to us on IRC in #openstack-nova, on this mailing list with the <br>
[nova] tag, or by email to me if you have any questions.<br>
<br>
Now, onto the high level summary:<br>
<br>
Rocky retrospective<br>
===================<br>
We began Wednesday morning with a retro on the Rocky cycle and captured <br>
notes on this etherpad:<br>
<br>
<a href="https://etherpad.openstack.org/p/nova-rocky-retrospective" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/nova-rocky-retrospective</a><br>
<br>
The runways review process was seen as overall positive and helped get <br>
some blueprint implementations merged that had languished in previous <br>
cycles. We agreed to continue with the runways process as-is in Stein <br>
and use it for approved blueprints. We did note that we could do better <br>
at queuing important approved work into runways, such as <br>
placement-related efforts that were not added to runways last cycle.<br>
<br>
We discussed whether or not to move the spec freeze deadline back to <br>
milestone 1 (we used milestone 2 in Rocky). I have an action item to dig <br>
into whether or not the late breaking regressions we found at RC time:<br>
<br>
<a href="https://etherpad.openstack.org/p/nova-rocky-release-candidate-todo" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/nova-rocky-release-candidate-todo</a><br>
<br>
were related to the later spec freeze at milestone 2. The question we <br>
want to answer is: did a later spec freeze lead to implementations <br>
landing later and resulting in the late detection of regressions at <br>
release candidate time?<br>
<br>
Finally, we discussed a lot of things around project management, <br>
end-to-end themes for a cycle, and people generally not feeling they had <br>
clarity throughout the cycle about which efforts and blueprints were <br>
most important, aside from runways. We got a lot of work done in Rocky, <br>
but not as much of it materialized into user-facing features and <br>
improvements as it did in Queens. Last cycle, we had thought runways <br>
would capture what is a priority at any given time, but looking back, we <br>
determined it would be helpful if we still had over-arching <br>
goals/efforts/features written down for people to refer to throughout <br>
the cycle. We dove deeper into that discussion on Friday during the hour <br>
before lunch, where we came up with user-facing themes we aim to <br>
accomplish in the Stein cycle:<br>
<br>
<a href="https://etherpad.openstack.org/p/nova-ptg-stein-priorities" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/nova-ptg-stein-priorities</a><br>
<br>
Note that these are _not_ meant to preempt anything in runways, these <br>
are just 1) for my use as a project manager and 2) for everyone's use to <br>
keep a bigger picture of our goals for the cycle in their heads, to aid <br>
in their work and review outside of runways.<br>
<br>
Themes<br>
======<br>
With that, I'll briefly mention the themes we came up with for the cycle:<br>
<br>
* Compute nodes capable to upgrade and exist with nested resource <br>
providers for multiple GPU types<br>
<br>
* Multi-cell operational enhancements: resilience to "down" or <br>
poor-performing cells and cross-cell instance migration<br>
<br>
* Volume-backed user experience and API hardening: ability to specify <br>
volume type during boot-from-volume, detach/attach of root volume, and <br>
volume-backed rebuild<br>
<br>
These are the user-visible features and functionality we aim to deliver <br>
and we'll keep tabs on these efforts throughout the cycle to keep them <br>
making progress.<br>
<br>
Placement<br>
=========<br>
As usual, we had a lot of discussions on placement-related topics, so <br>
I'll try to highlight the main things that stand out to me. Please see <br>
the "Placement" section of our PTG etherpad for all the details and <br>
additional topics we discussed.<br>
<br>
We discussed the regression in behavior that happened when we removed <br>
the Aggregate[Core|Ram|Disk]Filters from the scheduler filters -- these <br>
filters allowed operators to set overcommit allocation ratios per <br>
aggregate instead of per host. We agreed on the importance of restoring <br>
this functionality and hashed out a concrete plan, with two specs needed <br>
to move forward:<br>
<br>
<a href="https://review.openstack.org/552105" rel="noreferrer" target="_blank">https://review.openstack.org/552105</a><br>
<a href="https://review.openstack.org/544683" rel="noreferrer" target="_blank">https://review.openstack.org/544683</a><br>
<br>
The other standout discussions were around the placement extraction and <br>
closing the gaps in nested resource providers. For the placement <br>
extraction, we are focusing on full support of an upgrade from <br>
integrated placement => extracted placement, including assisting with <br>
making sure deployment tools like OpenStack-Ansible and TripleO are able <br>
to support the upgrade. For closing the gaps in nested resource <br>
providers, there are many parts to it that are documented on the <br>
aforementioned PTG etherpads. By closing the gaps with nested resource <br>
providers, we'll open the door for being able to support minimum <br>
bandwidth scheduling as well.<br>
<br></blockquote><div><br></div><div>So, during this day, we also discussed about NUMA affinity and we said that we could possibly use nested resource providers for NUMA cells in Stein, but given we don't have yet a specific Placement API query, NUMA affinity should still be using the NUMATopologyFilter.</div><div>That said, when looking about how to use this filter for vGPUs, it looks to me that I'd need to provide a new version for the NUMACell object and modify the virt.hardware module. Are we also accepting this (given it's a temporary question), or should we need to wait for the Placement API support ?</div><div><br></div><div>Folks, what are you thoughts ?</div><div><br></div><div>-Sylvain</div><div><br></div><div><br></div><div> </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Cells<br>
=====<br>
On cells, the main discussions were around resiliency "down" and <br>
poor-performing cells and cross-cell migration. Please see the "Cells" <br>
section of our PTG etherpad for all the details and additional topics we <br>
discussed.<br>
<br>
Some multi-cell resiliency work was completed in Rocky and is continuing <br>
in-progress for Stein, so there are no surprises there. Based on <br>
discussion at the PTG, there's enough info to start work on the <br>
cross-cell migration functionality.<br>
<br>
"Cross-project Day"<br>
===================<br>
We had all of our cross-project discussions with the Cinder, Cyborg, <br>
Neutron, and Ironic teams on Thursday. Please see the "Thursday" section <br>
of our etherpad for details of all topics discussed.<br>
<br>
With the Cinder team, we went over plans for volume-backed rebuild, <br>
improving the boot-from-volume experience by accepting volume type, and <br>
detach/attach of root volumes. We agreed to move forward with these <br>
features. This was also the start of a discussion around transfer of <br>
ownership of resources (volume/instance/port/etc) from one project/user <br>
to another. The current idea is to develop a tool that will do the <br>
database surgery correctly, instead of trying to implement ownership <br>
transfer APIs in each service and orchestrating them. More details on <br>
that are to come.<br>
<br>
With the Cyborg team, we focused on solidifying what Nova changes would <br>
be needed to integrate with Cyborg, and the Cyborg team is going to <br>
propose a Nova spec for those changes:<br>
<br>
<a href="https://etherpad.openstack.org/p/stein-ptg.cyborg-nova-new" rel="noreferrer" target="_blank">https://etherpad.openstack.org/p/stein-ptg.cyborg-nova-new</a><br>
<br>
With the Neutron team, we had a demo of minimum bandwidth scheduling to <br>
kick things off. A link to a writeup about the demo is available here if <br>
you missed it:<br>
<br>
<a href="http://lists.openstack.org/pipermail/openstack-dev/2018-September/134957.html" rel="noreferrer" target="_blank">http://lists.openstack.org/pipermail/openstack-dev/2018-September/134957.html</a><br>
<br>
Afterward, we discussed heterogeneous (linuxbridge, ovs, etc) Neutron <br>
ML2 backends and the current inability to migrate an instance between <br>
them -- we thought we had gained the ability by way of leveraging the <br>
newest Neutron port binding API but it turns out there are still some <br>
gaps. We discussed minimum bandwidth scheduling and ownership transfer <br>
of a port. We quickly realized transferring a port from a non-shared <br>
network would be really complicated, so we suspect the more realistic <br>
use case for someone wanting to transfer an instance and its ports to <br>
another project/user would involve an instance on a shared network, in <br>
which case the transfer is just database surgery.<br>
<br>
With the Ironic team, we discussed the problem of Nova/Ironic powersync <br>
wherein an instance that had been powered off via the Nova API is turned <br>
on via IPMI by a maintenance engineer to perform maintenance, is turned <br>
back off by Nova, disrupting maintenance. We agreed that Ironic will <br>
leverage Nova's external events API to notify Nova when a node has been <br>
powered on and should be considered ON so that Nova will not try to shut <br>
it down. We also discussed the need for failure domains for <br>
nova-computes controlling subsets of Ironic nodes and agreed to <br>
implement it as a config option in the [ironic] section to specify an <br>
Ironic partition key and a list services with which a node should peer. <br>
We also discussed whether to deprecate the ComputeCapabilities filter <br>
and we agreed to deprecate it. But, judging from the ML thread about it:<br>
<br>
<a href="http://lists.openstack.org/pipermail/openstack-dev/2018-September/135059.html" rel="noreferrer" target="_blank">http://lists.openstack.org/pipermail/openstack-dev/2018-September/135059.html</a><br>
<br>
I'm not sure it's appropriate to deprecate yet.<br>
<br>
Tech Debt and Miscellaneous Topic Day<br>
=====================================<br>
Friday was our day for discussing topics from the "Tech Debt/Project <br>
Management" and "Miscellaneous" sections of our PTG etherpad. Please see <br>
the etherpad for all the notes taken on those discussions.<br>
<br>
The major topics that stand out to me were the proposal to move to <br>
Keystone unified limits and filling in gaps in openstackclient (OSC) for <br>
support of newer compute API microversions and achieving parity with <br>
novaclient. Example: migrations and boot-from-volume work differently <br>
between openstackclient and novaclient. The support of OSC is coming up <br>
on the ML now as a prospective community-wide goal for the T series:<br>
<br>
<a href="http://lists.openstack.org/pipermail/openstack-dev/2018-September/135107.html" rel="noreferrer" target="_blank">http://lists.openstack.org/pipermail/openstack-dev/2018-September/135107.html</a><br>
<br>
On unified limits, we agreed we should migrate to unified limits, noting <br>
that I think we must wait for a few more oslo.limit changes to land <br>
first. We agreed to drop per user limits on resources when we move to <br>
unified limits. This means that we will no longer allow setting a limit <br>
on a resource for a particular user -- only for a particular project. <br>
Note that with unified limits, we will gain the ability to have strict <br>
two-level hierarchy, which should address the reasons why admins <br>
leverage per user limits, at present. We will signal the upcoming change <br>
with a 'nova-status upgrade check'. And we're freezing all other <br>
quota-related features until we integrate with unified limits.<br>
<br>
I think that's about it for the "summary" which has gotten pretty long <br>
here. Find us on IRC in #openstack-nova or email us on this mailing list <br>
with the [nova] tag if you have any questions about any discussions from <br>
the PTG.<br>
<br>
Cheers,<br>
-melanie<br>
<br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</blockquote></div></div>