*Nova Xena virtual PTG
*DO NOT USE TRANSLATION TOOLS IN THIS ETHERPAD!!!! (ノ ゜Д゜)ノ c┻━┻
*不要在这个ETHERPAD中使用翻译工具!
To translate this etherpad, please follow these easy instructions:
1. Look in the above toolbar and click the '</>' ("Share this pad") button
2. Click the "Read only" checkbox at the top of the dialog box
3. Copy the URL that appears in the "Link" box
4. Open that URL in a new browser tab or window
5. Use your translation tools of choice in the new window
Thank you!
要翻译此etherpad,请按照以下简单说明进行操作:
1.查看上面的工具栏,然后单击“ </>”(“共享此记事本”)按钮
2.单击对话框顶部的“只读”复选框
3.复制出现在“链接”框中的URL
4.在新的浏览器标签或窗口中打开该URL
5.在新窗口中使用您选择的翻译工具
谢谢!
ERROR(sean-k-mooney): presshing ctrl-shift-c appently is the shortcut for clear all colours... sorry
people/colours:
sean-k-mooney
- gibi
- gmann
- stephenfin
- lyarwood
- brinzhang
- kashyap
- johnthetubaguy
- bauzas
- belmoreira
- lassimus
- dmitriis
*Schedule
April 19-23, 2021
https://www.openstack.org/ptg/#tab_schedule
ethercalc for the schedule: https://ethercalc.net/oz7q0gds9zfi
PTG Registration: https://www.eventbrite.com/e/project-teams-gathering-april-2021-tickets-143360351671
*Please vote in the doodle when to have our sessions https://doodle.com/poll/ib2eu3c4346iqii3
- Monday - Tuesday: project support team discussions, e.g. SIGs, QA, Infra, Release mgmt, Oslo
- Wednesday 14:00 UTC - 17:00 UTC: Nova (Placement) sessions
- 14:00 - 14:30 UTC Interop discussion with Arkady
- 15:00 - 16:00 UTC Cinder cross project
- Thursday 14:00 UTC - 17:00 UTC - Nova(Placement) sessions
- 15:00 UTC - 16:00 UTC Keystone-Nova cross project session
- Friday 14:00 UTC - 17:00 UTC - Nova(Placement) sessions
- 15:00 UTC - 15:30 UTC Neutron - Nova cross project mini session
Practicalities:
- 14:00 UTC is pretty early in the US west coast so we will start the day slowly and or talk about simple topics at the start.
- While officially the EU timeslot ends at 17:00 UTC I expect that 17:00 - 18:00 is a better time to continue, if needed, than jumping to 21:00 UTC.
Use the zoom links from the schedule by clicking to a timeslot to join http://ptg.openstack.org/ptg.html
*Join at https://www.openstack.org/ptg/rooms/newton
*PTG Topics
*Interop discussion
Wednesday 14:00 - 14:30 UTC
(gibi) How can we help the Interop SIG WG?
*Nova-Keystone cross project
Thursday 15:00 -16:00 UTC
*System scope usage and workflow (below both topics are related to this)
- (stephenfin) Changes to 'GET /os-hypervisors' policy
- We had planned a policy change as part of the cleanup of the 'os-hypervisors' API, but some questions came up at the last minute and we deferred it
- Project admin's can currently create a server on a specific hypervisor (via hostname). This is necessary because we need project information when creating a server and we don't currently have a way to provide this in the body of the 'POST /servers' request. This leaves us caught in the middle, where only project admins can create a server on a specific hypervisor while only system admins or system readers can list hypervisors.
- There are two potential solutions:
- A) Allow system admins to specify a project ID in the 'POST /servers' request body, thus creating a server in a specific project. This requires a microversion
- B) Allow project admins to list hypervisors. This in theory does not require a microversion
- We went with option B) but it's not perfect. To avoid giving too much information about the deployment, we decided to restrict project admins to listing hosts in tenants assigned to them. This requires a tenant-isolated cloud and meant looking up aggregates to find the host assigned to this project. However, if we do that, shouldn't we also prevent project admins from specifying a host that they are not isolated in? If we don't, we have different behavior in two APIs
- What do we want to do?
- What about having the context library populate the project_id as the sentinel UUID value for system-scoped context objects
- Would there ever be a risk of overwriting a project ID with the sentinel?
- Not if we make sure that system-scoped token can't be used to create project-specific resources
- A lot easier to spot and it's consistent
- Configure 'admin' project ID that get populated in system-scoped tokens
- Creates overhead for deployment tools to make sure this is consistent across services
- Wrap project-specific operations
- $ openstack role add --user alice --project foo member
- $ openstack server delete bar
- $ openstack role remove --user alice --project foo member
- Update keystone to support trading a system-scoped token for a project-scoped token
- $ openstack token issue --os-cloud system-admin --project foo --role bar
- $ openstack server delete --os-cloud system-admin --project foo:member
- Transactional
- Project admin
- boot instance to a specific host
- open up the hypervisors API so you can see the host an instance is on
- AGREED:
- We're need to
- Add a sentinel project ID for system-scoped tokens (work in oslo.context maybe, or just have the services use the sentinel where absolutely necessary)
- Something in keystone or deployment tools to make sure the admin user has the admin role on the default domain (and all other domains)
- Now we can start doing client-side work to glue all of these interactions together that would allow
- $ openstack --os-cloud system-admin server reboot --project foo (this will work because the client requests a token for project foo and the assignment is inherited)
- $ openstack --os-cloud system-admin server list --all-projects (logging will use a sentinel ID)
- Write a spec for the hypervisor list case
- How do we maintain the ability for system users to operate on project-specific resources without refactoring the world?
- Use the client to figure out the owning project of a resource, grant the system user a role on that project, do the operation on the project-owned resource, clean up the resource
- Create an inherited role assignment on all domains for system users, the client fetches a project-scoped token, does the operation on the project-owner resource (leave the domain role assignment)
- Implement project ID header pass-through for system users in keystonemiddleware (spec coming)
- We could allow system users to specify the project ID so that $ openstack --os-cloud system-admin server reboot --project foo bar
- Client makes the request with a system-scoped token using the system-admin cloud profile
- Client finds project foo's ID and sets it to HTTP_X_PROJECT_ID in the request
- Keystonemiddleware validates the token, ensures the user is a system user, and keeps the project ID from the client
- oslo.context creates a context object with context.system_scope = 'all' and context.project_id with the project ID from HTTP_X_PROJECT_ID
- The roles in the context object come from the token, which are the roles the user has on the system (they don't technically exist on the project)
- A system reader would be able to set HTTP_X_PROJECT_ID for any project, but they would maintain reader permissions, they wouldn't be able to make writable changes within the project, even though they are a system user
- A system admin can give themselves authorization on any project using HTTP_X_PROJECT_ID, without an explict role assignment (chatty) or an inherited role assignment (assumed to exist)
- (sean-k-mooney dansmith lbragstad gmann) Usage of System token for Project's resource operations like system to operate project X's servers (The above topic "(stephenfin) Changes to 'GET /os-hypervisors' policy" is also related to this which was solving this issue for "POST /servers on specific host")
*Nova-Cinder cross project
Wednesday 15:00 - 16:00 UTC
https://etherpad.opendev.org/p/nova-cinder-xena-ptg
Let's use the nova zoom room: https://www.openstack.org/ptg/rooms/newton
*Neutron - Nova cross project
Friday 15:00 - 15:30 UTC
(gibi) support guaranteed minimum packet rate QoS policy
- Nova and Neutron specs: https://review.opendev.org/q/topic:bp/qos-minimum-guaranteed-packet-rate
- Similar to the minimum bandwidth support feature I worked on in the last couple of cycles.
- Neutron has the bigger impact:
- New QoS API extension
- New port resource_request structure
- OVS agent changes
- Resource reporting changes
- The Nova side of the solution reuses the existing code infrastructure to support additional resources.
- The assumption that a single port only requests a single placement resource group needs the be relaxed
- Goal of the discussion:
- Agree on that the new port resource_request structure, how we handle same_subtree, and how we indicate that the Neturon API is changed
- Agree on that the changes in the allocation key of the binding:profile is OK
- Discuss upgrade: Do we need to support mixed version of Neutron and Nova controllers? I.e. Does Nova needs to support the old resource_request format in Xena along with the new one or we can assume the new one?
- AGREED:
- Document that garanteeing min pps on the data plane can be done by creating min_pps == max_pps rules in the same policy.
- Qos policy rule should be direction aware, the resource is direction less in case of OVS, and direction aware in case of hw accelerated OVS (or SRIOV later). If vnic_type = normal then neutron will request a direction less resource, if vnic_type = direct (or other than normal?!) neutron will request direction aware resource.
- Add the alternative to the spec to have the pps resource on the OVS bridge instead of on the OVS agent RP.
- same_subtree: neturon should request it as later IP resource will be on a shared RP
- value of the allocation key should be a dict
- Upgrade: Both version mix needs to be supported (older-nova + new-neutron, new-nova+old-neutron)
- enable the new resource_request format via config OR
- duplicate the resource_request field (have a new resource_request_extended for the new format)
- Nova needs to be able to accept both the old and the new format in Xena
- (jlibosva) live-migration with MTU change
- there is a tool to migrate from OVS based Neutron backend to OVN. because nowadays OVN doesn't support vxlan with Neutron (may be changed soon), and Geneve header protocol is bigger than VXLAN, we need to change MTU of the network during the migration. This breaks live-migration because the "old" MTU of running instance is e.g. 1450 (vxlan) and the new one is on Geneve already on (Geneve) 1442
- (artom) semi-related downstream Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1943613
- (sean-k-mooney): yep we cant actully change the mtu on a live migration. the root cause is that they chagne the MTU on the neturon network and did not hard reboot the vm before trying to live migrate which is required.
- workarounds include
- attach/detach interfece
- remove mtu form xml again (not sure if that willl afffect jumbo frames or not)
- detact its as an error if the xml mtu is diffrent form the current port mtu in pre live migrate and raise an error telling the admin to hard reboot
- (artom - I'm pink now!) This could be part of that mythical "recreate" API - we can also fetch the latest info from the ports
- actully cold migrate just works so that also an option the issue is its not valid to change the mtu in the xml as part of a live migrate.
- So "AGREED: for the OVS -> OVN migration cold migrate all the things"? :)
- well we can hot detach and attach the intefcase like we do for sriov too automatically if the mtu needed to change.
- AGREED:
- error out the live migration if the MTU is not matching
- ... cold migrate would work
*Nova PTG topics
(lyarwood): Drop lower-constraints.txt on master?
(gibi): My current thinking is that as soon as it becomes unmaintainable then we will drop it. We saw that it can be hard to maintain (see the os-brick bump in W).(gibi): I'm also thinking about spending some time with the pip resolver. Removing the indirect deps from the lower-constraints can make the pip resolve runtime to increase to unacceptable levels. If there would be a way to see what unconstraint indirect deps contribute the most to this runtime then we might create a lower-constraints.txt that mosty constraints direct deps and couple of constrained indirect deps to manage the pip resolve runtime.the old behavior of the resolve was that if you provide multiple constirats the first one was ment to be used so to avoid un constriated deps we could maybe include LC first then UC after to have constraits.untimatly the new resolvere has fundementally changed what LC was tracking. it was ment to be the set of minium deps that fullfiled out api usage of that dep. e.g. if we use a new api or function that was not in LC version of the lib in nova we were ment to bump it but we were not ment to have to bump it for our indirect deps. e.g. if nova can support oslo.log 2 but our dep nees oslo.log 3 then that was ment to be ok for us to have 2 listed as that is our actull min version. if the indreict dep was opeitonal and not installed we would not need oslo.log 3 for nova to work so it should not be considerd a min dep of nova. the new resolver forces use to threat all requirement fo our deps including optional ones as requiremetns fo nova. where we share depencies we also cannot break that depency by removing it. i like having lower-constratis but if we cannot find a way to list only novs direcet lower bound i thikn we are forced to remove it because of the new resolver behavior.(gmann) As discussed in TC meeting and ML, lower bound testing is all optional and up to project to test it or remove. https://review.opendev.org/c/openstack/nova/+/778177 - requirements.txt: Bump os-brick to 4.2.0https://review.opendev.org/c/openstack/nova/+/783822 - Bumping min os-brick ver to 4.3.1- AGREED: both on master and on stable:
- Keep it until it works. If it breaks then raise it on the nova weekly meeting. If it is not fixed by the next meeting then delete the job, delete the lower_constraints.txt from the tree
- We also should remove lower bounds from the requirements.txt when we drop lower-constraints
(lyarwood): Block layer(s) refactor/cleanup
(lyarwood) https://docs.openstack.org/nova/latest/reference/block-device-structs.html - Sets out the current layout of data structers used within various layers of Nova that define the block layout of an instance.(lyarwood) Short term proposal (during Xena): Continue to document the current data structers in the above reference.Extend block_device_info to include image based disks.Use mypy throughout the device layers to ensure the correct types are used.This isn't currently the case as certain codepaths are called both with BlockDeviceMapping objects and DriverBlockDevice dicts.(stephenfin) +100 I think this would be a huge benefit. Docstrings too (for non-private functions) would be a lovely addition
(lyarwood) Longer term proposal (post Xena): Work towards a future where only BlockDeviceMapping, BlockDeviceMappingList and subclassed driver specific BlockDeviceMappingList classes are used throughout the codebase.Remove the nova.virt.block_device layer entirely and move the generic logic into BlockDeviceMapping where possible.Provide subclassed BlockDeviceMappingList classes to calculate mappings in the virt driversFor exampe, BlockDeviceMappingListLibvirt that in turn replaces logic and data structres like nova.virt.libvirt.block_info disk_info etc.
(gibi): +1 for both the sort and long plans. Sign me up to review doc and mypy patches during Xena- AGREED:
- Agreed on both the short and mid term plan
(gibi): Wallaby retrospective
What went well?(gibi) good amount of blueprints got finished (we closed 14 bps, ~70%) (and more blueprints got approved compared to previous cycle)Also worth noting that the ones that weren't finished were not finished for solid technical reasons, not simply cos people forgot or whatever
(gibi) both the FF and RC deadline was pretty calm from Nova perspective(gibi) we kept the size of the untriaged bugs backlog pretty steady. Thank you!(gibi) Nova is a cleaner place than before!RPC 6.0! yay \o/compaction of the DB migrations! yay \o/improved the possibility to reproduce bugs in the functional test
stable is pretty stable +1
What to change?(gibi) runway process wasn't really followed. I was lazy to enforce the strict runway process. I looked at the open blueprints weekly (around our meeting time) and tried to mention the progressing bp on the meeting. How do you feel about it? Would like to see the runway process followed?(lyarwood) I've not had to push that hard for reviews so it's hard to say either way this cycle. It has helped in the past however.
(gibi) Placement is our responsiblilty again. I ignored this fact most of the time during the cycle. Do we want to make placement more alive?
Any other feedback?(gibi) I miss the face to face PTG+1000+1+9001+42^ is that 9001 some kind compliance joke? Well it's over 9000...
- Action Points
- gibi to ask for e-r +2 rights to get e-r queries merged
- ask also if we can have e-r queries in our own tree as an alternative
- gibi to try out review priority flag in gerrit instead of runway for all the repo under nova governance
- add periodic test for placement to catch os-traits / os-resource-classes updates
- (sean-k-mooney) ill try to do this ^ just a weekly execution of unit and func test.
- gibi to add a topic for the weekly meeting to look at the result of this periodic
(stephenfin) Drop support for legacy CPU architectures
(gibi): ping lassiums and belmoreiraThe libvirt driver and tests contains references to a number of legacy architectures, none of which are likely to have been tested in a long-time. Can we hard fail on these host architectures and drop the tests for same?ARMv7 -> no modern server-class hardware available still using the ISA as AArch64 is the future. Even RPi uses AArch64 now (from RPi 3)i686 -> 32 bit x86, Intel's last non-embedded 32bit CPUs were early Atom chipsMIPS -> not viable for years and now discontinued in favour of RISC-VPPC -> POWER8 and POWER9 CPU families appear to be all based on the PPC64LE ISASPARC -> EOL
We should still warn for non-x86-64 hosts since testing is light hereOnly conflict is if we ever wanted to support non-host guests using these architectures but that seems to be a non-goal of nova?(maybe we shoudl rephsase this as dropping supprot for 32bit hosts) that might be a valid cross project goal.(lassimus) emulation support is invaluable in use cases surrounding modernization of embedded systems/workflows. e.g.:Security assessments: fuzzing/bug hunting on embedded firmware samples.CI based functional testing: many embedded products/teams still rely heavily on manual testing. Even poor-performance functional tests are a game changer.Modernization/iot-ifying: given a legacy system (often without source code) and a goal of modernization, the bests often virtualizing/emulating the existing system.Training/education: hardware can be expensive and limiting. Emulating embedd
(gibi): Stephen are we talking about supporting these architecture on the host or in the guest? I agree to drop support such hosts, but I think both belmoreira and lassimus expressed need for the guest support.(sean-k-mooney) does this cost us much. e.g. is it a signifcat burden in our current code. i know we do have do have to do some conditional in the xml generation to set good default.(kashyap) Even if it doesn't "cost" much, if we see that deploying OpenStack is not realistically viable in the next 5 years on these arches, we should nuke them. It also reduces a lot of clutter in the code
if we could remove some or most of that as a result it might eb worth it. because we dont have many users of the other types we also dont have many costs.nova itslef becaue of one of our depencies now need to run on x86_64 x86 does not work because i think one of the libs we uses needs 64bit. i know this came up before because some peopel had 32bit devstack envioment but that has been broken for a couple of releases now. nova itesef can run on 32bit but some of our deps cant anymore so dropin i686 or 32bit x86 proably makes sense.(belmoreira) My use case is to support aarch64 on top of x86_64.(johnthetubaguy) +1 enabling explicit ability to emulate, as new separately tested feature... but note that would need some extra testing (no need for hardware emulation, etc)(used to work, note we "broke it" at some point)
(gibi) Also it seems that there is testing, with devstack and tempest, on Arm64 host env: http://lists.openstack.org/pipermail/openstack-discuss/2021-April/021600.html- AGREED:
- Keep supporting x86_64, aarch64, ppc64 on the host
- Document what we test and what works
- Fix guest architecture checks to actually check guest architecture (instead of host)
- Fix SEV and SECURE_BOOT trait reporting and scheduler support
- with kvm virt_type only report the host arch specific feature traits
- We need developers adding CI for specific emulated guest archs
- This need only take the form of smoke tests to conserve resources
- lassimus needs some guidance setting up the gate
- stephenfin can help here with reviews/pointers
- Reno to document what is already gone
- (kashyap) Just had a chat with upstream QEMU folks, and some important points to bear in mind on supporting "emulated guests":
- QEMU explicitly does *not* consider emulation to be a secure production scenario — "Users with non-virtualization use cases must not rely on QEMU to provide guest isolation or any security guarantees."
- This means, you don't ever want an OpenStack public cloud to enable emulated guests; but it could be valid for private cloud setup, if the admin trusts their tenants
- (belmoreira) Emulated aarch64 is very slow... It would never be used for production workloads. But in a private cloud it's useful for users to have an easy and cheap way to test/validate things. Most of the times this is required before buying real/expensive hardware.
- Okay; good to know that it's for private cloud-only users
(sean-k-mooney): should we unplug vifs in power off
(gibi): ping ralonsohcurrently in the libvirt diver power_on is implmente by calling hard_rebootpower_off undefines the libvirt domain but does not call unplug_vifshard reboot will both destory the domain and clean up network interfaces.the current power_off behavior results in ports being left configured on ovs while the vm is off and then deletitng and recreating them on power on.nova has done this since before neutron was a project so its expected but should we tear down the backend networking config when we power off.also should we do the same for host mounted cinder volumes? i assume they are unmounted but the attachments are not unbound.i dont belive new calls to neutron or cinder to unbinidng the port binding or volume attachments woudl be correct but we might want to remove the configurtion form the host.context for this is https://bugzilla.redhat.com/show_bug.cgi?id=1932187 tobiko is a tempest alternitive that should not be asserting this behavior in test since is not part of the api contractand changes based on both the neutron ml2 driver and nova virt dirver so its not generic. but it raises the question should we keep the logic we always had or should we unplug in power off.- AGREED:
- (sean-k-mooney): Not a bug/wont fix
(sean-k-mooney): pci device tracking in placmnet?
(gibi): ping ralonsoh(dmitriis): ping dmitriispart of the work was already done 3 years ago but never merged because we did not have nested RP in plamcent yet.https://review.opendev.org/q/topic:%2522bp/enable-sriov-nic-features%2522+the main deltas for the old spec would be instead of using the pro profile to have trait request which is admin only, we should have a neutron extention toallow triat request on the port and then leverage the port resouce request to pass the info to nova if we wanted to support that.https://review.opendev.org/c/openstack/nova-specs/+/504895/7/specs/queens/approved/enable-sriov-nic-features.rstfrom a placment point of view each PF would be an reosuce provider as a direct child of the compute node.its name where it is a nic could be the interface netdev name otherwise i would suggest using the pci address to allow neutron or other services to identify it if needed.each inventory would have all pci passthough whitelist tags added as custom triats.standard nic feature flags would alos be added as standard traits where they exist.open questions:should we extend pci whitelist to allow a resouce class to be specificed for the device.(gibi): Why would the deployer want to do that? What is the goal here?am i was thiniking that in the flavor based pci passthough alias case(not nuetorn sriov ports) it might be nice to have a resouce clase per pci alias.
if we do ^ should we allow the alis to just containt the custom resouce class instead/in addtion to of vendor id/product idshoudl we use provider.yaml for this instead and deprecate both the alias and whitelist?(gibi): in the whitelist you don't have to manually list every device that matches the filter, you can have a pretty generic filter matching a lot of devices, while in case of provider.yaml you need to list each RP one-by-one. So this change has a usability impact(gibi): I vaugely remember that with the whitelist you can defince groups of similar devices. Is this gouping a real thing or I just imagined it?(sean) the whitelist can in several ways. the address filed is optional and wehn specifid support both bash glob syntax 08:* and python regex format.(sean) the other way is to just not use the adress and whitelist using only the vendor an product id. that is how i normally recommend listing VF for example assume all of them are for use by nova and are not used on the host.
(gibi): In case of PCI passthrough (not SRIOV) we need a placement model where a PCI device is a resource to be consumed(gibi): In case of SRIOV we need a model where the whole PF can be consumed (vnic_type=direct-physical) as well as VFs consumed (vnic_type=direct) as resources. yep in this case we would need to set reserved=total for the pf and have one RP per pf with PF and VF resouce class inventories.(gibi) hm, I think you cannot have RP(PF=1,VF=4) model as then a singel boot request asking for two ports one direct_physical and one direct might find this RP acceptable, but in really it is not acceptable.
(gibi): Some of the SRIOV PFs already modelled in placement to provider bandwidth resource. Today the connection between the bandwidth provider and the PCI device consumed by the PCI tracker in nova are enforced by the nova code. If the PCI tracker's logic is moved to placement we need a model that enforces this connection in placement. E.g. if there will be bandwidth provider and a PCI provider representing the same PF in placement then they need to be connected with aggregate or with parent-child relationship.(sean) long term yes but short term i think this is like numa something we should continue to enfore in nova. so what i was thinking of initally is having the PF moddeled twice once as an RP under the compute node withthe name set to the pci address of the pf and seperatly under the sriov agent for bandwith. nova would have to then assert that the bandwith and PF allcoation come form the same deivce.(stephenfin) Yes, to reinforce this we can't drop (all) of the PCI manger code since we need to maintain NUMA and NUMA in placement is 😊🔫
eventually we shoudl figure out how to have a singel RP for the device and have both invetories on it.the other thing is i think it only makes sense to have bandiwth invetores if the PF is not used.
(sean) my list of usecase in piroity order aremodel pci devices used for alias based passthough in placment.model vf in plamcnetmodel pf in plamcnetmake bandwith base scudling and PFs have just one RP or otherwise have plament enforce the relationship.
the reason for the above order is i would like to have placment deal with the quantivie part first since that get rid fo the race condtion we have today for 2 indepented build racinging for the same device on the compute node.and as with numa we can do the qulaitive aspect as we do today intially in nova, eventulaly we can make it pretty and have only one RP.note we could start by only reporting devices that done have physical_network set in the whitelist, i.e. the devies that are not use by neutron and are use with flaovr aliases.so we can do this incremenatlly
Open questions to be addressed in the specHow we do translation from alias to resource classesWhat we do to VFs when we consume the PFrolling upgrade
- AGREED:
- Need a detailed spec, we will iron out the hard pieces
(sean-k-mooney): pci device attach/detach api?
should we add a new api to allow attaching/detaching pci devices by alias and or cyborg device profile.this would need to be a new toplevel api like volumn atache/detach and device would likely need some kind of quotamaybe if we track the pci device in placment we can use unified limits for quota in the future.for now we could restrict this to admins by plolicy untill unified limits is a thing altohg we dont have a limit on indivigual flavor today so ist not really any differnet then what we do today.- AGREED:
- We need to know more about the use case
- Do the full detach / attach through cyborg. Transfer the PCI device handling from Nova to a generic Cyborg driver
(sean-k-mooney): should we support generic mdevs in nova?
i think we coudl support generic mdev passthough similar to vgpu by just allowing a config option to map mdev types to resouce classeswe would then need to extend the libvirt dirver to report the addtion mdev types to placment and be able to assign them. to the vmmost of the code for this is aready there and cyborg is going to be reusing some of it for vGPUs too.(brinzhang) yep, we can consider using cyborg to manage the generic mdev, and we have the plan to do this, of course welcome to join us to do this together.well we want to be able to do this without cyborg too(gibi): I'm wonding why we want to implement this twice. simple answer is i dont have a path to deliver cyborg to customer at present and also for stateless device cyborg does not actully add any value.there is nothing to program or manage its just like a VF so cyborg is a really hard cell.
- AGREED:
- sylvain write a spec with use case
(sean-k-mooney): VDPA part 2
(gibi): ping ralonsohif VDPA is not merged in Wallaby then this starts by finshing that work -> merged \o/If it is then this might involve adding support for think like jumbo frames or move operation like live migration depending on kernel/qemu supportthis is basicaly a placeholder but there may be other uses for vdpa beyond nic liek vdpa for virtio-blk, virtio-fs ...- AGREED:
- do the life cycle operations except suspend and live migrate
(sean-k-mooney): fixing all the bugs related to move operations with pci devices esspcially neutorn vnic_type=direct-physical
while move operation with passthough device appears to work in some cases there are many broken endgecase.i would like to spend some time paying down the tech debt and finsihing half completed featurescurrently all move operation with vnic_type=direct-physical are broken in some way.cold migration/resize does not update the neutron port mac to match teh PF mac on the destination.live migration also has the same problem and until very recently device detach did not work( thanks gibi for fixing:) )unshleve will somethime? always? use the wrong pci address since the neutron port profile is out of date.evac probaly has the same issue as cold migrationin all cases the pci track has the correct pci claim as far as i am aware but the neutron port is not updated properly.
i belive that the same unshleve issue can affect all other neutron sriov port types too. e.g. standard vfs, hardware offloaded ovs ectra.one way to fix part of the issue is to break the reliance on the neutron port profile by storing a mapiping between the pci request id and the port in the nova VIF object which is persited in the network info cache.a better way to do this might be to update the extra_info colume in the pci tracker with the neutorn port uuid or add an optional "consumer uuid" which would be used to store the neutron port uuid to the pci_deivce tables. currently we only have the instance uuid.
open questions:should we block all move operations with vnic_type=driect-physical until they are fixed? the spec that added support for it never covered move operaiton and they were not explictly enabled or tested so this just "worked" because it was never blocked. given it broken most of the time its posible that blocking it ourght like we did with numa live migration would be the best thing to do with a workaround config option for those that really need it and are prepared to fix things when it does not work properly.(gibi): I did some test with direct-physical migration when I implemented the qos stuff and the level I asserted the behavior the migration worked. So I agree on your statement from the first bullet point that it "appears to work in some cases". If only edge cases are broken then I would not block the operations. If every cases is broken, or the currently working cases cause inconsistencies later then I agree to block the operations. well it alwasy has the incorrect mac in the neutron port after the migration. that can cause other failures if a vm tries to boot on the souce host using the pf it will fail when we try to update its neutron port to the same mac if both vms were form the same tenant which is likely since PF really only work with flat networks.(gibi): OK that might be simply missing asserts in existing test. I'm not sure I asserted MAC assignment.
- AGREED:
- Fix is not backportable due to online datamigration.
- Create a patch that block that doesn't work (add a workaround flag that still allows the operation if set)
- Backport the patch ^^
- Implement the fix on master
(gmann): Consistent return code for "Feature not implemented'
There is in consistency on what return code we should return for "Feature not implemented'. Currently we return 400, 409, and 403.400 case: Example: Multiattach Swap Volume Not Supported:403 case: Cyborg integration.409 case: Example: Operation Not Supported For SEV , Operation Not Supported For VTPM
Discussion started in https://review.opendev.org/c/openstack/nova/+/780333/5//COMMIT_MSG#11API SIG guidlines suggest to use 400 : https://specs.openstack.org/openstack/api-wg/guidelines/http/response-codes.html#use-of-501-not-implementedIf we modify the existing return code of "Feature not implemented' to 400 then we do not need microversion bump as 400 is existing return code for any API. (sean-k-mooney)i was ok with 403 as a client is not allowed to automaticaly retry the same request unmodified if they recive a 403. the same would be true of a 400, a 409 allows the request to be retired.for that an other reasons since the user cant "fix" the request to make it succeed i don't like useing 409 for this but i would be ok with a 400 in all existing casesHTTP 501? https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501- AGREED:
- use 400, no need microversion to update the APIs to it
- backport it
(stephenfin) Switch to alembic
<---- Continue from here Friday 14:00 UTC ----------->
(stephenfin) Drop eventlet
I have no idea how much work this is, but I think dropping support for the eventlet-deployed API services (vs. WSGI) would be a good start(dansmith) I don't really have a problem with this, but what pain are we specifically seeing from having it as an option? I'm trying to think of what beyond cmd/api.py really has much eventletness in it.(melwitt) we have eventlet for scatter_gather_cells in nova-api. I actually proposed a patch [1] in the past that would allow us to swap eventlet out for native threading that I can restore *but* keeping in mind there is no way to "cancel" a native thread that has not finished within a particular timeout. That's the only tangible advantage to using eventlet in nova-api, green threads can be canceled/killed if they don't reply within the time we specify. If we move to native threading, there will be no more "canceling" of a thread that's not returned a result by $timeout and we'd just move on and leave the thread doing whatever. I'm OK with that as I spent a lot of time researching and did not find a way around it.
[1] https://review.opendev.org/c/openstack/nova/+/650172(dansmith) Yeah, canceling real theads is hard. Canceling stuck eventlet threads is actually pretty much the same, AFAIK, in that we just don't collect the results but we're still burning a slot in the thread pool until whatever is blocking it times out. If it's purely a network connection (as would be the case in the scatter/gather) the impact should be pretty low. I'll have to go back and look at the s/g code, but I guess I'm not sure we're *really* using eventlet threads there if we're running in uWSGI, but I'll have to reload my context.(melwitt) Hm, yeah, I don't know the interactions when we're running in uWSGI ... I was referring to the fact that eventlet threads have a kill() method that we're using alongside our own timeout [2], so AFAIK we are getting client side timeout + cancel functionality with eventlet threads. If that's not the case with uWSGI, I was not aware of that.[2] https://github.com/openstack/nova/blob/f55f5daed88a9c3b11d29834481b6f7e4fef763c/nova/context.py#L437-L450basicaly we should not try to cancel real treads if we need canilation we need to use a differnet concorancy constuct liek a proccess on the heavy side or tasks(often implemeted a coroutiens or green treads but with explcit support for cancelation)clis like nova manage and nova status however proably dont need eventlets. they get monkypatched implcitly https://github.com/openstack/nova/blob/master/nova/cmd/__init__.py but i dont think they use it much. the tend to do just one thing at a time.
(sean)droping the evenetlet version fo the nova-api would be trivial and was planned at one point we just never did it. some deployment tools would have to be altered to take this into account so this would likely need deprecation for one cycle and removal in Y
"Upon further investigation, we found that the application model of uWSGI and mod_wsgi is intended to pause when idle, which will naturally pause rabbitmq heartbeats as well. Changing over to native threads simply circumvents that application model, so we concluded this patch isn't a good way to approach the situation.The uWSGI and mod_wsgi app will properly resume and rabbitmq reconnects via oslo.messaging and heartbeats resume, without any disruption to the behavior of nova-api, provided that uWSGI or mod_wsgi is configured as the default threads=1 per process."I would be OK with going ahead and restoring this and doing it, just mentioning that there was reasoning for abandoning the effort last time ^.
(sean)droping eventlet form nova-compute or the conductor on the other hand would not be simple.(dansmith) agree dropping eventlet from the standalone services would be a big deal, not sure what it would really gain us(sean) there have been times when i have wanted to be a little more explcit about our concourancy model but the way we use eventlet and the implict concurancy it gives use would be hard to unwind.the main thing i think it would give us is less risk with monkeypatching issue when moving to new python versions and comparitablie with some of our deps. that said it comes at both a cost of doing the work for the converation and the cost of developing in a new concurancy model.
(melwitt) Related to dropping eventlet:Recently I've learned of a seemingly unsolvable problem having to do with a failure mode in PyMySQL (and mysqlconnector?) when a process has been eventlet monkey patched:Quoting zzzeek:
"I would theorize that if eventlet monkeypatching has occurred at all within the process, thus replacing Python's normal socket API with that of eventlet, that alone would be what precipitates this being possible.Searching for this issue, here's three issues, two against pymysql and one against sqlalchemy using pymysql, that show this error message. All of them have used gevent monkey patching (which is what eventlet is forked from):https://github.com/PyMySQL/PyMySQL/issues/234https://github.com/sqlalchemy/sqlalchemy/issues/3258https://github.com/PyMySQL/PyMySQL/issues/260 - on this one there are also some "me, too" comments, also using geventI'm not able to find descriptions of this issue that don't have gevent/eventlet involved, basically. I would theorize that gevent/eventlet monkeypatching simply adds this new kind of failure mode to the socket, however we dont know here what the trigger for the event is."
The error that gets raised from PyMySQL is "RuntimeError: reentrant call inside" and it has been raised during retrieval of quota limits from the API database (which does nothing with eventlet spawning), this seems to prove that the monkey patching alone causes problems with PyMySQL. As far as we can tell, the only way to solve this problem is to stop eventlet monkey patching any service that accesses the database.Places where eventlet spawn is used aside from nova-compute (doesn't directly access database, so should be OK IIUC) and would need to be replaced with native thread usage:nova-api: multi-cell scatter/gathernova-conductor: cache_images for glanceall: oslo.service timer for service up heartbeats using the DB driver (there is a traceback where "RuntimeError: reentrant call inside" was raised during a service update/heartbeat)
- AGREED:
- Start with the scatter/gather in the nova-api, transform it to async
- Deprecate the eventlet entry point from the nova-api (remove it in Y)
- -> remove the monkey patching from nova-api in Y
- specless bp is OK
(kashyapc) A few random items
Cleanups and bug fixes in the libvirt driverRemove floppy drive support -- https://blueprints.launchpad.net/nova/+spec/remove-support-for-floppy-disks- AGREED:
- need a way to update update existing instance using floppy disk (similar to machine_type nova-manage command)
- needs a spec to talk about
- upgrade check
- blocking the usage of deprecated libvirt / qemu features
Remove support for tunnelled live migration (we deprecated it in Wallaby) in favour of QEMU-native TLS- AGREED:
- => remove the deprecated config options
"Don't use VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE for virDomainSnapshotCreateXML when filesystems are already frozen by virDomainFSFreeze"
Bring CPU modelling APIs rework in the libvirt driver to completionUEFI (secure) boot: Replace Nova's custom parsing of "firmware descriptor files" with libvirt's firmware auto-selection feature
(brinzhang) Nova supports password encrypted VNC https://review.opendev.org/c/openstack/nova/+/622336
This aims at providing a safer remote console with password authentication to promote the security of opening server's console, but there are some issue when we implement itIn summary, if we want to support reseting password for vnc(a)with get_vnc_console API, it must be hard reboot,(b)with server create API, it's too intrusive to the VM system;,(c)with config-driven API behavior, looks better than before options, but it's also not have a good UX for users, the same as give the warning with exceptions.(gibi): Do I understand correctly from the discussion in review that "vencrypt" auth type we support today is overal better than the proposed password auth? What do we loose if we only support "vencrypt"?
a seperate issue is should we be using barbican for password storage.- AGREED:
- might be a solution to move the session key from the query string to the header and then the existing ssl support hides the key
- Query string is already encrypted in SSL (host isn't for SNI, but query string is), no need to change anything.
- If the spec is reproposed then we will request new investigations about alternative solutions. (Stephen has ideas)
(stephenfin) Should we deprecate AggregateImagePropertiesIsolation, AggregateInstanceExtraSpecsFilter
These match on image metadata and flavor extra specs respectively. However, they're not necessary since Rocky, when we added '[scheduler] enable_isolated_aggregate_filtering'The image property version also has bugs that make it almost useless https://review.opendev.org/c/openstack/nova/+/783396We probably _shouldn't_ remove for a long time, if ever, since doing so would break move operations? It's a good signal to avoid the filter though- AGREED:
- Deprecate in X
- Don't remove the filter.
- Remove the config options.
(stephenfin) Should we deprecate ImagePropertiesFilter
This checks three attributes: hw_architecture, img_hv_type, hw_vm_mode (only hw_architecture really makes much sense nowadays, and even then...)This has effectively been replaced by our reporting of compute capabilities as traits and removals of certain features- AGREED:
- Deprecate it in X.
- Sean ports this to placement prefilter, as part of a generic effort to move filters to prefilters. There will be a generic spec.
- Upgrading from the embeded image property case to the trait
- generic nova-manage CLI to update the embeded flavor
(brinzhang) Re-propose Cyborg suspend/resume support in Xena release
(gibi) Allow re-parenting and un-parenting resource providers via the Placement API
(lyarwood): [libvirt] Deprecating support for the pc machine type and defaulting to the q35 machine type
We landed coverage of q35 in W within the nova-next jobWe also landed tooling to allow operators of existing envs to move from pc to q35 in WLets go ahead and change the in-code defaults over to q35 during the Xena cyclei would like to do this in xena but Y might be more realistic, perhaps we should convert most of our ci jobs to be Q35 based and if that looks good then ya move it in code too(kashyap) I think we can wait a cycle more? I'm not sure we've given enough heads-up to the Operators
We should also land tooling to help operators move existing instances from pc to q35, this includes:Switching any IDE disks over to SATA$ nova-manage libvirt update-disk-bus $disk $bus ?$ nova-manage image-meta update hw_disk_bus sata
we proably would want to look at changing other default like cirros video model to vga or virtio its not required for the q35 change but advised.virtio-blk vs virtio-scsi as a default.(proably wont cahnge this since each has its advanatge but just listing it for completenetss)(kashyap) I don't think we should touch this, as both are valid in different scenarios'virtio-blk'' for performance-sensitive workloads; while 'virtio-scsi' for multiple disks (>20) and other use-cases
???
(belmoreira) Changing the default is fine if the old "pc" continues to be supported for several cyclesCurrently we are introducing q35 and UEFI with new OS versions (centos 8 stream). Meaning that if the user creates a new instance with a recent OS he will get the new "chipset".However, we don't plan to change it for running instances or change the behaviour of existing images. Users expect predictability.(lyarwood): Yup agreed and understood, there is however a growing push by QEMU to eventually drop the pc machine type entirely . While it's available in the MIN_QEMU_VERSION the libvirt driver within Nova will continue to support it.(kashyap) Note, though: *upstream* QEMU itself will keep the 'pc' — there is no real consensus that it will be removed. There were several discussions over the last three years, but no clear conlusions
- AGREED:
- warning operatos using "pc" in Xena (maybe an upgrade check too)
- switch the default machine_type to q35 in Y
- persist all guest details requested implicitly or explicitly in the DB
- this is a requirement if we want to allow users to change existing pet VMs in the future, so yes - need to evaluate what these are
- disk buses
- vNIC types
- graphics device
- (basically any hw_{prop} image metadata property
(lyarwood): [libvirt] Attaching manilla shares to instances via VirtIOFSDiscussed below:https://etherpad.opendev.org/p/nova-manila-virtio-fs-support-xptg-april-2021Current plan is to have a WIP spec and PoC written during Xena using NFS as a backend for both boot with an attached share and later live attach and detach a share.Will include a new attach share Nova API, ShareMapping objects and os-share lib used to mount the underlying share on the compute (will also replace some basic logic carried on Nova for RemoteFs based volumes).
- AGREED:
- Lee to draft a spec. It is not expected to be fully implemented it in X.
(artom) Optimizing how we place guest CPU cores on host CPU cores - spiritual continuation of the `socket` pci_numa_affinity_policy
- AGREED:
- do it
- document the algorithm but do not make it contractual
(suzhengwei): Support vm evacuation while server status is suspended, paused, or shelved.
When nova-compute down, It only supports vm evacuation while server status is active/stopped/error. Why not support suspended, paused, or shelved status?shelved im not sure makes sense, perhspas shelve specifcally shelve_offloaded does not benifit form evaucate.nova evacuate --target-state=stopped/shelved-offloaded? requires a new micorverion if this is addedyes i was thinking it was a set of valid targets so stoppted, active shelve offloaded is slightly different in that its not on a host at the end.
Note: massacari instanance HA expects running instance to be made running again- AGREED:
- Spec it up.
- Agree what will be the state of the instance on the dest node. E.g today it would be ACTIVE (due to rebuild on the dest), but for a shelved instance that sounds a bad status transition (due to billing).
- Shelved -> Shelved or Shelved_offloaded ?
- Paused -> Stopped
- Suspended -> Stopped
- needs a microversion
(dmitriis): Transport Node bp/spec RFC
- https://review.opendev.org/c/openstack/nova-specs/+/787458
- https://blueprints.launchpad.net/nova/+spec/introduce-transport-nodes
- aims to address hostname mismatches introduced by off-path SmartNIC/DPUs while reusing the code-paths used for the hardware offload functionality (https://docs.openstack.org/neutron/latest/admin/config-ovs-offload.html);
- need some coordination around the plans above to move pci device tracking into to the Placement service.
- (sean-k-mooney): we spoke about this at some lenthgh on the nova irc channel today.
- http://eavesdrop.openstack.org/irclogs/%23openstack-nova/latest.log.html#t2021-04-22T11:40:10
- there is a lot to digest but my summary of the topic is as follows.
- vendors would like to move the contol plane for hardware offloaded ovs backend form the host to the smartnic to save cpu cycle and create a security boundary between a hypervisor and a NIC
- these smart nics will run linux, ovs and the contol plane software (
ovn southd ovn-controller or neutron agents) - since the contol plane is runing on a smartnic under linux the hostname of the server where the vm is and the hostname seen by the ovs contoler porcesses will be differnt
- as the smartnic will be manage by 2 differnet operating systems (host and smartnic) and since the VFs may or may not existss on both and may have differnt adress if they do we need some way to corralate the VF selected by nova with the identifyer used in ovn/neutron to manage it in ovs. (more specifically, there needs to be a way for Neutron & OVN to pick a representor port corresponding to a VF selected by Nova at the hypervisor host)
- the spec makes a number of propalse for how to do this, i have not really read it fully to sumerise the spec but it invovled adding db tabels to nova to track the device corralation info and modifcation to placmenet to model the device in a way that could be used for both scudleing and sotring the info we need.
- i have a counter propasl which is also not fully tought out but at a high level it is this.
- no db schem change
- add the extra identifies required to do the corralation to the extra_info column fo the pci_devices table
- logical vf number
- logical switch id (if available, an eswitch is not exposed to hypervisor host PFs)
- pci vpd serial number
- add this addtional info to the binding profile so that the neutron backend can consume it
- extend the ovn ml2 driver to lookup the chasis that manages the device unsing the new info
- extend the ovn ml2 driver to pass the tcp connection info to the ovs db for that chassis in the port binding details.
- extend os-vif object to store that connection string and extend the ovs plugin to use it instead of the nova config value when present.
- note(dmitriis): Relying on Nova's use of os-vif to also do representor plugging is problematic because it does not have visibility of things seen at the SmartNIC host (see this portion of code that expects representors and VFs to be visible via sysfs on the same host https://opendev.org/openstack/os-vif/src/tag/2.4.0/vif_plug_ovs/linux_net.py#L236-L296). I would instead let Nova do plugging of a VF into an instance and let OVN handle representor plugging (besides with flow programming). WIP work to add that to OVN: https://github.com/fnordahl/ovn/commit/569709dcec423b7685b7fb42b1fcba3ab2402873
- note(dmitriis): Another problem is that by storing a connection string we imply that the hypervisor host needs to be able to have IP connectivity to the SmartNIC host and have credentials to access the remote OVS. This goes against keeping a security boundary between the hypervisor and the SmartNIC.
- extend nova to populate the os-vif object with the info form the port binding details.
- (sean-k-mooney)i thinks the above would work but there may be gaps
- seperatly form this we can also track pci device in placment without needing to consider what network backend is in use and if the netwrok contol plane is on the same host as nova-compute on a smart nic.
- AGREED:
- continues the spec review
(sean-k-mooney): performance impact of security groups in server detail responce.
- https://bugs.launchpad.net/nova/+bug/1923560
- today we include the list of security groups in the server detail responce.
- this made sense when nova had its own security groups for nova networks.
- i think we shoudl reavalueate the existings fo security groups in all nova api and remove them.
- if people want the security groups they can look them up from neutron driectly instead of using nova as a proxy api.
- alternitivly we could cache them in the network info cache but that is changing the responceof the server deatil to be a cached responce and not the current state.
- proposal:
- deprecate add and remvoe secuirty group apis.
- remove security group from server detail responce in a new microverions.
- remove os-security-groups api (it was deprecated in 2.36 now that nova-net is gone i think we should remove it)
- remove os-security-group-rules api (also deprecated in 2.36)
- alternitive approch
- open questions
- shoudl we keep security_groups in server create
- it is ment to set the security group used for all port created by nova when we pass a network.
- im not sure if it actully does i rembere there was some confustion on that fact and if it only ever worked that way for nova-net.
- we know that this does not apply to pre created ports that are passed on the command line which is also a source of confution to some.
- (stephenfin) I looked at doing this in the past but amotoki told me it still works, apparently
- AGREED:
- remove the security group from the GET servers/details (PUT, Rebuild API response also) in a new microversion
- use the network info cache to speed up the query on old microversion
- note in the reno that now it is a cached data now
- double check for any other proxy we do during server details +1
(sean-k-mooney): deprecate flavor based qos and add qos api?
- Note proably wont do this in xena but want to start the converstation around deprecating the network tunings.
- Nova currently support a number of qos style flavor extra specs for turning cpu,disk,ram and network parmaters (https://docs.openstack.org/nova/latest/user/flavors.html#extra-specs)
- see cpu limits, memory limits, disk tuning and bandwith io sections.
- some of these have been superceeded by QOS in other project in some or all cases (network qos should be done with neutron, disk qos can be done for cinder volumnes in cinder).
- other parmaters cannot be specified in any other way then then the flavor( disk qos for local block devices, memory and cpu tuning).
- should we deprecate and remove the network qos entirely now that nova net is gone?
- note these were implemente by libvirt directly using TC rules in the kerenel
- this obviouly only works if the neutron network backend use standard kernel interfaces (e.g. does not work for dpdk or sriov VFs)
- it also onely works if the tap device is not directly added to ovs. ovs with iptables firewall works fien but ovn, odl and ovs with ovs firewall or noop firewall wont work.
- that means this only works with linux bridge ro ovs with iptables firewall.
- shoudl we deprecate and remove disk tuning in faovr of cinder qos? this creates a gap for local disks.
- for others should we deprecate and remove the memory and cpu limits since thye are not widely supported outside of the vmware dirver?
- if we modeled QOS policies as a top level resouce in the nova api they could still be defiend by admins and selected by tenants.
- we could support seting a default QOS policy in the flavor which could be overriden by the user. e.g. qos:default_policy=<silver>
- we could also support a required QOS policy that cannot be overriden e.g. qos:required_policy
- openquestions:
- can we remove the networking QOS parmater without provideing any repecement? Maybe?
- can we do the same for disk? maybe not since it would be a feature regressions.
- do w care to provide QOS apis if they are rarely used or if the exsitng flavor version works?
- if we did this should qos policies be imutable once created or mutable.
- imutable:
- policies on exsiting vms would be altered only be selecting another policy
- this could hard reboot or live update the vm to have it apply imendiatly.
- mutable:
- we could create an embeded copy of the policy like we do with flavors and images so that policy updates are only applied on an api action
- we could not create a copy so that policy updates take effect on the next hard reboot.
- (brinzhang)This seems to be similar to our idea of integrating flavor_extra_specs in the Ussuri version, https://review.opendev.org/c/openstack/nova-specs/+/663563/14/specs/ussuri/approved/composable-flavor-properties.rst
- not really its not ment to enable composablity of arbitary extra specs. most other service that have qos treat it as a spereate top level api.
- AGREED:
- keep it for vmware driver
- deprecated in the docs for libvirt driver
<add your topics here>