From kamil.madac at slovenskoit.sk Sun Jan 2 15:51:45 2022 From: kamil.madac at slovenskoit.sk (=?utf-8?B?S2FtaWwgTWFkw6HEjQ==?=) Date: Sun, 2 Jan 2022 15:51:45 +0000 Subject: [neutron] Dadfailed of ipv6 metadata IP in qdhcp namespace and disappearing dhcp namespaces Message-ID: Hello, In our small cloud environment, we started to see weird behavior during last 2 months. Dhcp namespaces started to disappear randomly, which caused that VMs losed connectivity once dhcp lease expired. After the investigation I found out following issue/bug: 1. ipv6 metadata address of tap interface in some qdhcp-xxxx namespaces are stucked in "dadfailed tentative" state (i do not know why yet) 2. 3. root at cloud01:~# ip netns exec qdhcp-3094b264-829b-4381-9ca2-59b3a3fc1ea1 ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2585: tap1797d9b1-e1: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:77:64:0d brd ff:ff:ff:ff:ff:ff inet 169.254.169.254/32 brd 169.254.169.254 scope global tap1797d9b1-e1 valid_lft forever preferred_lft forever inet 192.168.0.2/24 brd 192.168.0.255 scope global tap1797d9b1-e1 valid_lft forever preferred_lft forever inet6 fe80::a9fe:a9fe/64 scope link dadfailed tentative valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe77:640d/64 scope link valid_lft forever preferred_lft forever 4. 5. This blocked dhcp agent to finish sync_state function, and NetworkCache was not updated with subnets of such neutron network 6. During creation of VM assigned to such network, agent does not detect any subnets (see point 2), so he thinks (reload_allocations()) there is no dhcp needed and deletes qdhcp-xxxx namespace, so no DHCP and no Metadata are working on such network since that moment, and after 24h we see connectivity issues. 7. Restart of DHCP agent recreates missing qdhcp-xxxx namespaces, but NetworkCache in dhcp agent is again empty, so creation of VM deletes the qdhcp-xxxx namespace again ? Workaround is to remove dhcp agent from that network and add it again. Interestingly, sometimes I need to do it multiple times, because in few cases tap interface in new qdhcp finishes again in dadfailed tentative state. After year in production we have 20 networks out of 60 in such state. We are using kolla-ansible deployment on Ubuntu 20.04, kernel 5.4.0-65-generic. Openstack version Victoria and neutron is in version 17.2.2.dev70. Is that bug in neutron, or is it misconfiguration of OS on our side? I'm locally testing patch which disables ipv6 dad in qdhcp-xxxx namespace (net.ipv6.conf.default.accept_dad = 1), but I'm not sure it is good solution when it comes to other neutron features? Kamil Mad?? Slovensko IT a.s. -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Sun Jan 2 23:23:30 2022 From: amy at demarco.com (Amy Marrich) Date: Sun, 2 Jan 2022 17:23:30 -0600 Subject: [Diversity] D&I WG Meeting Cancellation and Poll Reminder Message-ID: Just a reminder that tomorrow's meeting is canceled and to participate in the Meeting Time and IRC vs Video Poll! https://doodle.com/poll/mnzxkauzwf2isssm?utm_source=poll&utm_medium=link Thanks, Amy (spotz) -------------- next part -------------- An HTML attachment was scrubbed... URL: From haleyb.dev at gmail.com Mon Jan 3 01:35:26 2022 From: haleyb.dev at gmail.com (Brian Haley) Date: Sun, 2 Jan 2022 20:35:26 -0500 Subject: [neutron] Dadfailed of ipv6 metadata IP in qdhcp namespace and disappearing dhcp namespaces In-Reply-To: References: Message-ID: Hi, On 1/2/22 10:51 AM, Kamil Mad?? wrote: > Hello, > > In our small cloud environment, we started to see weird behavior during > last 2 months. Dhcp namespaces started to disappear randomly, which > caused that VMs losed connectivity once dhcp lease expired. > After the investigation I found out following issue/bug: > > 1. ipv6 metadata address of tap interface in some qdhcp-xxxx namespaces > are stucked in "dadfailed tentative" state (i do not know why yet) This issue was reported about a month ago: https://bugs.launchpad.net/neutron/+bug/1953165 And Bence marked it a duplicate of: https://bugs.launchpad.net/neutron/+bug/1930414 Seems to be a bug in a flow based on the title - "Traffic leaked from dhcp port before vlan tag is applied". I would follow-up in that second bug. Thanks, -Brian > 3. root at cloud01:~# ip netns exec > qdhcp-3094b264-829b-4381-9ca2-59b3a3fc1ea1 ip a > 1: lo: mtu 65536 qdisc noqueue state UNKNOWN > group default qlen 1000 > ? ? link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > ? ? inet 127.0.0.1/8 scope host lo > ? ? ? ?valid_lft forever preferred_lft forever > ? ? inet6 ::1/128 scope host > ? ? ? ?valid_lft forever preferred_lft forever > 2585: tap1797d9b1-e1: mtu 1500 > qdisc noqueue state UNKNOWN group default qlen 1000 > ? ? link/ether fa:16:3e:77:64:0d brd ff:ff:ff:ff:ff:ff > ? ? inet 169.254.169.254/32 brd 169.254.169.254 scope global > tap1797d9b1-e1 > ? ? ? ?valid_lft forever preferred_lft forever > ? ? inet 192.168.0.2/24 brd 192.168.0.255 scope global tap1797d9b1-e1 > ? ? ? ?valid_lft forever preferred_lft forever > ? ? inet6 fe80::a9fe:a9fe/64 scope link dadfailed tentative > ? ? ? ?valid_lft forever preferred_lft forever > ? ? inet6 fe80::f816:3eff:fe77:640d/64 scope link > ? ? ? ?valid_lft forever preferred_lft forever > 4. > > 5. This blocked dhcp agent to finish sync_state function, and > NetworkCache was not updated with subnets of such neutron network > 6. During creation of VM assigned to such network, agent does not > detect any subnets (see point 2), so he thinks > (reload_allocations()) there is no dhcp needed and deletes > qdhcp-xxxx namespace, so no DHCP and no Metadata are working on such > network since that moment, and after 24h we see connectivity issues. > 7. Restart of DHCP agent recreates missing qdhcp-xxxx namespaces, but > NetworkCache? in dhcp agent is again empty, so creation of VM > deletes the qdhcp-xxxx namespace again ? > > Workaround is to remove dhcp agent from that network and add it again. > Interestingly, sometimes I need to do it multiple times, because in few > cases tap interface in new qdhcp finishes again in dadfailed tentative > state. After year in production we have 20 networks out of 60 in such state. > > We are using kolla-ansible deployment on Ubuntu 20.04, kernel > 5.4.0-65-generic. Openstack version Victoria and neutron is in version > 17.2.2.dev70. > > > Is that bug in neutron, or is it misconfiguration of OS on our side? > > I'm locally testing patch which disables ipv6 dad in qdhcp-xxxx > namespace (net.ipv6.conf.default.accept_dad = 1), but I'm not sure it is > good solution when it comes to other neutron features? > > > Kamil Mad?? > /Slovensko IT a.s./ > From gthiemonge at redhat.com Mon Jan 3 07:01:40 2022 From: gthiemonge at redhat.com (Gregory Thiemonge) Date: Mon, 3 Jan 2022 08:01:40 +0100 Subject: [octavia] Management Network Issue In-Reply-To: References: Message-ID: Hi Ammad, Do you have a reproducer for this issue (CLI or API calls)? We recently fixed a similar bug when using Octavia availability zones: https://review.opendev.org/c/openstack/octavia/+/812798 The fix is included in 9.0.1 Greg On Tue, Dec 28, 2021 at 1:58 PM Ammad Syed wrote: > Hi, > > I am using octavia 9.0. I have created a neutron vlan based management > network of octavia. I am creating a loadbalancer with a subnet in the > tenant network, its getting created successfully. Then added a listener and > created a pool, both created successfully. > > Now a weird situation is happening. Now when I add a member in the pool, > the management network interface get detached automatically from the > AMPHORA instance and amphora keeps in PENDING_UPDATE state. > > I have created a dedicated service-project for octavia and management > network and subnet resides there. > > Any advise how to fix it ? > > Ammad > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kamil.madac at slovenskoit.sk Mon Jan 3 08:58:42 2022 From: kamil.madac at slovenskoit.sk (=?utf-8?B?S2FtaWwgTWFkw6HEjQ==?=) Date: Mon, 3 Jan 2022 08:58:42 +0000 Subject: [neutron] Dadfailed of ipv6 metadata IP in qdhcp namespace and disappearing dhcp namespaces In-Reply-To: References: Message-ID: Hi Brian, thank you very much for pointing to those bugs. It is exactly what we are experiencing in our deployment. I will follow-up in those bugs then. Kamil ________________________________ From: Brian Haley Sent: Monday, January 3, 2022 2:35 AM To: Kamil Mad?? ; openstack-discuss Subject: Re: [neutron] Dadfailed of ipv6 metadata IP in qdhcp namespace and disappearing dhcp namespaces Hi, On 1/2/22 10:51 AM, Kamil Mad?? wrote: > Hello, > > In our small cloud environment, we started to see weird behavior during > last 2 months. Dhcp namespaces started to disappear randomly, which > caused that VMs losed connectivity once dhcp lease expired. > After the investigation I found out following issue/bug: > > 1. ipv6 metadata address of tap interface in some qdhcp-xxxx namespaces > are stucked in "dadfailed tentative" state (i do not know why yet) This issue was reported about a month ago: https://bugs.launchpad.net/neutron/+bug/1953165 And Bence marked it a duplicate of: https://bugs.launchpad.net/neutron/+bug/1930414 Seems to be a bug in a flow based on the title - "Traffic leaked from dhcp port before vlan tag is applied". I would follow-up in that second bug. Thanks, -Brian > 3. root at cloud01:~# ip netns exec > qdhcp-3094b264-829b-4381-9ca2-59b3a3fc1ea1 ip a > 1: lo: mtu 65536 qdisc noqueue state UNKNOWN > group default qlen 1000 > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > inet 127.0.0.1/8 scope host lo > valid_lft forever preferred_lft forever > inet6 ::1/128 scope host > valid_lft forever preferred_lft forever > 2585: tap1797d9b1-e1: mtu 1500 > qdisc noqueue state UNKNOWN group default qlen 1000 > link/ether fa:16:3e:77:64:0d brd ff:ff:ff:ff:ff:ff > inet 169.254.169.254/32 brd 169.254.169.254 scope global > tap1797d9b1-e1 > valid_lft forever preferred_lft forever > inet 192.168.0.2/24 brd 192.168.0.255 scope global tap1797d9b1-e1 > valid_lft forever preferred_lft forever > inet6 fe80::a9fe:a9fe/64 scope link dadfailed tentative > valid_lft forever preferred_lft forever > inet6 fe80::f816:3eff:fe77:640d/64 scope link > valid_lft forever preferred_lft forever > 4. > > 5. This blocked dhcp agent to finish sync_state function, and > NetworkCache was not updated with subnets of such neutron network > 6. During creation of VM assigned to such network, agent does not > detect any subnets (see point 2), so he thinks > (reload_allocations()) there is no dhcp needed and deletes > qdhcp-xxxx namespace, so no DHCP and no Metadata are working on such > network since that moment, and after 24h we see connectivity issues. > 7. Restart of DHCP agent recreates missing qdhcp-xxxx namespaces, but > NetworkCache in dhcp agent is again empty, so creation of VM > deletes the qdhcp-xxxx namespace again ? > > Workaround is to remove dhcp agent from that network and add it again. > Interestingly, sometimes I need to do it multiple times, because in few > cases tap interface in new qdhcp finishes again in dadfailed tentative > state. After year in production we have 20 networks out of 60 in such state. > > We are using kolla-ansible deployment on Ubuntu 20.04, kernel > 5.4.0-65-generic. Openstack version Victoria and neutron is in version > 17.2.2.dev70. > > > Is that bug in neutron, or is it misconfiguration of OS on our side? > > I'm locally testing patch which disables ipv6 dad in qdhcp-xxxx > namespace (net.ipv6.conf.default.accept_dad = 1), but I'm not sure it is > good solution when it comes to other neutron features? > > > Kamil Mad?? > /Slovensko IT a.s./ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From syedammad83 at gmail.com Mon Jan 3 09:34:55 2022 From: syedammad83 at gmail.com (Ammad Syed) Date: Mon, 3 Jan 2022 14:34:55 +0500 Subject: [octavia] Management Network Issue In-Reply-To: References: Message-ID: This was the issue that I ran into. I have defined management_network in az. Now it's working fine. The other issue I am having is I am unable to delete az and its profile while using percona xtradb cluster with pxc_strict_mode enforcing. openstack loadbalancer availabilityzone delete abc24 (1105, "Percona-XtraDB-Cluster doesn't recommend using SERIALIZABLE isolation with pxc_strict_mode = ENFORCING") (HTTP 500) (Request-ID: req-c40c4902-110d-4b11-8cf8-6542d17768ba) Ammad On Mon, Jan 3, 2022 at 12:02 PM Gregory Thiemonge wrote: > Hi Ammad, > > Do you have a reproducer for this issue (CLI or API calls)? > > We recently fixed a similar bug when using Octavia availability zones: > https://review.opendev.org/c/openstack/octavia/+/812798 > The fix is included in 9.0.1 > > Greg > > > On Tue, Dec 28, 2021 at 1:58 PM Ammad Syed wrote: > >> Hi, >> >> I am using octavia 9.0. I have created a neutron vlan based management >> network of octavia. I am creating a loadbalancer with a subnet in the >> tenant network, its getting created successfully. Then added a listener and >> created a pool, both created successfully. >> >> Now a weird situation is happening. Now when I add a member in the pool, >> the management network interface get detached automatically from the >> AMPHORA instance and amphora keeps in PENDING_UPDATE state. >> >> I have created a dedicated service-project for octavia and management >> network and subnet resides there. >> >> Any advise how to fix it ? >> >> Ammad >> > -- Regards, Syed Ammad Ali -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Mon Jan 3 09:58:22 2022 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 3 Jan 2022 10:58:22 +0100 Subject: [largescale-sig] Next meeting: Jan 5th, 15utc Message-ID: <15b297bf-465b-dea2-51e1-ba570aea293c@openstack.org> Hi everyone, and happy new year! To kick off 2022, the Large Scale SIG will be meeting this Wednesday in #openstack-operators on OFTC IRC, at 15UTC. You can doublecheck how that time translates locally at: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20220105T15 We will be discussing our upcoming activities and our next OpenInfra Live episode, as well as any topic you may add to our agenda at: https://etherpad.openstack.org/p/large-scale-sig-meeting Regards, -- Thierry Carrez From mkopec at redhat.com Mon Jan 3 10:22:04 2022 From: mkopec at redhat.com (Martin Kopec) Date: Mon, 3 Jan 2022 11:22:04 +0100 Subject: [all][qa][triplo] Help need to maintain OpenStack health dashboard In-Reply-To: References: <17d76bee71c.cfa2c4c1178950.6715520613968754830@ghanshyammann.com> Message-ID: A little late but here is what we have concluded in the call: * TripleO's dashboard is not as similar to the openstack-health's one as we initially thought, it lacks the most important functionality for us which shows a pass/fail rate of a particular test * TripleO team would be interested in having test pass/fail rate functionality * In order to make openstack-health working again: ** openstack/openstack-health's codebase needs to be updated ** operational assistance to deploy the service as well as the subunit2sql processing pipeline and the subunit2sql database backend is required (see Clark's email within this thread) ** we need a point of contact for regular maintenance of the service, infra issues and etc * If a volunteer who would work on the mentioned action items is found, infra team can provide needed resources and guidance - remember that the main reason the tool is being shutdown is only the lack of a maintainer * In the meantime QA team will start retirement process of the openstack-health repository If you would like to get included in this and provide any help, feel free to reach out to us. Thanks, On Thu, 9 Dec 2021 at 17:23, Martin Kopec wrote: > We're gonna have a call to discuss collaboration between qa and tripleo > and the next steps regarding openstack-health. > > The call details: > next Thursday December 16th 1630 UTC > Video call link: https://meet.google.com/cmr-yzaq-twp > > Feel free to join, > > On Wed, 1 Dec 2021 at 17:52, Ronelle Landy wrote: > >> >> >> On Wed, Dec 1, 2021 at 11:11 AM Ghanshyam Mann >> wrote: >> >>> Hello Everyone, >>> >>> In the QA meeting[1], we discussed the help needed to maintain the >>> OpenStack health dashboard which is >>> broken currently and the QA team does not have the JS developer and >>> bandwidth to fix it. While discussing >>> it meeting, we found that the TripleO team might be working on a similar >>> dashboard for their CI results. If that >>> is the case, can we collaborate on maintaining the existing dashboard? >>> >> >> We had a discussion with Martin Kopec about collaboration with QA in >> maintaining a TripleO focused dashboard. >> We have two dashboards in progress at the moment - one focused on jobs >> running in https://review.rdoproject.org/zuul/status - >> http://ci-health-rdo.tripleo.org/ and one where we are starting to >> track failures in select jobs running on https://zuul.openstack.org/ - >> http://ci-health.tripleo.org/ >> >> If you would like to collaborate on this work, please ping us on #oooq on >> (OFTC) join our community call on Tuesdays at 1:30pm UTC and we can discuss >> further. >> >> Thanks! >> >>> >>> OpenStack health dashboard: >>> https://opendev.org/openstack/openstack-health >>> Repo: http://status.openstack.org/openstack-health/#/ >>> >>> [1] >>> https://meetings.opendev.org/meetings/qa/2021/qa.2021-11-16-15.00.log.html#l-71 >>> >>> -gmann >>> >>> > > -- > Martin Kopec > Senior Software Quality Engineer > Red Hat EMEA > > > > -- Martin Kopec Senior Software Quality Engineer Red Hat EMEA -------------- next part -------------- An HTML attachment was scrubbed... URL: From arxcruz at redhat.com Mon Jan 3 10:32:53 2022 From: arxcruz at redhat.com (Arx Cruz) Date: Mon, 3 Jan 2022 11:32:53 +0100 Subject: [all][qa][triplo] Help need to maintain OpenStack health dashboard In-Reply-To: References: <17d76bee71c.cfa2c4c1178950.6715520613968754830@ghanshyammann.com> Message-ID: Sorry for the late reply, I was on vacation and wasn't able to participate in the meeting. I'm interested to maintain the subunit2sql and help with maintenance of openstack-health. Kind regards, Arx Cruz On Mon, Jan 3, 2022 at 11:25 AM Martin Kopec wrote: > A little late but here is what we have concluded in the call: > * TripleO's dashboard is not as similar to the openstack-health's one as > we initially thought, it lacks the most important functionality for us > which shows a pass/fail rate of a particular test > * TripleO team would be interested in having test pass/fail rate > functionality > * In order to make openstack-health working again: > ** openstack/openstack-health's codebase needs to be updated > ** operational assistance to deploy the service as well as the subunit2sql > processing pipeline and the subunit2sql database backend is required (see > Clark's email within this thread) > ** we need a point of contact for regular maintenance of the service, > infra issues and etc > * If a volunteer who would work on the mentioned action items is found, > infra team can provide needed resources and guidance - remember that the > main reason the tool is being shutdown is only the lack of a maintainer > * In the meantime QA team will start retirement process of the > openstack-health repository > > If you would like to get included in this and provide any help, feel free > to reach out to us. > > Thanks, > > On Thu, 9 Dec 2021 at 17:23, Martin Kopec wrote: > >> We're gonna have a call to discuss collaboration between qa and tripleo >> and the next steps regarding openstack-health. >> >> The call details: >> next Thursday December 16th 1630 UTC >> Video call link: https://meet.google.com/cmr-yzaq-twp >> >> Feel free to join, >> >> On Wed, 1 Dec 2021 at 17:52, Ronelle Landy wrote: >> >>> >>> >>> On Wed, Dec 1, 2021 at 11:11 AM Ghanshyam Mann >>> wrote: >>> >>>> Hello Everyone, >>>> >>>> In the QA meeting[1], we discussed the help needed to maintain the >>>> OpenStack health dashboard which is >>>> broken currently and the QA team does not have the JS developer and >>>> bandwidth to fix it. While discussing >>>> it meeting, we found that the TripleO team might be working on a >>>> similar dashboard for their CI results. If that >>>> is the case, can we collaborate on maintaining the existing dashboard? >>>> >>> >>> We had a discussion with Martin Kopec about collaboration with QA in >>> maintaining a TripleO focused dashboard. >>> We have two dashboards in progress at the moment - one focused on jobs >>> running in https://review.rdoproject.org/zuul/status - >>> http://ci-health-rdo.tripleo.org/ and one where we are starting to >>> track failures in select jobs running on https://zuul.openstack.org/ - >>> http://ci-health.tripleo.org/ >>> >>> If you would like to collaborate on this work, please ping us on #oooq >>> on (OFTC) join our community call on Tuesdays at 1:30pm UTC and we can >>> discuss further. >>> >>> Thanks! >>> >>>> >>>> OpenStack health dashboard: >>>> https://opendev.org/openstack/openstack-health >>>> Repo: http://status.openstack.org/openstack-health/#/ >>>> >>>> [1] >>>> https://meetings.opendev.org/meetings/qa/2021/qa.2021-11-16-15.00.log.html#l-71 >>>> >>>> -gmann >>>> >>>> >> >> -- >> Martin Kopec >> Senior Software Quality Engineer >> Red Hat EMEA >> >> >> >> > > -- > Martin Kopec > Senior Software Quality Engineer > Red Hat EMEA > > > > -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkopec at redhat.com Mon Jan 3 10:35:08 2022 From: mkopec at redhat.com (Martin Kopec) Date: Mon, 3 Jan 2022 11:35:08 +0100 Subject: [tempest] Extending tempest for mixed-architecture stacks In-Reply-To: <8f8768fa-3097-6bc2-39ff-f87eb6f91ba1@redhat.com> References: <8f8768fa-3097-6bc2-39ff-f87eb6f91ba1@redhat.com> Message-ID: Hi James, thank you for the suggestion. During Wallaby PTG we have considered having different images (than just cirros ones), see "Use different guest image for gate jobs to run tempest tests" topic in [1], although it wasn't pursued at the end (we've had more pressing topics to deal with) and the action item got closed in Xena cycle [2]. I think we could start by creating a new option which would allow us to skip the failing tests on a different architecture. If we had at least an experimental job in the gates, which would run a different architecture, we could add a new test exercising that as you suggested. Then let's see where that gets us. [1] https://etherpad.opendev.org/p/qa-wallaby-ptg [2] https://etherpad.opendev.org/p/qa-xena-priority Regards, On Mon, 13 Dec 2021 at 22:49, James LaBarre wrote: > Recently I had been running Tempest on my setup testing a > mixed-architecture deployment (x86_64 ans ppc64le compute nodes at the same > time). It seems that some of the migration and affinity tests will check > if there's more than one Compute node before they run. However, it would > seem that's as far as they check, without checking if they are in fact > compatible or even of the same architecture. (my test cluster is very > small, and normally includes two ppc64le Compute nodes, and sometimes one > x86_64 Compute node. Currently one ppc64le machine is down for repair). > > Because the two compute nodes are different architectures, I am getting > failures in various migration and affinity tests, maybe more if I tested a > larger subset. Now granted my particular setup is a special case, but it > does bring to mind some extensions that may be needed for Tempest in the > future. I could see it being possible to have x86_64 and ARM mixed > together in one stack, maybe even tossing in RISC-v someday. > > I'm thinking we need to start adding in extra test images, flavors, etc > into the Tempest configurations (as in defining multiple options so that > each architecture can have test images, etc assigned to it, rather than the > current primary and alt image for just one architecture) Additionally, > there should be testcases taking into account the architectures involved > (such as seeing that an instance on one arch cannot be migrated to the > other, as an example). I know this involves a bit of refactoring, I didn't > know if it had even been considered yet. > > > -- > > James LaBarre > > Software Engineer, OpenStack MultiArch > > Red Hat > > jlabarre at redhat.com > > -- Martin Kopec Senior Software Quality Engineer Red Hat EMEA -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Mon Jan 3 16:02:14 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 3 Jan 2022 16:02:14 +0000 Subject: [security-sig] Log4j vulnerabilities and OpenStack Message-ID: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> Unless you were living under a rock during most of December, you've almost certainly seen the press surrounding the various security vulnerabilities discovered in the Apache Log4j Java library. As everyone reading this list hopefully knows, OpenStack is primarily written in Python, so has little use for Java libraries in the first place, but that hasn't stopped users from asking our VMT members if OpenStack is affected. While OpenStack doesn't require any Java components, I'm aware of one Neutron driver (networking-odl) which relies on an affected third-party service: https://access.redhat.com/solutions/6586821 Additionally, "storm" component of SUSE OpenStack seems to be impacted: https://www.suse.com/c/suse-statement-on-log4j-log4shell-cve-2021-44228-vulnerability/ As does an Elasticsearch component in Sovereign Cloud Stack: https://scs.community/security/2021/12/13/advisory-log4j/ Users should, obviously, rely on their distribution vendors/suppliers to notify them of available updates for these. Is anyone aware of other, similar situations where OpenStack is commonly installed alongside Java software using Log4j in vulnerable ways? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From dms at danplanet.com Mon Jan 3 17:12:28 2022 From: dms at danplanet.com (Dan Smith) Date: Mon, 03 Jan 2022 09:12:28 -0800 Subject: [tc][all] Follow up pain point discussion In-Reply-To: <17dc3bdef3c.c0f9d0e042778.4126473039134888292@ghanshyammann.com> (Ghanshyam Mann's message of "Thu, 16 Dec 2021 08:55:44 -0600") References: <17da028f2c6.f53fb4e822720.4072888890953590866@ghanshyammann.com> <7285a21e-2d2a-1c4f-d3ba-d74f65880ea6@debian.org> <17dc3bdef3c.c0f9d0e042778.4126473039134888292@ghanshyammann.com> Message-ID: > > Last time I checked in Victoria, Glance was broken when we had: > > > > Transfer-Encoding: chucked > > > > Has this been fixed? If yes, can someone point at the patches? > > Import workflow fixes I mentioned above were in wallaby, I am not sure about victoria. > Maybe it is better to check on or after wallaby and see if there is still issues or glance team > can give more insights (Dan is on PTO until Jan 2nd, he is the one who made it work on uwsgi). Yeah, as gmann noted, we're gating on uwsgi and actively using it during the setup and test phase. I'm not aware of any bugs, so if you have one, point me to it (and/or file it) and I'll take a look. --Dan From gmann at ghanshyammann.com Mon Jan 3 17:16:08 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 03 Jan 2022 11:16:08 -0600 Subject: [tempest] Extending tempest for mixed-architecture stacks In-Reply-To: References: <8f8768fa-3097-6bc2-39ff-f87eb6f91ba1@redhat.com> Message-ID: <17e20f0f241.bb1803f121114.47658658821095375@ghanshyammann.com> ---- On Mon, 03 Jan 2022 04:35:08 -0600 Martin Kopec wrote ---- > Hi James, > thank you for the suggestion. During Wallaby PTG we have considered having different images (than just cirros ones), see "Use different guest image for gate jobs to run tempest tests" topic in [1], although it wasn't pursued at the end (we've had more pressing topics to deal with) and the action item got closed in Xena cycle [2]. > I think we could start by creating a new option which would allow us to skip the failing tests on a different architecture. If we had at least an experimental job in the gates, which would run a different architecture, we could add a new test exercising that as you suggested. Then let's see where that gets us. We can add more image tests in CI with separate jobs which will be straight forward to configure in zuul job (how many tests fails is another things to see). But skipping tests, I am not sure. What is the actual operation result for such arch? does real operation fails? or there is a different way to perform those operation in those arch than what test is doing? If they are failing in real world then test failing is valid things and exclude such tests while running will be right way instead of skipping the test. -gmann > > [1] https://etherpad.opendev.org/p/qa-wallaby-ptg[2] https://etherpad.opendev.org/p/qa-xena-priority > > Regards, > > On Mon, 13 Dec 2021 at 22:49, James LaBarre wrote: > Recently I had been running Tempest on my setup testing a mixed-architecture deployment (x86_64 ans ppc64le compute nodes at the same time). It seems that some of the migration and affinity tests will check if there's more than one Compute node before they run. However, it would seem that's as far as they check, without checking if they are in fact compatible or even of the same architecture. (my test cluster is very small, and normally includes two ppc64le Compute nodes, and sometimes one x86_64 Compute node. Currently one ppc64le machine is down for repair). > Because the two compute nodes are different architectures, I am getting failures in various migration and affinity tests, maybe more if I tested a larger subset. Now granted my particular setup is a special case, but it does bring to mind some extensions that may be needed for Tempest in the future. I could see it being possible to have x86_64 and ARM mixed together in one stack, maybe even tossing in RISC-v someday. > > I'm thinking we need to start adding in extra test images, flavors, etc into the Tempest configurations (as in defining multiple options so that each architecture can have test images, etc assigned to it, rather than the current primary and alt image for just one architecture) Additionally, there should be testcases taking into account the architectures involved (such as seeing that an instance on one arch cannot be migrated to the other, as an example). I know this involves a bit of refactoring, I didn't know if it had even been considered yet. > > > -- > James LaBarre > Software Engineer, OpenStack MultiArch > Red Hat > jlabarre at redhat.com > > > > -- > Martin Kopec > Senior Software Quality Engineer > Red Hat EMEA > > > From samueldmq at gmail.com Mon Jan 3 17:51:24 2022 From: samueldmq at gmail.com (Samuel de Medeiros Queiroz) Date: Mon, 3 Jan 2022 14:51:24 -0300 Subject: [outreachy] Stepping down as Outreachy co-organizer Message-ID: Hi all, Outreachy is a wonderful program that promotes diversity in open source communities by giving opportunities to people in underrepresented groups. This was a hard decision to make, but I have not been committing the time this project deserves. For that reason, I would like to give visibility that I am stepping down as an Outreachy organizer. It was a great honor to serve as co-organizer since late 2018, and we had 19 internships since then. I also had the pleasure to serve twice (2016 and 2017) as a mentor. Mahati, it was a great pleasure co-organizing Outreachy in this community with you. Thanks! Samuel Queiroz -------------- next part -------------- An HTML attachment was scrubbed... URL: From allison at openinfra.dev Mon Jan 3 19:39:34 2022 From: allison at openinfra.dev (Allison Price) Date: Mon, 3 Jan 2022 13:39:34 -0600 Subject: [outreachy] Stepping down as Outreachy co-organizer In-Reply-To: References: Message-ID: <43C58C86-B5BD-4945-B6B9-03DA71E08F6E@openinfra.dev> Samuel - Thank you so much for all of your work with the Outreachy program and bringing in new OpenStack contributors! I wish you luck in your upcoming endeavors and hope we continue to find ways to collaborate. Cheers, Allison > On Jan 3, 2022, at 11:51 AM, Samuel de Medeiros Queiroz wrote: > > Hi all, > > Outreachy is a wonderful program that promotes diversity in open source communities by giving opportunities to people in underrepresented groups. > > This was a hard decision to make, but I have not been committing the time this project deserves. > For that reason, I would like to give visibility that I am stepping down as an Outreachy organizer. > > It was a great honor to serve as co-organizer since late 2018, and we had 19 internships since then. > I also had the pleasure to serve twice (2016 and 2017) as a mentor. > > Mahati, it was a great pleasure co-organizing Outreachy in this community with you. > > Thanks! > Samuel Queiroz From gmann at ghanshyammann.com Mon Jan 3 22:35:09 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 03 Jan 2022 16:35:09 -0600 Subject: [all][tc] Technical Committee next weekly meeting on Jan 6th at 1500 UTC Message-ID: <17e2215043e.10f43b8a730636.5648333992896462439@ghanshyammann.com> Hello Everyone, Technical Committee's next weekly meeting is scheduled for Jan 6th at 1500 UTC. If you would like to add topics for discussion, please add them to the below wiki page by Wednesday, Jan 5th, at 2100 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting -gmann From mkopec at redhat.com Tue Jan 4 08:45:18 2022 From: mkopec at redhat.com (Martin Kopec) Date: Tue, 4 Jan 2022 09:45:18 +0100 Subject: [all][qa][triplo] Help need to maintain OpenStack health dashboard In-Reply-To: References: <17d76bee71c.cfa2c4c1178950.6715520613968754830@ghanshyammann.com> Message-ID: Thank you Arx, let's talk about it today during QA's office hour and find out what we should start with. On Mon, 3 Jan 2022 at 11:33, Arx Cruz wrote: > Sorry for the late reply, I was on vacation and wasn't able to participate > in the meeting. > > I'm interested to maintain the subunit2sql and help with maintenance of > openstack-health. > > Kind regards, > Arx Cruz > > On Mon, Jan 3, 2022 at 11:25 AM Martin Kopec wrote: > >> A little late but here is what we have concluded in the call: >> * TripleO's dashboard is not as similar to the openstack-health's one as >> we initially thought, it lacks the most important functionality for us >> which shows a pass/fail rate of a particular test >> * TripleO team would be interested in having test pass/fail rate >> functionality >> * In order to make openstack-health working again: >> ** openstack/openstack-health's codebase needs to be updated >> ** operational assistance to deploy the service as well as the >> subunit2sql processing pipeline and the subunit2sql database backend is >> required (see Clark's email within this thread) >> ** we need a point of contact for regular maintenance of the service, >> infra issues and etc >> * If a volunteer who would work on the mentioned action items is found, >> infra team can provide needed resources and guidance - remember that the >> main reason the tool is being shutdown is only the lack of a maintainer >> * In the meantime QA team will start retirement process of the >> openstack-health repository >> >> If you would like to get included in this and provide any help, feel free >> to reach out to us. >> >> Thanks, >> >> On Thu, 9 Dec 2021 at 17:23, Martin Kopec wrote: >> >>> We're gonna have a call to discuss collaboration between qa and tripleo >>> and the next steps regarding openstack-health. >>> >>> The call details: >>> next Thursday December 16th 1630 UTC >>> Video call link: https://meet.google.com/cmr-yzaq-twp >>> >>> Feel free to join, >>> >>> On Wed, 1 Dec 2021 at 17:52, Ronelle Landy wrote: >>> >>>> >>>> >>>> On Wed, Dec 1, 2021 at 11:11 AM Ghanshyam Mann >>>> wrote: >>>> >>>>> Hello Everyone, >>>>> >>>>> In the QA meeting[1], we discussed the help needed to maintain the >>>>> OpenStack health dashboard which is >>>>> broken currently and the QA team does not have the JS developer and >>>>> bandwidth to fix it. While discussing >>>>> it meeting, we found that the TripleO team might be working on a >>>>> similar dashboard for their CI results. If that >>>>> is the case, can we collaborate on maintaining the existing dashboard? >>>>> >>>> >>>> We had a discussion with Martin Kopec about collaboration with QA in >>>> maintaining a TripleO focused dashboard. >>>> We have two dashboards in progress at the moment - one focused on jobs >>>> running in https://review.rdoproject.org/zuul/status - >>>> http://ci-health-rdo.tripleo.org/ and one where we are starting to >>>> track failures in select jobs running on https://zuul.openstack.org/ - >>>> http://ci-health.tripleo.org/ >>>> >>>> If you would like to collaborate on this work, please ping us on #oooq >>>> on (OFTC) join our community call on Tuesdays at 1:30pm UTC and we can >>>> discuss further. >>>> >>>> Thanks! >>>> >>>>> >>>>> OpenStack health dashboard: >>>>> https://opendev.org/openstack/openstack-health >>>>> Repo: http://status.openstack.org/openstack-health/#/ >>>>> >>>>> [1] >>>>> https://meetings.opendev.org/meetings/qa/2021/qa.2021-11-16-15.00.log.html#l-71 >>>>> >>>>> -gmann >>>>> >>>>> >>> >>> -- >>> Martin Kopec >>> Senior Software Quality Engineer >>> Red Hat EMEA >>> >>> >>> >>> >> >> -- >> Martin Kopec >> Senior Software Quality Engineer >> Red Hat EMEA >> >> >> >> > > -- > > Arx Cruz > > Software Engineer > > Red Hat EMEA > > arxcruz at redhat.com > @RedHat Red Hat > Red Hat > > > -- Martin Kopec Senior Software Quality Engineer Red Hat EMEA -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Jan 4 11:26:08 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 04 Jan 2022 12:26:08 +0100 Subject: [neutron] CI meeting agenda - 4.01.2022 Message-ID: <13042001.uLZWGnKmhe@p1> Hi, It's just reminder that we will today have Neutron CI meeting. It will be at 3pm UTC time. Meeting will be on the #openstack-neutron channel and on jiitsi: https://meetpad.opendev.org/neutron-ci-meetings Agenda is in the etherpad https://etherpad.opendev.org/p/neutron-ci-meetings See You there :) -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From openinfradn at gmail.com Tue Jan 4 12:38:22 2022 From: openinfradn at gmail.com (open infra) Date: Tue, 4 Jan 2022 18:08:22 +0530 Subject: cpu pinning In-Reply-To: References: Message-ID: I recently noticed that available vpcu usage is "used 49 of 30" [1] but I have a total of 128 vcpus [2] and allocated only 126 for applications. Is this due to misconfiguration in my environment? [1] https://pasteboard.co/TI0WbbyZiXsn.png [2] https://paste.opendev.org/show/811910/ Regards, Danishka On Wed, Dec 8, 2021 at 10:04 PM open infra wrote: > Managed to set cpu_dedicated_setin nova. > Thanks, Sean! > > On Thu, Dec 2, 2021 at 9:16 PM Sean Mooney wrote: > >> On Thu, 2021-12-02 at 08:58 +0530, open infra wrote: >> > Hi, >> > >> > I have created a flavor with following properties and created an >> instance. >> > Instance failed with the error "No valid host was found. There are not >> > enough hosts available." >> > When I set the cpu policy as 'shared' I can create the instance. The >> host >> > machine has two numa nodes and a total of 128 vcpu. >> > I can not figure out what's missing here. >> i suspect the issue is not with the flavor but with yoru host >> configurtion. >> >> you likely need to defience cpu_dedicated_set and cpu_shared_set in the >> nova.conf >> >> we do not support mixing pinned and floating cpus on the same host unless >> you partion the cpu pool >> using cpu_dedicated_set and cpu_shared_set. >> >> as of train cpu_dedicated_set replaced vcpu_pin_set as the supported way >> to report the pool of cpus to be >> used for pinned vms to placment. >> >> if you do "openstack resource provider inventory show > uuid>" it should detail the avaiabel pcpu and vcpu inventories. >> when you use hw:cpu_policy='dedicated' it will claim PCPUs not VCPUs in >> placment. >> That is likely the issue you are encountering. >> >> by default we have a fallback query to make this work while you are >> upgrading >> >> >> https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.disable_fallback_pcpu_query >> >> which we should be disabling by default soon. >> >> but i suspect that is likely why you are getting the no valid host. >> >> to debug this properly you should enable debug logging on the nova >> schduler and then confirm if you got >> host back form placment and then if the numa toplogy filter is rejectign >> the host or not. >> >> without the schduler debug logs for the instance creation we cannt really >> help more then this since we do not have the info required. >> > >> > controller-1:~$ openstack flavor show dn.large -c properties >> > >> > >> +------------+--------------------------------------------------------------------------------------------------------+ >> > >> > > Field | Value >> > | >> > >> > >> +------------+--------------------------------------------------------------------------------------------------------+ >> > >> > > properties | hw:cpu_cores='2', hw:cpu_policy='dedicated', >> > hw:cpu_sockets='1', hw:cpu_threads='2', hw:numa_nodes='1' | >> > >> > >> +------------+--------------------------------------------------------------------------------------------------------+ >> > >> > controller-1:~$ openstack hypervisor stats show >> > >> > +----------------------+--------+ >> > >> > > Field | Value | >> > >> > +----------------------+--------+ >> > >> > > count | 1 | >> > >> > > current_workload | 0 | >> > >> > > disk_available_least | 187 | >> > >> > > free_disk_gb | 199 | >> > >> > > free_ram_mb | 308787 | >> > >> > > local_gb | 219 | >> > >> > > local_gb_used | 20 | >> > >> > > memory_mb | 515443 | >> > >> > > memory_mb_used | 206656 | >> > >> > > running_vms | 7 | >> > >> > > vcpus | 126 | >> > >> > > vcpus_used | 49 | >> > >> > +----------------------+--------+ >> > >> > >> > >> > Regards, >> > >> > Danishka >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Jan 4 12:54:06 2022 From: smooney at redhat.com (Sean Mooney) Date: Tue, 04 Jan 2022 12:54:06 +0000 Subject: cpu pinning In-Reply-To: References: Message-ID: On Tue, 2022-01-04 at 18:08 +0530, open infra wrote: > I recently noticed that available vpcu usage is "used 49 of 30" [1] but I > have a total of 128 vcpus [2] and allocated only 126 for applications. > Is this due to misconfiguration in my environment? the hyperviosr api only reprots the vCPU not the pCPUs which are used for pining so if you have cpu_dedicated_set and cpu_shared_set defiend then the vcpu reported in the horizon ui will only contain the cpus form the cpu_shared_set if you are using the older vcpu_pin_set config value instead of cpu_dedicated_set then the host can only be used for either pinned or unpinned vms and the value in the hypervior api for vcpus will be the total number of cores in vcpu_pin_set. looking at the output of openstack hypervisor stats show below we see 126 vcpus are reported so this looks like a horizon bug of some kind. the used cpus is correct. did you perhaps change form using vcpu_pin_set to cpu_dedicated_set while vms were on the host? that is not supported. if you did an you allocated 30 cpus to the cpu_shared set then the horizon output would make sense but based on the "openstack hypervisor stats show" below this shoudl be 49/126 i should also point out that starting in wallaby this infomations nolonger reproted the stats endpoint was removed entirely form the hyperviors api and the cpu_info, free_disk_gb, local_gb, local_gb_used, disk_available_least, free_ram_mb, memory_mb, memory_mb_used, vcpus, vcpus_used, and running_vms fields were removed form teh hypervior detail show endpoint. https://specs.openstack.org/openstack/nova-specs/specs/wallaby/implemented/modernize-os-hypervisors-api.html has the details and what you should used instead. > > [1] https://pasteboard.co/TI0WbbyZiXsn.png > [2] https://paste.opendev.org/show/811910/ > > Regards, > Danishka > > On Wed, Dec 8, 2021 at 10:04 PM open infra wrote: > > > Managed to set cpu_dedicated_setin nova. > > Thanks, Sean! > > > > On Thu, Dec 2, 2021 at 9:16 PM Sean Mooney wrote: > > > > > On Thu, 2021-12-02 at 08:58 +0530, open infra wrote: > > > > Hi, > > > > > > > > I have created a flavor with following properties and created an > > > instance. > > > > Instance failed with the error "No valid host was found. There are not > > > > enough hosts available." > > > > When I set the cpu policy as 'shared' I can create the instance. The > > > host > > > > machine has two numa nodes and a total of 128 vcpu. > > > > I can not figure out what's missing here. > > > i suspect the issue is not with the flavor but with yoru host > > > configurtion. > > > > > > you likely need to defience cpu_dedicated_set and cpu_shared_set in the > > > nova.conf > > > > > > we do not support mixing pinned and floating cpus on the same host unless > > > you partion the cpu pool > > > using cpu_dedicated_set and cpu_shared_set. > > > > > > as of train cpu_dedicated_set replaced vcpu_pin_set as the supported way > > > to report the pool of cpus to be > > > used for pinned vms to placment. > > > > > > if you do "openstack resource provider inventory show > > uuid>" it should detail the avaiabel pcpu and vcpu inventories. > > > when you use hw:cpu_policy='dedicated' it will claim PCPUs not VCPUs in > > > placment. > > > That is likely the issue you are encountering. > > > > > > by default we have a fallback query to make this work while you are > > > upgrading > > > > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.disable_fallback_pcpu_query > > > > > > which we should be disabling by default soon. > > > > > > but i suspect that is likely why you are getting the no valid host. > > > > > > to debug this properly you should enable debug logging on the nova > > > schduler and then confirm if you got > > > host back form placment and then if the numa toplogy filter is rejectign > > > the host or not. > > > > > > without the schduler debug logs for the instance creation we cannt really > > > help more then this since we do not have the info required. > > > > > > > > controller-1:~$ openstack flavor show dn.large -c properties > > > > > > > > > > > +------------+--------------------------------------------------------------------------------------------------------+ > > > > > > > > > Field | Value > > > > | > > > > > > > > > > > +------------+--------------------------------------------------------------------------------------------------------+ > > > > > > > > > properties | hw:cpu_cores='2', hw:cpu_policy='dedicated', > > > > hw:cpu_sockets='1', hw:cpu_threads='2', hw:numa_nodes='1' | > > > > > > > > > > > +------------+--------------------------------------------------------------------------------------------------------+ > > > > > > > > controller-1:~$ openstack hypervisor stats show > > > > > > > > +----------------------+--------+ > > > > > > > > > Field | Value | > > > > > > > > +----------------------+--------+ > > > > > > > > > count | 1 | > > > > > > > > > current_workload | 0 | > > > > > > > > > disk_available_least | 187 | > > > > > > > > > free_disk_gb | 199 | > > > > > > > > > free_ram_mb | 308787 | > > > > > > > > > local_gb | 219 | > > > > > > > > > local_gb_used | 20 | > > > > > > > > > memory_mb | 515443 | > > > > > > > > > memory_mb_used | 206656 | > > > > > > > > > running_vms | 7 | > > > > > > > > > vcpus | 126 | > > > > > > > > > vcpus_used | 49 | > > > > > > > > +----------------------+--------+ > > > > > > > > > > > > > > > > Regards, > > > > > > > > Danishka > > > > > > From marios at redhat.com Tue Jan 4 13:08:38 2022 From: marios at redhat.com (Marios Andreou) Date: Tue, 4 Jan 2022 15:08:38 +0200 Subject: [TripleO] Douglas Viroel for tripleo-ci core Message-ID: Hello TripleO ( & happy new year :) \o/ ) I'd like to propose Douglas Viroel [1] for core on the tripleo-ci repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, openstack/tripleo-quickstart, openstack/tripleo-repos). Doug joined the team last year and besides his code contributions he has also been consistently providing many very useful and thoughtful code reviews. I think he will be an excellent addition to the ci core team. As is customary, let's leave this thread open for a week and if there are no objections or other concerns then we add Doug to the core group next week. thanks, marios [1] https://review.opendev.org/q/owner:viroel%2540gmail.com From chkumar at redhat.com Tue Jan 4 13:17:48 2022 From: chkumar at redhat.com (Chandan Kumar) Date: Tue, 4 Jan 2022 18:47:48 +0530 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: Message-ID: On Tue, Jan 4, 2022 at 6:46 PM Marios Andreou wrote: > > Hello TripleO ( & happy new year :) \o/ ) > > I'd like to propose Douglas Viroel [1] for core on the tripleo-ci > repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, > openstack/tripleo-quickstart, openstack/tripleo-repos). > > Doug joined the team last year and besides his code contributions he > has also been consistently providing many very useful and thoughtful > code reviews. I think he will be an excellent addition to the ci core > team. > > As is customary, let's leave this thread open for a week and if there > are no objections or other concerns then we add Doug to the core group > next week. > + 2 for Doug :-) Thanks, Chandan Kumar From sandeepggn93 at gmail.com Tue Jan 4 13:19:25 2022 From: sandeepggn93 at gmail.com (Sandeep Yadav) Date: Tue, 4 Jan 2022 18:49:25 +0530 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: Message-ID: +1 On Tue, Jan 4, 2022 at 6:46 PM Marios Andreou wrote: > Hello TripleO ( & happy new year :) \o/ ) > > I'd like to propose Douglas Viroel [1] for core on the tripleo-ci > repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, > openstack/tripleo-quickstart, openstack/tripleo-repos). > > Doug joined the team last year and besides his code contributions he > has also been consistently providing many very useful and thoughtful > code reviews. I think he will be an excellent addition to the ci core > team. > > As is customary, let's leave this thread open for a week and if there > are no objections or other concerns then we add Doug to the core group > next week. > > thanks, marios > > [1] https://review.opendev.org/q/owner:viroel%2540gmail.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sshnaidm at redhat.com Tue Jan 4 13:24:53 2022 From: sshnaidm at redhat.com (Sagi Shnaidman) Date: Tue, 4 Jan 2022 15:24:53 +0200 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: Message-ID: +1 On Tue, Jan 4, 2022 at 3:17 PM Marios Andreou wrote: > Hello TripleO ( & happy new year :) \o/ ) > > I'd like to propose Douglas Viroel [1] for core on the tripleo-ci > repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, > openstack/tripleo-quickstart, openstack/tripleo-repos). > > Doug joined the team last year and besides his code contributions he > has also been consistently providing many very useful and thoughtful > code reviews. I think he will be an excellent addition to the ci core > team. > > As is customary, let's leave this thread open for a week and if there > are no objections or other concerns then we add Doug to the core group > next week. > > thanks, marios > > [1] https://review.opendev.org/q/owner:viroel%2540gmail.com > > > -- Best regards Sagi Shnaidman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ozzzo at yahoo.com Tue Jan 4 14:01:21 2022 From: ozzzo at yahoo.com (Albert Braden) Date: Tue, 4 Jan 2022 14:01:21 +0000 (UTC) Subject: [ops] [kolla] RabbitMQ High Availability In-Reply-To: <1335760337.3548170.1639680236968@mail.yahoo.com> References: <5dd6d28f-9955-7ca5-0ab8-0c5ce11ceb7e@redhat.com> <14084e87df22458caa7668eea8b496b6@verisign.com> <1147779219.1274196.1639086048233@mail.yahoo.com> <986294621.1553814.1639155002132@mail.yahoo.com> <169252651.2819859.1639516226823@mail.yahoo.com> <1335760337.3548170.1639680236968@mail.yahoo.com> Message-ID: <33441648.1434581.1641304881681@mail.yahoo.com> Now that the holidays are over I'm trying this one again. Can anyone help me figure out how to set "expires" and "message-ttl" ? On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden wrote: I tried these policies in ansible/roles/rabbitmq/templates/definitions.json.j2: "policies":[ {"vhost": "/", "name": "ha-all", "pattern": '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "/", "name": "notifications-ttl", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"message-ttl":600}, "priority":0} {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"expire":3600}, "priority":0} {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} But I still see unconsumed messages lingering in notifications_extractor.info. From reading the docs I think this setting should cause messages to expire after 600 seconds, and unused queues to be deleted after 3600 seconds. What am I missing? On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden wrote: Following [1] I successfully set "amqp_durable_queues = True" and restricted HA to the appropriate queues, but I'm having trouble with some of the other settings such as "expires" and "message-ttl". Does anyone have an example of a working kolla config that includes these changes? [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud wrote: So, your config snippet LGTM. Le?ven. 10 d?c. 2021 ??17:50, Albert Braden a ?crit?: Sorry, that was a transcription error. I thought "True" and my fingers typed "False." The correct lines are: [oslo_messaging_rabbit] amqp_durable_queues = True On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud wrote: If you plan to let `amqp_durable_queues = False` (i.e if you plan to keep this config equal to false), then you don't need to add these config lines as this is already the default value [1]. [1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34 Le?jeu. 9 d?c. 2021 ??22:40, Albert Braden a ?crit?: Replying from my home email because I've been asked to not email the list from my work email anymore, until I get permission from upper management. I'm not sure I follow. I was planning to add 2 lines to etc/kolla/config/global.conf: [oslo_messaging_rabbit] amqp_durable_queues = False Is that not sufficient? What is involved in configuring dedicated control exchanges for each service? What would that look like in the config? From: Herve Beraud Sent: Thursday, December 9, 2021 2:45 AM To: Bogdan Dobrelya Cc: openstack-discuss at lists.openstack.org Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability ? Caution:?This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.? ? ? Le?mer. 8 d?c. 2021 ??11:48, Bogdan Dobrelya a ?crit?: Please see inline >> I read this with great interest because we are seeing this issue. Questions: >> >> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? >> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? >> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into? > > Note that even having rabbit HA policies adjusted like that and its HA > replication factor [0] decreased (e.g. to a 2), there still might be > high churn caused by a large enough number of replicated durable RPC > topic queues. And that might cripple the cloud down with the incurred > I/O overhead because a durable queue requires all messages in it to be > persisted to a disk (for all the messaging cluster replicas) before they > are ack'ed by the broker. > > Given that said, Oslo messaging would likely require a more granular > control for topic exchanges and the durable queues flag - to tell it to > declare as durable only the most critical paths of a service. A single > config setting and a single control exchange per a service might be not > enough. Also note that therefore, amqp_durable_queue=True requires dedicated control exchanges configured for each service. Those that use 'openstack' as a default cannot turn the feature ON. Changing it to a service specific might also cause upgrade impact, as described in the topic [3]. ? The same is true for `amqp_auto_delete=True`. That requires dedicated control exchanges else it won't work if each service defines its own policy on a shared control exchange (e.g `openstack`) and if policies differ from each other. ? [3] https://review.opendev.org/q/topic:scope-config-opts > > There are also race conditions with durable queues enabled, like [1]. A > solution could be where each service declare its own dedicated control > exchange with its own configuration. > > Finally, openstack components should add perhaps a *.next CI job to test > it with durable queues, like [2] > > [0] https://www.rabbitmq.com/ha.html#replication-factor > > [1] > https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt > > [2] https://review.opendev.org/c/openstack/nova/+/820523 > >> >> Does anyone have a sample set of RMQ config files that they can share? >> >> It looks like my Outlook has ruined the link; reposting: >> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando -- Best regards, Bogdan Dobrelya, Irc #bogdando -- Herv? Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/ https://twitter.com/4383hberaud -- Herv? BeraudSenior Software Engineer at Red Hatirc: hberaudhttps://github.com/4383/https://twitter.com/4383hberaud -- Herv? BeraudSenior Software Engineer at Red Hatirc: hberaudhttps://github.com/4383/https://twitter.com/4383hberaud -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyarwood at redhat.com Tue Jan 4 14:23:59 2022 From: lyarwood at redhat.com (Lee Yarwood) Date: Tue, 4 Jan 2022 14:23:59 +0000 Subject: [nova] Getting away from cdrkit / genisoimage In-Reply-To: <6494f3ac-cfe3-d028-33d6-05ab9b625d67@debian.org> References: <6494f3ac-cfe3-d028-33d6-05ab9b625d67@debian.org> Message-ID: <20220104142359.klvp5zptiab2x5qd@lyarwood-laptop.usersys.redhat.com> On 27-12-21 11:44:56, Thomas Goirand wrote: > Hi, > > Please see: > https://bugs.debian.org/982241 > > It looks like cdrkit / genisoimage wont be available in the next Debian > 12 (and therefore probably it will also be dropped in Ubuntu). As per > the bug report: > > "I'm told xorriso and libburnia are alternatives and are alive and doing > well." > > Would it be possible to switch to that? Yeah, xorriso provides a drop in replacement for genisoimage in the form of mkisofs so you can either default to that in your nova-dist.conf or deployment tooling as TripleO is doing for el9 based hosts. I also have a WIP change up to move the default of [configdrive]mkisofs_cmd in Nova to mkisofs during Yoga. WIP configdrive: Move mkisofs_cmd default to mkisofs https://review.opendev.org/q/Ibd8356665f47326b05b56878a36513fda183fe6c Cheers, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From fungi at yuggoth.org Tue Jan 4 14:42:36 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jan 2022 14:42:36 +0000 Subject: [neutron] CI meeting agenda - 4.01.2022 In-Reply-To: <13042001.uLZWGnKmhe@p1> References: <13042001.uLZWGnKmhe@p1> Message-ID: <20220104144235.ayowj2kfmz7klp6e@yuggoth.org> On 2022-01-04 12:26:08 +0100 (+0100), Slawek Kaplonski wrote: > It's just reminder that we will today have Neutron CI meeting. It > will be at 3pm UTC time. Meeting will be on the #openstack-neutron > channel and on jiitsi: > https://meetpad.opendev.org/neutron-ci-meetings [...] Sorry for spotting this so late, but OpenDev's Meetpad service is still offline as we're awaiting a new Jitsi-Meet stable release without the Log4j library[*]. In the meantime, you could use the free https://meet.jit.si/ service instead. [*] https://community.jitsi.org/t/108844 -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ralonsoh at redhat.com Tue Jan 4 14:46:49 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Tue, 4 Jan 2022 15:46:49 +0100 Subject: [neutron] Bug deputy December 27 Message-ID: Hello: These are the bugs opened during the last week: https://bugs.launchpad.net/neutron/+bug/1956034: "ovn load balancer health monitor cause mac address conflict". Unassigned. https://bugs.launchpad.net/neutron/+bug/1956035: "ovn load balancer member failover not working when accessed from floating ip". Unassigned. https://bugs.launchpad.net/neutron/+bug/1955799: "Not changing floatingip status while deleting instance that had floatingip assocciate (Showing ACTIVE)". Patch: https://review.opendev.org/c/openstack/neutron/+/822091 Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From akahat at redhat.com Tue Jan 4 16:17:39 2022 From: akahat at redhat.com (Amol Kahat) Date: Tue, 4 Jan 2022 21:47:39 +0530 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: Message-ID: +1 On Tue, Jan 4, 2022 at 6:40 PM Marios Andreou wrote: > Hello TripleO ( & happy new year :) \o/ ) > > I'd like to propose Douglas Viroel [1] for core on the tripleo-ci > repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, > openstack/tripleo-quickstart, openstack/tripleo-repos). > > Doug joined the team last year and besides his code contributions he > has also been consistently providing many very useful and thoughtful > code reviews. I think he will be an excellent addition to the ci core > team. > > As is customary, let's leave this thread open for a week and if there > are no objections or other concerns then we add Doug to the core group > next week. > > thanks, marios > > [1] https://review.opendev.org/q/owner:viroel%2540gmail.com > > > -- *Amol Kahat* Software Engineer *Red Hat India Pvt. Ltd. Pune, India.* akahat at redhat.com B764 E6F8 F4C1 A1AF 816C 6840 FDD3 BA6C 832D 7715 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlandy at redhat.com Tue Jan 4 16:26:03 2022 From: rlandy at redhat.com (Ronelle Landy) Date: Tue, 4 Jan 2022 11:26:03 -0500 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: Message-ID: +1 absolutely On Tue, Jan 4, 2022 at 11:25 AM Amol Kahat wrote: > +1 > > On Tue, Jan 4, 2022 at 6:40 PM Marios Andreou wrote: > >> Hello TripleO ( & happy new year :) \o/ ) >> >> I'd like to propose Douglas Viroel [1] for core on the tripleo-ci >> repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, >> openstack/tripleo-quickstart, openstack/tripleo-repos). >> >> Doug joined the team last year and besides his code contributions he >> has also been consistently providing many very useful and thoughtful >> code reviews. I think he will be an excellent addition to the ci core >> team. >> >> As is customary, let's leave this thread open for a week and if there >> are no objections or other concerns then we add Doug to the core group >> next week. >> >> thanks, marios >> >> [1] https://review.opendev.org/q/owner:viroel%2540gmail.com >> >> >> > > -- > *Amol Kahat* > Software Engineer > *Red Hat India Pvt. Ltd. Pune, India.* > akahat at redhat.com > B764 E6F8 F4C1 A1AF 816C 6840 FDD3 BA6C 832D 7715 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.macnaughton at canonical.com Tue Jan 4 16:32:57 2022 From: chris.macnaughton at canonical.com (Chris MacNaughton) Date: Tue, 4 Jan 2022 10:32:57 -0600 Subject: [charms] nominate Hemanth Nakkina for charms-core In-Reply-To: <3c167485-421c-4861-bbad-6f588b923ee9@gmail.com> References: <3c167485-421c-4861-bbad-6f588b923ee9@gmail.com> Message-ID: Trying this again with the ML added: On Fri, Dec 10, 2021 at 7:28 AM Edward Hope-Morley wrote: > I would like propose Hemanth Nakkina for charms-core. Hemanth has really > stepped up his efforts to provide invaluable contributions both in code > and reviews and I beleive he is in a good position to be considered for > core where he will be able to provide additional value to the project. > Hemanth has demonstrated rigour for quality through his contributions > and is always keen to help others with his deep knowledge of the charms > and Openstack. Here is a list of all his contributions: > > patches: > https://review.opendev.org/q/owner:hemanth.nakkina%2540canonical.com > reviews: > https://review.opendev.org/q/reviewedby:hemanth.nakkina%2540canonical.com > > I hope you will join me in voting for Hemanth. > > - Ed > > +1 from me. Hemanth has consistently provided thoughtful reviews on charms, in addition to proposing quality changes, as well as responding to reviews. Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From frode.nordahl at canonical.com Tue Jan 4 16:33:34 2022 From: frode.nordahl at canonical.com (Frode Nordahl) Date: Tue, 4 Jan 2022 17:33:34 +0100 Subject: [charms] nominate Hemanth Nakkina for charms-core In-Reply-To: References: <3c167485-421c-4861-bbad-6f588b923ee9@gmail.com> Message-ID: On Fri, Dec 10, 2021 at 6:12 PM Billy Olsen wrote: > > > > On Fri, Dec 10, 2021 at 7:08 AM Alex Kavanagh wrote: >> >> >> >> On Fri, Dec 10, 2021 at 1:24 PM Edward Hope-Morley wrote: >>> >>> I would like propose Hemanth Nakkina for charms-core. Hemanth has really >>> stepped up his efforts to provide invaluable contributions both in code >>> and reviews and I beleive he is in a good position to be considered for >>> core where he will be able to provide additional value to the project. >>> Hemanth has demonstrated rigour for quality through his contributions >>> and is always keen to help others with his deep knowledge of the charms >>> and Openstack. Here is a list of all his contributions: >>> >>> patches: >>> https://review.opendev.org/q/owner:hemanth.nakkina%2540canonical.com >>> reviews: >>> https://review.opendev.org/q/reviewedby:hemanth.nakkina%2540canonical.com >>> >>> I hope you will join me in voting for Hemanth. >> >> >> +1 from me; Hemanth has submitted good patches and thoughtful reviews in my experience. He would be a good member of charms-core. >> >> Cheers >> Alex. >> > > I concur with Alex here. +1 from me as well. I concur with Alex and Billy here. I would like to add that it has been a pleasure to work with Hemanth to resolve some really hard problems and I look forward to what he can bring to the team and project in the capacity of being a charms core. -- Frode Nordahl > - Billy > >>> >>> >>> - Ed >>> >>> >> >> >> -- >> Alex Kavanagh - Software Engineer >> OpenStack Engineering - Data Centre Development - Canonical Ltd From corellianimports at aol.com Tue Jan 4 17:00:53 2022 From: corellianimports at aol.com (Jason Poulin) Date: Tue, 4 Jan 2022 12:00:53 -0500 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: Message-ID: Can someone take me off this list. I don?t know why I?m on it. Please. > On Jan 4, 2022, at 11:32 AM, Ronelle Landy wrote: > > ? > +1 absolutely > >> On Tue, Jan 4, 2022 at 11:25 AM Amol Kahat wrote: >> +1 >> >>> On Tue, Jan 4, 2022 at 6:40 PM Marios Andreou wrote: >>> Hello TripleO ( & happy new year :) \o/ ) >>> >>> I'd like to propose Douglas Viroel [1] for core on the tripleo-ci >>> repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, >>> openstack/tripleo-quickstart, openstack/tripleo-repos). >>> >>> Doug joined the team last year and besides his code contributions he >>> has also been consistently providing many very useful and thoughtful >>> code reviews. I think he will be an excellent addition to the ci core >>> team. >>> >>> As is customary, let's leave this thread open for a week and if there >>> are no objections or other concerns then we add Doug to the core group >>> next week. >>> >>> thanks, marios >>> >>> [1] https://review.opendev.org/q/owner:viroel%2540gmail.com >>> >>> >> >> >> -- >> Amol Kahat >> Software Engineer >> Red Hat India Pvt. Ltd. Pune, India. >> akahat at redhat.com >> B764 E6F8 F4C1 A1AF 816C 6840 FDD3 BA6C 832D 7715 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alee at redhat.com Tue Jan 4 17:06:05 2022 From: alee at redhat.com (Ade Lee) Date: Tue, 4 Jan 2022 12:06:05 -0500 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: Message-ID: +1 On Tue, Jan 4, 2022 at 11:37 AM Ronelle Landy wrote: > +1 absolutely > > On Tue, Jan 4, 2022 at 11:25 AM Amol Kahat wrote: > >> +1 >> >> On Tue, Jan 4, 2022 at 6:40 PM Marios Andreou wrote: >> >>> Hello TripleO ( & happy new year :) \o/ ) >>> >>> I'd like to propose Douglas Viroel [1] for core on the tripleo-ci >>> repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, >>> openstack/tripleo-quickstart, openstack/tripleo-repos). >>> >>> Doug joined the team last year and besides his code contributions he >>> has also been consistently providing many very useful and thoughtful >>> code reviews. I think he will be an excellent addition to the ci core >>> team. >>> >>> As is customary, let's leave this thread open for a week and if there >>> are no objections or other concerns then we add Doug to the core group >>> next week. >>> >>> thanks, marios >>> >>> [1] https://review.opendev.org/q/owner:viroel%2540gmail.com >>> >>> >>> >> >> -- >> *Amol Kahat* >> Software Engineer >> *Red Hat India Pvt. Ltd. Pune, India.* >> akahat at redhat.com >> B764 E6F8 F4C1 A1AF 816C 6840 FDD3 BA6C 832D 7715 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Jan 4 17:22:39 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jan 2022 17:22:39 +0000 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: Message-ID: <20220104172238.745wrbo2rpggfcet@yuggoth.org> On 2022-01-04 12:00:53 -0500 (-0500), Jason Poulin wrote: > Can someone take me off this list. I don?t know why I?m on it. Please. [...] I've unsubscribed this user; it appears an attacker managed to brute-force a mailman confirmation key for a subscription request. This hole should hopefully be plugged once we migrate to Mailman v3, which employs stronger hashes for subscription confirmations. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jp.methot at planethoster.info Tue Jan 4 19:17:37 2022 From: jp.methot at planethoster.info (J-P Methot) Date: Tue, 4 Jan 2022 14:17:37 -0500 Subject: [kolla] Updating libvirt container images without VM downtime Message-ID: <36d9fbdb-a582-a766-2263-823ed5ab4959@planethoster.info> Hi, I'm looking for validation regarding the way Kolla and containers work in regard to upgrading the libvirt containers. Essentially, when you upgrade the libvirt container to a new container image, the container needs to be restarted, thus creating downtime for the VMs. There is no way to avoid this downtime, unless you migrate the VMs to another node and then move them back once the container has restarted, right? -- Jean-Philippe M?thot Senior Openstack system administrator Administrateur syst?me Openstack s?nior PlanetHoster inc. From stephane.chalansonnet at acoss.fr Tue Jan 4 19:48:25 2022 From: stephane.chalansonnet at acoss.fr (=?iso-8859-1?Q?CHALANSONNET_St=E9phane_=28Acoss=29?=) Date: Tue, 4 Jan 2022 19:48:25 +0000 Subject: Subject: [kolla] Updating libvirt container images without VM downtime Message-ID: Hello, When you update or restart libvirt container , the instances aren't restart . The qemu process /usr/libexec/qemu-kvm is executed outside the container. However for the lastest functionality you need to restart your instances later Regards, stephane.chalansonnet at acoss.fr -----Message d'origine----- De?: openstack-discuss-request at lists.openstack.org Envoy??: mardi 4 janvier 2022 20:18 ??: openstack-discuss at lists.openstack.org Objet?: openstack-discuss Digest, Vol 39, Issue 11 Send openstack-discuss mailing list submissions to openstack-discuss at lists.openstack.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss or, via email, send a message with subject or body 'help' to openstack-discuss-request at lists.openstack.org You can reach the person managing the list at openstack-discuss-owner at lists.openstack.org When replying, please edit your Subject line so it is more specific than "Re: Contents of openstack-discuss digest..." Today's Topics: 1. Re: [charms] nominate Hemanth Nakkina for charms-core (Frode Nordahl) 2. Re: [TripleO] Douglas Viroel for tripleo-ci core (Jason Poulin) 3. Re: [TripleO] Douglas Viroel for tripleo-ci core (Ade Lee) 4. Re: [TripleO] Douglas Viroel for tripleo-ci core (Jeremy Stanley) 5. [kolla] Updating libvirt container images without VM downtime (J-P Methot) ---------------------------------------------------------------------- Message: 1 Date: Tue, 4 Jan 2022 17:33:34 +0100 From: Frode Nordahl To: Billy Olsen Cc: Alex Kavanagh , openstack-discuss at lists.openstack.org Subject: Re: [charms] nominate Hemanth Nakkina for charms-core Message-ID: Content-Type: text/plain; charset="UTF-8" On Fri, Dec 10, 2021 at 6:12 PM Billy Olsen wrote: > > > > On Fri, Dec 10, 2021 at 7:08 AM Alex Kavanagh wrote: >> >> >> >> On Fri, Dec 10, 2021 at 1:24 PM Edward Hope-Morley wrote: >>> >>> I would like propose Hemanth Nakkina for charms-core. Hemanth has >>> really stepped up his efforts to provide invaluable contributions >>> both in code and reviews and I beleive he is in a good position to >>> be considered for core where he will be able to provide additional value to the project. >>> Hemanth has demonstrated rigour for quality through his >>> contributions and is always keen to help others with his deep >>> knowledge of the charms and Openstack. Here is a list of all his contributions: >>> >>> patches: >>> https://review.opendev.org/q/owner:hemanth.nakkina%2540canonical.com >>> reviews: >>> https://review.opendev.org/q/reviewedby:hemanth.nakkina%2540canonica >>> l.com >>> >>> I hope you will join me in voting for Hemanth. >> >> >> +1 from me; Hemanth has submitted good patches and thoughtful reviews in my experience. He would be a good member of charms-core. >> >> Cheers >> Alex. >> > > I concur with Alex here. +1 from me as well. I concur with Alex and Billy here. I would like to add that it has been a pleasure to work with Hemanth to resolve some really hard problems and I look forward to what he can bring to the team and project in the capacity of being a charms core. -- Frode Nordahl > - Billy > >>> >>> >>> - Ed >>> >>> >> >> >> -- >> Alex Kavanagh - Software Engineer >> OpenStack Engineering - Data Centre Development - Canonical Ltd ------------------------------ Message: 2 Date: Tue, 4 Jan 2022 12:00:53 -0500 From: Jason Poulin To: Ronelle Landy Cc: Amol Kahat , Marios Andreou , openstack-discuss Subject: Re: [TripleO] Douglas Viroel for tripleo-ci core Message-ID: Content-Type: text/plain; charset="utf-8" Can someone take me off this list. I don?t know why I?m on it. Please. > On Jan 4, 2022, at 11:32 AM, Ronelle Landy wrote: > > ? > +1 absolutely > >> On Tue, Jan 4, 2022 at 11:25 AM Amol Kahat wrote: >> +1 >> >>> On Tue, Jan 4, 2022 at 6:40 PM Marios Andreou wrote: >>> Hello TripleO ( & happy new year :) \o/ ) >>> >>> I'd like to propose Douglas Viroel [1] for core on the tripleo-ci >>> repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, >>> openstack/tripleo-quickstart, openstack/tripleo-repos). >>> >>> Doug joined the team last year and besides his code contributions he >>> has also been consistently providing many very useful and thoughtful >>> code reviews. I think he will be an excellent addition to the ci >>> core team. >>> >>> As is customary, let's leave this thread open for a week and if >>> there are no objections or other concerns then we add Doug to the >>> core group next week. >>> >>> thanks, marios >>> >>> [1] https://review.opendev.org/q/owner:viroel%2540gmail.com >>> >>> >> >> >> -- >> Amol Kahat >> Software Engineer >> Red Hat India Pvt. Ltd. Pune, India. >> akahat at redhat.com >> B764 E6F8 F4C1 A1AF 816C 6840 FDD3 BA6C 832D 7715 -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Tue, 4 Jan 2022 12:06:05 -0500 From: Ade Lee To: Ronelle Landy Cc: Amol Kahat , Marios Andreou , openstack-discuss Subject: Re: [TripleO] Douglas Viroel for tripleo-ci core Message-ID: Content-Type: text/plain; charset="utf-8" +1 On Tue, Jan 4, 2022 at 11:37 AM Ronelle Landy wrote: > +1 absolutely > > On Tue, Jan 4, 2022 at 11:25 AM Amol Kahat wrote: > >> +1 >> >> On Tue, Jan 4, 2022 at 6:40 PM Marios Andreou wrote: >> >>> Hello TripleO ( & happy new year :) \o/ ) >>> >>> I'd like to propose Douglas Viroel [1] for core on the tripleo-ci >>> repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, >>> openstack/tripleo-quickstart, openstack/tripleo-repos). >>> >>> Doug joined the team last year and besides his code contributions he >>> has also been consistently providing many very useful and thoughtful >>> code reviews. I think he will be an excellent addition to the ci >>> core team. >>> >>> As is customary, let's leave this thread open for a week and if >>> there are no objections or other concerns then we add Doug to the >>> core group next week. >>> >>> thanks, marios >>> >>> [1] https://review.opendev.org/q/owner:viroel%2540gmail.com >>> >>> >>> >> >> -- >> *Amol Kahat* >> Software Engineer >> *Red Hat India Pvt. Ltd. Pune, India.* akahat at redhat.com >> B764 E6F8 F4C1 A1AF 816C 6840 FDD3 BA6C 832D 7715 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 4 Date: Tue, 4 Jan 2022 17:22:39 +0000 From: Jeremy Stanley To: openstack-discuss at lists.openstack.org Subject: Re: [TripleO] Douglas Viroel for tripleo-ci core Message-ID: <20220104172238.745wrbo2rpggfcet at yuggoth.org> Content-Type: text/plain; charset="utf-8" On 2022-01-04 12:00:53 -0500 (-0500), Jason Poulin wrote: > Can someone take me off this list. I don?t know why I?m on it. Please. [...] I've unsubscribed this user; it appears an attacker managed to brute-force a mailman confirmation key for a subscription request. This hole should hopefully be plugged once we migrate to Mailman v3, which employs stronger hashes for subscription confirmations. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: ------------------------------ Message: 5 Date: Tue, 4 Jan 2022 14:17:37 -0500 From: J-P Methot To: openstack-discuss Subject: [kolla] Updating libvirt container images without VM downtime Message-ID: <36d9fbdb-a582-a766-2263-823ed5ab4959 at planethoster.info> Content-Type: text/plain; charset=utf-8; format=flowed Hi, I'm looking for validation regarding the way Kolla and containers work in regard to upgrading the libvirt containers. Essentially, when you upgrade the libvirt container to a new container image, the container needs to be restarted, thus creating downtime for the VMs. There is no way to avoid this downtime, unless you migrate the VMs to another node and then move them back once the container has restarted, right? -- Jean-Philippe M?thot Senior Openstack system administrator Administrateur syst?me Openstack s?nior PlanetHoster inc. ------------------------------ Subject: Digest Footer _______________________________________________ openstack-discuss mailing list openstack-discuss at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss ------------------------------ End of openstack-discuss Digest, Vol 39, Issue 11 ************************************************* From felipe.reyes at canonical.com Tue Jan 4 20:17:48 2022 From: felipe.reyes at canonical.com (Felipe Reyes) Date: Tue, 04 Jan 2022 17:17:48 -0300 Subject: [charms] nominate Hemanth Nakkina for charms-core In-Reply-To: <3c167485-421c-4861-bbad-6f588b923ee9@gmail.com> References: <3c167485-421c-4861-bbad-6f588b923ee9@gmail.com> Message-ID: <10114a748f9bc4c55595e8e6491714b7fcc4020f.camel@canonical.com> On Fri, 2021-12-10 at 13:20 +0000, Edward Hope-Morley wrote: > I would like propose Hemanth Nakkina for charms-core. Hemanth has > really > stepped up his efforts to provide invaluable contributions both in > code > and reviews and I beleive he is in a good position to be considered > for > core where he will be able to provide additional value to the > project. > and is always keen to help others with his deep knowledge of the > charms > and Openstack. Here is a list of all his contributions: > > patches: > https://review.opendev.org/q/owner:hemanth.nakkina%2540canonical.com > reviews: > https://review.opendev.org/q/reviewedby:hemanth.nakkina%2540canonical.com > > I hope you will join me in voting for Hemanth. > +1 , Hemanth has submitted good patches and it's been a pleasure to discuss them with him, he has shown good understanding of the problems and open to different solutions :-) He would do great in the charms- core group. Best, -- Felipe Reyes Software Engineer @ Canonical # Email: felipe.reyes at canonical.com (GPG:0x9B1FFF39) # Launchpad: ~freyes | IRC: freyes From satish.txt at gmail.com Tue Jan 4 20:30:14 2022 From: satish.txt at gmail.com (Satish Patel) Date: Tue, 4 Jan 2022 15:30:14 -0500 Subject: openstack-kolla with GlusterFS integration Message-ID: Folks, Recently I have started playing with the Openstack Kolla Project because of some requirements. I want to integrate kolla with GlusterFS for nova vm backend storage. I have functional Glsuter storage and want to mount on compute node to use for vm backend but not sure how do i do that because i am new to containers I can see kolla use following to bind volume with nova_compute container { "Type": "volume", "Name": "nova_compute", "Source": "/var/lib/docker/volumes/nova_compute/_data", "Destination": "/var/lib/nova", "Driver": "local", "Mode": "rw", "RW": true, "Propagation": "" } If I can mount GlusterFS in /mnt directory then how do I tell docker to use /mnt to create volume for nova_compute? From zaitcev at redhat.com Tue Jan 4 20:37:55 2022 From: zaitcev at redhat.com (Pete Zaitcev) Date: Tue, 4 Jan 2022 14:37:55 -0600 Subject: [infra] Missing releases from opendev.org/opendev/git-review/tags Message-ID: <20220104143755.439cce67@niphredil.zaitcev.lan> Greetings: I noticed that the list of tags for git-review in the website does not include the 2.2.x tags, which exist in git: [zaitcev at niphredil git-review-0]$ grep url .git/config url = https://opendev.org/opendev/git-review.git [zaitcev at niphredil git-review-0]$ git tag -l --format='%(objectname) %(objecttype) %(refname)'| grep 2\\.2 bb81051c2481d4ac983ec0b28bd62b590d361141 tag refs/tags/2.2.0 [zaitcev at niphredil git-review-0]$ git show bb81051c2481d4ac983ec0b28bd62b590d361141 | head -5 tag 2.2.0 Tagger: Jeremy Stanley Date: Wed Nov 24 02:20:05 2021 +0000 Release 2.2.0 [zaitcev at niphredil git-review-0]$ The 2.2.0 is missing from https://opendev.org/opendev/git-review/tags Coincidentally, it is also missing from https://tarballs.opendev.org/openstack/git-review/ Is this something that the infrastructure team did on purpose? Or is this just a bug to be reported? -- Pete From radoslaw.piliszek at gmail.com Tue Jan 4 20:38:52 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 4 Jan 2022 21:38:52 +0100 Subject: openstack-kolla with GlusterFS integration In-Reply-To: References: Message-ID: You want to override "nova_instance_datadir_volume" to point to your mount path, in this case /mnt -yoctozepto On Tue, 4 Jan 2022 at 21:31, Satish Patel wrote: > > Folks, > > Recently I have started playing with the Openstack Kolla Project > because of some requirements. I want to integrate kolla with GlusterFS > for nova vm backend storage. > > I have functional Glsuter storage and want to mount on compute node to > use for vm backend but not sure how do i do that because i am new to > containers > > I can see kolla use following to bind volume with nova_compute container > > { > "Type": "volume", > "Name": "nova_compute", > "Source": "/var/lib/docker/volumes/nova_compute/_data", > "Destination": "/var/lib/nova", > "Driver": "local", > "Mode": "rw", > "RW": true, > "Propagation": "" > } > > If I can mount GlusterFS in /mnt directory then how do I tell docker > to use /mnt to create volume for nova_compute? > From satish.txt at gmail.com Tue Jan 4 20:55:12 2022 From: satish.txt at gmail.com (Satish Patel) Date: Tue, 4 Jan 2022 15:55:12 -0500 Subject: openstack-kolla with GlusterFS integration In-Reply-To: References: Message-ID: Thank you, so just add this override in globals.yml correct? Do I need to destroy and recreate the container? On Tue, Jan 4, 2022 at 3:39 PM Rados?aw Piliszek wrote: > > You want to override "nova_instance_datadir_volume" to point to your > mount path, in this case /mnt > > -yoctozepto > > On Tue, 4 Jan 2022 at 21:31, Satish Patel wrote: > > > > Folks, > > > > Recently I have started playing with the Openstack Kolla Project > > because of some requirements. I want to integrate kolla with GlusterFS > > for nova vm backend storage. > > > > I have functional Glsuter storage and want to mount on compute node to > > use for vm backend but not sure how do i do that because i am new to > > containers > > > > I can see kolla use following to bind volume with nova_compute container > > > > { > > "Type": "volume", > > "Name": "nova_compute", > > "Source": "/var/lib/docker/volumes/nova_compute/_data", > > "Destination": "/var/lib/nova", > > "Driver": "local", > > "Mode": "rw", > > "RW": true, > > "Propagation": "" > > } > > > > If I can mount GlusterFS in /mnt directory then how do I tell docker > > to use /mnt to create volume for nova_compute? > > From fungi at yuggoth.org Tue Jan 4 21:20:24 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jan 2022 21:20:24 +0000 Subject: [infra] Missing releases from opendev.org/opendev/git-review/tags In-Reply-To: <20220104143755.439cce67@niphredil.zaitcev.lan> References: <20220104143755.439cce67@niphredil.zaitcev.lan> Message-ID: <20220104212023.cyxeh3djvoozkkeo@yuggoth.org> On 2022-01-04 14:37:55 -0600 (-0600), Pete Zaitcev wrote: > I noticed that the list of tags for git-review in the website > does not include the 2.2.x tags, which exist in git: [...] > The 2.2.0 is missing from > https://opendev.org/opendev/git-review/tags > > Coincidentally, it is also missing from > https://tarballs.opendev.org/openstack/git-review/ > > Is this something that the infrastructure team did on purpose? > Or is this just a bug to be reported? A little of both. For the first URL I suspect that's a bug in Gitea not correctly reflecting mirrored tags. I don't recall noticing this before, but it definitely seems like Gitea is serving that view from its database and not updating whenever a new tag is pushed in. It doesn't seem to only affect the git-review repo. For example, the most recent nova tag displayed is from 2019-06-18. Skimming the OpenDev meeting agendas from around then, it looks like roughly a month later we were rebuilding the Gitea server farm (all the machines currently seem to have creation dates between 2019-07-23 and 2019-07-29), so it's quite probable that recreating the database is the only thing which has been populating the tags table in their DB. We'll have to take a look at it and probably file a bug with the Gitea maintainers. As for the tarballs URL, we moved git-review from the openstack Git namespace to the opendev Git namespace, but don't appear to have set up a redirect for those files on the tarballs site nor moved them to the opendev tarballs directory in AFS like when we moved bindep. I'll work on cleaning that up, thanks for bringing it to my attention! We also moved gating of git-review from the openstack Zuul tenant to the opendev Zuul tenant, where tarballs site uploads are not included in the Python release job (though I'm not opposed to adding it, probably in a child job). It's expected that the place to obtain new git-review tarballs is PyPI, as mentioned in our release announcements on the service-announce mailing list: http://lists.opendev.org/pipermail/service-announce/2021-November/000028.html I agree, all of this is a bit confusing, and could stand to get fixed up. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From allison at openinfra.dev Tue Jan 4 21:16:14 2022 From: allison at openinfra.dev (Allison Price) Date: Tue, 4 Jan 2022 15:16:14 -0600 Subject: OpenInfra Live - January 6 at 9am CT Message-ID: <576473DD-1B56-4FD5-89B5-8E2132880C41@openinfra.dev> Hi everyone, This week, I will be hosting the first OpenInfra Live episode of 2022, brought to you by the OpenStack, Kata Containers, Zuul, and StarlingX communities! It's been one year since the formation of the OpenInfra Foundation and the global community has produced more software, more production use cases, more collaboration than ever before. Join us as we look back at 2021, the software milestones, production growth, and momentum across the OpenInfra landscape. Episode: A Year in Review: OpenInfra Highlights from 2021 Date and time: Thursday, January 6 at 9am CT (1500 UTC) You can watch us live on: YouTube LinkedIn Facebook WeChat: recording will be posted on OpenStack WeChat after the live stream Speakers: Ghanshyam Mann, OpenStack TC Chair Tao Peng, Kata Containers Architecture Committee Member James Blair, Zuul Maintainer Greg Waines, StarlingX Technical Steering Committee Member Have an idea for a future episode? Share it now at ideas.openinfra.live . Have you heard that the OpenInfra Summit is going back to Berlin, June 7-9? Registration and sponsorship opportunities are now available here: https://openinfra.dev/summit/ . Thanks, Allison -------------- next part -------------- An HTML attachment was scrubbed... URL: From zaitcev at redhat.com Tue Jan 4 22:13:17 2022 From: zaitcev at redhat.com (Pete Zaitcev) Date: Tue, 4 Jan 2022 16:13:17 -0600 Subject: [infra] Missing releases from opendev.org/opendev/git-review/tags In-Reply-To: <20220104212023.cyxeh3djvoozkkeo@yuggoth.org> References: <20220104143755.439cce67@niphredil.zaitcev.lan> <20220104212023.cyxeh3djvoozkkeo@yuggoth.org> Message-ID: <20220104161317.2403d25d@niphredil.zaitcev.lan> On Tue, 4 Jan 2022 21:20:24 +0000 Jeremy Stanley wrote: Thanks for the detailed explanations, I think I understood everything. One thing caught my attention: > It's expected that the place to obtain > new git-review tarballs is PyPI, as mentioned in our release > announcements on the service-announce mailing list: > > http://lists.opendev.org/pipermail/service-announce/2021-November/000028.html As it happens, just a short time back, I ran into an issue with PyPI.[1] Basically, it's possible to upload something there and nobody knows anything about it. Is that loss of audit trail a concern for our releases? -- Pete [1] https://zaitcev.livejournal.com/263602.html From fungi at yuggoth.org Tue Jan 4 23:11:50 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jan 2022 23:11:50 +0000 Subject: [infra] Missing releases from opendev.org/opendev/git-review/tags In-Reply-To: <20220104161317.2403d25d@niphredil.zaitcev.lan> References: <20220104143755.439cce67@niphredil.zaitcev.lan> <20220104212023.cyxeh3djvoozkkeo@yuggoth.org> <20220104161317.2403d25d@niphredil.zaitcev.lan> Message-ID: <20220104231150.msmad6jwikrfpy6j@yuggoth.org> On 2022-01-04 16:13:17 -0600 (-0600), Pete Zaitcev wrote: [...] > As it happens, just a short time back, I ran into an issue with > PyPI.[1] Basically, it's possible to upload something there and > nobody knows anything about it. Is that loss of audit trail a > concern for our releases? > > -- Pete > > [1] https://zaitcev.livejournal.com/263602.html That sounds like one of the nose maintainers uploaded a broken file to PyPI, or someone compromised one of their accounts, or hijacked the upload mechanism they were relying on. I'm not sure it's evidence that PyPI itself is untrustworthy, the same can happen (and has) in other places like NPM... really any artifact registry is susceptible if there are no cryptographic signatures or external checksums to validate the files, or if the compromise happens earlier in automation than where checksums or signatures are generated for that matter. Was the altered code malicious? Did the maintainers publish a security advisory somewhere with details? The PyPI maintainers are generally willing to help investigate such incidents, and are in the process of pushing stronger authentication mechanisms (2FA for logins, separate upload tokens, TUF for artifact attestation). Anyway, back to the original topic, I don't think any of us were strongly against hosting copies of the release tarballs/wheels for OpenDev's Python-based utilities, we just hadn't taken the time to set up jobs to upload them anywhere besides PyPI nor decided on any sort of signing/attestation solution (reuse what we're doing for OpenStack with the OpenStack release signing key? Create a separate OpenDev release signing key and use that? Switch OpenStack's releases to an OpenDev signing key too? Do something other than OpenPGP signatures in the wake of the SKS WoT collapse?). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From zaitcev at redhat.com Tue Jan 4 23:56:46 2022 From: zaitcev at redhat.com (Pete Zaitcev) Date: Tue, 4 Jan 2022 17:56:46 -0600 Subject: [infra] Missing releases from opendev.org/opendev/git-review/tags In-Reply-To: <20220104231150.msmad6jwikrfpy6j@yuggoth.org> References: <20220104143755.439cce67@niphredil.zaitcev.lan> <20220104212023.cyxeh3djvoozkkeo@yuggoth.org> <20220104161317.2403d25d@niphredil.zaitcev.lan> <20220104231150.msmad6jwikrfpy6j@yuggoth.org> Message-ID: <20220104175646.2ced0e7b@niphredil.zaitcev.lan> On Tue, 4 Jan 2022 23:11:50 +0000 Jeremy Stanley wrote: > On 2022-01-04 16:13:17 -0600 (-0600), Pete Zaitcev wrote: > > [1] https://zaitcev.livejournal.com/263602.html > Was the altered code malicious? Not at all. It looked like an adaptation to py3 that went wrong. -- Pete From satish.txt at gmail.com Wed Jan 5 04:47:19 2022 From: satish.txt at gmail.com (Satish Patel) Date: Tue, 4 Jan 2022 23:47:19 -0500 Subject: openstack-kolla with GlusterFS integration In-Reply-To: References: Message-ID: This is what I did but it looks like there is a better way to run kolla-ansible on a specific role/service. I have added nova_instance_datadir_volume; /mnt in global.yml and run following command $ kolla-ansible -i all-in-one deploy Above command took 20 minute to finish because it went through each role. but i think i am missing something in documentation to run the above task in a limited way to just push out to a specific container/role or service. In my case it's nova_compute. kolla looks good but the documentation isn't great, especially related maintenance and daily operations. On Tue, Jan 4, 2022 at 3:55 PM Satish Patel wrote: > > Thank you, > > so just add this override in globals.yml correct? Do I need to destroy > and recreate the container? > > On Tue, Jan 4, 2022 at 3:39 PM Rados?aw Piliszek > wrote: > > > > You want to override "nova_instance_datadir_volume" to point to your > > mount path, in this case /mnt > > > > -yoctozepto > > > > On Tue, 4 Jan 2022 at 21:31, Satish Patel wrote: > > > > > > Folks, > > > > > > Recently I have started playing with the Openstack Kolla Project > > > because of some requirements. I want to integrate kolla with GlusterFS > > > for nova vm backend storage. > > > > > > I have functional Glsuter storage and want to mount on compute node to > > > use for vm backend but not sure how do i do that because i am new to > > > containers > > > > > > I can see kolla use following to bind volume with nova_compute container > > > > > > { > > > "Type": "volume", > > > "Name": "nova_compute", > > > "Source": "/var/lib/docker/volumes/nova_compute/_data", > > > "Destination": "/var/lib/nova", > > > "Driver": "local", > > > "Mode": "rw", > > > "RW": true, > > > "Propagation": "" > > > } > > > > > > If I can mount GlusterFS in /mnt directory then how do I tell docker > > > to use /mnt to create volume for nova_compute? > > > From mrunge at matthias-runge.de Wed Jan 5 07:31:51 2022 From: mrunge at matthias-runge.de (Matthias Runge) Date: Wed, 5 Jan 2022 08:31:51 +0100 Subject: [infra] Missing releases from opendev.org/opendev/git-review/tags In-Reply-To: <20220104175646.2ced0e7b@niphredil.zaitcev.lan> References: <20220104143755.439cce67@niphredil.zaitcev.lan> <20220104212023.cyxeh3djvoozkkeo@yuggoth.org> <20220104161317.2403d25d@niphredil.zaitcev.lan> <20220104231150.msmad6jwikrfpy6j@yuggoth.org> <20220104175646.2ced0e7b@niphredil.zaitcev.lan> Message-ID: On Tue, Jan 04, 2022 at 05:56:46PM -0600, Pete Zaitcev wrote: > On Tue, 4 Jan 2022 23:11:50 +0000 > Jeremy Stanley wrote: > > > On 2022-01-04 16:13:17 -0600 (-0600), Pete Zaitcev wrote: > > > > [1] https://zaitcev.livejournal.com/263602.html > > > Was the altered code malicious? > > Not at all. It looked like an adaptation to py3 that went wrong. > So, previously, releases were uploaded to tarballs.opendev.org[3], which is not the case anymore. To me it looks like publishing releases on tarballs.... would solve the immediate issue Pete pointed out. Also it would provide an alternate download option. Is that doable? Matthias [3] https://tarballs.opendev.org/openstack/git-review/ -- Matthias Runge From radoslaw.piliszek at gmail.com Wed Jan 5 07:43:32 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 5 Jan 2022 08:43:32 +0100 Subject: openstack-kolla with GlusterFS integration In-Reply-To: References: Message-ID: On Wed, 5 Jan 2022 at 05:47, Satish Patel wrote: > > This is what I did but it looks like there is a better way to run > kolla-ansible on a specific role/service. > > I have added nova_instance_datadir_volume; /mnt in global.yml and > run following command > > $ kolla-ansible -i all-in-one deploy > > Above command took 20 minute to finish because it went through each > role. but i think i am missing something in documentation to run the > above task in a limited way to just push out to a specific > container/role or service. In my case it's nova_compute. kolla looks > good but the documentation isn't great, especially related maintenance > and daily operations. Yes, all comments are correct. This is how it is supposed to work (i.e., change globals.yml and re-run deploy/reconfigure). As for a more optimal approach, you can use --tags to limit the number of plays, e.g., in your case: kolla-ansible -i all-in-one deploy --tags nova-cell I agree this could be better documented. -yoctozepto > On Tue, Jan 4, 2022 at 3:55 PM Satish Patel wrote: > > > > Thank you, > > > > so just add this override in globals.yml correct? Do I need to destroy > > and recreate the container? > > > > On Tue, Jan 4, 2022 at 3:39 PM Rados?aw Piliszek > > wrote: > > > > > > You want to override "nova_instance_datadir_volume" to point to your > > > mount path, in this case /mnt > > > > > > -yoctozepto > > > > > > On Tue, 4 Jan 2022 at 21:31, Satish Patel wrote: > > > > > > > > Folks, > > > > > > > > Recently I have started playing with the Openstack Kolla Project > > > > because of some requirements. I want to integrate kolla with GlusterFS > > > > for nova vm backend storage. > > > > > > > > I have functional Glsuter storage and want to mount on compute node to > > > > use for vm backend but not sure how do i do that because i am new to > > > > containers > > > > > > > > I can see kolla use following to bind volume with nova_compute container > > > > > > > > { > > > > "Type": "volume", > > > > "Name": "nova_compute", > > > > "Source": "/var/lib/docker/volumes/nova_compute/_data", > > > > "Destination": "/var/lib/nova", > > > > "Driver": "local", > > > > "Mode": "rw", > > > > "RW": true, > > > > "Propagation": "" > > > > } > > > > > > > > If I can mount GlusterFS in /mnt directory then how do I tell docker > > > > to use /mnt to create volume for nova_compute? > > > > From sbauza at redhat.com Wed Jan 5 08:11:50 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 5 Jan 2022 09:11:50 +0100 Subject: [nova][placement] Spec approval freeze punted to Jan13st Message-ID: Hi folks, As agreed on yesterday's nova meeting [1], we accepted to punt the spec approval freeze for one week, ie. at January 13st. That said, we won't accept any exception for a spec approval after this day as we will only have 6 weeks after Jan 13st until FeatureFreeze. Thanks (and happy new 2022 by the way), -Sylvain [1] https://meetings.opendev.org/meetings/nova/2022/nova.2022-01-04-16.00.log.html#l-73 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jan 5 09:23:17 2022 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jan 2022 09:23:17 +0000 Subject: [ops] [kolla] RabbitMQ High Availability In-Reply-To: <33441648.1434581.1641304881681@mail.yahoo.com> References: <5dd6d28f-9955-7ca5-0ab8-0c5ce11ceb7e@redhat.com> <14084e87df22458caa7668eea8b496b6@verisign.com> <1147779219.1274196.1639086048233@mail.yahoo.com> <986294621.1553814.1639155002132@mail.yahoo.com> <169252651.2819859.1639516226823@mail.yahoo.com> <1335760337.3548170.1639680236968@mail.yahoo.com> <33441648.1434581.1641304881681@mail.yahoo.com> Message-ID: On Tue, 4 Jan 2022 at 14:08, Albert Braden wrote: > > Now that the holidays are over I'm trying this one again. Can anyone help me figure out how to set "expires" and "message-ttl" ? John Garbutt proposed a few patches for RabbitMQ in kolla, including this: https://review.opendev.org/c/openstack/kolla-ansible/+/822191 https://review.opendev.org/q/hashtag:%2522rabbitmq%2522+(status:open+OR+status:merged)+project:openstack/kolla-ansible Note that they are currently untested. Mark > On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden wrote: > > > I tried these policies in ansible/roles/rabbitmq/templates/definitions.json.j2: > > "policies":[ > {"vhost": "/", "name": "ha-all", "pattern": '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, > {"vhost": "/", "name": "notifications-ttl", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"message-ttl":600}, "priority":0} > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"expire":3600}, "priority":0} > {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} > {% endif %} > > But I still see unconsumed messages lingering in notifications_extractor.info. From reading the docs I think this setting should cause messages to expire after 600 seconds, and unused queues to be deleted after 3600 seconds. What am I missing? > On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden wrote: > > > Following [1] I successfully set "amqp_durable_queues = True" and restricted HA to the appropriate queues, but I'm having trouble with some of the other settings such as "expires" and "message-ttl". Does anyone have an example of a working kolla config that includes these changes? > > [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud wrote: > > > So, your config snippet LGTM. > > Le ven. 10 d?c. 2021 ? 17:50, Albert Braden a ?crit : > > Sorry, that was a transcription error. I thought "True" and my fingers typed "False." The correct lines are: > > [oslo_messaging_rabbit] > amqp_durable_queues = True > > On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud wrote: > > > If you plan to let `amqp_durable_queues = False` (i.e if you plan to keep this config equal to false), then you don't need to add these config lines as this is already the default value [1]. > > [1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34 > > Le jeu. 9 d?c. 2021 ? 22:40, Albert Braden a ?crit : > > Replying from my home email because I've been asked to not email the list from my work email anymore, until I get permission from upper management. > > I'm not sure I follow. I was planning to add 2 lines to etc/kolla/config/global.conf: > > [oslo_messaging_rabbit] > amqp_durable_queues = False > > Is that not sufficient? What is involved in configuring dedicated control exchanges for each service? What would that look like in the config? > > > From: Herve Beraud > Sent: Thursday, December 9, 2021 2:45 AM > To: Bogdan Dobrelya > Cc: openstack-discuss at lists.openstack.org > Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability > > > > Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > Le mer. 8 d?c. 2021 ? 11:48, Bogdan Dobrelya a ?crit : > > Please see inline > > >> I read this with great interest because we are seeing this issue. Questions: > >> > >> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? > >> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? > >> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into? > > > > Note that even having rabbit HA policies adjusted like that and its HA > > replication factor [0] decreased (e.g. to a 2), there still might be > > high churn caused by a large enough number of replicated durable RPC > > topic queues. And that might cripple the cloud down with the incurred > > I/O overhead because a durable queue requires all messages in it to be > > persisted to a disk (for all the messaging cluster replicas) before they > > are ack'ed by the broker. > > > > Given that said, Oslo messaging would likely require a more granular > > control for topic exchanges and the durable queues flag - to tell it to > > declare as durable only the most critical paths of a service. A single > > config setting and a single control exchange per a service might be not > > enough. > > Also note that therefore, amqp_durable_queue=True requires dedicated > control exchanges configured for each service. Those that use > 'openstack' as a default cannot turn the feature ON. Changing it to a > service specific might also cause upgrade impact, as described in the > topic [3]. > > > > The same is true for `amqp_auto_delete=True`. That requires dedicated control exchanges else it won't work if each service defines its own policy on a shared control exchange (e.g `openstack`) and if policies differ from each other. > > > > [3] https://review.opendev.org/q/topic:scope-config-opts > > > > > There are also race conditions with durable queues enabled, like [1]. A > > solution could be where each service declare its own dedicated control > > exchange with its own configuration. > > > > Finally, openstack components should add perhaps a *.next CI job to test > > it with durable queues, like [2] > > > > [0] https://www.rabbitmq.com/ha.html#replication-factor > > > > [1] > > https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt > > > > [2] https://review.opendev.org/c/openstack/nova/+/820523 > > > >> > >> Does anyone have a sample set of RMQ config files that they can share? > >> > >> It looks like my Outlook has ruined the link; reposting: > >> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > > > > > -- > > Best regards, > > Bogdan Dobrelya, > > Irc #bogdando > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > > > -- > > Herv? Beraud > > Senior Software Engineer at Red Hat > > irc: hberaud > > https://github.com/4383/ > > https://twitter.com/4383hberaud > > > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > https://twitter.com/4383hberaud > > > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > https://twitter.com/4383hberaud > From Danny.Webb at thehutgroup.com Wed Jan 5 09:30:06 2022 From: Danny.Webb at thehutgroup.com (Danny Webb) Date: Wed, 5 Jan 2022 09:30:06 +0000 Subject: [kolla] Updating libvirt container images without VM downtime In-Reply-To: <36d9fbdb-a582-a766-2263-823ed5ab4959@planethoster.info> References: <36d9fbdb-a582-a766-2263-823ed5ab4959@planethoster.info> Message-ID: If working properly the restart of the libvirtd container is a non-impacting action for running VMs. The only containers on the hypervisors that have an actual impact on the VMs in the standard setups are the restart of the ovs-vswitchd / ovn-controller / ovsdb containers which result in a small blip of the VM neworks that we've noticed. ________________________________ From: J-P Methot Sent: 04 January 2022 19:17 To: openstack-discuss Subject: [kolla] Updating libvirt container images without VM downtime CAUTION: This email originates from outside THG Hi, I'm looking for validation regarding the way Kolla and containers work in regard to upgrading the libvirt containers. Essentially, when you upgrade the libvirt container to a new container image, the container needs to be restarted, thus creating downtime for the VMs. There is no way to avoid this downtime, unless you migrate the VMs to another node and then move them back once the container has restarted, right? -- Jean-Philippe M?thot Senior Openstack system administrator Administrateur syst?me Openstack s?nior PlanetHoster inc. Danny Webb Senior Linux Systems Administrator The Hut Group Tel: Email: Danny.Webb at thehutgroup.com For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Fifth Floor, Voyager House, Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its respective subsidiaries. Confidentiality Notice This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 811888 or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of the company. Encryptions and Viruses Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions sent by e-mail. Monitoring Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also be monitored and may be recorded to secure effective use and operation and for other lawful business purposes. hgvyjuv -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkchn.in at gmail.com Wed Jan 5 10:54:45 2022 From: kkchn.in at gmail.com (KK CHN) Date: Wed, 5 Jan 2022 16:24:45 +0530 Subject: Logical Volume migration to a New Openstack environment Message-ID: List, I am in need to migrate a Virtual machine ( running in an old OpenStack environment with only one controller). This VM has its own disk file + a logical volume attached to it with application data on it. Only Cinder backend( No rbd/ceph backend configured ) for this environment. Now I am migrating this Virtual machine to another OpenStack(ussuri with three controllers, three compute nodes and three storage nodes) environment with ceph/rbd backend. I am able to export the VM with its disk file and import it to the other openstack environment and able to boot the machine. But How to export the attached logical volume from the old VM and attach to the VM in the other environment ? what is the methodology to follow to export this logical volume in the old openstack VM and import to the new VM running in the other openstack environment ? kindly share your suggestions/expertise to perform this Thanks in advance for your valuable guidance. Krish -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Jan 5 11:14:32 2022 From: smooney at redhat.com (Sean Mooney) Date: Wed, 05 Jan 2022 11:14:32 +0000 Subject: Logical Volume migration to a New Openstack environment In-Reply-To: References: Message-ID: <46597556863537f29bf6a432395ecc3a1f1d81eb.camel@redhat.com> On Wed, 2022-01-05 at 16:24 +0530, KK CHN wrote: > List, > > I am in need to migrate a Virtual machine ( running in an old OpenStack > environment with only one controller). This VM has its own disk file + a > logical volume attached to it with application data on it. Only Cinder > backend( No rbd/ceph backend configured ) for this environment. > > Now I am migrating this Virtual machine to another OpenStack(ussuri with > three controllers, three compute nodes and three storage nodes) environment > with ceph/rbd backend. > > I am able to export the VM with its disk file and import it to the other > openstack environment and able to boot the machine. > > But How to export the attached logical volume from the old VM and attach > to the VM in the other environment ? if its just a singel vm you can convert teh volume to an image then download and upload it to the the new cloud and finally create a volume form that image. if you have to do this wiht more then one vm you might want to look at https://github.com/os-migrate/os-migrate which is being created to automate migrating workloads betwen clouds it will copy the volume in a more efficent way by directly copying the data form the source could to the destination witout creating images. > what is the methodology to follow to export this logical volume in the > old openstack VM and import to the new VM running in the other openstack > environment ? the manual procedure is to convert it to a glance image and download and upload it. that only works if glance is not useing volume backed images. os-migrate will create a migration vm in each cloud, detach the volume from its current vm on the source cloud then attach it to the migration vm on the dest cloud it will create a new empty volume and attch it to the might vm. then it basicly rsyncs the datat from one migration vm to the other. > > kindly share your suggestions/expertise to perform this > > Thanks in advance for your valuable guidance. > Krish From marios at redhat.com Wed Jan 5 12:48:35 2022 From: marios at redhat.com (Marios Andreou) Date: Wed, 5 Jan 2022 14:48:35 +0200 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: <20220104172238.745wrbo2rpggfcet@yuggoth.org> References: <20220104172238.745wrbo2rpggfcet@yuggoth.org> Message-ID: On Tue, Jan 4, 2022 at 7:27 PM Jeremy Stanley wrote: > > On 2022-01-04 12:00:53 -0500 (-0500), Jason Poulin wrote: > > Can someone take me off this list. I don?t know why I?m on it. Please. > [...] > > I've unsubscribed this user; it appears an attacker managed to > brute-force a mailman confirmation key for a subscription request. > This hole should hopefully be plugged once we migrate to Mailman v3, > which employs stronger hashes for subscription confirmations. thanks fungi for looking into that and removing that person but does it mean we potentially have more folks being spammed by us on a regular basis :/ is there a way to know all the addresses that were subscribed in this way and remove them all? regards, marios > -- > Jeremy Stanley From marios at redhat.com Wed Jan 5 12:50:24 2022 From: marios at redhat.com (Marios Andreou) Date: Wed, 5 Jan 2022 14:50:24 +0200 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: <20220104172238.745wrbo2rpggfcet@yuggoth.org> Message-ID: On Wed, Jan 5, 2022 at 2:48 PM Marios Andreou wrote: > > On Tue, Jan 4, 2022 at 7:27 PM Jeremy Stanley wrote: > > > > On 2022-01-04 12:00:53 -0500 (-0500), Jason Poulin wrote: > > > Can someone take me off this list. I don?t know why I?m on it. Please. > > [...] > > > > I've unsubscribed this user; it appears an attacker managed to > > brute-force a mailman confirmation key for a subscription request. > > This hole should hopefully be plugged once we migrate to Mailman v3, > > which employs stronger hashes for subscription confirmations. > > thanks fungi for looking into that and removing that person > but does it mean we potentially have more folks being spammed by us on > a regular basis :/ > is there a way to know all the addresses that were subscribed in this > way and remove them all? > sorry... am guessing you would have done it already if there were a way... Asking all subscribers to validate their address/subscription would be a big pain... but how else can we address it? > regards, marios > > > -- > > Jeremy Stanley From fungi at yuggoth.org Wed Jan 5 13:47:38 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Jan 2022 13:47:38 +0000 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: <20220104172238.745wrbo2rpggfcet@yuggoth.org> Message-ID: <20220105134738.minljqhsngvoht5h@yuggoth.org> On 2022-01-05 14:48:35 +0200 (+0200), Marios Andreou wrote: > thanks fungi for looking into that and removing that person but > does it mean we potentially have more folks being spammed by us on > a regular basis :/ Yes, I clean them up when they come to my attention. > is there a way to know all the addresses that were subscribed in > this way and remove them all? Not easily, because it's exploiting the subscription confirmation mechanism in Mailman, so it's indistinguishable from someone who received the confirmation message and followed the URL or replied. Usually the only way I can tell is that an address appears to have attempted to subscribe to a very large number of mailing lists (most/all published lists we host) but only one or two actually get confirmed. I'm trying to put together a heuristic to identify people who seem to have been subscribed under those circumstances via log analysis. The routine used to generate the cryptographic hash which serves as a confirmation token is too weak/short, and a (small) percentage of them are brute-forcible in a matter of hours by a determined attacker. We're working on an upgrade to Mailman 3, which uses much stronger authentication and confirmation tokens. I'm hoping we'll have it ready within a few months, but the migration will be somewhat disruptive as well since it's a rewrite of much of the underlying platform. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From senrique at redhat.com Wed Jan 5 14:00:58 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 5 Jan 2022 11:00:58 -0300 Subject: [cinder] Bug deputy report for week of 01-05-2022 Message-ID: No meeting today. This is a bug report from 12-29-2021 to 01-05-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- No bugs in this period :P Cheers, -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Wed Jan 5 14:06:32 2022 From: marios at redhat.com (Marios Andreou) Date: Wed, 5 Jan 2022 16:06:32 +0200 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: <20220105134738.minljqhsngvoht5h@yuggoth.org> References: <20220104172238.745wrbo2rpggfcet@yuggoth.org> <20220105134738.minljqhsngvoht5h@yuggoth.org> Message-ID: On Wed, Jan 5, 2022 at 3:55 PM Jeremy Stanley wrote: > > On 2022-01-05 14:48:35 +0200 (+0200), Marios Andreou wrote: > > thanks fungi for looking into that and removing that person but > > does it mean we potentially have more folks being spammed by us on > > a regular basis :/ > > Yes, I clean them up when they come to my attention. > > > is there a way to know all the addresses that were subscribed in > > this way and remove them all? > > Not easily, because it's exploiting the subscription confirmation > mechanism in Mailman, so it's indistinguishable from someone who > received the confirmation message and followed the URL or replied. > Usually the only way I can tell is that an address appears to have > attempted to subscribe to a very large number of mailing lists > (most/all published lists we host) but only one or two actually get > confirmed. I'm trying to put together a heuristic to identify people > who seem to have been subscribed under those circumstances via log > analysis. sounds neat (identifying those subscriptions in this way) ;) > > The routine used to generate the cryptographic hash which serves as > a confirmation token is too weak/short, and a (small) percentage of > them are brute-forcible in a matter of hours by a determined > attacker. We're working on an upgrade to Mailman 3, which uses much > stronger authentication and confirmation tokens. I'm hoping we'll > have it ready within a few months, but the migration will be > somewhat disruptive as well since it's a rewrite of much of the > underlying platform. thanks for taking the time to explain regards > -- > Jeremy Stanley From eblock at nde.ag Wed Jan 5 14:18:05 2022 From: eblock at nde.ag (Eugen Block) Date: Wed, 05 Jan 2022 14:18:05 +0000 Subject: Logical Volume migration to a New Openstack environment In-Reply-To: <46597556863537f29bf6a432395ecc3a1f1d81eb.camel@redhat.com> References: <46597556863537f29bf6a432395ecc3a1f1d81eb.camel@redhat.com> Message-ID: <20220105141805.Horde.jh4RYmPK2zvoQPPdT_xtlwc@webmail.nde.ag> You can import both image files (OS and LVM) into ceph with rbd import, then use 'cinder manage' to make them available as volumes. Then create a new instance from volume and attach the second volume to it. That should work without having to create glance images first. Zitat von Sean Mooney : > On Wed, 2022-01-05 at 16:24 +0530, KK CHN wrote: >> List, >> >> I am in need to migrate a Virtual machine ( running in an old OpenStack >> environment with only one controller). This VM has its own disk file + a >> logical volume attached to it with application data on it. Only Cinder >> backend( No rbd/ceph backend configured ) for this environment. >> >> Now I am migrating this Virtual machine to another OpenStack(ussuri with >> three controllers, three compute nodes and three storage nodes) environment >> with ceph/rbd backend. >> >> I am able to export the VM with its disk file and import it to the other >> openstack environment and able to boot the machine. >> >> But How to export the attached logical volume from the old VM and attach >> to the VM in the other environment ? > if its just a singel vm you can convert teh volume to an image then > download and upload it to the > the new cloud and finally create a volume form that image. > > if you have to do this wiht more then one vm you might want to look at > https://github.com/os-migrate/os-migrate > > which is being created to automate migrating workloads betwen clouds > it will copy the volume in a more efficent way by directly copying > the data form the source could to the destination witout creating > images. > >> what is the methodology to follow to export this logical volume in the >> old openstack VM and import to the new VM running in the other openstack >> environment ? > the manual procedure is to convert it to a glance image and download > and upload it. that only works if glance is not useing volume backed > images. > os-migrate will create a migration vm in each cloud, detach the > volume from its current vm on the source cloud then attach it to the > migration vm > on the dest cloud it will create a new empty volume and attch it to > the might vm. > then it basicly rsyncs the datat from one migration vm to the other. > >> >> kindly share your suggestions/expertise to perform this >> >> Thanks in advance for your valuable guidance. >> Krish From fungi at yuggoth.org Wed Jan 5 14:24:49 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Jan 2022 14:24:49 +0000 Subject: [infra] Missing releases from opendev.org/opendev/git-review/tags In-Reply-To: References: <20220104143755.439cce67@niphredil.zaitcev.lan> <20220104212023.cyxeh3djvoozkkeo@yuggoth.org> <20220104161317.2403d25d@niphredil.zaitcev.lan> <20220104231150.msmad6jwikrfpy6j@yuggoth.org> <20220104175646.2ced0e7b@niphredil.zaitcev.lan> Message-ID: <20220105142449.brosxdj554mefze2@yuggoth.org> On 2022-01-05 08:31:51 +0100 (+0100), Matthias Runge wrote: [...] > So, previously, releases were uploaded to tarballs.opendev.org[3], > which is not the case anymore. > > To me it looks like publishing releases on tarballs.... would solve > the immediate issue Pete pointed out. Also it would provide an alternate > download option. > > Is that doable? > > [3] https://tarballs.opendev.org/openstack/git-review/ It's definitely doable, as we do it for other projects outside the opendev Zuul tenant, e.g. those in the openstack Zuul tenant. And technically we do upload tarballs for git-review to the tarballs site, but currently only branch tip tarballs not release tarballs. Later this week or early next I hope to move the prior artifacts at that URL and set up a redirect to do open{stack->dev}/git-review there as a first step, since that's where any future tarballs would appear anyway. Once more of the OpenDev sysadmins are back from winter holidays, I'll make sure there aren't any objections to adding release tarball uploads for our Python-based tools (git-review, bindep, et cetera), and work out the most elegant solution from a job perspective. We can punt on the signing part initially, I expect, and separately decide what sor of solution and signing key we might want for that. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From satish.txt at gmail.com Wed Jan 5 14:24:55 2022 From: satish.txt at gmail.com (Satish Patel) Date: Wed, 5 Jan 2022 09:24:55 -0500 Subject: openstack-kolla with GlusterFS integration In-Reply-To: References: Message-ID: Thank you! How do I make sure this change is only going to apply to compute nodes and not api nodes? i believe global is for everyone so curious how to put this change only for compute nodes? On Wed, Jan 5, 2022 at 2:43 AM Rados?aw Piliszek wrote: > > On Wed, 5 Jan 2022 at 05:47, Satish Patel wrote: > > > > This is what I did but it looks like there is a better way to run > > kolla-ansible on a specific role/service. > > > > I have added nova_instance_datadir_volume; /mnt in global.yml and > > run following command > > > > $ kolla-ansible -i all-in-one deploy > > > > Above command took 20 minute to finish because it went through each > > role. but i think i am missing something in documentation to run the > > above task in a limited way to just push out to a specific > > container/role or service. In my case it's nova_compute. kolla looks > > good but the documentation isn't great, especially related maintenance > > and daily operations. > > Yes, all comments are correct. This is how it is supposed to work > (i.e., change globals.yml and re-run deploy/reconfigure). > As for a more optimal approach, you can use --tags to limit the number > of plays, e.g., in your case: > > kolla-ansible -i all-in-one deploy --tags nova-cell > > I agree this could be better documented. > > -yoctozepto > > > On Tue, Jan 4, 2022 at 3:55 PM Satish Patel wrote: > > > > > > Thank you, > > > > > > so just add this override in globals.yml correct? Do I need to destroy > > > and recreate the container? > > > > > > On Tue, Jan 4, 2022 at 3:39 PM Rados?aw Piliszek > > > wrote: > > > > > > > > You want to override "nova_instance_datadir_volume" to point to your > > > > mount path, in this case /mnt > > > > > > > > -yoctozepto > > > > > > > > On Tue, 4 Jan 2022 at 21:31, Satish Patel wrote: > > > > > > > > > > Folks, > > > > > > > > > > Recently I have started playing with the Openstack Kolla Project > > > > > because of some requirements. I want to integrate kolla with GlusterFS > > > > > for nova vm backend storage. > > > > > > > > > > I have functional Glsuter storage and want to mount on compute node to > > > > > use for vm backend but not sure how do i do that because i am new to > > > > > containers > > > > > > > > > > I can see kolla use following to bind volume with nova_compute container > > > > > > > > > > { > > > > > "Type": "volume", > > > > > "Name": "nova_compute", > > > > > "Source": "/var/lib/docker/volumes/nova_compute/_data", > > > > > "Destination": "/var/lib/nova", > > > > > "Driver": "local", > > > > > "Mode": "rw", > > > > > "RW": true, > > > > > "Propagation": "" > > > > > } > > > > > > > > > > If I can mount GlusterFS in /mnt directory then how do I tell docker > > > > > to use /mnt to create volume for nova_compute? > > > > > From radoslaw.piliszek at gmail.com Wed Jan 5 15:37:17 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 5 Jan 2022 16:37:17 +0100 Subject: openstack-kolla with GlusterFS integration In-Reply-To: References: Message-ID: On Wed, 5 Jan 2022 at 15:25, Satish Patel wrote: > > Thank you! > > How do I make sure this change is only going to apply to compute nodes > and not api nodes? i believe global is for everyone so curious how to > put this change only for compute nodes? Not sure what you are asking about. nova-cell applies only to computes. nova_instance_datadir_volume is also only used on computes. So either way no effect on API services. If you are asking in general (about some other possible variable), you can use the ansible inventory group vars and host vars instead of globals to configure certain aspects depending on the target. -yoctozepto > On Wed, Jan 5, 2022 at 2:43 AM Rados?aw Piliszek > wrote: > > > > On Wed, 5 Jan 2022 at 05:47, Satish Patel wrote: > > > > > > This is what I did but it looks like there is a better way to run > > > kolla-ansible on a specific role/service. > > > > > > I have added nova_instance_datadir_volume; /mnt in global.yml and > > > run following command > > > > > > $ kolla-ansible -i all-in-one deploy > > > > > > Above command took 20 minute to finish because it went through each > > > role. but i think i am missing something in documentation to run the > > > above task in a limited way to just push out to a specific > > > container/role or service. In my case it's nova_compute. kolla looks > > > good but the documentation isn't great, especially related maintenance > > > and daily operations. > > > > Yes, all comments are correct. This is how it is supposed to work > > (i.e., change globals.yml and re-run deploy/reconfigure). > > As for a more optimal approach, you can use --tags to limit the number > > of plays, e.g., in your case: > > > > kolla-ansible -i all-in-one deploy --tags nova-cell > > > > I agree this could be better documented. > > > > -yoctozepto > > > > > On Tue, Jan 4, 2022 at 3:55 PM Satish Patel wrote: > > > > > > > > Thank you, > > > > > > > > so just add this override in globals.yml correct? Do I need to destroy > > > > and recreate the container? > > > > > > > > On Tue, Jan 4, 2022 at 3:39 PM Rados?aw Piliszek > > > > wrote: > > > > > > > > > > You want to override "nova_instance_datadir_volume" to point to your > > > > > mount path, in this case /mnt > > > > > > > > > > -yoctozepto > > > > > > > > > > On Tue, 4 Jan 2022 at 21:31, Satish Patel wrote: > > > > > > > > > > > > Folks, > > > > > > > > > > > > Recently I have started playing with the Openstack Kolla Project > > > > > > because of some requirements. I want to integrate kolla with GlusterFS > > > > > > for nova vm backend storage. > > > > > > > > > > > > I have functional Glsuter storage and want to mount on compute node to > > > > > > use for vm backend but not sure how do i do that because i am new to > > > > > > containers > > > > > > > > > > > > I can see kolla use following to bind volume with nova_compute container > > > > > > > > > > > > { > > > > > > "Type": "volume", > > > > > > "Name": "nova_compute", > > > > > > "Source": "/var/lib/docker/volumes/nova_compute/_data", > > > > > > "Destination": "/var/lib/nova", > > > > > > "Driver": "local", > > > > > > "Mode": "rw", > > > > > > "RW": true, > > > > > > "Propagation": "" > > > > > > } > > > > > > > > > > > > If I can mount GlusterFS in /mnt directory then how do I tell docker > > > > > > to use /mnt to create volume for nova_compute? > > > > > > From smooney at redhat.com Wed Jan 5 16:02:25 2022 From: smooney at redhat.com (Sean Mooney) Date: Wed, 05 Jan 2022 16:02:25 +0000 Subject: [nova][dev] Revisiting qemu emulation where guest arch != host arch In-Reply-To: References: Message-ID: <70d98af3002a53f4b6ed74f62dd7b7ad06b23f52.camel@redhat.com> On Wed, 2020-07-15 at 14:17 +0000, Apsey, Christopher wrote: > All, > > A few years ago I asked a question[1] about why nova, when given a hw_architecture property from glance for an image, would not end up using the correct qemu-system-xx binary when starting the guest process on a compute node if that compute nodes architecture did not match the proposed guest architecture. As an example, if we had all x86 hosts, but wanted to run an emulated ppc guest, we should be able to do that given that at least one compute node had qemu-system-ppc already installed and libvirt was successfully reporting that as a supported architecture to nova. It seemed like a heavy lift at the time, so it was put on the back burner. > > I am now in a position to fund a contract developer to make this happen, so the question is: would this be a useful blueprint that would potentially be accepted? > yes, i cant really speak to how much use it would get or how useful it woudl be to the majoriy fo user but i would be supportive of adding this capablity. We have to be a little carful to get the design wright for example we might want to differnciate between the native architecture and emulated architectures e.g. use HW_ARCH_X86 to idenfiy the host as being x86 and COMPUTE_ARCH_X86 for emulated with the new "in" suppot being added to placment we can use required=in:HW_ARCH_X86,COMPUTE_ARCH_X86 in cases where you dont care and by default if you wanted native only you could use the HW_ARCH_* traits in the image and we can have a prefileter add the both triats by default if the architrue is set in the iamge and there is not arch trait in the flavor or image. i will certenly review a spec if you proporse one but you might now have time to get it approved this cycle the spec deadlien woudl have been thursday btu it has been moved to next week. > Most of the time when people want to run an emulated guest they would just nest it inside of an already running guest of the native architecture, but that severely limits observability and the task of managing any more than a handful of instances in this manner quickly becomes a tangled nightmare of networking, etc. I see real benefit in allowing this scenario to run natively so all of the tooling that exists for fleet management 'just works'. This would also be a significant differentiator for OpenStack as a whole. > > Thoughts? > > [1] > http://lists.openstack.org/pipermail/openstack-operators/2018-August/015653.html > > Chris Apsey > Director | Georgia Cyber Range > GEORGIA CYBER CENTER > > 100 Grace Hopper Lane | Augusta, Georgia | 30901 > https://www.gacybercenter.org > From thierry at openstack.org Wed Jan 5 17:40:15 2022 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 5 Jan 2022 18:40:15 +0100 Subject: [largescale-sig] Next meeting: Jan 5th, 15utc In-Reply-To: <15b297bf-465b-dea2-51e1-ba570aea293c@openstack.org> References: <15b297bf-465b-dea2-51e1-ba570aea293c@openstack.org> Message-ID: <8175e101-f044-d87d-4e86-77274c89c474@openstack.org> We held our meeting today! We discussed 2021 actions and our plans for 2022, up to the Berlin summit. In particular, we discussed introducing a recurring format for our "Large Scale OpenStack" episodes on OpenInfra.Live: have more of a podcast format, where regular hosts invite a special guest each episode, and discuss their large scale deployment and operational concerns. We plan to prototype this new format on OpenInfra Live on Feb 3rd. You can read the meeting logs at: https://meetings.opendev.org/meetings/large_scale_sig/2022/large_scale_sig.2022-01-05-15.00.html Our next IRC meeting will be January 19, at 1500utc on #openstack-operators on OFTC. Regards, -- Thierry Carrez (ttx) From fungi at yuggoth.org Wed Jan 5 22:08:58 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Jan 2022 22:08:58 +0000 Subject: [infra] Missing releases from opendev.org/opendev/git-review/tags In-Reply-To: <20220104212023.cyxeh3djvoozkkeo@yuggoth.org> References: <20220104143755.439cce67@niphredil.zaitcev.lan> <20220104212023.cyxeh3djvoozkkeo@yuggoth.org> Message-ID: <20220105220858.qdme6kta4pwboywd@yuggoth.org> On 2022-01-04 21:20:24 +0000 (+0000), Jeremy Stanley wrote: > On 2022-01-04 14:37:55 -0600 (-0600), Pete Zaitcev wrote: > > I noticed that the list of tags for git-review in the website > > does not include the 2.2.x tags, which exist in git: > [...] > > The 2.2.0 is missing from > > https://opendev.org/opendev/git-review/tags [...] > For the first URL I suspect that's a bug in Gitea not correctly > reflecting mirrored tags. I don't recall noticing this before, but > it definitely seems like Gitea is serving that view from its > database and not updating whenever a new tag is pushed in. It > doesn't seem to only affect the git-review repo. For example, the > most recent nova tag displayed is from 2019-06-18. Skimming the > OpenDev meeting agendas from around then, it looks like roughly a > month later we were rebuilding the Gitea server farm (all the > machines currently seem to have creation dates between 2019-07-23 > and 2019-07-29), so it's quite probable that recreating the database > is the only thing which has been populating the tags table in their > DB. We'll have to take a look at it and probably file a bug with the > Gitea maintainers. [...] Update on this part: After discussing briefly with clarkb in #opendev he reminded me that this is the "bad" tarballs tab we want to hide in Gitea (along with its releases tab). The branch drop-down has its own tags tab which should contain a correct and complete list of them for selecting. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gmann at ghanshyammann.com Thu Jan 6 00:04:25 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 05 Jan 2022 18:04:25 -0600 Subject: [all][tc] Technical Committee next weekly meeting on Jan 6th at 1500 UTC In-Reply-To: <17e2215043e.10f43b8a730636.5648333992896462439@ghanshyammann.com> References: <17e2215043e.10f43b8a730636.5648333992896462439@ghanshyammann.com> Message-ID: <17e2cb3761f.122717ae267159.3814806298985459248@ghanshyammann.com> Hello Everyone, Below is the agenda for Tomorrow's TC IRC meeting schedule at 1500 UTC. Amy will chair tomorrow's meeting. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting == Agenda for tomorrow's TC meeting == * Roll call * Follow up on past action items * Gate health check ** Fixing Zuul config error in OpenStack *** https://etherpad.opendev.org/p/zuul-config-error-openstack * SIG i18n status check ** Xena translation missing *** http://lists.openstack.org/pipermail/openstack-discuss/2021-December/026244.html ** Translation bug *** https://review.opendev.org/c/openstack/contributor-guide/+/821371 * Adjutant need PTLs and maintainers ** http://lists.openstack.org/pipermail/openstack-discuss/2021-October/025555.html * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open -gmann ---- On Mon, 03 Jan 2022 16:35:09 -0600 Ghanshyam Mann wrote ---- > Hello Everyone, > > Technical Committee's next weekly meeting is scheduled for Jan 6th at 1500 UTC. > > If you would like to add topics for discussion, please add them to the below wiki page by > Wednesday, Jan 5th, at 2100 UTC. > > https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting > > -gmann > > From JRACE at augusta.edu Wed Jan 5 21:55:31 2022 From: JRACE at augusta.edu (Race, Jonathan) Date: Wed, 5 Jan 2022 21:55:31 +0000 Subject: [nova][dev] Revisiting qemu emulation where guest arch != host arch Message-ID: That makes sense which is the best method I saw as well. For your example "use HW_ARCH_X86 to idenfiy the host as being x86 and COMPUTE_ARCH_X86 for emulated", I actually implemented a new field called hw_emulation_architecture as to not conflict with any existing nova operations and to eliminate confusion. This allows for setting this specific value as an image meta property in glance, and if it is set then it will execute emulation via qemu for the designated architecture. If it is not set then operations will occur as normal. https://review.opendev.org/c/openstack/nova/+/822053 is really just an initial response for the bug, that was submitted. And our long-term goal is to collab on this existing blueprint https://blueprints.launchpad.net/nova/+spec/pick-guest-arch-based-on-host-arch-in-libvirt-driver for a more full set of capabilities. Jonathan Race Cyber Range Engineer | Georgia Cyber Range GEORGIA CYBER CENTER On Wed, 2020-07-15 at 14:17 +0000, Apsey, Christopher wrote: > All, > > A few years ago I asked a question[1] about why nova, when given a hw_architecture property from glance for an image, would not end up using the correct qemu-system-xx binary when starting the guest process on a compute node if that compute nodes architecture did not match the proposed guest architecture. As an example, if we had all x86 hosts, but wanted to run an emulated ppc guest, we should be able to do that given that at least one compute node had qemu-system-ppc already installed and libvirt was successfully reporting that as a supported architecture to nova. It seemed like a heavy lift at the time, so it was put on the back burner. > > I am now in a position to fund a contract developer to make this happen, so the question is: would this be a useful blueprint that would potentially be accepted? > yes, i cant really speak to how much use it would get or how useful it woudl be to the majoriy fo user but i would be supportive of adding this capablity. We have to be a little carful to get the design wright for example we might want to differnciate between the native architecture and emulated architectures e.g. use HW_ARCH_X86 to idenfiy the host as being x86 and COMPUTE_ARCH_X86 for emulated with the new "in" suppot being added to placment we can use required=in:HW_ARCH_X86,COMPUTE_ARCH_X86 in cases where you dont care and by default if you wanted native only you could use the HW_ARCH_* traits in the image and we can have a prefileter add the both triats by default if the architrue is set in the iamge and there is not arch trait in the flavor or image. i will certenly review a spec if you proporse one but you might now have time to get it approved this cycle the spec deadlien woudl have been thursday btu it has been moved to next week. > Most of the time when people want to run an emulated guest they would just nest it inside of an already running guest of the native architecture, but that severely limits observability and the task of managing any more than a handful of instances in this manner quickly becomes a tangled nightmare of networking, etc. I see real benefit in allowing this scenario to run natively so all of the tooling that exists for fleet management 'just works'. This would also be a significant differentiator for OpenStack as a whole. > > Thoughts? > > [1] > http://lists.openstack.org/pipermail/openstack-operators/2018-August/015653.html > > Chris Apsey > Director | Georgia Cyber Range > GEORGIA CYBER CENTER > > 100 Grace Hopper Lane | Augusta, Georgia | 30901 > https://www.gacybercenter.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel at mlavalle.com Thu Jan 6 01:12:10 2022 From: miguel at mlavalle.com (Miguel Lavalle) Date: Wed, 5 Jan 2022 19:12:10 -0600 Subject: Can neutron-fwaas project be revived? In-Reply-To: References: <87f786a0f0714c0a89fb7097e965b1d6@inspur.com> <035f05bc97204972bb975c962d47507c@inspur.com> Message-ID: Hi Qin, Unfortunately, this coming January 7th several members of the drivers team will be off on holiday. We won't have a quorum to discuss your proposal. I hope that January 14th works for you and your team. Best regards Miguel On Fri, Dec 24, 2021 at 10:18 AM Miguel Lavalle wrote: > Hi Qin, > > I have added this topic to the drivers meeting agenda (see on demand > agenda close to the bottom): > https://wiki.openstack.org/wiki/Meetings/NeutronDrivers > > Cheers > > On Thu, Dec 23, 2021 at 7:42 PM Dazhong Qin (???)-??????? < > qinhaizhong01 at inspur.com> wrote: > >> Hi Miguel, >> >> Thank you for your suggestion. My colleague HengZhou will submit relevant >> documents as soon as possible in accordance with the official neutron rules. >> >> Yes?we will attend the neutron drivers meeting on January 7th. >> >> Merry Christmas! >> >> Best wish for you! >> >> >> >> *???:* Miguel Lavalle [mailto:miguel at mlavalle.com] >> *????:* 2021?12?24? 0:43 >> *???:* Dazhong Qin (???)-??????? >> *??:* openstack-discuss at lists.openstack.org >> *??:* Re: Can neutron-fwaas project be revived? >> >> >> >> Hi Qin, >> >> >> >> In preparation for your meeting with the drivers team, I suggest we >> follow as a starting point the Neutron Stadium Governance rules and >> processes as outlined in the official documentation: >> https://docs.openstack.org/neutron/latest/contributor/stadium/governance.html. >> In the past, we have re-incorporated projects to the Stadium, like for >> example in the case of neutron-vpnaas. This document in the Neutron specs >> repo summarizes how we assessed the readiness of vpnaas for the stadium: >> https://specs.openstack.org/openstack/neutron-specs/specs/stadium/queens/neutron-vpnaas.html >> (https://review.opendev.org/c/openstack/neutron-specs/+/506012). I >> suggest you start a similar document for fwaas in the folder for the >> current cycle: >> https://specs.openstack.org/openstack/neutron-specs/specs/yoga/index.html. >> As soon as you can, please push it to gerrit, so we can start reviewing it. >> >> >> >> Did I understand correctly that you will attend the drivers meeting on >> January 7th? >> >> >> >> Best regards >> >> >> >> Miguel >> >> >> >> >> >> On Wed, Dec 22, 2021 at 8:09 PM Dazhong Qin (???)-??????? < >> qinhaizhong01 at inspur.com> wrote: >> >> Hi Miguel, >> >> I am glad to hear this news. How about our discussion on January 7th, >> this Friday is not convenient, what do I need to prepare before the >> discussion, do I need to submit rfe or other descriptions? >> >> >> >> *???:* Miguel Lavalle [mailto:miguel at mlavalle.com] >> *????:* 2021?12?23? 0:20 >> *???:* Dazhong Qin (???)-??????? >> *??:* openstack-discuss at lists.openstack.org >> *??:* Re: Can neutron-fwaas project be revived? >> >> >> >> Hi Qin, >> >> >> >> I think that in principle the community will be delighted if you and your >> team can reactivate the project and maintain it. Probably the best next >> step is for you to attend the next Neutron drivers meeting ( >> https://wiki.openstack.org/wiki/Meetings/NeutronDrivers) so we >> can discuss the specifics of your proposal. This meeting takes place on >> Fridays at 1400 UTC over IRC in oftc.net, channel #openstack-neutron. >> Due to the end of year festivities in much of Europe and America, the next >> meeting will take place until January 7th. Is that a good next step for >> you? If yes, I'll add this topic to the meeting's agenda. >> >> >> >> Best regards >> >> >> >> On Tue, Dec 21, 2021 at 10:29 AM Dazhong Qin (???)-??????? < >> qinhaizhong01 at inspur.com> wrote: >> >> Hi? >> >> The firewall project is a necessary function when the project is >> delivered. The lack of firewall function after switching OVN is not >> acceptable to customers. We intend to maintain this project and develop the >> fwaas driver based on ovn. Whether the neutron-fwaas project can be >> reactivate? What should I do ? >> >> >> >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From qinhaizhong01 at inspur.com Thu Jan 6 01:16:06 2022 From: qinhaizhong01 at inspur.com (=?utf-8?B?RGF6aG9uZyBRaW4gKOenpua1t+S4rSkt5LqR5pWw5o2u5Lit5b+D6ZuG5Zui?=) Date: Thu, 6 Jan 2022 01:16:06 +0000 Subject: =?utf-8?B?562U5aSNOiBDYW4gbmV1dHJvbi1md2FhcyBwcm9qZWN0IGJlIHJldml2ZWQ/?= In-Reply-To: References: <87f786a0f0714c0a89fb7097e965b1d6@inspur.com> <035f05bc97204972bb975c962d47507c@inspur.com> Message-ID: Hi?Miguel? Ok?let?s meet at January 14th. Best regards ???: Miguel Lavalle [mailto:miguel at mlavalle.com] ????: 2022?1?6? 9:12 ???: Dazhong Qin (???)-??????? ??: openstack-discuss at lists.openstack.org ??: Re: Can neutron-fwaas project be revived? Hi Qin, Unfortunately, this coming January 7th several members of the drivers team will be off on holiday. We won't have a quorum to discuss your proposal. I hope that January 14th works for you and your team. Best regards Miguel On Fri, Dec 24, 2021 at 10:18 AM Miguel Lavalle > wrote: Hi Qin, I have added this topic to the drivers meeting agenda (see on demand agenda close to the bottom): https://wiki.openstack.org/wiki/Meetings/NeutronDrivers Cheers On Thu, Dec 23, 2021 at 7:42 PM Dazhong Qin (???)-??????? > wrote: Hi Miguel, Thank you for your suggestion. My colleague HengZhou will submit relevant documents as soon as possible in accordance with the official neutron rules. Yes?we will attend the neutron drivers meeting on January 7th. Merry Christmas! Best wish for you! ???: Miguel Lavalle [mailto:miguel at mlavalle.com ] ????: 2021?12?24? 0:43 ???: Dazhong Qin (???)-??????? > ??: openstack-discuss at lists.openstack.org ??: Re: Can neutron-fwaas project be revived? Hi Qin, In preparation for your meeting with the drivers team, I suggest we follow as a starting point the Neutron Stadium Governance rules and processes as outlined in the official documentation: https://docs.openstack.org/neutron/latest/contributor/stadium/governance.html. In the past, we have re-incorporated projects to the Stadium, like for example in the case of neutron-vpnaas. This document in the Neutron specs repo summarizes how we assessed the readiness of vpnaas for the stadium: https://specs.openstack.org/openstack/neutron-specs/specs/stadium/queens/neutron-vpnaas.html (https://review.opendev.org/c/openstack/neutron-specs/+/506012). I suggest you start a similar document for fwaas in the folder for the current cycle: https://specs.openstack.org/openstack/neutron-specs/specs/yoga/index.html. As soon as you can, please push it to gerrit, so we can start reviewing it. Did I understand correctly that you will attend the drivers meeting on January 7th? Best regards Miguel On Wed, Dec 22, 2021 at 8:09 PM Dazhong Qin (???)-??????? > wrote: Hi Miguel, I am glad to hear this news. How about our discussion on January 7th, this Friday is not convenient, what do I need to prepare before the discussion, do I need to submit rfe or other descriptions? ???: Miguel Lavalle [mailto: miguel at mlavalle.com] ????: 2021?12?23? 0:20 ???: Dazhong Qin (???)-??????? < qinhaizhong01 at inspur.com> ??: openstack-discuss at lists.openstack.org ??: Re: Can neutron-fwaas project be revived? Hi Qin, I think that in principle the community will be delighted if you and your team can reactivate the project and maintain it. Probably the best next step is for you to attend the next Neutron drivers meeting (https://wiki.openstack.org/wiki/Meetings/NeutronDrivers) so we can discuss the specifics of your proposal. This meeting takes place on Fridays at 1400 UTC over IRC in oftc.net , channel #openstack-neutron. Due to the end of year festivities in much of Europe and America, the next meeting will take place until January 7th. Is that a good next step for you? If yes, I'll add this topic to the meeting's agenda. Best regards On Tue, Dec 21, 2021 at 10:29 AM Dazhong Qin (???)-??????? > wrote: Hi? The firewall project is a necessary function when the project is delivered. The lack of firewall function after switching OVN is not acceptable to customers. We intend to maintain this project and develop the fwaas driver based on ovn. Whether the neutron-fwaas project can be reactivate? What should I do ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3615 bytes Desc: not available URL: From mthode at mthode.org Thu Jan 6 02:29:15 2022 From: mthode at mthode.org (Matthew Thode) Date: Wed, 5 Jan 2022 20:29:15 -0600 Subject: [horizon][requirements] falling behind on requirements updates Message-ID: <20220106022915.mwybmhrg3fjhuh64@mthode.org> Hi, The following requirements updates have been stalled. XStatic-Angular===1.8.2.1 Has been held back for at least 7 months, but I think closer to a year now. Some review history is here: https://review.opendev.org/794258 django-compressor===2.4.1 rcssmin===1.1.0 rjsmin===1.1.0 These come as a group and have been holding things back for about a month. I'm not sure what's holding things back here on the horizon side. Review bundled with XStatic-Angular here: https://review.opendev.org/823449 -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From mthode at mthode.org Thu Jan 6 02:31:04 2022 From: mthode at mthode.org (Matthew Thode) Date: Wed, 5 Jan 2022 20:31:04 -0600 Subject: [keystone][requirements] keystone holding back pysaml2-7.1.x update Message-ID: <20220106023104.gsgkfxz6i2xu2fxx@mthode.org> Hi again, pysaml2===7.1.0 has been breaking updates for a while now. https://review.opendev.org/818612 -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From hanguangyu2 at gmail.com Thu Jan 6 02:33:02 2022 From: hanguangyu2 at gmail.com (=?UTF-8?B?6Z+p5YWJ5a6H?=) Date: Thu, 6 Jan 2022 10:33:02 +0800 Subject: [glance] Does glance not support using local filesystem storage in a cluster Message-ID: Deal all, Sorry that maybe I ask a stupid question. But I'm really confused with it and didn't find discuss in glance document(https://docs.openstack.org/glance/latest/). I have a OpenStack Victoria cluster with three all-in-one node in centos8. I implemented it with reference to https://docs.openstack.org/ha-guide/. So this cluster use Pacemaker, HAproxy and Galera. "To implement high availability, run an instance of the database on each controller node and use Galera Cluster to provide replication between them." I found that I will encounter an error If I configure Glance backend to use local storage driver to store image files on the local disk. If I upload a image, this image only will be storaged in one node. But the database only storage the file path of image such as "/v2/images/aa3cbee0-717f-4699-8cca-61243302d693/file", don't have the host information. The database data is same in three node. If I upload a image in node1, image only is storaged in node1. The database of three node stores the local filesystem path of image. And If The create Instance task is assigned to node2, It will find image in node2, but image can't be found in node2. So we get the "Image has no associated data" error. So I want to ask: 1. Wheter glance does not support using local filesystem storage in a cluster? 2. If 1 was right, why do we do this design instead of storing information about the host on which images is located, as nova does with instance. I would appreciate any kind of guidance or help. Thank you, Han Guangyu From mthode at mthode.org Thu Jan 6 02:33:38 2022 From: mthode at mthode.org (Matthew Thode) Date: Wed, 5 Jan 2022 20:33:38 -0600 Subject: [nova][requirements] fasteners===0.16.3 held back by nova Message-ID: <20220106023338.4yaudjnvmmysvp5d@mthode.org> This one is simple, and iirc is blocked on upstream fixing something (but cannot find the reference). fasteners===0.16.3 https://review.opendev.org/823470 and https://review.opendev.org/804246 both test this change. -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From emiller at genesishosting.com Thu Jan 6 06:12:33 2022 From: emiller at genesishosting.com (Eric K. Miller) Date: Thu, 6 Jan 2022 00:12:33 -0600 Subject: [nova] iothread support with Libvirt Message-ID: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> Hi, I haven't found anything that indicates Nova supports adding iothreads parameters to the Libvirt XML file. I had asked various performance related questions a couple years back, including asking if iothreads were available, but I didn't get any response (so assumed the answer was no). So I'm just checking again to see if this has been a consideration to help improve a VM's storage performance - specifically with extremely high-speed storage in the host. Or is there a way to add iothread-related parameters without Nova being involved (such as modifying a template)? Thanks! Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Thu Jan 6 08:56:12 2022 From: eblock at nde.ag (Eugen Block) Date: Thu, 06 Jan 2022 08:56:12 +0000 Subject: [glance] Does glance not support using local filesystem storage in a cluster In-Reply-To: Message-ID: <20220106085612.Horde.FTZSS84MjBp1HTP1Fx3tbzE@webmail.nde.ag> Hi, if you really aim towards a highly available cluster you'll also need a ha storage solution like ceph. Having glance images or VMs on local storage can make it easier to deploy, maybe for testing and getting involved with openstack, but it's not really recommended for production use. You'll probably have the same issue with cinder volumes, I believe. Or do you have a different backend for cinder? Regards, Eugen Zitat von ??? : > Deal all, > > Sorry that maybe I ask a stupid question. But I'm really confused with > it and didn't find discuss in glance > document(https://docs.openstack.org/glance/latest/). > > I have a OpenStack Victoria cluster with three all-in-one node in > centos8. I implemented it with reference to > https://docs.openstack.org/ha-guide/. So this cluster use Pacemaker, > HAproxy and Galera. "To implement high availability, run an instance > of the database on each controller node and use Galera Cluster to > provide replication between them." > > I found that I will encounter an error If I configure Glance backend > to use local storage driver to store image files on the local disk. If > I upload a image, this image only will be storaged in one node. But > the database only storage the file path of image such as > "/v2/images/aa3cbee0-717f-4699-8cca-61243302d693/file", don't have the > host information. The database data is same in three node. > > If I upload a image in node1, image only is storaged in node1. The > database of three node stores the local filesystem path of image. And > If The create Instance task is assigned to node2, It will find image > in node2, but image can't be found in node2. So we get the "Image has > no associated data" error. > > So I want to ask: > 1. Wheter glance does not support using local filesystem storage in > a cluster? > 2. If 1 was right, why do we do this design instead of storing > information about the host on which images is located, as nova does > with instance. > > I would appreciate any kind of guidance or help. > > Thank you, > Han Guangyu From zigo at debian.org Thu Jan 6 09:34:20 2022 From: zigo at debian.org (Thomas Goirand) Date: Thu, 6 Jan 2022 10:34:20 +0100 Subject: [horizon] Bump XStatic-Angular-FileUpload to 12.2.13 Message-ID: <4600efa1-0b94-b9bb-ada5-b82d5770d90b@debian.org> Hi, After someone unrelated to OpenStack maintenance uploaded libjs-angular-file-upload 12.2.13, I tested it, and it didn't break Horizon (at least, when I manually uploaded a file, it worked...). So IMO, it would be fine to just bump to that version. Cheers, Thomas Goirand (zigo) From hanguangyu2 at gmail.com Thu Jan 6 09:41:37 2022 From: hanguangyu2 at gmail.com (=?UTF-8?B?6Z+p5YWJ5a6H?=) Date: Thu, 6 Jan 2022 17:41:37 +0800 Subject: [glance] Does glance not support using local filesystem storage in a cluster In-Reply-To: <20220106085612.Horde.FTZSS84MjBp1HTP1Fx3tbzE@webmail.nde.ag> References: <20220106085612.Horde.FTZSS84MjBp1HTP1Fx3tbzE@webmail.nde.ag> Message-ID: Hi, Yes, you are right. I have deploied a ceph as the backend storage of glance, cinder and nova. And it can resolve this question. But I wonder why it's designed this way. It doesn't fit my perception of OpenStack. As currently designed, local storage of glance must not be used in the cluster. Why not record the host where the image resides? Just like the local storage of the nova-compute node, if a Glance node breaks down, the image on the host cannot be accessed. Sorry that maybe this idea is unreasonable and stupid. Could anyone tell me the reason or what's the problem with that best wishes to you, love you. Thank you, Han Guangyu Eugen Block ?2022?1?6??? 17:03??? > > Hi, > > if you really aim towards a highly available cluster you'll also need > a ha storage solution like ceph. Having glance images or VMs on local > storage can make it easier to deploy, maybe for testing and getting > involved with openstack, but it's not really recommended for > production use. You'll probably have the same issue with cinder > volumes, I believe. Or do you have a different backend for cinder? > > Regards, > Eugen > > > Zitat von ??? : > > > Deal all, > > > > Sorry that maybe I ask a stupid question. But I'm really confused with > > it and didn't find discuss in glance > > document(https://docs.openstack.org/glance/latest/). > > > > I have a OpenStack Victoria cluster with three all-in-one node in > > centos8. I implemented it with reference to > > https://docs.openstack.org/ha-guide/. So this cluster use Pacemaker, > > HAproxy and Galera. "To implement high availability, run an instance > > of the database on each controller node and use Galera Cluster to > > provide replication between them." > > > > I found that I will encounter an error If I configure Glance backend > > to use local storage driver to store image files on the local disk. If > > I upload a image, this image only will be storaged in one node. But > > the database only storage the file path of image such as > > "/v2/images/aa3cbee0-717f-4699-8cca-61243302d693/file", don't have the > > host information. The database data is same in three node. > > > > If I upload a image in node1, image only is storaged in node1. The > > database of three node stores the local filesystem path of image. And > > If The create Instance task is assigned to node2, It will find image > > in node2, but image can't be found in node2. So we get the "Image has > > no associated data" error. > > > > So I want to ask: > > 1. Wheter glance does not support using local filesystem storage in > > a cluster? > > 2. If 1 was right, why do we do this design instead of storing > > information about the host on which images is located, as nova does > > with instance. > > > > I would appreciate any kind of guidance or help. > > > > Thank you, > > Han Guangyu > > > > From hanguangyu2 at gmail.com Thu Jan 6 09:51:10 2022 From: hanguangyu2 at gmail.com (=?UTF-8?B?6Z+p5YWJ5a6H?=) Date: Thu, 6 Jan 2022 17:51:10 +0800 Subject: [glance] Does glance not support using local filesystem storage in a cluster In-Reply-To: References: Message-ID: Hi, Sorry to bother. I continued to think about it. If the the original purpose of "local storage of glance" is just to make openstack > easier to deploy, maybe for testing and getting involved with openstack, but it's not really recommended for production use. I can accept this setting. I used to think of it as an equal way to use shared storage. Thank you. Han ??? ?2022?1?6??? 10:33??? > > Deal all, > > Sorry that maybe I ask a stupid question. But I'm really confused with > it and didn't find discuss in glance > document(https://docs.openstack.org/glance/latest/). > > I have a OpenStack Victoria cluster with three all-in-one node in > centos8. I implemented it with reference to > https://docs.openstack.org/ha-guide/. So this cluster use Pacemaker, > HAproxy and Galera. "To implement high availability, run an instance > of the database on each controller node and use Galera Cluster to > provide replication between them." > > I found that I will encounter an error If I configure Glance backend > to use local storage driver to store image files on the local disk. If > I upload a image, this image only will be storaged in one node. But > the database only storage the file path of image such as > "/v2/images/aa3cbee0-717f-4699-8cca-61243302d693/file", don't have the > host information. The database data is same in three node. > > If I upload a image in node1, image only is storaged in node1. The > database of three node stores the local filesystem path of image. And > If The create Instance task is assigned to node2, It will find image > in node2, but image can't be found in node2. So we get the "Image has > no associated data" error. > > So I want to ask: > 1. Wheter glance does not support using local filesystem storage in a cluster? > 2. If 1 was right, why do we do this design instead of storing > information about the host on which images is located, as nova does > with instance. > > I would appreciate any kind of guidance or help. > > Thank you, > Han Guangyu From pierre at stackhpc.com Thu Jan 6 09:55:54 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 6 Jan 2022 10:55:54 +0100 Subject: [neutron] Dadfailed of ipv6 metadata IP in qdhcp namespace and disappearing dhcp namespaces In-Reply-To: References: Message-ID: Hello Kamil, We also experienced this issue after upgrading to Victoria, which introduced availability of the metadata service over the IPv6 link-local address fe80::a9fe:a9fe. In bug 1953165 I mentioned a workaround. Below is the script I used, which should be followed by a restart of neutron-dhcp-agent. #!/bin/bash for ns in $(ip netns | grep -o 'qdhcp[^ ]*'); do if sudo ip netns exec $ns ip a | grep dadfailed > /dev/null; then tap=$(sudo ip netns exec $ns ip link | grep -o 'tap[^:]*') echo "Cleaning up IPv6 from $tap on $ns" sudo ip netns exec $ns ip addr del fe80::a9fe:a9fe/64 dev $tap fi done On Mon, 3 Jan 2022 at 10:02, Kamil Mad?? wrote: > > Hi Brian, > > thank you very much for pointing to those bugs. It is exactly what we are experiencing in our deployment. I will follow-up in those bugs then. > > Kamil > ________________________________ > From: Brian Haley > Sent: Monday, January 3, 2022 2:35 AM > To: Kamil Mad?? ; openstack-discuss > Subject: Re: [neutron] Dadfailed of ipv6 metadata IP in qdhcp namespace and disappearing dhcp namespaces > > Hi, > > On 1/2/22 10:51 AM, Kamil Mad?? wrote: > > Hello, > > > > In our small cloud environment, we started to see weird behavior during > > last 2 months. Dhcp namespaces started to disappear randomly, which > > caused that VMs losed connectivity once dhcp lease expired. > > After the investigation I found out following issue/bug: > > > > 1. ipv6 metadata address of tap interface in some qdhcp-xxxx namespaces > > are stucked in "dadfailed tentative" state (i do not know why yet) > > This issue was reported about a month ago: > > https://bugs.launchpad.net/neutron/+bug/1953165 > > And Bence marked it a duplicate of: > > https://bugs.launchpad.net/neutron/+bug/1930414 > > Seems to be a bug in a flow based on the title - "Traffic leaked from > dhcp port before vlan tag is applied". > > I would follow-up in that second bug. > > Thanks, > > -Brian > > > 3. root at cloud01:~# ip netns exec > > qdhcp-3094b264-829b-4381-9ca2-59b3a3fc1ea1 ip a > > 1: lo: mtu 65536 qdisc noqueue state UNKNOWN > > group default qlen 1000 > > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > > inet 127.0.0.1/8 scope host lo > > valid_lft forever preferred_lft forever > > inet6 ::1/128 scope host > > valid_lft forever preferred_lft forever > > 2585: tap1797d9b1-e1: mtu 1500 > > qdisc noqueue state UNKNOWN group default qlen 1000 > > link/ether fa:16:3e:77:64:0d brd ff:ff:ff:ff:ff:ff > > inet 169.254.169.254/32 brd 169.254.169.254 scope global > > tap1797d9b1-e1 > > valid_lft forever preferred_lft forever > > inet 192.168.0.2/24 brd 192.168.0.255 scope global tap1797d9b1-e1 > > valid_lft forever preferred_lft forever > > inet6 fe80::a9fe:a9fe/64 scope link dadfailed tentative > > valid_lft forever preferred_lft forever > > inet6 fe80::f816:3eff:fe77:640d/64 scope link > > valid_lft forever preferred_lft forever > > 4. > > > > 5. This blocked dhcp agent to finish sync_state function, and > > NetworkCache was not updated with subnets of such neutron network > > 6. During creation of VM assigned to such network, agent does not > > detect any subnets (see point 2), so he thinks > > (reload_allocations()) there is no dhcp needed and deletes > > qdhcp-xxxx namespace, so no DHCP and no Metadata are working on such > > network since that moment, and after 24h we see connectivity issues. > > 7. Restart of DHCP agent recreates missing qdhcp-xxxx namespaces, but > > NetworkCache in dhcp agent is again empty, so creation of VM > > deletes the qdhcp-xxxx namespace again ? > > > > Workaround is to remove dhcp agent from that network and add it again. > > Interestingly, sometimes I need to do it multiple times, because in few > > cases tap interface in new qdhcp finishes again in dadfailed tentative > > state. After year in production we have 20 networks out of 60 in such state. > > > > We are using kolla-ansible deployment on Ubuntu 20.04, kernel > > 5.4.0-65-generic. Openstack version Victoria and neutron is in version > > 17.2.2.dev70. > > > > > > Is that bug in neutron, or is it misconfiguration of OS on our side? > > > > I'm locally testing patch which disables ipv6 dad in qdhcp-xxxx > > namespace (net.ipv6.conf.default.accept_dad = 1), but I'm not sure it is > > good solution when it comes to other neutron features? > > > > > > Kamil Mad?? > > /Slovensko IT a.s./ > > From katonalala at gmail.com Thu Jan 6 10:45:55 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 6 Jan 2022 11:45:55 +0100 Subject: [all][Neutron][devstack] Deprecate lib/neutron Message-ID: Hi, In devstack we have two more or less overlapping lib for deploying Neutron: lib/neutron and lib/neutron-legacy. lib/neutron-legacy was recently "undeprecated" (see [0]), and Openstack CI mostly (fully?) uses it. To avoid double maintenance, and confusion in settings we would like to deprecate lib/neutron, for recent discussions see [1] and [2]. The plan is to keep lib/neutron for 2 cycles (that will be AA or the release after Z) and delete lib/neutron then, and rename lib/neutron-legacy to lib/neutron. As You can see from the discussions below there are questions like how to name the processes, log-files etc..., as lib/neutron-legacy uses q-* and lib/neutron uses neutron-* names. The best would be to keep both in the long term (we love q-* as historical tradition, but neutron-* looks like so fresh and modern;)) If you have any thoughts suggestions or questions please answer to this mail or comment on the deprecation patch: [3] [0]: https://review.opendev.org/c/openstack/devstack/+/704829 [1]: https://meetings.opendev.org/meetings/networking/2022/networking.2022-01-04-14.04.log.html#l-52 [2]: https://meetings.opendev.org/irclogs/%23openstack-qa/%23openstack-qa.2022-01-05.log.html#t2022-01-05T15:57:37 [3]: https://review.opendev.org/c/openstack/devstack/+/823653 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Thu Jan 6 12:44:13 2022 From: zigo at debian.org (Thomas Goirand) Date: Thu, 6 Jan 2022 13:44:13 +0100 Subject: [glance] Does glance not support using local filesystem storage in a cluster In-Reply-To: References: Message-ID: On 1/6/22 03:33, ??? wrote: > Deal all, > > Sorry that maybe I ask a stupid question. But I'm really confused with > it and didn't find discuss in glance > document(https://docs.openstack.org/glance/latest/). > > I have a OpenStack Victoria cluster with three all-in-one node in > centos8. I implemented it with reference to > https://docs.openstack.org/ha-guide/. So this cluster use Pacemaker, > HAproxy and Galera. "To implement high availability, run an instance > of the database on each controller node and use Galera Cluster to > provide replication between them." > > I found that I will encounter an error If I configure Glance backend > to use local storage driver to store image files on the local disk. If > I upload a image, this image only will be storaged in one node. But > the database only storage the file path of image such as > "/v2/images/aa3cbee0-717f-4699-8cca-61243302d693/file", don't have the > host information. The database data is same in three node. > > If I upload a image in node1, image only is storaged in node1. The > database of three node stores the local filesystem path of image. And > If The create Instance task is assigned to node2, It will find image > in node2, but image can't be found in node2. So we get the "Image has > no associated data" error. > > So I want to ask: > 1. Wheter glance does not support using local filesystem storage in a cluster? > 2. If 1 was right, why do we do this design instead of storing > information about the host on which images is located, as nova does > with instance. > > I would appreciate any kind of guidance or help. > > Thank you, > Han Guangyu > Hi ??, It is possible to setup Glance with a local storage in a HA way. The way to do this, is simply to get your HAproxy to use one node, always, and the other as backups. Then have a cron job that does the rsync from the first node to the other 2. A simple command as Glance user like this is enough (to be run as Glance user, and having the ssh host keys thingy fixed (we sign host keys, so we don't have this problem)): rsync -e ssh -avz --delete /var/lib/glance/images/ \ :/var/lib/glance/images/ >/dev/null 2>&1 We have some internal logic to iterate through all the backup nodes and replace dest-host accordingly... This way, if the first node fails, yes, you do have a problem because there wont be the primary node that is up, so saving new Glance image will be a problem as it wont be replicated to other nodes. But existing image will be there already, so it ok until you repair the first node. I hope this helps, Cheers, Thomas Goirand (zigo) From Danny.Webb at thehutgroup.com Thu Jan 6 13:05:37 2022 From: Danny.Webb at thehutgroup.com (Danny Webb) Date: Thu, 6 Jan 2022 13:05:37 +0000 Subject: [glance] Does glance not support using local filesystem storage in a cluster In-Reply-To: References: Message-ID: We've done something similar insofar as we've installed gluster with replica X on our controllers to share out the glace images volume between nodes. Saves you having to rsync between nodes and supports proper HA. ________________________________ From: Thomas Goirand Sent: 06 January 2022 12:44 To: ??? ; openstack-discuss Cc: hanguangyu at uniontech.com ; wangleic at uniontech.com Subject: Re: [glance] Does glance not support using local filesystem storage in a cluster CAUTION: This email originates from outside THG On 1/6/22 03:33, ??? wrote: > Deal all, > > Sorry that maybe I ask a stupid question. But I'm really confused with > it and didn't find discuss in glance > document(https://docs.openstack.org/glance/latest/). > > I have a OpenStack Victoria cluster with three all-in-one node in > centos8. I implemented it with reference to > https://docs.openstack.org/ha-guide/. So this cluster use Pacemaker, > HAproxy and Galera. "To implement high availability, run an instance > of the database on each controller node and use Galera Cluster to > provide replication between them." > > I found that I will encounter an error If I configure Glance backend > to use local storage driver to store image files on the local disk. If > I upload a image, this image only will be storaged in one node. But > the database only storage the file path of image such as > "/v2/images/aa3cbee0-717f-4699-8cca-61243302d693/file", don't have the > host information. The database data is same in three node. > > If I upload a image in node1, image only is storaged in node1. The > database of three node stores the local filesystem path of image. And > If The create Instance task is assigned to node2, It will find image > in node2, but image can't be found in node2. So we get the "Image has > no associated data" error. > > So I want to ask: > 1. Wheter glance does not support using local filesystem storage in a cluster? > 2. If 1 was right, why do we do this design instead of storing > information about the host on which images is located, as nova does > with instance. > > I would appreciate any kind of guidance or help. > > Thank you, > Han Guangyu > Hi ??, It is possible to setup Glance with a local storage in a HA way. The way to do this, is simply to get your HAproxy to use one node, always, and the other as backups. Then have a cron job that does the rsync from the first node to the other 2. A simple command as Glance user like this is enough (to be run as Glance user, and having the ssh host keys thingy fixed (we sign host keys, so we don't have this problem)): rsync -e ssh -avz --delete /var/lib/glance/images/ \ :/var/lib/glance/images/ >/dev/null 2>&1 We have some internal logic to iterate through all the backup nodes and replace dest-host accordingly... This way, if the first node fails, yes, you do have a problem because there wont be the primary node that is up, so saving new Glance image will be a problem as it wont be replicated to other nodes. But existing image will be there already, so it ok until you repair the first node. I hope this helps, Cheers, Thomas Goirand (zigo) Danny Webb Senior Linux Systems Administrator The Hut Group Tel: Email: Danny.Webb at thehutgroup.com For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Fifth Floor, Voyager House, Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its respective subsidiaries. Confidentiality Notice This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 811888 or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of the company. Encryptions and Viruses Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions sent by e-mail. Monitoring Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also be monitored and may be recorded to secure effective use and operation and for other lawful business purposes. hgvyjuv -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Jan 6 14:10:33 2022 From: smooney at redhat.com (Sean Mooney) Date: Thu, 06 Jan 2022 14:10:33 +0000 Subject: [nova] iothread support with Libvirt In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> Message-ID: <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> On Thu, 2022-01-06 at 00:12 -0600, Eric K. Miller wrote: > Hi, > > > > I haven't found anything that indicates Nova supports adding iothreads > parameters to the Libvirt XML file. I had asked various performance > related questions a couple years back, including asking if iothreads > were available, but I didn't get any response (so assumed the answer was > no). So I'm just checking again to see if this has been a consideration > to help improve a VM's storage performance - specifically with extremely > high-speed storage in the host. hi up until recently the advice from our virt team was that iotread where not really needed for openstack howver in the last 6 weeks they have actully asked us to consider enabling them so work will be happening in qemu/libvirt to always create at least one iothread going forward and affinies it to the same set of cores as the emulator threads by default. we dont have a downstream rfe currently filed for ioithread specifically but we do virtio scsi multi queue support https://bugzilla.redhat.com/show_bug.cgi?id=1880273 i was proposing that we also enable iotread support as part of that work but we have not currently internaly piroited it for any upstream release. enable support for iotrhead and virtio multiqueue i think makes a lot of sense to do together. my understanding is that without iothread multi queue virtio scsi does not provide as much of a perfromace boost as with io threads. if you our other have capasity to work on this i would be happy to work on a spec with ye to enable it. effectivly what i was plannign to propose if we got around to it is adding a new config option cpu_iothread_set which would default to the same value as cpu_share_set. this effectivly will ensure that witout any config updates all existing deployment will start benifiting form iothreads and allow you to still dedicate a set of cores to running the iothread seperate form the cpu_share_set if you wasnt this to also benifit floating vms not just pinned vms. in addtion to that a new flavor extra spec/image property woudl be added similar to cpu_emultor_threads. im not quite sure how that extra spec should work but either hw:cpu_iotread_policy woudl either support the same vales as hw:cpu_emulator_threads where hw:cpu_iotread_policy=shared woudl allocate an iotread that floats over the cpu_iothread_set (which is the same as cpu_shared_set by default) and hw:cpu_iotread_policy=isolate would allocate an addtional iothread form the cpu_dedicated_set. hw:cpu_iotread_policy=share woudl be the default behavior if cpu_shared_set or cpu_iothread_set was defined in the config and not flavor extra spec or image property was defiend. basically all vms woudl have at least 1 iothread that floated over teh shared pool if a share pool was configured on the host. that is option a option b woudl be to allso support hw:cpu_iotread_count so you could ask for n iothread eitehr form the shared/iothread set or dedicated set depending on the value of hw:cpu_iotread_policy im not really sure if there is a need for more the 1 io thread. my understanding is that once you have at least 1 there is demising retruns. it will improve your perfoamce if you have more propvided you have multiple disks/volumes attached but not as much as having the initall iotread. is this something you wold be willing to work on and implement? i woudl be happy to review any spec in this areay and i can bring it up downstream again but i cant commit to working on this in the z release. this would require some minor rpc chagnes to ensure live migrate work properly as the iothread set or cpu share set could be different on different hosts. but beyond that the feature is actully pretty simple to enable. > > > > Or is there a way to add iothread-related parameters without Nova being > involved (such as modifying a template)? no there is no way to enable them out of band of nova today. you technially could wrap the qemu binary wiht a script that inject parmaters but that obviously would not be supported upstream. but that would be a workaround if you really needed it https://review.opendev.org/c/openstack/devstack/+/817075 is an exmaple of such a script that break apparmor and selinx but you could proably make it work with enough effort. although i woudl sugess just implemeting the feature upstream and downing a downstream backport instead. > > > > Thanks! > > > > Eric > From zigo at debian.org Thu Jan 6 15:25:15 2022 From: zigo at debian.org (Thomas Goirand) Date: Thu, 6 Jan 2022 16:25:15 +0100 Subject: [glance] Does glance not support using local filesystem storage in a cluster In-Reply-To: References: Message-ID: <9d9bf810-057b-daf9-f6ec-5f231b058fb4@debian.org> On 1/6/22 14:05, Danny Webb wrote: > We've done something similar insofar as we've installed gluster with > replica X on our controllers to share out the glace images volume > between nodes.? Saves you having to rsync between nodes and supports > proper HA. One needs to like Gluster though ... :) Thomas From openstack at nemebean.com Thu Jan 6 16:17:56 2022 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 6 Jan 2022 10:17:56 -0600 Subject: [nova][requirements] fasteners===0.16.3 held back by nova In-Reply-To: <20220106023338.4yaudjnvmmysvp5d@mthode.org> References: <20220106023338.4yaudjnvmmysvp5d@mthode.org> Message-ID: <3b644658-f7ac-9188-8d2c-4d109dcaffd7@nemebean.com> Is it still this: https://github.com/harlowja/fasteners/issues/36 ? I did some investigation about a year ago with one of the fasteners maintainers, but we never really came up with a definite answer as to what is going on. :-/ On 1/5/22 20:33, Matthew Thode wrote: > This one is simple, and iirc is blocked on upstream fixing something > (but cannot find the reference). > > fasteners===0.16.3 > > https://review.opendev.org/823470 > and > https://review.opendev.org/804246 > both test this change. > From openstack at nemebean.com Thu Jan 6 16:31:34 2022 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 6 Jan 2022 10:31:34 -0600 Subject: [security-sig] Log4j vulnerabilities and OpenStack In-Reply-To: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> Message-ID: <2a3bb648-f762-5061-0b40-b5cc4a4301cc@nemebean.com> On 1/3/22 10:02, Jeremy Stanley wrote: > Unless you were living under a rock during most of December, you've > almost certainly seen the press surrounding the various security > vulnerabilities discovered in the Apache Log4j Java library. As > everyone reading this list hopefully knows, OpenStack is primarily > written in Python, so has little use for Java libraries in the first > place, but that hasn't stopped users from asking our VMT members if > OpenStack is affected. > > While OpenStack doesn't require any Java components, I'm aware of > one Neutron driver (networking-odl) which relies on an affected > third-party service: > https://access.redhat.com/solutions/6586821 > > Additionally, "storm" component of SUSE OpenStack seems to be > impacted: > https://www.suse.com/c/suse-statement-on-log4j-log4shell-cve-2021-44228-vulnerability/ > > As does an Elasticsearch component in Sovereign Cloud Stack: > https://scs.community/security/2021/12/13/advisory-log4j/ > > Users should, obviously, rely on their distribution > vendors/suppliers to notify them of available updates for these. Is > anyone aware of other, similar situations where OpenStack is > commonly installed alongside Java software using Log4j in vulnerable > ways? > I don't know if this is common, but if you use Zookeeper for DLM I assume you'd be affected. It's a supported driver in Tooz so it's possible someone would be using it. From fungi at yuggoth.org Thu Jan 6 16:40:20 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 6 Jan 2022 16:40:20 +0000 Subject: [security-sig] Log4j vulnerabilities and OpenStack In-Reply-To: <2a3bb648-f762-5061-0b40-b5cc4a4301cc@nemebean.com> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <2a3bb648-f762-5061-0b40-b5cc4a4301cc@nemebean.com> Message-ID: <20220106164020.hru3qnkdj522otzj@yuggoth.org> On 2022-01-06 10:31:34 -0600 (-0600), Ben Nemec wrote: [...] > I don't know if this is common, but if you use Zookeeper for DLM I > assume you'd be affected. It's a supported driver in Tooz so it's > possible someone would be using it. Thanks, that's a good point! I recall when we were investigating it with regard to Zuul (which relies on ZK for state coordination and persistence), the conclusion was that it isn't impacted by the recent vulnerabilities. I found this brief explanation, but maybe that's outdated information? https://issues.apache.org/jira/browse/ZOOKEEPER-4423 -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From mthode at mthode.org Thu Jan 6 16:41:13 2022 From: mthode at mthode.org (Matthew Thode) Date: Thu, 6 Jan 2022 10:41:13 -0600 Subject: [nova][requirements] fasteners===0.16.3 held back by nova In-Reply-To: <3b644658-f7ac-9188-8d2c-4d109dcaffd7@nemebean.com> References: <20220106023338.4yaudjnvmmysvp5d@mthode.org> <3b644658-f7ac-9188-8d2c-4d109dcaffd7@nemebean.com> Message-ID: <20220106164113.iwf7l6v7cyj6tmr2@mthode.org> That seems right, I seem to remember eventlet in there. -- Matthew Thode On 22-01-06 10:17:56, Ben Nemec wrote: > Is it still this: https://github.com/harlowja/fasteners/issues/36 ? > > I did some investigation about a year ago with one of the fasteners > maintainers, but we never really came up with a definite answer as to what > is going on. :-/ > > On 1/5/22 20:33, Matthew Thode wrote: > > This one is simple, and iirc is blocked on upstream fixing something > > (but cannot find the reference). > > > > fasteners===0.16.3 > > > > https://review.opendev.org/823470 > > and > > https://review.opendev.org/804246 > > both test this change. > > From openstack at nemebean.com Thu Jan 6 17:16:08 2022 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 6 Jan 2022 11:16:08 -0600 Subject: [security-sig] Log4j vulnerabilities and OpenStack In-Reply-To: <20220106164020.hru3qnkdj522otzj@yuggoth.org> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <2a3bb648-f762-5061-0b40-b5cc4a4301cc@nemebean.com> <20220106164020.hru3qnkdj522otzj@yuggoth.org> Message-ID: On 1/6/22 10:40, Jeremy Stanley wrote: > On 2022-01-06 10:31:34 -0600 (-0600), Ben Nemec wrote: > [...] >> I don't know if this is common, but if you use Zookeeper for DLM I >> assume you'd be affected. It's a supported driver in Tooz so it's >> possible someone would be using it. > > Thanks, that's a good point! I recall when we were investigating it > with regard to Zuul (which relies on ZK for state coordination and > persistence), the conclusion was that it isn't impacted by the > recent vulnerabilities. I found this brief explanation, but maybe > that's outdated information? > https://issues.apache.org/jira/browse/ZOOKEEPER-4423 > Ah, so zookeeper was one of the projects using a version of log4j so ancient it wasn't affected. :-) I was just thinking of Java stuff that might be running alongside OpenStack, I don't know anything that contradicts the issue you linked. From melwittt at gmail.com Thu Jan 6 17:53:06 2022 From: melwittt at gmail.com (melanie witt) Date: Thu, 6 Jan 2022 09:53:06 -0800 Subject: [nova][requirements] fasteners===0.16.3 held back by nova In-Reply-To: <20220106164113.iwf7l6v7cyj6tmr2@mthode.org> References: <20220106023338.4yaudjnvmmysvp5d@mthode.org> <3b644658-f7ac-9188-8d2c-4d109dcaffd7@nemebean.com> <20220106164113.iwf7l6v7cyj6tmr2@mthode.org> Message-ID: On Thu Jan 06 2022 08:41:13 GMT-0800 (Pacific Standard Time), Matthew Thode wrote: > Subject: > Re: [nova][requirements] fasteners===0.16.3 held back by nova > From: > Matthew Thode > Date: > 1/6/22, 08:41 > > To: > Ben Nemec > CC: > openstack-discuss at lists.openstack.org > > > That seems right, I seem to remember eventlet in there. > > -- Matthew Thode On 22-01-06 10:17:56, Ben Nemec wrote: >> Is it still this:https://github.com/harlowja/fasteners/issues/36 ? >> >> I did some investigation about a year ago with one of the fasteners >> maintainers, but we never really came up with a definite answer as to what >> is going on. :-/ >> >> On 1/5/22 20:33, Matthew Thode wrote: >>> This one is simple, and iirc is blocked on upstream fixing something >>> (but cannot find the reference). >>> >>> fasteners===0.16.3 >>> >>> https://review.opendev.org/823470 >>> and >>> https://review.opendev.org/804246 >>> both test this change. I worked on the fasteners thing for Too Long of a Time last October and found what is happening. It is indeed the same issue from 2019 [1] and I explain the problem (it's long) in a new eventlet issue I opened [2]. I proposed a patch to "fix" the problem in nova [3], it was initially nacked because it has to do with eventlet, but it's the simplest, smallest change IMHO that will address the issue. I also went on a wild goose chase trying to change all our spawn_n() calls with spawn() in PS2 and PS3 but it ended in a dead end. There are comments detailing that attempt in the review if anyone is curious. So, based on that dead end and seeing this come up on the ML, I have reverted [3] to PS1 if anyone can review and give feedback on what approach they would prefer if they think the current approach is not suitable. Note: the reason we pull in fasteners is through oslo.concurrency, the lockutils use it. Cheers, -melanie [1] https://github.com/harlowja/fasteners/issues/36 [2] https://github.com/eventlet/eventlet/issues/731 [3] https://review.opendev.org/c/openstack/nova/+/813114 From katonalala at gmail.com Thu Jan 6 18:19:33 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 6 Jan 2022 19:19:33 +0100 Subject: [neutron] Drivers meeting - Friday 7.1.2022 - cancelled Message-ID: Hi Neutron Drivers! Tomorrow we have not enough drivers to vote (see previous mail from mlavalle: [0]), so let's cancel tomorrow's meeting, and meet a week later: January 14th. [0]: http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026549.html See You on the meeting next week. Lajos Katona (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From emiller at genesishosting.com Thu Jan 6 20:20:39 2022 From: emiller at genesishosting.com (Eric K. Miller) Date: Thu, 6 Jan 2022 14:20:39 -0600 Subject: [nova] iothread support with Libvirt In-Reply-To: <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> Hi Sean, Thanks, as always, for your reply! > hi up until recently the advice from our virt team was that iotread where not > really needed > for openstack howver in the last 6 weeks they have actully asked us to > consider enabling them I don't have the data to know whether iothread improves performance or not. Rather, I made the assumption that a dedicated core for I/O would likely perform much better than without. If someone has any data on this, it would be extremely useful. The issue we are trying to resolve is related to high-speed local storage performance that is literally 10x, and sometimes 15x, slower in a VM than the host. The local storage can reach upwards of 8GiB/sec and 1 million IOPS. It's not necessarily throughput we're after, though - it is latency, and the high latency in QEMU/KVM is simply too high to get adequate storage performance inside a VM. If iothread(s) do not help, then the point of implementing the parameter in Nova is probably moot. > so work will be happening in qemu/libvirt to always create at least one > iothread going forward and > affinies it to the same set of cores as the emulator threads by default. That sounds like a good idea, although I did read somewhere in the QEMU docs that not all drivers support iothreads, and trying to use them with unsupported drivers will likely crash QEMU - but I don't know how old those docs were. It seems reasonable since the "big QEMU lock" is not being used for the io thread(s). > we dont have a downstream rfe currently filed for ioithread specifically but > we do virtio scsi multi queue support > https://bugzilla.redhat.com/show_bug.cgi?id=1880273 I found this old blueprint and implementation (that apparently was never accepted due to tests failing in various environments): https://blueprints.launchpad.net/nova/+spec/libvirt-iothreads https://review.opendev.org/c/openstack/nova/+/384871/ > to do together. my understanding is that without iothread multi queue virtio > scsi does not provide as much of > a perfromace boost as with io threads. I can imagine that being the case - since a spinning loop has super-low latency compared to an interrupt. > if you our other have capasity to work on this i would be happy to work on a > spec with ye to enable it. I wish I had the bandwidth to learn how, but since I'm not a Python developer, nor have a development environment ready to go (I'm mostly performing cloud operator and business support functions), I probably couldn't help much other than provide feedback. > effectivly what i was plannign to propose if we got around to it is adding a > new config option > cpu_iothread_set which would default to the same value as cpu_share_set. > this effectivly will ensure that witout any config updates all existing > deployment will start benifiting > form iothreads and allow you to still dedicate a set of cores to running the > iothread seperate form the cpu_share_set > if you wasnt this to also benifit floating vms not just pinned vms. I would first suggest asking the QEMU folks whether there are incompatibilities with iothreads with storage drivers that could cause issues by enabling iothreads by default. I suggest a more cautionary approach and leave the default as-is and allow a user to enable iothreads themselves. The default could always be changed later if there isn't any negative feedback from those who tried using iothreads. > in addtion to that a new flavor extra spec/image property woudl be added > similar to cpu_emultor_threads. > > im not quite sure how that extra spec should work but either > hw:cpu_iotread_policy woudl either support the same vales as > hw:cpu_emulator_threads where > hw:cpu_iotread_policy=shared woudl allocate an iotread that floats over the > cpu_iothread_set (which is the same as cpu_shared_set by default) > and hw:cpu_iotread_policy=isolate would allocate an addtional iothread > form the cpu_dedicated_set. > hw:cpu_iotread_policy=share woudl be the default behavior if > cpu_shared_set or cpu_iothread_set was defined in the config and not > flavor extra > spec or image property was defiend. basically all vms woudl have at least 1 > iothread that floated over teh shared pool if a share pool was configured > on the host. I will have to review this more carefully, when I have a bit more time. > that is option a > option b woudl be to allso support > > hw:cpu_iotread_count so you could ask for n iothread eitehr form the > shared/iothread set or dedicated set depending on the value of > hw:cpu_iotread_policy > > im not really sure if there is a need for more the 1 io thread. my > understanding is that once you have at least 1 there is demising retruns. > it will improve your perfoamce if you have more propvided you have multiple > disks/volumes attached but not as much as having the initall iotread. I would guess that multiple io threads would benefit multiple VMs, where each VM would use its own I/O thread/dedicated core. So, I think providing the possibility for multiple iothreads should be considered, with assignment of these threads to individual VMs. However, this brings up a significantly more complex resource allocation requirement, much less resource allocation during live migration. > is this something you wold be willing to work on and implement? > i woudl be happy to review any spec in this areay and i can bring it up > downstream again but i cant commit to working on this in the z release. > this would require some minor rpc chagnes to ensure live migrate work > properly as the iothread set or cpu share set could be different on different > hosts. but beyond that the feature is actully pretty simple to enable. I think we need to do some testing to prove the performance benefits first - before spending the time to implement. > no there is no way to enable them out of band of nova today. > you technially could wrap the qemu binary wiht a script that inject parmaters > but that obviously would not be supported upstream. > but that would be a workaround if you really needed it > > https://review.opendev.org/c/openstack/devstack/+/817075 is an exmaple > of such a script > that break apparmor and selinx but you could proably make it work with > enough effort. > although i woudl sugess just implemeting the feature upstream and downing > a downstream backport instead. Interesting - maybe I can hack this for testing and proof-of-concept purposes. Thanks for the suggestion! I'll see if we can figure out how to test iothreads in our environment where the high-speed local storage exists. Eric From emiller at genesishosting.com Thu Jan 6 22:41:37 2022 From: emiller at genesishosting.com (Eric K. Miller) Date: Thu, 6 Jan 2022 16:41:37 -0600 Subject: [nova] iothread support with Libvirt In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> > > no there is no way to enable them out of band of nova today. > > you technially could wrap the qemu binary wiht a script that inject > parmaters > > but that obviously would not be supported upstream. > > but that would be a workaround if you really needed it > > > > https://review.opendev.org/c/openstack/devstack/+/817075 is an > exmaple > > of such a script I created a modified version of your script to wrap the qemu-kvm executable, but when OpenStack starts the VM, Nova returns: 2022-01-06 16:15:24.758 6 ERROR nova.compute.manager libvirtError: internal error: Failed to probe QEMU binary with QMP: qemu-kvm.orig: -object iothread,id=iothread0: invalid option "-object iothread,id=iothread0" is the first argument. Our Libvirt/QEMU versions are: Compiled against library: libvirt 4.5.0 Using library: libvirt 4.5.0 Using API: QEMU 4.5.0 Running hypervisor: QEMU 2.12.0 I'm pretty sure these versions includes support for iothreads (for both QEMU as well as Libvirt). Is Libvirt doing some form of cross-check on the XML parameters with the running QEMU parameters that is incompatible with the wrapper perhaps? Eric From laurentfdumont at gmail.com Thu Jan 6 23:44:23 2022 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Thu, 6 Jan 2022 18:44:23 -0500 Subject: [nova] iothread support with Libvirt In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> Message-ID: Out of curiosity, how are you passing the local storage to the VM? I would also assume a performance hit when using a VM but local storage (instead of Ceph, iscsi, nfs) should still perform well? On Thu, Jan 6, 2022 at 5:44 PM Eric K. Miller wrote: > > > no there is no way to enable them out of band of nova today. > > > you technially could wrap the qemu binary wiht a script that inject > > parmaters > > > but that obviously would not be supported upstream. > > > but that would be a workaround if you really needed it > > > > > > https://review.opendev.org/c/openstack/devstack/+/817075 is an > > exmaple > > > of such a script > > I created a modified version of your script to wrap the qemu-kvm > executable, but when OpenStack starts the VM, Nova returns: > > 2022-01-06 16:15:24.758 6 ERROR nova.compute.manager libvirtError: > internal error: Failed to probe QEMU binary with QMP: qemu-kvm.orig: > -object iothread,id=iothread0: invalid option > > "-object iothread,id=iothread0" is the first argument. > > Our Libvirt/QEMU versions are: > Compiled against library: libvirt 4.5.0 > Using library: libvirt 4.5.0 > Using API: QEMU 4.5.0 > Running hypervisor: QEMU 2.12.0 > > I'm pretty sure these versions includes support for iothreads (for both > QEMU as well as Libvirt). > > Is Libvirt doing some form of cross-check on the XML parameters with the > running QEMU parameters that is incompatible with the wrapper perhaps? > > Eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emiller at genesishosting.com Thu Jan 6 23:50:59 2022 From: emiller at genesishosting.com (Eric K. Miller) Date: Thu, 6 Jan 2022 17:50:59 -0600 Subject: [nova] iothread support with Libvirt In-Reply-To: References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04AFB65B@gmsxchsvr01.thecreation.com> > Out of curiosity, how are you passing the local storage to the VM? I would also assume a performance hit when using a VM but local storage (instead of Ceph, iscsi, nfs) should still perform well? We're using and LVM logical volume on 4 x Micron 9300's in a RAID 0 configuration using md. We're using the standard Nova "image_type=lvm" option. The performance is good, relative to something as slow as Ceph, but the performance hit is pretty significant compared to the host's performance. It is essentially IOPS limited. I can run some tests and provide results if you were interested. Eric From laurentfdumont at gmail.com Fri Jan 7 00:09:14 2022 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Thu, 6 Jan 2022 19:09:14 -0500 Subject: [nova] iothread support with Libvirt In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04AFB65B@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65B@gmsxchsvr01.thecreation.com> Message-ID: For sure! I would be curious to see the benchmarks. I haven't deployed anything with LVM but I'm surprised that the cost is so high. There is a pretty thin line between the VM --> Libvirt --> Qemu --> LVM. I would expect the performance to be close to baremetal, but lower of course. On Thu, Jan 6, 2022 at 6:51 PM Eric K. Miller wrote: > > Out of curiosity, how are you passing the local storage to the VM? I > would also assume a performance hit when using a VM but local storage > (instead of Ceph, iscsi, nfs) should still perform well? > > We're using and LVM logical volume on 4 x Micron 9300's in a RAID 0 > configuration using md. We're using the standard Nova "image_type=lvm" > option. > > The performance is good, relative to something as slow as Ceph, but the > performance hit is pretty significant compared to the host's performance. > It is essentially IOPS limited. I can run some tests and provide results > if you were interested. > > Eric > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emiller at genesishosting.com Fri Jan 7 00:15:32 2022 From: emiller at genesishosting.com (Eric K. Miller) Date: Thu, 6 Jan 2022 18:15:32 -0600 Subject: [nova] iothread support with Libvirt In-Reply-To: References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65B@gmsxchsvr01.thecreation.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04AFB65C@gmsxchsvr01.thecreation.com> > For sure!? I would be curious to see the benchmarks. I haven't deployed anything with LVM but I'm surprised that the cost is so high. No problem. I'll work on this tonight. > There is a pretty thin line between the VM --> Libvirt --> Qemu --> LVM. I would expect the performance to be close to baremetal, but lower of course. The bare metal test is using a logical volume on the same LVM volume group as OpenStack, so it is an easy comparison of VM versus bare metal. The fact that the bare metal test is ridiculously fast shows that there is significant latency in QEMU/KVM somewhere. I have seen others that mention they get a maximum of about 190k IOPS, which is still quite a bit, but that is slowly becoming easy to achieve with the latest SSDs, even when writing small random blocks. Eric From emiller at genesishosting.com Fri Jan 7 03:54:38 2022 From: emiller at genesishosting.com (Eric K. Miller) Date: Thu, 6 Jan 2022 21:54:38 -0600 Subject: [nova] iothread support with Libvirt References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65B@gmsxchsvr01.thecreation.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04AFB65E@gmsxchsvr01.thecreation.com> Hi Laurent, I thought I may have already done some benchmarks, and it looks like I did, long ago, for the discussion that I created a couple years ago (on August 6, 2020 to be exact). I copied the results from that email below. You can see that the latency difference is pretty significant (13.75x with random 4KiB reads) between bare metal and a VM, which is about the same as the difference in IOPS. Writes are not quite as bad of difference at 8.4x. Eric Some numbers from fio, just to get an idea for how good/bad the IOPS will be: Configuration: 32 core EPYC 7502P with 512GiB of RAM - CentOS 7 latest updates - Kolla Ansible (Stein) deployment 32 vCPU VM with 64GiB of RAM 32 x 10GiB test files (I'm using file tests, not raw device tests, so not optimal, but easiest when the VM root disk is the test disk) iodepth=10 numofjobs=32 time=30 (seconds) The VM was deployed using a qcow2 image, then deployed as a raw image, to see the difference in performance. There was none, which makes sense, since I'm pretty sure the qcow2 image was decompressed and stored in the LVM logical volume - so both tests were measuring the same thing. Bare metal (random 4KiB reads): 8066MiB/sec 154.34 microsecond avg latency 2.065 million IOPS VM qcow2 (random 4KiB reads): 589MiB/sec 2122.10 microsecond avg latency 151k IOPS Bare metal (random 4KiB writes): 4940MiB/sec 252.44 microsecond avg latency 1.265 million IOPS VM qcow2 (random 4KiB writes): 589MiB/sec 2119.16 microsecond avg latency 151k IOPS Since the read and write VM results are nearly identical, my assumption is that the emulation layer is the bottleneck. CPUs in the VM were all at 55% utilization (all kernel usage). The qemu process on the bare metal machine indicated 1600% (or so) CPU utilization. Below are runs with sequential 1MiB block tests Bare metal (sequential 1MiB reads): 13.3GiB/sec 23446.43 microsecond avg latency 13.7k IOPS VM qcow2 (sequential 1MiB reads): 8378MiB/sec 38164.52 microsecond avg latency 8377 IOPS Bare metal (sequential 1MiB writes): 8098MiB/sec 39488.00 microsecond avg latency 8097 million IOPS VM qcow2 (sequential 1MiB writes): 8087MiB/sec 39534.96 microsecond avg latency 8087 IOPS From elod.illes at est.tech Fri Jan 7 14:48:58 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Fri, 7 Jan 2022 15:48:58 +0100 Subject: [release] Release countdown for week R-11, Jan 10-14 Message-ID: <9529181f-969a-788d-6a36-df47e0b7f1be@est.tech> Development Focus ----------------- We are now past the Yoga-2 milestone, and entering the last development phase of the cycle. Teams should be focused on implementing planned work for the cycle. Now is a good time to review those plans and reprioritize anything if needed based on the what progress has been made and what looks realistic to complete in the next few weeks. General Information ------------------- Looking ahead to the end of the release cycle, please be aware of the feature freeze dates. Those vary depending on deliverable type: * General libraries (except client libraries) need to have their last ? feature release before Non-client library freeze (February 17th, 2022). ? Their stable branches are cut early. * Client libraries (think python-*client libraries) need to have their ? last feature release before Client library freeze (February 24th, 2022) * Deliverables following a cycle-with-rc model (that would be most ? services) observe a Feature freeze on that same date, February 24th, 2022. ? Any feature addition beyond that date should be discussed on the ? mailing-list and get PTL approval. After feature freeze, cycle-with-rc ? deliverables need to produce a first release candidate (and a stable ? branch) before RC1 deadline (March 10th, 2022) * Deliverables following cycle-with-intermediary model can release as ? necessary, but in all cases beforeMarch 25th, 2022 (Final RC deadline) Upcoming Deadlines & Dates -------------------------- Non-client library freeze: February 17th, 2022(R-6 week) Client library freeze: February 24th, 2022(R-5 week) Yoga-3 milestone: February 24th, 2022(R-5 week) Yogafinalrelease: March 30th, 2022 Next PTG: April 4 - 8, 2022 (virtual) El?d Ill?s irc: elodilles -------------- next part -------------- An HTML attachment was scrubbed... URL: From anyrude10 at gmail.com Fri Jan 7 13:58:49 2022 From: anyrude10 at gmail.com (Anirudh Gupta) Date: Fri, 7 Jan 2022 19:28:49 +0530 Subject: [TripleO] Overcloud deployment failing with Network Seggregation Message-ID: Hi Team, I am trying to deploy an Openstack Train using TripleO having 3 Controllers and 1 Compute. Without Network Isolation, I have deployed it successfully. Now, I am trying to segregate my network traffic using multiple-nics-vlan configuration generated via the below command /tools/process-templates.py -o ~/openstack-tripleo-heat-templates-rendered -n /home/stack/network_data.yaml -r /home/stack/templates/roles_data.yaml The command used for deploying the overcloud is openstack overcloud deploy --templates \ *-n /home/stack/templates/network_data.yaml \* -r /home/stack/templates/roles_data.yaml \ -e /home/stack/templates/node-info.yaml \ *...* *-e /home/stack/templates/environments/network-isolation.yaml \-e /home/stack/templates/environments/network-environment.yaml \* *...* It is failing just after the ovs bridge gets created and external nic vlan is attached to the bridge. [image: MicrosoftTeams-image (1).png] MAC Address of br-ex is same as the MAC of the interface specified in file network/config/multiple-nics-vlan/controller.yaml For testing, If I assign an IP address to that interface, it is reachable to GW. But, in case IP is assigned to VLAN 418 and it is added in the br-ex, Gateway remains unreachable. Please suggest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MicrosoftTeams-image (1).png Type: image/png Size: 65085 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ansible-errors.json Type: application/json Size: 4011 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ansible.log Type: application/octet-stream Size: 217956 bytes Desc: not available URL: From jp.methot at planethoster.info Fri Jan 7 21:22:30 2022 From: jp.methot at planethoster.info (J-P Methot) Date: Fri, 7 Jan 2022 16:22:30 -0500 Subject: [kolla] Updating libvirt container images without VM downtime In-Reply-To: References: <36d9fbdb-a582-a766-2263-823ed5ab4959@planethoster.info> Message-ID: <927f86fb-d65e-1145-cbca-f8bb7deef877@planethoster.info> This is interesting, because it's the exact opposite of what I'm seeing in my test infrastructure. If I do sudo docker restart 5b737dc80fc5 (the libvirt container), I lose ssh access, web console access and the dashboard's log window becomes empty. When the libvirt container comes back up, if I go inside the container and do virsh list, the return list is empty. As far as I can tell, the VM is effectively shutdown. The openstack dashboard still reports it as up, but any attempt at operations on the VM will force the dashboard to update and show the VM as shut off. As far as I can tell, restarting the docker container for libvirt did kill off my VM here. From what you tell me, this is not the expected behaviour. So, I must ask, why is it acting differently in my environment? Additionally, someone else said that the VMs were running on the host with the process /usr/libexec/qemu-kvm running the VM. On my compute host, qemu-kvm is not present in /usr/libexec. I understand that this could be due to an OS difference, but I thought it would be an important information to add. This is kolla 12.0 on ubuntu 20.04.3 with the openstack wallaby container installed. On 1/5/22 4:30 AM, Danny Webb wrote: > If working properly the restart of the libvirtd container is a > non-impacting action for running VMs.? The only containers on the > hypervisors that have an actual impact on the VMs in the standard > setups are the restart of the ovs-vswitchd / ovn-controller / ovsdb > containers which result in a small blip of the VM neworks that we've > noticed. > ------------------------------------------------------------------------ > *From:* J-P Methot > *Sent:* 04 January 2022 19:17 > *To:* openstack-discuss > *Subject:* [kolla] Updating libvirt container images without VM downtime > CAUTION: This email originates from outside THG > > Hi, > > I'm looking for validation regarding the way Kolla and containers work > in regard to upgrading the libvirt containers. Essentially, when you > upgrade the libvirt container to a new container image, the container > needs to be restarted, thus creating downtime for the VMs. There is no > way to avoid this downtime, unless you migrate the VMs to another node > and then move them back once the container has restarted, right? > > -- > Jean-Philippe M?thot > Senior Openstack system administrator > Administrateur syst?me Openstack s?nior > PlanetHoster inc. > > > Danny Webb > Senior Linux Systems Administrator > The Hut Group > > Tel: > Email: Danny.Webb at thehutgroup.com > > > For the purposes of this email, the "company" means The Hut Group > Limited, a company registered in England and Wales (company number > 6539496) whose registered office is at Fifth Floor, Voyager House, > Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its > respective subsidiaries. > > *Confidentiality Notice* > This e-mail is confidential and intended for the use of the named > recipient only. If you are not the intended recipient please notify us > by telephone immediately on +44(0)1606 811888 or return it to us by > e-mail. Please then delete it from your system and note that any use, > dissemination, forwarding, printing or copying is strictly prohibited. > Any views or opinions are solely those of the author and do not > necessarily represent those of the company. > > *Encryptions and Viruses* > Please note that this e-mail and any attachments have not been > encrypted. They may therefore be liable to be compromised. Please also > note that it is your responsibility to scan this e-mail and any > attachments for viruses. We do not, to the extent permitted by law, > accept any liability (whether in contract, negligence or otherwise) > for any virus infection and/or external compromise of security and/or > confidentiality in relation to transmissions sent by e-mail. > > *Monitoring* > Activity and use of the company's systems is monitored to secure its > effective use and operation and for other lawful business purposes. > Communications using these systems will also be monitored and may be > recorded to secure effective use and operation and for other lawful > business purposes. > > hgvyjuv -- Jean-Philippe M?thot Senior Openstack system administrator Administrateur syst?me Openstack s?nior PlanetHoster inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Fri Jan 7 22:28:01 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Fri, 7 Jan 2022 23:28:01 +0100 Subject: [kolla] Updating libvirt container images without VM downtime In-Reply-To: <927f86fb-d65e-1145-cbca-f8bb7deef877@planethoster.info> References: <36d9fbdb-a582-a766-2263-823ed5ab4959@planethoster.info> <927f86fb-d65e-1145-cbca-f8bb7deef877@planethoster.info> Message-ID: Hello J-P, I believe you must be hitting this critical bug, which was fixed in kolla-ansible 12.2.0: https://bugs.launchpad.net/kolla-ansible/+bug/1941706 I would recommend keeping on top of kolla-ansible updates, at least using tagged releases, which are also published to PyPI. By staying on the initial Wallaby release, you are missing six months of bug fixes. Best wishes, Pierre Riteau (priteau) On Fri, 7 Jan 2022 at 22:27, J-P Methot wrote: > > This is interesting, because it's the exact opposite of what I'm seeing in my test infrastructure. > > > If I do sudo docker restart 5b737dc80fc5 (the libvirt container), I lose ssh access, web console access and the dashboard's log window becomes empty. When the libvirt container comes back up, if I go inside the container and do virsh list, the return list is empty. As far as I can tell, the VM is effectively shutdown. The openstack dashboard still reports it as up, but any attempt at operations on the VM will force the dashboard to update and show the VM as shut off. > > > As far as I can tell, restarting the docker container for libvirt did kill off my VM here. From what you tell me, this is not the expected behaviour. So, I must ask, why is it acting differently in my environment? > > > Additionally, someone else said that the VMs were running on the host with the process /usr/libexec/qemu-kvm running the VM. On my compute host, qemu-kvm is not present in /usr/libexec. I understand that this could be due to an OS difference, but I thought it would be an important information to add. > > > This is kolla 12.0 on ubuntu 20.04.3 with the openstack wallaby container installed. > > > On 1/5/22 4:30 AM, Danny Webb wrote: > > If working properly the restart of the libvirtd container is a non-impacting action for running VMs. The only containers on the hypervisors that have an actual impact on the VMs in the standard setups are the restart of the ovs-vswitchd / ovn-controller / ovsdb containers which result in a small blip of the VM neworks that we've noticed. > ________________________________ > From: J-P Methot > Sent: 04 January 2022 19:17 > To: openstack-discuss > Subject: [kolla] Updating libvirt container images without VM downtime > > CAUTION: This email originates from outside THG > > Hi, > > I'm looking for validation regarding the way Kolla and containers work > in regard to upgrading the libvirt containers. Essentially, when you > upgrade the libvirt container to a new container image, the container > needs to be restarted, thus creating downtime for the VMs. There is no > way to avoid this downtime, unless you migrate the VMs to another node > and then move them back once the container has restarted, right? > > -- > Jean-Philippe M?thot > Senior Openstack system administrator > Administrateur syst?me Openstack s?nior > PlanetHoster inc. > > > Danny Webb > Senior Linux Systems Administrator > The Hut Group > > Tel: > Email: Danny.Webb at thehutgroup.com > > > For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Fifth Floor, Voyager House, Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its respective subsidiaries. > > Confidentiality Notice > This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 811888 or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of the company. > > Encryptions and Viruses > Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions sent by e-mail. > > Monitoring > Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also be monitored and may be recorded to secure effective use and operation and for other lawful business purposes. > > hgvyjuv > > -- > Jean-Philippe M?thot > Senior Openstack system administrator > Administrateur syst?me Openstack s?nior > PlanetHoster inc. From tonykarera at gmail.com Sat Jan 8 08:44:57 2022 From: tonykarera at gmail.com (Karera Tony) Date: Sat, 8 Jan 2022 10:44:57 +0200 Subject: Logging (Monasca) not in Dashboard Message-ID: Dear Team, I hope all is well. I have deployed Openstack with kolla-ansible and enabled Monasca among the projects. The deployment was successful but unfortunately I can't see Logging in the Dashn=board. Kindly advise If I could be missing something. Regards Tony Karera -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurentfdumont at gmail.com Sat Jan 8 17:38:12 2022 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Sat, 8 Jan 2022 12:38:12 -0500 Subject: [nova] iothread support with Libvirt In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04AFB65E@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65B@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65E@gmsxchsvr01.thecreation.com> Message-ID: Super interesting. Thank you. Pretty obvious with the random IO/throughput performance degradation :( Are these NVME/SSD in hardware raid? On Thu, Jan 6, 2022 at 10:54 PM Eric K. Miller wrote: > Hi Laurent, > > I thought I may have already done some benchmarks, and it looks like I > did, long ago, for the discussion that I created a couple years ago (on > August 6, 2020 to be exact). > > I copied the results from that email below. You can see that the latency > difference is pretty significant (13.75x with random 4KiB reads) between > bare metal and a VM, which is about the same as the difference in IOPS. > Writes are not quite as bad of difference at 8.4x. > > Eric > > > Some numbers from fio, just to get an idea for how good/bad the IOPS will > be: > > Configuration: > 32 core EPYC 7502P with 512GiB of RAM - CentOS 7 latest updates - Kolla > Ansible (Stein) deployment > 32 vCPU VM with 64GiB of RAM > 32 x 10GiB test files (I'm using file tests, not raw device tests, so not > optimal, but easiest when the VM root disk is the test disk) > iodepth=10 > numofjobs=32 > time=30 (seconds) > > The VM was deployed using a qcow2 image, then deployed as a raw image, to > see the difference in performance. There was none, which makes sense, > since I'm pretty sure the qcow2 image was decompressed and stored in the > LVM logical volume - so both tests were measuring the same thing. > > Bare metal (random 4KiB reads): > 8066MiB/sec > 154.34 microsecond avg latency > 2.065 million IOPS > > VM qcow2 (random 4KiB reads): > 589MiB/sec > 2122.10 microsecond avg latency > 151k IOPS > > Bare metal (random 4KiB writes): > 4940MiB/sec > 252.44 microsecond avg latency > 1.265 million IOPS > > VM qcow2 (random 4KiB writes): > 589MiB/sec > 2119.16 microsecond avg latency > 151k IOPS > > Since the read and write VM results are nearly identical, my assumption is > that the emulation layer is the bottleneck. CPUs in the VM were all at 55% > utilization (all kernel usage). The qemu process on the bare metal machine > indicated 1600% (or so) CPU utilization. > > Below are runs with sequential 1MiB block tests > > Bare metal (sequential 1MiB reads): > 13.3GiB/sec > 23446.43 microsecond avg latency > 13.7k IOPS > > VM qcow2 (sequential 1MiB reads): > 8378MiB/sec > 38164.52 microsecond avg latency > 8377 IOPS > > Bare metal (sequential 1MiB writes): > 8098MiB/sec > 39488.00 microsecond avg latency > 8097 million IOPS > > VM qcow2 (sequential 1MiB writes): > 8087MiB/sec > 39534.96 microsecond avg latency > 8087 IOPS > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cel975 at yahoo.com Sun Jan 9 08:17:18 2022 From: cel975 at yahoo.com (Celinio Fernandes) Date: Sun, 9 Jan 2022 08:17:18 +0000 (UTC) Subject: Cannot ssh/ping instance References: <869855278.629940.1641716238605.ref@mail.yahoo.com> Message-ID: <869855278.629940.1641716238605@mail.yahoo.com> Hi, I am running Ubuntu Server 20.04 LTS on Virtualbox. I installed OpenStack (Xena release) through Devstack. Here is the content of my /opt/stack/devstack/local.conf file : [[local|localrc]] ADMIN_PASSWORD=secret DATABASE_PASSWORD=$ADMIN_PASSWORD RABBIT_PASSWORD=$ADMIN_PASSWORD SERVICE_PASSWORD=$ADMIN_PASSWORD HOST_IP=10.0.2.15 I created an instance through Horizon. The security group contains the 2 rules needed (one to be able to ping and one to be able to ssh the instance). I also allocated and associated a floating IP address. And a ssh key pair. Here is the configuration : openstack server list ---------------------------------+--------------------------+---------+ | ID?? | Name | Status | Networks | Image? | Flavor? | ---------------------------------+--------------------------+---------+ | f5f0fdd5-298b-4fa3-9ee9-e6e4288f4327 | InstanceJanvier | ACTIVE | shared=172.24.4.133, 192.168.233.165 | cirros-0.5.2-x86_64-disk | m1.nano | ------------------------------------------------------+ openstack network list : ------------------------------------------------------+ | ID???? | Name??? | Subnets???????????? | ------------------------------------------------------+ | 96a04799-7fc7-4525-b05c-ad57261aed38 | public? | 07ce42db-6f3f-4135-ace7-2fc104ea62a0, 6dba13fc-b10c-48b1-b1b4-e1f1afe25b53 | | c42638dc-fa56-4644-ad34-295fce4811d2 | shared? | a4e2d8cc-02b2-42e2-a525-e0eebbb08980?????????????????????????????????????? | | ffb8a527-266e-4e96-ad60-f7e9aba8f0c1 | private | 42e36677-cf3c-4df4-88a1-8cf79b9d6060, e507e6dd-132a-4249-96b1-83761562dd73 | ------------------------------------------------------+ openstack router list : +--------------------------------------+----------------+--------+------ | ID??? | Name? | Status | State | Project????????????????????????? | +--------------------------------------+----------------+--------+------ | b9a15051-a532-4c93-95ad-53c057720c62 | Virtual_router | ACTIVE | UP??? | 6556c02dd88f4c45b535c2dbb8ba1a04 | +--------------------------------------+----------------+--------+------ I cannot ping/ssh neither the fixed IP address or the floating IP address : ping -c 3 172.24.4.133 PING 172.24.4.133 (172.24.4.133) 56(84) bytes of data. --- 172.24.4.133 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2035ms ping -c 3 192.168.233.165 PING 192.168.233.165 (192.168.233.165) 56(84) bytes of data. --- 192.168.233.165 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2035ms Maybe that has something to do with the network namespaces configuration on Ubuntu. Does anyone know what could go wrong or what is missing ? Thanks for helping. -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Sun Jan 9 19:21:07 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sun, 09 Jan 2022 20:21:07 +0100 Subject: Cannot ssh/ping instance In-Reply-To: <869855278.629940.1641716238605@mail.yahoo.com> References: <869855278.629940.1641716238605.ref@mail.yahoo.com> <869855278.629940.1641716238605@mail.yahoo.com> Message-ID: <5690630.DvuYhMxLoT@p1> Hi, On niedziela, 9 stycznia 2022 09:17:18 CET Celinio Fernandes wrote: > Hi, > I am running Ubuntu Server 20.04 LTS on Virtualbox. I installed OpenStack > (Xena release) through Devstack. Here is the content of my > /opt/stack/devstack/local.conf file : > [[local|localrc]] > ADMIN_PASSWORD=secret > DATABASE_PASSWORD=$ADMIN_PASSWORD > RABBIT_PASSWORD=$ADMIN_PASSWORD > SERVICE_PASSWORD=$ADMIN_PASSWORD > HOST_IP=10.0.2.15 > > > I created an instance through Horizon. The security group contains the > 2 rules needed (one to be able to ping and one to be able to ssh the > instance). I also allocated and associated a floating IP address. And a ssh > key pair. > > Here is the configuration : > openstack server list > ---------------------------------+--------------------------+---------+ > > | ID | Name | Status | Networks | Image | Flavor | > > ---------------------------------+--------------------------+---------+ > > | f5f0fdd5-298b-4fa3-9ee9-e6e4288f4327 | InstanceJanvier | ACTIVE | > | shared=172.24.4.133, 192.168.233.165 | cirros-0.5.2-x86_64-disk | m1.nano > | | > ------------------------------------------------------+ > > > openstack network list : > ------------------------------------------------------+ > > | ID | Name | Subnets | > > ------------------------------------------------------+ > > | 96a04799-7fc7-4525-b05c-ad57261aed38 | public | > | 07ce42db-6f3f-4135-ace7-2fc104ea62a0, 6dba13fc-b10c-48b1-b1b4-e1f1afe25b53 > | | c42638dc-fa56-4644-ad34-295fce4811d2 | shared | > | a4e2d8cc-02b2-42e2-a525-e0eebbb08980 > | | ffb8a527-266e-4e96-ad60-f7e9aba8f0c1 | private | > | 42e36677-cf3c-4df4-88a1-8cf79b9d6060, e507e6dd-132a-4249-96b1-83761562dd73 > | | > ------------------------------------------------------+ > > openstack router list : > +--------------------------------------+----------------+--------+------ > > | ID | Name | Status | State | Project | > > +--------------------------------------+----------------+--------+------ > > | b9a15051-a532-4c93-95ad-53c057720c62 | Virtual_router | ACTIVE | UP | > | 6556c02dd88f4c45b535c2dbb8ba1a04 | > +--------------------------------------+----------------+--------+------ > > > I cannot ping/ssh neither the fixed IP address or the floating IP address : > ping -c 3 172.24.4.133 > PING 172.24.4.133 (172.24.4.133) 56(84) bytes of data. > --- 172.24.4.133 ping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > ping -c 3 192.168.233.165 > PING 192.168.233.165 (192.168.233.165) 56(84) bytes of data. > --- 192.168.233.165 ping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > Maybe that has something to do with the network namespaces configuration on > Ubuntu. Does anyone know what could go wrong or what is missing ? > Thanks for helping. If You are trying to ping Floating IP directly from the host where devstack is installed (Virtualbox VM in Your case IIUC) then You should first have those floating IP addresses somehow reachable on the host, otherwise traffic is probably going through default gateway so is going outside the VM. If You are using ML2/OVN (default in Devstack) or ML2/OVS You probably have in the openvswitch bridge called br-ex which is used to send external network traffic from the OpenStack networks in Devstack. In such case You can e.g. add some IP address from the public network's subnet on the br-ex interface, like 192.168.233.254/24 - that will tell Your OS to reach that subnet through br- ex, so traffic will be able to go "into" the OVS managed by Neutron. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From smooney at redhat.com Sun Jan 9 19:38:34 2022 From: smooney at redhat.com (Sean Mooney) Date: Sun, 09 Jan 2022 19:38:34 +0000 Subject: [kolla] Updating libvirt container images without VM downtime In-Reply-To: <927f86fb-d65e-1145-cbca-f8bb7deef877@planethoster.info> References: <36d9fbdb-a582-a766-2263-823ed5ab4959@planethoster.info> <927f86fb-d65e-1145-cbca-f8bb7deef877@planethoster.info> Message-ID: <80e15d82f0f486a2f5bbcbb5ad230d769ecb0544.camel@redhat.com> On Fri, 2022-01-07 at 16:22 -0500, J-P Methot wrote: > This is interesting, because it's the exact opposite of what I'm seeing > in my test infrastructure. > > > If I do sudo docker restart 5b737dc80fc5 (the libvirt container), I lose > ssh access, web console access and the dashboard's log window becomes > empty. When the libvirt container comes back up, if I go inside the > container and do virsh list, the return list is empty. As far as I can > tell, the VM is effectively shutdown. The openstack dashboard still > reports it as up, but any attempt at operations on the VM will force the > dashboard to update and show the VM as shut off. > > > As far as I can tell, restarting the docker container for libvirt did > kill off my VM here. From what you tell me, this is not the expected > behaviour. So, I must ask, why is it acting differently in my environment? it really should not. the containe shoudl be runing with pid=host so that teh vm are parented to the host pid 1 not the vm and it whould outlive the container the other thing to be aware if os to be carful of the cgroup behavior as stoping the container estrially if you are maging it with sytemd/podman if concifuged incorrectly can kill all procewss in the sam cgroup which can result in the vms being killed. the qemu process shoudl not be in the docker created cgroups. libvirts behavior also changed dependin on if you have systemd-containerd and systemd-manchiend configured and enabled in the host so you can see the vm showdown behviaor if you install systemd-contianerd on the host and restart the vms. this is because libvirt changes form its legacy direct cgroup interface to use systemd to interact with cgroups. libvirt does not have a upgrade mechanium to go form one to the other so since the exsiting vms are not regesited in systemd it shuts them down since it thinks they should not be running. > > Additionally, someone else said that the VMs were running on the host > with the process /usr/libexec/qemu-kvm running the VM. On my compute > host, qemu-kvm is not present in /usr/libexec. I understand that this > could be due to an OS difference, but I thought it would be an important > information to add. > > > This is kolla 12.0 on ubuntu 20.04.3 with the openstack wallaby > container installed. > > > On 1/5/22 4:30 AM, Danny Webb wrote: > > If working properly the restart of the libvirtd container is a > > non-impacting action for running VMs.? The only containers on the > > hypervisors that have an actual impact on the VMs in the standard > > setups are the restart of the ovs-vswitchd / ovn-controller / ovsdb > > containers which result in a small blip of the VM neworks that we've > > noticed. > > ------------------------------------------------------------------------ > > *From:* J-P Methot > > *Sent:* 04 January 2022 19:17 > > *To:* openstack-discuss > > *Subject:* [kolla] Updating libvirt container images without VM downtime > > CAUTION: This email originates from outside THG > > > > Hi, > > > > I'm looking for validation regarding the way Kolla and containers work > > in regard to upgrading the libvirt containers. Essentially, when you > > upgrade the libvirt container to a new container image, the container > > needs to be restarted, thus creating downtime for the VMs. There is no > > way to avoid this downtime, unless you migrate the VMs to another node > > and then move them back once the container has restarted, right? > > > > -- > > Jean-Philippe M?thot > > Senior Openstack system administrator > > Administrateur syst?me Openstack s?nior > > PlanetHoster inc. > > > > > > Danny Webb > > Senior Linux Systems Administrator > > The Hut Group > > > > Tel: > > Email: Danny.Webb at thehutgroup.com > > > > > > For the purposes of this email, the "company" means The Hut Group > > Limited, a company registered in England and Wales (company number > > 6539496) whose registered office is at Fifth Floor, Voyager House, > > Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its > > respective subsidiaries. > > > > *Confidentiality Notice* > > This e-mail is confidential and intended for the use of the named > > recipient only. If you are not the intended recipient please notify us > > by telephone immediately on +44(0)1606 811888 or return it to us by > > e-mail. Please then delete it from your system and note that any use, > > dissemination, forwarding, printing or copying is strictly prohibited. > > Any views or opinions are solely those of the author and do not > > necessarily represent those of the company. > > > > *Encryptions and Viruses* > > Please note that this e-mail and any attachments have not been > > encrypted. They may therefore be liable to be compromised. Please also > > note that it is your responsibility to scan this e-mail and any > > attachments for viruses. We do not, to the extent permitted by law, > > accept any liability (whether in contract, negligence or otherwise) > > for any virus infection and/or external compromise of security and/or > > confidentiality in relation to transmissions sent by e-mail. > > > > *Monitoring* > > Activity and use of the company's systems is monitored to secure its > > effective use and operation and for other lawful business purposes. > > Communications using these systems will also be monitored and may be > > recorded to secure effective use and operation and for other lawful > > business purposes. > > > > hgvyjuv > From tonykarera at gmail.com Mon Jan 10 00:33:25 2022 From: tonykarera at gmail.com (Karera Tony) Date: Mon, 10 Jan 2022 02:33:25 +0200 Subject: Monasca in kolla-ansible Message-ID: Dear Team, I hope all is well. I have deployed Openstack with kolla-ansible and enabled Monasca among the projects. The deployment was successful but unfortunately I can't see Logging in the Dashn=board. Kindly advise If I could be missing something. Regards Tony Karera -------------- next part -------------- An HTML attachment was scrubbed... URL: From munnaeebd at gmail.com Mon Jan 10 05:07:46 2022 From: munnaeebd at gmail.com (Md. Hejbul Tawhid MUNNA) Date: Mon, 10 Jan 2022 11:07:46 +0600 Subject: Huge Fake tap device creating on some Compute node || Open vSwitch Message-ID: Hi, We are facing a critical issue. Something happened in our infrastructure and now huge fake tap devices are creating and deleting continuously for some of our compute node. following logs are generating continuously. all the services/VMs are up and running. Due to this, snmp service is not working properly. after rebooting one compute node the issue resolved but all the compute node we are not unable to restart now. Any idea to resolve the issue without restart the compute node. # tail -f /var/log/openvswitch/ovs-vswitchd.log 2022-01-10T04:59:10.689Z|679747376|bridge|INFO|bridge br-int: added interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:10.721Z|679747377|bridge|INFO|bridge br-int: added interface tap4129f7f6-2e on port 736 2022-01-10T04:59:10.852Z|679747378|bridge|INFO|bridge br-int: deleted interface tap5219a94c-2c on port 736 2022-01-10T04:59:10.854Z|679747379|bridge|INFO|bridge br-int: deleted interface tap4aac6d99-25 on port 678 2022-01-10T04:59:10.882Z|679747380|bridge|INFO|bridge br-int: added interface tap5219a94c-2c on port 736 2022-01-10T04:59:10.909Z|679747381|bridge|INFO|bridge br-int: added interface tap4aac6d99-25 on port 678 2022-01-10T04:59:11.073Z|679747382|bridge|INFO|bridge br-int: deleted interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:11.074Z|679747383|bridge|INFO|bridge br-int: deleted interface tap4129f7f6-2e on port 736 2022-01-10T04:59:11.100Z|679747384|bridge|INFO|bridge br-int: added interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:11.130Z|679747385|bridge|INFO|bridge br-int: added interface tap4129f7f6-2e on port 736 2022-01-10T04:59:11.287Z|679747386|bridge|INFO|bridge br-int: deleted interface tap5219a94c-2c on port 736 2022-01-10T04:59:11.289Z|679747387|bridge|INFO|bridge br-int: deleted interface tap4aac6d99-25 on port 678 2022-01-10T04:59:11.315Z|679747388|bridge|INFO|bridge br-int: added interface tap5219a94c-2c on port 736 2022-01-10T04:59:11.340Z|679747389|bridge|INFO|bridge br-int: added interface tap4aac6d99-25 on port 678 2022-01-10T04:59:11.459Z|679747390|bridge|INFO|bridge br-int: deleted interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:11.461Z|679747391|bridge|INFO|bridge br-int: deleted interface tap4129f7f6-2e on port 736 2022-01-10T04:59:11.494Z|679747392|bridge|INFO|bridge br-int: added interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:11.518Z|679747393|bridge|INFO|bridge br-int: added interface tap4129f7f6-2e on port 736 2022-01-10T04:59:11.674Z|679747394|bridge|INFO|bridge br-int: deleted interface tap5219a94c-2c on port 736 2022-01-10T04:59:11.676Z|679747395|bridge|INFO|bridge br-int: deleted interface tap4aac6d99-25 on port 678 2022-01-10T04:59:11.704Z|679747396|bridge|INFO|bridge br-int: added interface tap5219a94c-2c on port 736 2022-01-10T04:59:11.733Z|679747397|bridge|INFO|bridge br-int: added interface tap4aac6d99-25 on port 678 2022-01-10T04:59:11.861Z|679747398|bridge|INFO|bridge br-int: deleted interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:11.864Z|679747399|bridge|INFO|bridge br-int: deleted interface tap4129f7f6-2e on port 736 2022-01-10T04:59:11.890Z|679747400|bridge|INFO|bridge br-int: added interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:11.916Z|679747401|bridge|INFO|bridge br-int: added interface tap4129f7f6-2e on port 736 2022-01-10T04:59:12.073Z|679747402|bridge|INFO|bridge br-int: deleted interface tap5219a94c-2c on port 736 2022-01-10T04:59:12.074Z|679747403|bridge|INFO|bridge br-int: deleted interface tap4aac6d99-25 on port 678 2022-01-10T04:59:12.100Z|679747404|bridge|INFO|bridge br-int: added interface tap5219a94c-2c on port 736 2022-01-10T04:59:12.125Z|679747405|bridge|INFO|bridge br-int: added interface tap4aac6d99-25 on port 678 2022-01-10T04:59:12.256Z|679747406|bridge|INFO|bridge br-int: deleted interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:12.257Z|679747407|bridge|INFO|bridge br-int: deleted interface tap4129f7f6-2e on port 736 2022-01-10T04:59:12.290Z|679747408|bridge|INFO|bridge br-int: added interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:12.313Z|679747409|bridge|INFO|bridge br-int: added interface tap4129f7f6-2e on port 736 2022-01-10T04:59:12.450Z|679747410|bridge|INFO|bridge br-int: deleted interface tap5219a94c-2c on port 736 2022-01-10T04:59:12.452Z|679747411|bridge|INFO|bridge br-int: deleted interface tap4aac6d99-25 on port 678 2022-01-10T04:59:12.478Z|679747412|bridge|INFO|bridge br-int: added interface tap5219a94c-2c on port 736 2022-01-10T04:59:12.506Z|679747413|bridge|INFO|bridge br-int: added interface tap4aac6d99-25 on port 678 2022-01-10T04:59:12.626Z|679747414|bridge|INFO|bridge br-int: deleted interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:12.628Z|679747415|bridge|INFO|bridge br-int: deleted interface tap4129f7f6-2e on port 736 2022-01-10T04:59:12.660Z|679747416|bridge|INFO|bridge br-int: added interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:12.688Z|679747417|bridge|INFO|bridge br-int: added interface tap4129f7f6-2e on port 736 2022-01-10T04:59:12.827Z|679747418|bridge|INFO|bridge br-int: deleted interface tap5219a94c-2c on port 736 2022-01-10T04:59:12.828Z|679747419|bridge|INFO|bridge br-int: deleted interface tap4aac6d99-25 on port 678 2022-01-10T04:59:12.855Z|679747420|bridge|INFO|bridge br-int: added interface tap5219a94c-2c on port 736 2022-01-10T04:59:12.883Z|679747421|bridge|INFO|bridge br-int: added interface tap4aac6d99-25 on port 678 2022-01-10T04:59:13.035Z|679747422|bridge|INFO|bridge br-int: deleted interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:13.036Z|679747423|bridge|INFO|bridge br-int: deleted interface tap4129f7f6-2e on port 736 2022-01-10T04:59:13.080Z|679747424|bridge|INFO|bridge br-int: added interface tapa1ce84ba-64 on port 678 2022-01-10T04:59:13.109Z|679747425|bridge|INFO|bridge br-int: added interface tap4129f7f6-2e on port 736 Regards, Munna -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.huettner at mail.schwarz Mon Jan 10 08:11:54 2022 From: felix.huettner at mail.schwarz (=?utf-8?B?RmVsaXggSMO8dHRuZXI=?=) Date: Mon, 10 Jan 2022 08:11:54 +0000 Subject: [glance] Does glance not support using local filesystem storage in a cluster In-Reply-To: References: <20220106085612.Horde.FTZSS84MjBp1HTP1Fx3tbzE@webmail.nde.ag> Message-ID: Hi everyone, we are actually using glance with filesystem storage in our clusters. To do this we use an NFS share / k8s PVC as the backing storage for the images on all nodes. However we only go this route since we are using NFS for our cinder storage as well. So it probably only makes sense if you have a shared Filesystem available to your setup anyway. Best Regards Felix -----Original Message----- From: ??? Sent: Thursday, January 6, 2022 10:42 AM To: openstack-discuss Cc: wangleic at uniontech.com; hanguangyu at uniontech.com Subject: Re: [glance] Does glance not support using local filesystem storage in a cluster Hi, Yes, you are right. I have deploied a ceph as the backend storage of glance, cinder and nova. And it can resolve this question. But I wonder why it's designed this way. It doesn't fit my perception of OpenStack. As currently designed, local storage of glance must not be used in the cluster. Why not record the host where the image resides? Just like the local storage of the nova-compute node, if a Glance node breaks down, the image on the host cannot be accessed. Sorry that maybe this idea is unreasonable and stupid. Could anyone tell me the reason or what's the problem with that best wishes to you, love you. Thank you, Han Guangyu Eugen Block ?2022?1?6??? 17:03??? > > Hi, > > if you really aim towards a highly available cluster you'll also need > a ha storage solution like ceph. Having glance images or VMs on local > storage can make it easier to deploy, maybe for testing and getting > involved with openstack, but it's not really recommended for > production use. You'll probably have the same issue with cinder > volumes, I believe. Or do you have a different backend for cinder? > > Regards, > Eugen > > > Zitat von ??? : > > > Deal all, > > > > Sorry that maybe I ask a stupid question. But I'm really confused > > with it and didn't find discuss in glance > > document(https://docs.openstack.org/glance/latest/). > > > > I have a OpenStack Victoria cluster with three all-in-one node in > > centos8. I implemented it with reference to > > https://docs.openstack.org/ha-guide/. So this cluster use Pacemaker, > > HAproxy and Galera. "To implement high availability, run an instance > > of the database on each controller node and use Galera Cluster to > > provide replication between them." > > > > I found that I will encounter an error If I configure Glance backend > > to use local storage driver to store image files on the local disk. > > If I upload a image, this image only will be storaged in one node. > > But the database only storage the file path of image such as > > "/v2/images/aa3cbee0-717f-4699-8cca-61243302d693/file", don't have > > the host information. The database data is same in three node. > > > > If I upload a image in node1, image only is storaged in node1. The > > database of three node stores the local filesystem path of image. > > And If The create Instance task is assigned to node2, It will find > > image in node2, but image can't be found in node2. So we get the > > "Image has no associated data" error. > > > > So I want to ask: > > 1. Wheter glance does not support using local filesystem storage in > > a cluster? > > 2. If 1 was right, why do we do this design instead of storing > > information about the host on which images is located, as nova does > > with instance. > > > > I would appreciate any kind of guidance or help. > > > > Thank you, > > Han Guangyu > > > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. From mkopec at redhat.com Mon Jan 10 08:19:56 2022 From: mkopec at redhat.com (Martin Kopec) Date: Mon, 10 Jan 2022 09:19:56 +0100 Subject: [heat][interop] StackBuildErrorException on a master job Message-ID: Hi there, our interop master job (jobs for older OpenStack releases are fine) is constantly failing on 2 heat tests: heat_tempest_plugin.tests.functional.test_software_config.ParallelDeploymentsTest.test_deployments_metadata heat_tempest_plugin.tests.functional.test_software_config.ParallelDeploymentsTest.test_deployments_timeout_failed Full log at: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c51/819918/1/check/refstack-client-devstack-master/c51816e/job-output.txt Patch where you can see all the jobs - master one failing and the rest passing: https://review.opendev.org/c/openinfra/refstack-client/+/819918 We can't figure out what's going on there. Maybe additional configuration of the deployment is required? New requirements? Any ideas what the job is missing? Thank you, -- Martin Kopec Senior Software Quality Engineer Red Hat EMEA -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokendrarathour at gmail.com Mon Jan 10 09:22:25 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Mon, 10 Jan 2022 14:52:25 +0530 Subject: [Triple0] Message-ID: Hi Team, I was trying to deploy Triple0 Train using containers. After around 30+ min of overcloud running it gett below error: The error was: "keystoneauth1.exceptions.http.Unauthorized: The password is expired and needs to be changed for user: 765d7c43b3d54b289c9fa9e1dae15112. (HTTP 401) (Request-ID: req-f1344091-4f81-44c2-8ec6-e2daffbf998c) 2022-01-09 23:10:39.354147 | 525400bb-6dc5-c157-232d-000000003de1 | FATAL | As" I tried checking the Horizon, where is see that it is asking me to change the password immediately with the same message and a similar observation is seen with overcloud file. Please advice why this deployment is getting failed -- ~ Lokendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at stackhpc.com Mon Jan 10 10:02:11 2022 From: doug at stackhpc.com (Doug Szumski) Date: Mon, 10 Jan 2022 10:02:11 +0000 Subject: Monasca in kolla-ansible In-Reply-To: References: Message-ID: <79bee35d-a4d3-8f49-d683-75edf838e64a@stackhpc.com> On 10/01/2022 00:33, Karera Tony wrote: > Dear Team, Hi Tony > > I hope all?is well. > > I have deployed Openstack with kolla-ansible and enabled Monasca among > the projects. > > The deployment was successful but unfortunately I can't see Logging in > the Dashn=board. Logs are searchable via Kibana, please see: https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/central-logging-guide.html#kibana > > Kindly advise If I could be missing something. > > Regards > > > Tony Karera > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Mon Jan 10 12:27:36 2022 From: balazs.gibizer at est.tech (Balazs Gibizer) Date: Mon, 10 Jan 2022 13:27:36 +0100 Subject: [nova][requirements] fasteners===0.16.3 held back by nova In-Reply-To: References: <20220106023338.4yaudjnvmmysvp5d@mthode.org> <3b644658-f7ac-9188-8d2c-4d109dcaffd7@nemebean.com> <20220106164113.iwf7l6v7cyj6tmr2@mthode.org> Message-ID: <0MUH5R.M9W3OWKRRFL81@est.tech> On Thu, Jan 6 2022 at 09:53:06 AM -0800, melanie witt wrote: > On Thu Jan 06 2022 08:41:13 GMT-0800 (Pacific Standard Time), Matthew > Thode wrote: >> Subject: >> Re: [nova][requirements] fasteners===0.16.3 held back by nova >> From: >> Matthew Thode >> Date: >> 1/6/22, 08:41 >> >> To: >> Ben Nemec >> CC: >> openstack-discuss at lists.openstack.org >> >> >> That seems right, I seem to remember eventlet in there. >> >> -- Matthew Thode On 22-01-06 10:17:56, Ben Nemec wrote: >>> Is it still this:https://github.com/harlowja/fasteners/issues/36 ? >>> >>> I did some investigation about a year ago with one of the fasteners >>> maintainers, but we never really came up with a definite answer as >>> to what >>> is going on. :-/ >>> >>> On 1/5/22 20:33, Matthew Thode wrote: >>>> This one is simple, and iirc is blocked on upstream fixing >>>> something >>>> (but cannot find the reference). >>>> >>>> fasteners===0.16.3 >>>> >>>> https://review.opendev.org/823470 >>>> and >>>> https://review.opendev.org/804246 >>>> both test this change. > > I worked on the fasteners thing for Too Long of a Time last October > and found what is happening. It is indeed the same issue from 2019 > [1] and I explain the problem (it's long) in a new eventlet issue I > opened [2]. > > I proposed a patch to "fix" the problem in nova [3], it was initially > nacked because it has to do with eventlet, but it's the simplest, > smallest change IMHO that will address the issue. It was me who originally nacked PS1. I still think that PS1 is a hack, but after the many weeks of investigation Melanie did and the discussions in the github issues I have to accept that we have no better option in our hands at the moment. So I'm +2 on [3]. Cheers, gibi > > I also went on a wild goose chase trying to change all our spawn_n() > calls with spawn() in PS2 and PS3 but it ended in a dead end. There > are comments detailing that attempt in the review if anyone is > curious. > > So, based on that dead end and seeing this come up on the ML, I have > reverted [3] to PS1 if anyone can review and give feedback on what > approach they would prefer if they think the current approach is not > suitable. > > Note: the reason we pull in fasteners is through oslo.concurrency, > the lockutils use it. > > Cheers, > -melanie > > [1] https://github.com/harlowja/fasteners/issues/36 > [2] https://github.com/eventlet/eventlet/issues/731 > [3] https://review.opendev.org/c/openstack/nova/+/813114 > > From tonykarera at gmail.com Mon Jan 10 12:53:34 2022 From: tonykarera at gmail.com (Karera Tony) Date: Mon, 10 Jan 2022 14:53:34 +0200 Subject: Monasca in kolla-ansible In-Reply-To: <79bee35d-a4d3-8f49-d683-75edf838e64a@stackhpc.com> References: <79bee35d-a4d3-8f49-d683-75edf838e64a@stackhpc.com> Message-ID: Hello Doug, I understand but I am interested in Monitoring the Services from the Dashboard Regards Tony Karera On Mon, Jan 10, 2022 at 12:02 PM Doug Szumski wrote: > > On 10/01/2022 00:33, Karera Tony wrote: > > Dear Team, > > Hi Tony > > > I hope all is well. > > I have deployed Openstack with kolla-ansible and enabled Monasca among the > projects. > > The deployment was successful but unfortunately I can't see Logging in the > Dashn=board. > > Logs are searchable via Kibana, please see: > > https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/central-logging-guide.html#kibana > > > Kindly advise If I could be missing something. > > Regards > > > Tony Karera > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Mon Jan 10 13:41:29 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 10 Jan 2022 13:41:29 +0000 Subject: [security-sig][kolla] Log4j vulnerabilities and OpenStack In-Reply-To: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> Message-ID: <20220110134129.2arffdy63nkacxgy@yuggoth.org> On 2022-01-03 16:02:14 +0000 (+0000), Jeremy Stanley wrote: [...] > Is anyone aware of other, similar situations where OpenStack is > commonly installed alongside Java software using Log4j in > vulnerable ways? It came to my attention a few moments ago that Kolla installs Elasticsearch[*]. Is there any particular guidance we should be giving Kolla users about mitigating the recent Log4j vulnerabilities in light of this? [*] https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/central-logging-guide.html -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From radoslaw.piliszek at gmail.com Mon Jan 10 13:47:53 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 10 Jan 2022 14:47:53 +0100 Subject: [security-sig][kolla] Log4j vulnerabilities and OpenStack In-Reply-To: <20220110134129.2arffdy63nkacxgy@yuggoth.org> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220110134129.2arffdy63nkacxgy@yuggoth.org> Message-ID: On Mon, 10 Jan 2022 at 14:42, Jeremy Stanley wrote: > > On 2022-01-03 16:02:14 +0000 (+0000), Jeremy Stanley wrote: > [...] > > Is anyone aware of other, similar situations where OpenStack is > > commonly installed alongside Java software using Log4j in > > vulnerable ways? > > It came to my attention a few moments ago that Kolla installs > Elasticsearch[*]. Is there any particular guidance we should be > giving Kolla users about mitigating the recent Log4j vulnerabilities > in light of this? Yes, we have already patched the command line [1] so the guidance is to make sure to run the latest and greatest. It would make sense to broadcast this so that users know that log4j is in Elasticsearch. In Kolla, ES is used either standalone or with Monasca (and soon Venus). [1] https://review.opendev.org/c/openstack/kolla-ansible/+/821860 -yoctozepto > [*] https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/central-logging-guide.html > > -- > Jeremy Stanley From fungi at yuggoth.org Mon Jan 10 13:57:26 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 10 Jan 2022 13:57:26 +0000 Subject: [security-sig][kolla] Log4j vulnerabilities and OpenStack In-Reply-To: References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220110134129.2arffdy63nkacxgy@yuggoth.org> Message-ID: <20220110135726.kwkwr3wyheg2mzxo@yuggoth.org> On 2022-01-10 14:47:53 +0100 (+0100), Rados?aw Piliszek wrote: [...] > Yes, we have already patched the command line [1] so the guidance > is to make sure to run the latest and greatest. It would make > sense to broadcast this so that users know that log4j is in > Elasticsearch. In Kolla, ES is used either standalone or with > Monasca (and soon Venus). > > [1] https://review.opendev.org/c/openstack/kolla-ansible/+/821860 [...] Is the presence/absence of Elasticsearch determined by configuration options, or is it always installed and run when Kolla is used? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From radoslaw.piliszek at gmail.com Mon Jan 10 14:44:41 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 10 Jan 2022 15:44:41 +0100 Subject: [security-sig][kolla] Log4j vulnerabilities and OpenStack In-Reply-To: <20220110135726.kwkwr3wyheg2mzxo@yuggoth.org> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220110134129.2arffdy63nkacxgy@yuggoth.org> <20220110135726.kwkwr3wyheg2mzxo@yuggoth.org> Message-ID: On Mon, 10 Jan 2022 at 14:58, Jeremy Stanley wrote: > > On 2022-01-10 14:47:53 +0100 (+0100), Rados?aw Piliszek wrote: > [...] > > Yes, we have already patched the command line [1] so the guidance > > is to make sure to run the latest and greatest. It would make > > sense to broadcast this so that users know that log4j is in > > Elasticsearch. In Kolla, ES is used either standalone or with > > Monasca (and soon Venus). > > > > [1] https://review.opendev.org/c/openstack/kolla-ansible/+/821860 > [...] > > Is the presence/absence of Elasticsearch determined by configuration > options, or is it always installed and run when Kolla is used? Determined by configuration. It is not present by default - only if installed on demand, by enabling central logging, Monasca or some other dependent component. -yoctozepto > -- > Jeremy Stanley From fungi at yuggoth.org Mon Jan 10 16:27:09 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 10 Jan 2022 16:27:09 +0000 Subject: [security-sig][kolla] Log4j vulnerabilities and OpenStack In-Reply-To: References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220110134129.2arffdy63nkacxgy@yuggoth.org> <20220110135726.kwkwr3wyheg2mzxo@yuggoth.org> Message-ID: <20220110162709.aovhjf7pjhkkmugv@yuggoth.org> On 2022-01-10 15:44:41 +0100 (+0100), Rados?aw Piliszek wrote: > On Mon, 10 Jan 2022 at 14:58, Jeremy Stanley wrote: > > > > On 2022-01-10 14:47:53 +0100 (+0100), Rados?aw Piliszek wrote: > > [...] > > > Yes, we have already patched the command line [1] so the guidance > > > is to make sure to run the latest and greatest. It would make > > > sense to broadcast this so that users know that log4j is in > > > Elasticsearch. In Kolla, ES is used either standalone or with > > > Monasca (and soon Venus). > > > > > > [1] https://review.opendev.org/c/openstack/kolla-ansible/+/821860 > > [...] > > > > Is the presence/absence of Elasticsearch determined by configuration > > options, or is it always installed and run when Kolla is used? > > Determined by configuration. It is not present by default - only if > installed on demand, by enabling central logging, Monasca or some > other dependent component. Thanks for the details, and apologies for all the sudden questions. Is there a list of which components (aside from the aforementioned central logging and Monasca) which pull Elasticsearch into the deployment? Also, does Kolla build/distribute its own Elasticsearch images or reuse some maintained by an outside party? And what version(s) of Elasticsearch and Log4j end up installed? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From geguileo at redhat.com Mon Jan 10 16:47:06 2022 From: geguileo at redhat.com (Gorka Eguileor) Date: Mon, 10 Jan 2022 17:47:06 +0100 Subject: [Deploy] Can three all-in-one nodes form a OpenStack cluster? In-Reply-To: References: Message-ID: <20220110164706.bnlwteyxjfbkvhi2@localhost> On 29/12, ??? wrote: > Dear Radoslaw, > > Thank you for your reply. > > I'm manually building OpenStack Victoria in centos8 for learning > cluster and high-availability. I refer to > https://docs.openstack.org/ha-guide/ and some blogs. > > If you have some advices or you know some good document, you can tell > me. I will very glad. > > Sincerely, > Han Guangyu > Hi, If you are running on centos you may be interested in RDO's packstack: https://www.rdoproject.org/install/packstack/ Cheers, Gorka. > Rados?aw Piliszek ?2021?12?29??? 21:34??? > > > > Dear Han, > > > > This is possible and might be sensible depending on your use case. > > > > What deployment method are you using? > > > > Kind regards, > > -yoctozepto > > > > On Wed, 29 Dec 2021 at 10:32, ??? wrote: > > > > > > Dear all, > > > > > > I am deploing OpenStack Victoria. Can I make a cluster with three > > > all-in-one nodes? > > > > > > Will this have any problems or points to be aware of? > > > > > > Sorry to bother, Thank you very much. > > > > > > Han Guangyu. > > > > From massimo.sgaravatto at gmail.com Mon Jan 10 17:00:29 2022 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Mon, 10 Jan 2022 18:00:29 +0100 Subject: [ops][nova] How to update from N to N+3 (and manage nova services that don't start because they find too old compute nodes...) Message-ID: Dear all When we upgraded our Cloud from Rocky to Train we followed the following procedure: 1) Shutdown of all services on the controller and compute nodes 2) Update from Rocky to Stein of controller (just to do the dbsyncs) 3) Update from Stein to Train of controller 4) Update from Rocky to Train of compute nodes We are trying to do the same to update from Train to Xena, but now there is a problem because nova services on the controller node refuse to start since they find too old compute nodes (this is indeed a new feature, properly documented in the release notes). As a workaround we had to manually modify the "version" field of the compute nodes in the nova.services table. Is it ok, or is there a cleaner way to manage the issue ? Thanks, Massimo -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Mon Jan 10 17:02:38 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 10 Jan 2022 18:02:38 +0100 Subject: [security-sig][kolla] Log4j vulnerabilities and OpenStack In-Reply-To: <20220110162709.aovhjf7pjhkkmugv@yuggoth.org> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220110134129.2arffdy63nkacxgy@yuggoth.org> <20220110135726.kwkwr3wyheg2mzxo@yuggoth.org> <20220110162709.aovhjf7pjhkkmugv@yuggoth.org> Message-ID: On Mon, 10 Jan 2022 at 17:28, Jeremy Stanley wrote: > > On 2022-01-10 15:44:41 +0100 (+0100), Rados?aw Piliszek wrote: > > On Mon, 10 Jan 2022 at 14:58, Jeremy Stanley wrote: > > > > > > On 2022-01-10 14:47:53 +0100 (+0100), Rados?aw Piliszek wrote: > > > [...] > > > > Yes, we have already patched the command line [1] so the guidance > > > > is to make sure to run the latest and greatest. It would make > > > > sense to broadcast this so that users know that log4j is in > > > > Elasticsearch. In Kolla, ES is used either standalone or with > > > > Monasca (and soon Venus). > > > > > > > > [1] https://review.opendev.org/c/openstack/kolla-ansible/+/821860 > > > [...] > > > > > > Is the presence/absence of Elasticsearch determined by configuration > > > options, or is it always installed and run when Kolla is used? > > > > Determined by configuration. It is not present by default - only if > > installed on demand, by enabling central logging, Monasca or some > > other dependent component. > > Thanks for the details, and apologies for all the sudden questions. > Is there a list of which components (aside from the aforementioned > central logging and Monasca) which pull Elasticsearch into the > deployment? Yeah, I just omitted them as they are less frequently used: - osprofiler - skydive - cloudkitty configured to use elasticsearch backend > Also, does Kolla build/distribute its own Elasticsearch images or > reuse some maintained by an outside party? And what version(s) of > Elasticsearch and Log4j end up installed? Kolla builds its own images using upstream (ES) rpm/deb packages. We end up with the latest 7.13.x as that's the last truly OSS release of ES. I know it has vulnerable log4j but can't tell the version atm. Let me know if it's crucial. -yoctozepto > -- > Jeremy Stanley From pierre at stackhpc.com Mon Jan 10 17:10:19 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 10 Jan 2022 18:10:19 +0100 Subject: [security-sig][kolla] Log4j vulnerabilities and OpenStack In-Reply-To: References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220110134129.2arffdy63nkacxgy@yuggoth.org> <20220110135726.kwkwr3wyheg2mzxo@yuggoth.org> <20220110162709.aovhjf7pjhkkmugv@yuggoth.org> Message-ID: On Mon, 10 Jan 2022 at 18:05, Rados?aw Piliszek wrote: > > On Mon, 10 Jan 2022 at 17:28, Jeremy Stanley wrote: > > > > On 2022-01-10 15:44:41 +0100 (+0100), Rados?aw Piliszek wrote: > > > On Mon, 10 Jan 2022 at 14:58, Jeremy Stanley wrote: > > > > > > > > On 2022-01-10 14:47:53 +0100 (+0100), Rados?aw Piliszek wrote: > > > > [...] > > > > > Yes, we have already patched the command line [1] so the guidance > > > > > is to make sure to run the latest and greatest. It would make > > > > > sense to broadcast this so that users know that log4j is in > > > > > Elasticsearch. In Kolla, ES is used either standalone or with > > > > > Monasca (and soon Venus). > > > > > > > > > > [1] https://review.opendev.org/c/openstack/kolla-ansible/+/821860 > > > > [...] > > > > > > > > Is the presence/absence of Elasticsearch determined by configuration > > > > options, or is it always installed and run when Kolla is used? > > > > > > Determined by configuration. It is not present by default - only if > > > installed on demand, by enabling central logging, Monasca or some > > > other dependent component. > > > > Thanks for the details, and apologies for all the sudden questions. > > Is there a list of which components (aside from the aforementioned > > central logging and Monasca) which pull Elasticsearch into the > > deployment? > > Yeah, I just omitted them as they are less frequently used: > - osprofiler > - skydive > - cloudkitty configured to use elasticsearch backend > > > Also, does Kolla build/distribute its own Elasticsearch images or > > reuse some maintained by an outside party? And what version(s) of > > Elasticsearch and Log4j end up installed? > > Kolla builds its own images using upstream (ES) rpm/deb packages. We > end up with the latest 7.13.x as that's the last truly OSS release of > ES. > I know it has vulnerable log4j but can't tell the version atm. Let me > know if it's crucial. For CentOS images, this is bundled into elasticsearch-oss-7.10.2-1.x86_64: /usr/share/elasticsearch/lib/log4j-api-2.11.1.jar /usr/share/elasticsearch/lib/log4j-core-2.11.1.jar Note that according to Elastic, this version is not vulnerable thanks to the use of the Java Security Manager. From fungi at yuggoth.org Mon Jan 10 17:15:23 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 10 Jan 2022 17:15:23 +0000 Subject: [security-sig][kolla] Log4j vulnerabilities and OpenStack In-Reply-To: References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220110134129.2arffdy63nkacxgy@yuggoth.org> <20220110135726.kwkwr3wyheg2mzxo@yuggoth.org> <20220110162709.aovhjf7pjhkkmugv@yuggoth.org> Message-ID: <20220110171523.tg6nh35b2imhie45@yuggoth.org> On 2022-01-10 18:10:19 +0100 (+0100), Pierre Riteau wrote: [...] > For CentOS images, this is bundled into elasticsearch-oss-7.10.2-1.x86_64: > > /usr/share/elasticsearch/lib/log4j-api-2.11.1.jar > /usr/share/elasticsearch/lib/log4j-core-2.11.1.jar > > Note that according to Elastic, this version is not vulnerable thanks > to the use of the Java Security Manager. Thanks! Was there a public statement from Elastic to that effect, so that we can point users at it if they have questions? At this point a lot of enterprises are ripping out or shutting down anything which can't be upgraded to Log4j 2.17.1, due in part to the mixed messages about which older versions are actually impacted and which workarounds can mitigate it. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From radoslaw.piliszek at gmail.com Mon Jan 10 17:18:00 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 10 Jan 2022 18:18:00 +0100 Subject: [security-sig][kolla] Log4j vulnerabilities and OpenStack In-Reply-To: <20220110171523.tg6nh35b2imhie45@yuggoth.org> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220110134129.2arffdy63nkacxgy@yuggoth.org> <20220110135726.kwkwr3wyheg2mzxo@yuggoth.org> <20220110162709.aovhjf7pjhkkmugv@yuggoth.org> <20220110171523.tg6nh35b2imhie45@yuggoth.org> Message-ID: On Mon, 10 Jan 2022 at 18:15, Jeremy Stanley wrote: > > On 2022-01-10 18:10:19 +0100 (+0100), Pierre Riteau wrote: > [...] > > For CentOS images, this is bundled into elasticsearch-oss-7.10.2-1.x86_64: > > > > /usr/share/elasticsearch/lib/log4j-api-2.11.1.jar > > /usr/share/elasticsearch/lib/log4j-core-2.11.1.jar > > > > Note that according to Elastic, this version is not vulnerable thanks > > to the use of the Java Security Manager. > > Thanks! Was there a public statement from Elastic to that effect, so > that we can point users at it if they have questions? https://discuss.elastic.co/t/apache-log4j2-remote-code-execution-rce-vulnerability-cve-2021-44228-esa-2021-31/291476 -yoctozepto > At this point a lot of enterprises are ripping out or shutting down > anything which can't be upgraded to Log4j 2.17.1, due in part to the > mixed messages about which older versions are actually impacted and > which workarounds can mitigate it. > -- > Jeremy Stanley From pierre at stackhpc.com Mon Jan 10 17:19:46 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 10 Jan 2022 18:19:46 +0100 Subject: [security-sig][kolla] Log4j vulnerabilities and OpenStack In-Reply-To: <20220110171523.tg6nh35b2imhie45@yuggoth.org> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220110134129.2arffdy63nkacxgy@yuggoth.org> <20220110135726.kwkwr3wyheg2mzxo@yuggoth.org> <20220110162709.aovhjf7pjhkkmugv@yuggoth.org> <20220110171523.tg6nh35b2imhie45@yuggoth.org> Message-ID: On Mon, 10 Jan 2022 at 18:18, Jeremy Stanley wrote: > > On 2022-01-10 18:10:19 +0100 (+0100), Pierre Riteau wrote: > [...] > > For CentOS images, this is bundled into elasticsearch-oss-7.10.2-1.x86_64: > > > > /usr/share/elasticsearch/lib/log4j-api-2.11.1.jar > > /usr/share/elasticsearch/lib/log4j-core-2.11.1.jar > > > > Note that according to Elastic, this version is not vulnerable thanks > > to the use of the Java Security Manager. > > Thanks! Was there a public statement from Elastic to that effect, so > that we can point users at it if they have questions? > > At this point a lot of enterprises are ripping out or shutting down > anything which can't be upgraded to Log4j 2.17.1, due in part to the > mixed messages about which older versions are actually impacted and > which workarounds can mitigate it. > -- > Jeremy Stanley https://discuss.elastic.co/t/apache-log4j2-remote-code-execution-rce-vulnerability-cve-2021-44228-esa-2021-31/291476#elasticsearch-9 From dms at danplanet.com Mon Jan 10 17:38:38 2022 From: dms at danplanet.com (Dan Smith) Date: Mon, 10 Jan 2022 09:38:38 -0800 Subject: [ops][nova] How to update from N to N+3 (and manage nova services that don't start because they find too old compute nodes...) In-Reply-To: (Massimo Sgaravatto's message of "Mon, 10 Jan 2022 18:00:29 +0100") References: Message-ID: > We are trying to do the same to update from Train to Xena, but now > there is a problem because nova services on the controller node refuse > to start since they find too old compute nodes (this is indeed a new > feature, properly documented in the release notes). As a workaround > we had to manually modify the "version" field of the compute nodes in > the nova.services table. > > Is it ok, or is there a cleaner way to manage the issue ? I think this is an unintended consequence of the new check. Can you file a bug against nova and report the number here? We probably need to do something here... Thanks! --Dan From smooney at redhat.com Mon Jan 10 17:58:22 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 10 Jan 2022 17:58:22 +0000 Subject: [ops][nova] How to update from N to N+3 (and manage nova services that don't start because they find too old compute nodes...) In-Reply-To: References: Message-ID: <9345ea727eaa9c3e4f17d2d106ca7cca799dacf2.camel@redhat.com> On Mon, 2022-01-10 at 18:00 +0100, Massimo Sgaravatto wrote: > Dear all > > When we upgraded our Cloud from Rocky to Train we followed the following > procedure: > > 1) Shutdown of all services on the controller and compute nodes > 2) Update from Rocky to Stein of controller (just to do the dbsyncs) > 3) Update from Stein to Train of controller > 4) Update from Rocky to Train of compute nodes > > We are trying to do the same to update from Train to Xena, but now there is > a problem because > nova services on the controller node refuse to start since they find too > old compute nodes (this is indeed a new feature, properly documented in the > release notes). > As a workaround we had to manually modify the "version" field of the > compute nodes in the nova.services table. > > Is it ok, or is there a cleaner way to manage the issue ? the check is mainly implemeented by https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/utils.py#L1052-L1099 i belive the intent was this shoudl only be an issue if the service report as up so you should be able to do the following. 1 stop nova-compute on all nodes 2 wait for compute service to be down then stop contolers. 3 upgrade contoler directly to xena skiping all intermediary releases. (the db sysncs have never needed to be done every release we keep the migration for many releases. there also are no db change between train and wallaby and i dont think there are any in xena either) 4 upgrade the nova-compute on all compute nodes. looking at the code however i dont think we ar checking the status of the services at all so it is an absolute check. as a result you can nolonger do FFU which im surpised no on has complained about before. this was implemented by https://github.com/openstack/nova/commit/aa7c6f87699ec1340bd446a7d47e1453847a637f in wallaby just to be clear we have never actully support having active nova service wherne the version mix is greate then n+1 we just started enforceing that in wallaby > > Thanks, Massimo From massimo.sgaravatto at gmail.com Mon Jan 10 18:19:42 2022 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Mon, 10 Jan 2022 19:19:42 +0100 Subject: [ops][nova] How to update from N to N+3 (and manage nova services that don't start because they find too old compute nodes...) In-Reply-To: <9345ea727eaa9c3e4f17d2d106ca7cca799dacf2.camel@redhat.com> References: <9345ea727eaa9c3e4f17d2d106ca7cca799dacf2.camel@redhat.com> Message-ID: Good to know that it is not necessary for nova to go through ALL intermediate releases and perform db-sync The question is if this is true for ALL openstack services (in our deployment the controller node is used for all services and not only for nova) Thanks, Massimo On Mon, Jan 10, 2022 at 7:03 PM Sean Mooney wrote: > On Mon, 2022-01-10 at 18:00 +0100, Massimo Sgaravatto wrote: > > Dear all > > > > When we upgraded our Cloud from Rocky to Train we followed the following > > procedure: > > > > 1) Shutdown of all services on the controller and compute nodes > > 2) Update from Rocky to Stein of controller (just to do the dbsyncs) > > 3) Update from Stein to Train of controller > > 4) Update from Rocky to Train of compute nodes > > > > We are trying to do the same to update from Train to Xena, but now there > is > > a problem because > > nova services on the controller node refuse to start since they find too > > old compute nodes (this is indeed a new feature, properly documented in > the > > release notes). > > As a workaround we had to manually modify the "version" field of the > > compute nodes in the nova.services table. > > > > Is it ok, or is there a cleaner way to manage the issue ? > the check is mainly implemeented by > > https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/utils.py#L1052-L1099 > > i belive the intent was this shoudl only be an issue if the service report > as up > > so you should be able to do the following. > 1 stop nova-compute on all nodes > 2 wait for compute service to be down then stop contolers. > 3 upgrade contoler directly to xena skiping all intermediary releases. > (the db sysncs have never needed to be done every release we keep the > migration for many releases. > there also are no db change between train and wallaby and i dont think > there are any in xena either) > 4 upgrade the nova-compute on all compute nodes. > > looking at the code however i dont think we ar checking the status of the > services at all so it is an absolute check. > > as a result you can nolonger do FFU which im surpised no on has complained > about before. > > this was implemented by > https://github.com/openstack/nova/commit/aa7c6f87699ec1340bd446a7d47e1453847a637f > in wallaby > > just to be clear we have never actully support having active nova service > wherne the version mix is greate then n+1 > we just started enforceing that in wallaby > > > > > > Thanks, Massimo > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dms at danplanet.com Mon Jan 10 18:50:49 2022 From: dms at danplanet.com (Dan Smith) Date: Mon, 10 Jan 2022 10:50:49 -0800 Subject: [ops][nova] How to update from N to N+3 (and manage nova services that don't start because they find too old compute nodes...) In-Reply-To: (Massimo Sgaravatto's message of "Mon, 10 Jan 2022 19:19:42 +0100") References: <9345ea727eaa9c3e4f17d2d106ca7cca799dacf2.camel@redhat.com> Message-ID: > Good to know that it is not necessary for nova to go through ALL > intermediate releases and perform db-sync The question is if this is > true for ALL openstack services (in our deployment the controller node > is used for all services and not only for nova) Actually, Sean is wrong here - we do expect you to go through each release on the controller, it's just that it's rare that it's actually a problem. We have had blocker migrations at times in the past where we have had to ensure that data is migrated before changing or dropping items of schema. We also recently did a schema compaction, which wouldn't tolerate moving across the releases without the (correct) intermediate step. We definitely should fix the problem related to compute records being old and causing the controllers to start. However, at the moment, you should still assume that each intermediate release needs to be db-sync'd unless you've tested that a particular source and target release works. I expect the same requirement for most other projects. --Dan From smooney at redhat.com Mon Jan 10 19:45:46 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 10 Jan 2022 19:45:46 +0000 Subject: [ops][nova] How to update from N to N+3 (and manage nova services that don't start because they find too old compute nodes...) In-Reply-To: References: <9345ea727eaa9c3e4f17d2d106ca7cca799dacf2.camel@redhat.com> Message-ID: <8b4d735370f19b542f5d3cfab3faf21162fd77a4.camel@redhat.com> On Mon, 2022-01-10 at 10:50 -0800, Dan Smith wrote: > > Good to know that it is not necessary for nova to go through ALL > > intermediate releases and perform db-sync The question is if this is > > true for ALL openstack services (in our deployment the controller node > > is used for all services and not only for nova) > > Actually, Sean is wrong here - we do expect you to go through each > release on the controller, it's just that it's rare that it's actually a > problem. We have had blocker migrations at times in the past where we > have had to ensure that data is migrated before changing or dropping > items of schema. We also recently did a schema compaction, which > wouldn't tolerate moving across the releases without the (correct) > intermediate step. dan is correct. you should run each on the contoler back to back. between train and wallaby specifical and we are in a special case where we just happen to not change the db in those releases. xena we started doing db compation yes and moving to alembic instead of sqlachmy-migrate. from a cli point of view that is transparent at the nova manage level but it is still best to do it each release on the contoler to ensure that tanstion happens corretly. > > We definitely should fix the problem related to compute records being > old and causing the controllers to start. However, at the moment, you > should still assume that each intermediate release needs to be db-sync'd > unless you've tested that a particular source and target release > works. I expect the same requirement for most other projects. we have not tested skiping them on the contolers but i bevile in this case it woudl work ok to go directly from train to the wallaby code base and do the db sync. train to xena may not work. if the start and end version were different there is not guarentted that it woudl work due to the? blocker migration, online migrations and eventual drop of migration code that dan mentioned. but ya unless you have tested it better to assume you cant skip. > > --Dan > From geguileo at redhat.com Mon Jan 10 20:05:57 2022 From: geguileo at redhat.com (Gorka Eguileor) Date: Mon, 10 Jan 2022 21:05:57 +0100 Subject: [ops][nova] How to update from N to N+3 (and manage nova services that don't start because they find too old compute nodes...) In-Reply-To: References: <9345ea727eaa9c3e4f17d2d106ca7cca799dacf2.camel@redhat.com> Message-ID: <20220110200557.huixbepgapioqjo6@localhost> On 10/01, Dan Smith wrote: > > Good to know that it is not necessary for nova to go through ALL > > intermediate releases and perform db-sync The question is if this is > > true for ALL openstack services (in our deployment the controller node > > is used for all services and not only for nova) > > Actually, Sean is wrong here - we do expect you to go through each > release on the controller, it's just that it's rare that it's actually a > problem. We have had blocker migrations at times in the past where we > have had to ensure that data is migrated before changing or dropping > items of schema. We also recently did a schema compaction, which > wouldn't tolerate moving across the releases without the (correct) > intermediate step. > > We definitely should fix the problem related to compute records being > old and causing the controllers to start. However, at the moment, you > should still assume that each intermediate release needs to be db-sync'd > unless you've tested that a particular source and target release > works. I expect the same requirement for most other projects. > > --Dan > Hi, Unrelated to this Nova issue, but related to why intermediate releases cannot be skipped in OpenStack, the Cinder project requires that the db sync and the online data migrations are run on each intermediate release. You may be lucky and everything may run fine, but it might as well blow up in your face and lose database data. Cheers, Gorka. From gouthampravi at gmail.com Mon Jan 10 20:37:56 2022 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Mon, 10 Jan 2022 12:37:56 -0800 Subject: [outreachy] Stepping down as Outreachy co-organizer In-Reply-To: References: Message-ID: On Mon, Jan 3, 2022 at 9:58 AM Samuel de Medeiros Queiroz wrote: > > Hi all, > > Outreachy is a wonderful program that promotes diversity in open source communities by giving opportunities to people in underrepresented groups. > > This was a hard decision to make, but I have not been committing the time this project deserves. > For that reason, I would like to give visibility that I am stepping down as an Outreachy organizer. > > It was a great honor to serve as co-organizer since late 2018, and we had 19 internships since then. > I also had the pleasure to serve twice (2016 and 2017) as a mentor. Wow - and these internships have had a tremendous impact over these years! Thank you so much for your service, Samuel! > > Mahati, it was a great pleasure co-organizing Outreachy in this community with you. > > Thanks! > Samuel Queiroz From lucasagomes at gmail.com Mon Jan 10 21:17:55 2022 From: lucasagomes at gmail.com (Lucas Alvares Gomes) Date: Mon, 10 Jan 2022 18:17:55 -0300 Subject: [neutron] Bug Deputy Report January 03 - 10 Message-ID: Hi, This is the Neutron bug report from January 3rd to 10th. Critical: * https://bugs.launchpad.net/neutron/+bug/1956344 - "Functional test test_gateway_chassis_rebalance is failing intermittently" - Assigned to: Rodolfo Alonso High: * https://bugs.launchpad.net/neutron/+bug/1956034 - "ovn load balancer health monitor cause mac address conflict" - Unassigned * https://bugs.launchpad.net/neutron/+bug/1956745 - " [ovn-octavia-provider] Load Balancer remained with ACTIVE state even with PENDING_UPDATE listener" - Assigned to: Luis Tomas Bolivar * https://bugs.launchpad.net/neutron/+bug/1956763 - "[OVN] Overlapping configuration for security group logging is not applied correctly" - Assigned to: Elvira Garcia Ruiz Medium: * https://bugs.launchpad.net/neutron/+bug/1956035 - "ovn load balancer member failover not working when accessed from floating ip" - Unassigned * https://bugs.launchpad.net/neutron/+bug/1956632 - "DNS integration documentation and scenario tests need update after dns_assignment calculation was changed" - Assigned to: Miguel Lavalle * https://bugs.launchpad.net/neutron/+bug/1956785 - "Duplicate validator TypeError for dict values Edit" - Assigned to: Harald Jens?s Low: * https://bugs.launchpad.net/neutron/+bug/1956476 - "[OVN] Disallow multiple physnets per bridge" - Assigned to: Rodolfo Alonso * https://bugs.launchpad.net/neutron/+bug/1956770 - " keepalived no_track implemented in >= 2.0.3" - Assigned to: Rodolfo Alonso Needs further triage: * https://bugs.launchpad.net/neutron/+bug/1956435 - "OVS: support multiple segments per host" - Unassigned * https://bugs.launchpad.net/neutron/+bug/1956460 - "The VM can't get metadata from dhcp-agent" - Unassigned * https://bugs.launchpad.net/neutron/+bug/1956846 - "ha router duplicated routes" - Unassigned Cheers, Lucas From gagehugo at gmail.com Tue Jan 11 00:52:59 2022 From: gagehugo at gmail.com (Gage Hugo) Date: Mon, 10 Jan 2022 18:52:59 -0600 Subject: [openstack-helm] No meeting tomorrow Message-ID: Hey team, Since there are no agenda items [0] for the IRC meeting tomorrow, the meeting is canceled. Our next meeting will be Jan 18th. Thanks [0] https://etherpad.opendev.org/p/openstack-helm-weekly-meeting -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Tue Jan 11 07:12:48 2022 From: marios at redhat.com (Marios Andreou) Date: Tue, 11 Jan 2022 09:12:48 +0200 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: Message-ID: On Tue, Jan 4, 2022 at 3:08 PM Marios Andreou wrote: > > Hello TripleO ( & happy new year :) \o/ ) > > I'd like to propose Douglas Viroel [1] for core on the tripleo-ci > repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, > openstack/tripleo-quickstart, openstack/tripleo-repos). > > Doug joined the team last year and besides his code contributions he > has also been consistently providing many very useful and thoughtful > code reviews. I think he will be an excellent addition to the ci core > team. > > As is customary, let's leave this thread open for a week and if there > are no objections or other concerns then we add Doug to the core group > next week. > having seen no objections ;) Doug is now in the tripleo-core gerrit group [1] @Doug thank you for your contributions o/ keep 'em coming ;) regards, marios [1] https://review.opendev.org/admin/groups/0319cee8020840a3016f46359b076fa6b6ea831a,members > thanks, marios > > [1] https://review.opendev.org/q/owner:viroel%2540gmail.com From viroel at gmail.com Tue Jan 11 14:15:36 2022 From: viroel at gmail.com (Douglas) Date: Tue, 11 Jan 2022 11:15:36 -0300 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: Message-ID: Thank you all Looking forward to continue contributing to these projects \o/ On Tue, Jan 11, 2022 at 4:14 AM Marios Andreou wrote: > On Tue, Jan 4, 2022 at 3:08 PM Marios Andreou wrote: > > > > Hello TripleO ( & happy new year :) \o/ ) > > > > I'd like to propose Douglas Viroel [1] for core on the tripleo-ci > > repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, > > openstack/tripleo-quickstart, openstack/tripleo-repos). > > > > Doug joined the team last year and besides his code contributions he > > has also been consistently providing many very useful and thoughtful > > code reviews. I think he will be an excellent addition to the ci core > > team. > > > > As is customary, let's leave this thread open for a week and if there > > are no objections or other concerns then we add Doug to the core group > > next week. > > > > having seen no objections ;) Doug is now in the tripleo-core gerrit group > [1] > > @Doug thank you for your contributions o/ keep 'em coming ;) > > regards, marios > > [1] > https://review.opendev.org/admin/groups/0319cee8020840a3016f46359b076fa6b6ea831a,members > > > > > thanks, marios > > > > [1] https://review.opendev.org/q/owner:viroel%2540gmail.com > > > -- Douglas Viroel - dviroel -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Tue Jan 11 15:45:57 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 11 Jan 2022 09:45:57 -0600 Subject: [all] Nomination open for OpenStack "Z" Release Naming Message-ID: <17e49d142da.1232180f6369644.468276914290122896@ghanshyammann.com> Hello Everyone, We are now starting the process for the OpenStack 'Z' release name. We are a little late to start it, sorry for that. I have proposed to close the nomination on 24th Jan[1]. I am hoping that is enough time to collect the names, If not please reply to this thread or in gerrit review[1]. Once the governance patch is merged I will update the final dates of nomination close and polls here, meanwhile in parallel please start proposing the name on the wiki page. Criteria: ====== - Refer to the below governance page for the naming criteria: https://governance.openstack.org/tc/reference/release-naming.html#release-name-criteria - Any community members can propose the name to the below wiki page: https://wiki.openstack.org/wiki/Release_Naming/Z_Proposals We encourage all community members to participate in this process. [1] https://review.opendev.org/c/openstack/governance/+/824201 -gmann From cboylan at sapwetik.org Tue Jan 11 21:50:27 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 11 Jan 2022 13:50:27 -0800 Subject: =?UTF-8?Q?[all][infra][qa][tripleo][openstack-ansible][rally][aodh][cind?= =?UTF-8?Q?er][kolla][chef][sahara]_CentOS_8_EOL_and_removal_from_CI_lab?= =?UTF-8?Q?el/image_lists?= Message-ID: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> As noted last month the OpenDev [0] team intends on removing CentOS 8 images from our CI system now that the release has gone EOL. A number of you have already shifted over to CentOS 8 Stream in CI (thank you!), but there is still quite a bit remaining based on codesearch and some manual digging. The OpenDev team has begun the process of removing some of the supporting infrastructure and testing as well [1]. This list is probably not comprehensive but is a start. These projects will need to look at removing their CentOS 8 CI jobs (optionally replacing them with CentOS 8 Stream jobs): * devstack (victoria and older branches) * tripleo-validations * tripleo-upgrade * openstack-ansible * rally * aodh * cinderlib * kolla-ansible * openstack-chef * sahara * tenks * validations-common * validations-lib One thing to keep in mind is that stable branches may also be affected. We'd like to do this cleanup as gracefully as possible, but reality is that some projects are unlikely to completely remove their use of CentOS 8. I think that we can give it until the end of January before we force merge updates to remove the nodeset and label from our configs. At that point any Zuul configuration still using CentOS 8 will enter an error state and correcting it will be necessary to make updates to those Zuul configs. [0] https://lists.opendev.org/pipermail/service-announce/2021-December/000029.html [1] https://review.opendev.org/q/topic:%22remove-centos-8%22+status:open From tkajinam at redhat.com Wed Jan 12 01:13:04 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Wed, 12 Jan 2022 10:13:04 +0900 Subject: [all][infra][qa][tripleo][openstack-ansible][rally][aodh][cinder][kolla][chef][sahara] CentOS 8 EOL and removal from CI label/image lists In-Reply-To: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> References: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> Message-ID: Can we know which job from the aodh repo is running on CentOS8 ? I checked the job definitions in master and stable branches but I could not find any. I've submitted patches to migrate jobs from CentOS 8 to CentOS 8 Stream for some of these repos (we also need to decide how to deal with the stable branches, though). I used the same topic (remove-centos-8), so these patches can be found by the same query[1]. On Wed, Jan 12, 2022 at 6:58 AM Clark Boylan wrote: > As noted last month the OpenDev [0] team intends on removing CentOS 8 > images from our CI system now that the release has gone EOL. A number of > you have already shifted over to CentOS 8 Stream in CI (thank you!), but > there is still quite a bit remaining based on codesearch and some manual > digging. The OpenDev team has begun the process of removing some of the > supporting infrastructure and testing as well [1]. > > This list is probably not comprehensive but is a start. These projects > will need to look at removing their CentOS 8 CI jobs (optionally replacing > them with CentOS 8 Stream jobs): > > * devstack (victoria and older branches) > * tripleo-validations > * tripleo-upgrade > * openstack-ansible > * rally > * aodh > * cinderlib > * kolla-ansible > * openstack-chef > * sahara > * tenks > * validations-common > * validations-lib > > One thing to keep in mind is that stable branches may also be affected. > > We'd like to do this cleanup as gracefully as possible, but reality is > that some projects are unlikely to completely remove their use of CentOS 8. > I think that we can give it until the end of January before we force merge > updates to remove the nodeset and label from our configs. At that point any > Zuul configuration still using CentOS 8 will enter an error state and > correcting it will be necessary to make updates to those Zuul configs. > > [0] > https://lists.opendev.org/pipermail/service-announce/2021-December/000029.html > [1] https://review.opendev.org/q/topic:%22remove-centos-8%22+status:open > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Jan 12 02:06:54 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 11 Jan 2022 20:06:54 -0600 Subject: [all][infra][qa][tripleo][openstack-ansible][rally][aodh][cinder][kolla][chef][sahara] CentOS 8 EOL and removal from CI label/image lists In-Reply-To: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> References: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> Message-ID: <17e4c09c11c.d843a893391236.7268403741732901596@ghanshyammann.com> ---- On Tue, 11 Jan 2022 15:50:27 -0600 Clark Boylan wrote ---- > As noted last month the OpenDev [0] team intends on removing CentOS 8 images from our CI system now that the release has gone EOL. A number of you have already shifted over to CentOS 8 Stream in CI (thank you!), but there is still quite a bit remaining based on codesearch and some manual digging. The OpenDev team has begun the process of removing some of the supporting infrastructure and testing as well [1]. > > This list is probably not comprehensive but is a start. These projects will need to look at removing their CentOS 8 CI jobs (optionally replacing them with CentOS 8 Stream jobs): > > * devstack (victoria and older branches) > * tripleo-validations > * tripleo-upgrade > * openstack-ansible > * rally > * aodh > * cinderlib > * kolla-ansible > * openstack-chef > * sahara > * tenks > * validations-common > * validations-lib > > One thing to keep in mind is that stable branches may also be affected. > > We'd like to do this cleanup as gracefully as possible, but reality is that some projects are unlikely to completely remove their use of CentOS 8. I think that we can give it until the end of January before we force merge updates to remove the nodeset and label from our configs. At that point any Zuul configuration still using CentOS 8 will enter an error state and correcting it will be necessary to make updates to those Zuul configs. Devstack has replaced the CentOS 8 to CentOS 8 Stream until stable/wallaby[1]. We can remove the CentOS 8 or replace it with CentOS 8 Stream for other older stable too. [1] https://review.opendev.org/q/I508eceb00d7501ffcfac73d7bc2272badb241494 -gmann > > [0] https://lists.opendev.org/pipermail/service-announce/2021-December/000029.html > [1] https://review.opendev.org/q/topic:%22remove-centos-8%22+status:open > > From fungi at yuggoth.org Wed Jan 12 02:50:01 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 12 Jan 2022 02:50:01 +0000 Subject: [all][infra][qa][tripleo][openstack-ansible][rally][aodh][cinder][kolla][chef][sahara] CentOS 8 EOL and removal from CI label/image lists In-Reply-To: <17e4c09c11c.d843a893391236.7268403741732901596@ghanshyammann.com> References: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> <17e4c09c11c.d843a893391236.7268403741732901596@ghanshyammann.com> Message-ID: <20220112025001.ftrrq6pr4iroi3wb@yuggoth.org> On 2022-01-11 20:06:54 -0600 (-0600), Ghanshyam Mann wrote: [...] > Devstack has replaced the CentOS 8 to CentOS 8 Stream until > stable/wallaby[1]. We can remove the CentOS 8 or replace it with > CentOS 8 Stream for other older stable too. [...] Yeah, the removal will need to be backported to any open branches which still have job configuration using it. Probably stable/victoria if what I saw in the requirements repo is any indication. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gmann at ghanshyammann.com Wed Jan 12 03:03:35 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 11 Jan 2022 21:03:35 -0600 Subject: [all][infra][qa][tripleo][openstack-ansible][rally][aodh][cinder][kolla][chef][sahara] CentOS 8 EOL and removal from CI label/image lists In-Reply-To: <20220112025001.ftrrq6pr4iroi3wb@yuggoth.org> References: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> <17e4c09c11c.d843a893391236.7268403741732901596@ghanshyammann.com> <20220112025001.ftrrq6pr4iroi3wb@yuggoth.org> Message-ID: <17e4c3da3d0.115a39c1c391791.2557921501717553745@ghanshyammann.com> ---- On Tue, 11 Jan 2022 20:50:01 -0600 Jeremy Stanley wrote ---- > On 2022-01-11 20:06:54 -0600 (-0600), Ghanshyam Mann wrote: > [...] > > Devstack has replaced the CentOS 8 to CentOS 8 Stream until > > stable/wallaby[1]. We can remove the CentOS 8 or replace it with > > CentOS 8 Stream for other older stable too. > [...] > > Yeah, the removal will need to be backported to any open branches > which still have job configuration using it. Probably > stable/victoria if what I saw in the requirements repo is any > indication. I have pushed the patch for removal as 1st step and we can accept the adding centOS Steam job/support in those stable branches of devstack if someone can propose. -https://review.opendev.org/q/I36751569d92fbc5084b8308d423a75318ae7d406 -gmann > -- > Jeremy Stanley > From radoslaw.piliszek at gmail.com Wed Jan 12 09:39:33 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 12 Jan 2022 10:39:33 +0100 Subject: [all][infra][qa][tripleo][openstack-ansible][rally][aodh][cinder][kolla][chef][sahara] CentOS 8 EOL and removal from CI label/image lists In-Reply-To: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> References: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> Message-ID: On Tue, 11 Jan 2022 at 22:52, Clark Boylan wrote: > > As noted last month the OpenDev [0] team intends on removing CentOS 8 images from our CI system now that the release has gone EOL. A number of you have already shifted over to CentOS 8 Stream in CI (thank you!), but there is still quite a bit remaining based on codesearch and some manual digging. The OpenDev team has begun the process of removing some of the supporting infrastructure and testing as well [1]. > > This list is probably not comprehensive but is a start. These projects will need to look at removing their CentOS 8 CI jobs (optionally replacing them with CentOS 8 Stream jobs): > > > * kolla-ansible As far as Kolla is concerned, I have proposed a series of patches with the topic "ci-stop-testing-non-stream-centos" [1] which drop the reliance of Kolla projects on non-stream CentOS 8. [1] https://review.opendev.org/q/topic:%2522ci-stop-testing-non-stream-centos%2522 -yoctozepto From ccamacho at redhat.com Wed Jan 12 13:13:41 2022 From: ccamacho at redhat.com (Carlos Camacho Gonzalez) Date: Wed, 12 Jan 2022 14:13:41 +0100 Subject: [TripleO] Proposing Juan Badia Payno for TripleO core reviewer. Message-ID: Hi everyone! I'd like to propose Juan Badia Paino [1][2][3] as a core reviewer on the TripleO repositories that are or might be related to the backup and restore efforts (openstack/tripleo-ci, openstack/tripleo-ansible, openstack/python-tripleoclient, openstack/tripleo-quickstart-extras, openstack/tripleo-quickstart). Juan has been around since 2016 making useful contributions and code reviews to the community and I believe adding him to our core reviewer group will help us improve the review and coding speed for the backup and restore codebase. As usual, consider this email as an initial +1 from my side, I will keep an eye on this thread for a week, and based on your feedback and if there are no objections I will add him as a core reviewer in two weeks. [1]: https://review.opendev.org/q/owner:jbadiapa%2540redhat.com [2]: https://www.stackalytics.io/?project_type=all&metric=commits&user_id=jbadiapa&release=all [3]: https://www.stackalytics.io/report/contribution?module=tripleo-group&project_type=openstack&days=180 Cheers, Carlos. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfrancoa at redhat.com Wed Jan 12 13:45:44 2022 From: jfrancoa at redhat.com (Jose Luis Franco Arza) Date: Wed, 12 Jan 2022 14:45:44 +0100 Subject: [TripleO] Proposing Juan Badia Payno for TripleO core reviewer. In-Reply-To: References: Message-ID: +1 Very deserved! On Wed, Jan 12, 2022 at 2:16 PM Carlos Camacho Gonzalez wrote: > Hi everyone! > > I'd like to propose Juan Badia Paino [1][2][3] as a core reviewer on the > TripleO repositories that are or might be related to the backup and restore > efforts (openstack/tripleo-ci, openstack/tripleo-ansible, > openstack/python-tripleoclient, openstack/tripleo-quickstart-extras, > openstack/tripleo-quickstart). > > Juan has been around since 2016 making useful contributions and code > reviews to the community and I believe adding him to our core reviewer > group will help us improve the review and coding speed for the backup and > restore codebase. > > As usual, consider this email as an initial +1 from my side, I will keep > an eye on this thread for a week, and based on your feedback and if there > are no objections I will add him as a core reviewer in two weeks. > > [1]: https://review.opendev.org/q/owner:jbadiapa%2540redhat.com > [2]: > https://www.stackalytics.io/?project_type=all&metric=commits&user_id=jbadiapa&release=all > [3]: > https://www.stackalytics.io/report/contribution?module=tripleo-group&project_type=openstack&days=180 > > Cheers, > Carlos. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfrancoa at redhat.com Wed Jan 12 13:47:57 2022 From: jfrancoa at redhat.com (Jose Luis Franco Arza) Date: Wed, 12 Jan 2022 14:47:57 +0100 Subject: [TripleO] Douglas Viroel for tripleo-ci core In-Reply-To: References: Message-ID: +1 On Tue, Jan 4, 2022 at 2:11 PM Marios Andreou wrote: > Hello TripleO ( & happy new year :) \o/ ) > > I'd like to propose Douglas Viroel [1] for core on the tripleo-ci > repos (openstack/tripleo-ci, openstack/tripleo-quickstart-extras, > openstack/tripleo-quickstart, openstack/tripleo-repos). > > Doug joined the team last year and besides his code contributions he > has also been consistently providing many very useful and thoughtful > code reviews. I think he will be an excellent addition to the ci core > team. > > As is customary, let's leave this thread open for a week and if there > are no objections or other concerns then we add Doug to the core group > next week. > > thanks, marios > > [1] https://review.opendev.org/q/owner:viroel%2540gmail.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Wed Jan 12 13:48:35 2022 From: johfulto at redhat.com (John Fulton) Date: Wed, 12 Jan 2022 08:48:35 -0500 Subject: [TripleO] Proposing Juan Badia Payno for TripleO core reviewer. In-Reply-To: References: Message-ID: +1 On Wed, Jan 12, 2022 at 8:47 AM Jose Luis Franco Arza wrote: > > +1 > > Very deserved! > > On Wed, Jan 12, 2022 at 2:16 PM Carlos Camacho Gonzalez wrote: >> >> Hi everyone! >> >> I'd like to propose Juan Badia Paino [1][2][3] as a core reviewer on the TripleO repositories that are or might be related to the backup and restore efforts (openstack/tripleo-ci, openstack/tripleo-ansible, openstack/python-tripleoclient, openstack/tripleo-quickstart-extras, openstack/tripleo-quickstart). >> >> Juan has been around since 2016 making useful contributions and code reviews to the community and I believe adding him to our core reviewer group will help us improve the review and coding speed for the backup and restore codebase. >> >> As usual, consider this email as an initial +1 from my side, I will keep an eye on this thread for a week, and based on your feedback and if there are no objections I will add him as a core reviewer in two weeks. >> >> [1]: https://review.opendev.org/q/owner:jbadiapa%2540redhat.com >> [2]: https://www.stackalytics.io/?project_type=all&metric=commits&user_id=jbadiapa&release=all >> [3]: https://www.stackalytics.io/report/contribution?module=tripleo-group&project_type=openstack&days=180 >> >> Cheers, >> Carlos. From senrique at redhat.com Wed Jan 12 13:53:37 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 12 Jan 2022 10:53:37 -0300 Subject: [cinder] Bug deputy report for week of 01-12-2022 Message-ID: This is a bug report from 01-05-2021 to 01-12-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- Medium - https://bugs.launchpad.net/cinder/+bug/1956887 "LVM backed ISCSI device not reporting the same size to nova." Unassigned. - https://bugs.launchpad.net/cinder/+bug/1957073 "rbd: snapshot can't be deleted/unmanaged if its source volume is deleted from backend." Assigned to Takashi Kajinami. Incomplete - https://bugs.launchpad.net/cinder/+bug/1956601"Cinder doc is not reproducible." Unassigned. Cheers, -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From fpantano at redhat.com Wed Jan 12 13:59:05 2022 From: fpantano at redhat.com (Francesco Pantano) Date: Wed, 12 Jan 2022 14:59:05 +0100 Subject: [TripleO] Proposing Juan Badia Payno for TripleO core reviewer. In-Reply-To: References: Message-ID: +1 On Wed, Jan 12, 2022 at 2:56 PM John Fulton wrote: > +1 > > On Wed, Jan 12, 2022 at 8:47 AM Jose Luis Franco Arza > wrote: > > > > +1 > > > > Very deserved! > > > > On Wed, Jan 12, 2022 at 2:16 PM Carlos Camacho Gonzalez < > ccamacho at redhat.com> wrote: > >> > >> Hi everyone! > >> > >> I'd like to propose Juan Badia Paino [1][2][3] as a core reviewer on > the TripleO repositories that are or might be related to the backup and > restore efforts (openstack/tripleo-ci, openstack/tripleo-ansible, > openstack/python-tripleoclient, openstack/tripleo-quickstart-extras, > openstack/tripleo-quickstart). > >> > >> Juan has been around since 2016 making useful contributions and code > reviews to the community and I believe adding him to our core reviewer > group will help us improve the review and coding speed for the backup and > restore codebase. > >> > >> As usual, consider this email as an initial +1 from my side, I will > keep an eye on this thread for a week, and based on your feedback and if > there are no objections I will add him as a core reviewer in two weeks. > >> > >> [1]: https://review.opendev.org/q/owner:jbadiapa%2540redhat.com > >> [2]: > https://www.stackalytics.io/?project_type=all&metric=commits&user_id=jbadiapa&release=all > >> [3]: > https://www.stackalytics.io/report/contribution?module=tripleo-group&project_type=openstack&days=180 > >> > >> Cheers, > >> Carlos. > > > -- Francesco Pantano GPG KEY: F41BD75C -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Wed Jan 12 14:25:51 2022 From: marios at redhat.com (Marios Andreou) Date: Wed, 12 Jan 2022 16:25:51 +0200 Subject: [TripleO] Proposing Juan Badia Payno for TripleO core reviewer. In-Reply-To: References: Message-ID: +1 On Wed, Jan 12, 2022 at 3:19 PM Carlos Camacho Gonzalez wrote: > > Hi everyone! > > I'd like to propose Juan Badia Paino [1][2][3] as a core reviewer on the TripleO repositories that are or might be related to the backup and restore efforts (openstack/tripleo-ci, openstack/tripleo-ansible, openstack/python-tripleoclient, openstack/tripleo-quickstart-extras, openstack/tripleo-quickstart). > > Juan has been around since 2016 making useful contributions and code reviews to the community and I believe adding him to our core reviewer group will help us improve the review and coding speed for the backup and restore codebase. > > As usual, consider this email as an initial +1 from my side, I will keep an eye on this thread for a week, and based on your feedback and if there are no objections I will add him as a core reviewer in two weeks. > > [1]: https://review.opendev.org/q/owner:jbadiapa%2540redhat.com > [2]: https://www.stackalytics.io/?project_type=all&metric=commits&user_id=jbadiapa&release=all > [3]: https://www.stackalytics.io/report/contribution?module=tripleo-group&project_type=openstack&days=180 > > Cheers, > Carlos. From chkumar at redhat.com Wed Jan 12 14:53:46 2022 From: chkumar at redhat.com (Chandan Kumar) Date: Wed, 12 Jan 2022 20:23:46 +0530 Subject: [tripleo]Making CentOS Stream 9 jobs voting and remove cs8 jobs Message-ID: Hello, By the start of the year 2022, we have added CentOS Stream 9 tripleo non-voting jobs in the check pipeline. Now these jobs seem to be healthy. The TripleO CI team has put a lot of effort into adding these jobs. Now it's time to make them voting. Below is the status of each job: * content provider and Standalone Jobs https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-9-content-provider&job_name=tripleo-ci-centos-9-standalone * Standalone Sc1 to 4 https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-9-scenario001-standalone&job_name=tripleo-ci-centos-9-scenario002-standalone&job_name=tripleo-ci-centos-9-scenario003-standalone&job_name=tripleo-ci-centos-9-scenario004-standalone * Standalone sc7, Sc10 and Sc12 https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-9-scenario007-standalone&job_name=tripleo-ci-centos-9-scenario010-standalone&job_name=tripleo-ci-centos-9-scenario010-ovn-provider-standalone&job_name=tripleo-ci-centos-9-scenario012-standalone * Multinode and undercloud jobs https://zuul.openstack.org/builds?job_name=ripleo-ci-centos-9-containers-multinode&job_name=tripleo-ci-centos-9-scenario007-multinode-oooq-container&job_name=tripleo-ci-centos-9-scenario000-multinode-oooq-container-updates&job_name=tripleo-ci-centos-9-undercloud-containers Here are the plans for moving forward with Master release: * Merge https://review.opendev.org/q/hashtag:%22c9voting%22+(status:open%20OR%20status:merged) patches to make the above jobs voting and gating. - It means tripleo patches might be seeing a lot of cs8 and cs9 jobs running on the patch and blocking it. * We will keep both cs8 and cs9 jobs running for 1 or 2 weeks. - Note: It might increase the load on infra. * After 2 weeks, we will start removing cs8 jobs from the check and gate jobs. * RDO third party c8 jobs will be still running against tripleo patches until CS9 OVB jobs are proven stable in check. Thank you and please reach out to us for any concerns or queries. With Regards, Chandan Kumar From christian.rohmann at inovex.de Wed Jan 12 15:03:47 2022 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Wed, 12 Jan 2022 16:03:47 +0100 Subject: [neutron] Bug Deputy Report January 03 - 10 In-Reply-To: References: Message-ID: <02b79e74-7d84-a6ba-537f-7db1a7ee6532@inovex.de> Hello Lucas, I hope you don't mind my rather blunt question, but how do tickets end up on this list? I read https://docs.openstack.org/neutron/latest/contributor/policies/bugs.html#neutron-bug-deputy but am just wondering if there is anything else I should have done when reporting the issue? I don't just want to shout louder and certainly everybody wants their issue to be looked at and fixed first. But I just noticed that my bug report received no reply or confirmation in the last three months (apart from the tag "vpnaas" being added), while other issues were triaged quickly and were consequently added to this deputy report ... On 10/01/2022 22:17, Lucas Alvares Gomes wrote: > [...] > > * https://bugs.launchpad.net/neutron/+bug/1956846 - "ha router > duplicated routes" > - Unassigned > > [...] Similar to those duplicate routes, I reported an issue about duplicated IPtables causing VPNaaS not not work in https://bugs.launchpad.net/neutron/+bug/1943449. With kind regards, Christian From sbauza at redhat.com Wed Jan 12 16:19:04 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 12 Jan 2022 17:19:04 +0100 Subject: [nova] [placement] Proposing Sean Mooney as nova-core Message-ID: Hi all, I would like to propose Sean as an addition to the nova-core team (which includes placement merge rights as nova-core is implicitly a subgroup). As we know, he's around for a long time, is already a nova-specs-core and has proven solid experience in reviews. Cores, please vote (-1, 0, +1) before next Wednesday Jan 19th 1600UTC. Cheers, -Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Wed Jan 12 16:19:21 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Wed, 12 Jan 2022 08:19:21 -0800 Subject: =?UTF-8?Q?Re:_[all][infra][qa][tripleo][openstack-ansible][rally][aodh][?= =?UTF-8?Q?cinder][kolla][chef][sahara]_CentOS_8_EOL_and_removal_from_CI?= =?UTF-8?Q?_label/image_lists?= In-Reply-To: References: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> Message-ID: <1ecc11e6-0020-43c4-9b7a-3580df6606f2@www.fastmail.com> On Tue, Jan 11, 2022, at 5:13 PM, Takashi Kajinami wrote: > Can we know which job from the aodh repo is running on CentOS8 ? > I checked the job definitions in master and stable branches but I could > not find any. > > I've submitted patches to migrate jobs from CentOS 8 to CentOS 8 Stream > for some of > these repos (we also need to decide how to deal with the stable > branches, though). > I used the same topic (remove-centos-8), so these patches can be found > by the same query[1]. I think I flagged it due to the telemetry jobs which I had misidentified as running CentOS 8 instead of CentOS 8 stream. Double checking it the job seems to use stream so this was likely a false positive. Sorry for the noise and thank you for double checking. > > > On Wed, Jan 12, 2022 at 6:58 AM Clark Boylan wrote: >> As noted last month the OpenDev [0] team intends on removing CentOS 8 images from our CI system now that the release has gone EOL. A number of you have already shifted over to CentOS 8 Stream in CI (thank you!), but there is still quite a bit remaining based on codesearch and some manual digging. The OpenDev team has begun the process of removing some of the supporting infrastructure and testing as well [1]. >> >> This list is probably not comprehensive but is a start. These projects will need to look at removing their CentOS 8 CI jobs (optionally replacing them with CentOS 8 Stream jobs): >> >> * devstack (victoria and older branches) >> * tripleo-validations >> * tripleo-upgrade >> * openstack-ansible >> * rally >> * aodh >> * cinderlib >> * kolla-ansible >> * openstack-chef >> * sahara >> * tenks >> * validations-common >> * validations-lib >> >> One thing to keep in mind is that stable branches may also be affected. >> >> We'd like to do this cleanup as gracefully as possible, but reality is that some projects are unlikely to completely remove their use of CentOS 8. I think that we can give it until the end of January before we force merge updates to remove the nodeset and label from our configs. At that point any Zuul configuration still using CentOS 8 will enter an error state and correcting it will be necessary to make updates to those Zuul configs. >> >> [0] https://lists.opendev.org/pipermail/service-announce/2021-December/000029.html >> [1] https://review.opendev.org/q/topic:%22remove-centos-8%22+status:open >> From damian.pietras at hardit.pl Wed Jan 12 17:14:31 2022 From: damian.pietras at hardit.pl (Damian Pietras) Date: Wed, 12 Jan 2022 18:14:31 +0100 Subject: [nova] iothread support with Libvirt In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04AFB65E@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65B@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65E@gmsxchsvr01.thecreation.com> Message-ID: Hi, We've actually hit latency with local disks (image based storage) this week and I've performed multiple benchmarks with various options. Our goal is to have best latency / IOPS with random synchronous 8K writes on local NVME with queue depth of 1 (this is what our DB is doing). Our writes are synchronous so my numbers will be much lower then your 4K random writes. Our hardware: 2x INTEL Xeon Silver 4214R 16x 16GB DDR4 2x NVME - WD SN630 3.2TB in RAID0 (using LVM) VM is Debian 9 image with hw_disk_bus=scsi set in metadata With our setup we started with 4800 IOPS and ~0.3ms latency with standard settings and went to 17,8K IOPS with ~0.054ms latency after some optimizations. Here is what settings resulted in different performance data: - change I/O scheduler to noop in VM (echo 'noop' > /sys/block/sda/queue/scheduler) - set scaling_governor=performance on the compute host from default "schedutil". I've noticed this is most significant change with queue depth of 1 when there is no other load on the host. Alternatively putting artificial CPU load on the VM while running the benchmark also improves I/O latency. I guess keeping CPU clocks higher wither with CPU scheduler setting or artificial CPU usage have significant impact. This may also prevent CPU from going into deeper C-states but I did not investigate that further. - Setting io='native' in libvirt configuration. This is set automatically in OpenStack when you use preallocated images (https://docs.openstack.org/nova/xena/configuration/config.html#DEFAULT.preallocate_images) - Use LVM-backed images instead of thin provisioning qcow2 as you've already tried - Change the "bus" parameter to "virtio" instead of scsi. I did not performed benchmark with all those changes combined because we achieved required performance. After that we only set I/O scheduler to noop, and will probably relay on CPU load in production performance to keep the CPU busy and prevent going to deeper C-states and lower the CPU clock. On 07.01.2022 04:54, Eric K. Miller wrote: > Hi Laurent, > > I thought I may have already done some benchmarks, and it looks like I did, long ago, for the discussion that I created a couple years ago (on August 6, 2020 to be exact). > > I copied the results from that email below. You can see that the latency difference is pretty significant (13.75x with random 4KiB reads) between bare metal and a VM, which is about the same as the difference in IOPS. Writes are not quite as bad of difference at 8.4x. > > Eric > > > Some numbers from fio, just to get an idea for how good/bad the IOPS will be: > > Configuration: > 32 core EPYC 7502P with 512GiB of RAM - CentOS 7 latest updates - Kolla Ansible (Stein) deployment > 32 vCPU VM with 64GiB of RAM > 32 x 10GiB test files (I'm using file tests, not raw device tests, so not optimal, but easiest when the VM root disk is the test disk) > iodepth=10 > numofjobs=32 > time=30 (seconds) > > The VM was deployed using a qcow2 image, then deployed as a raw image, to see the difference in performance. There was none, which makes sense, since I'm pretty sure the qcow2 image was decompressed and stored in the LVM logical volume - so both tests were measuring the same thing. > > Bare metal (random 4KiB reads): > 8066MiB/sec > 154.34 microsecond avg latency > 2.065 million IOPS > > VM qcow2 (random 4KiB reads): > 589MiB/sec > 2122.10 microsecond avg latency > 151k IOPS > > Bare metal (random 4KiB writes): > 4940MiB/sec > 252.44 microsecond avg latency > 1.265 million IOPS > > VM qcow2 (random 4KiB writes): > 589MiB/sec > 2119.16 microsecond avg latency > 151k IOPS > > Since the read and write VM results are nearly identical, my assumption is that the emulation layer is the bottleneck. CPUs in the VM were all at 55% utilization (all kernel usage). The qemu process on the bare metal machine indicated 1600% (or so) CPU utilization. > > Below are runs with sequential 1MiB block tests > > Bare metal (sequential 1MiB reads): > 13.3GiB/sec > 23446.43 microsecond avg latency > 13.7k IOPS > > VM qcow2 (sequential 1MiB reads): > 8378MiB/sec > 38164.52 microsecond avg latency > 8377 IOPS > > Bare metal (sequential 1MiB writes): > 8098MiB/sec > 39488.00 microsecond avg latency > 8097 million IOPS > > VM qcow2 (sequential 1MiB writes): > 8087MiB/sec > 39534.96 microsecond avg latency > 8087 IOPS -- Damian Pietras HardIT From rosmaita.fossdev at gmail.com Wed Jan 12 17:47:03 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 12 Jan 2022 12:47:03 -0500 Subject: [cinder] removing block-box code Message-ID: <5c46aa62-3b1f-b400-ac15-3960e5768ebd@gmail.com> Hello Argonauts and other interested parties, As you may be aware, the cinder code repository contains files under contrib/block-box whose README file has included a note [0] since 20 January 2020 indicating that block-box is not supported. This note serves notice that we will be removing the code during the Yoga development cycle. I trust that this will be uncontroversial. If you have any objections, please raise them now, either here on the ML or on the patch removing the code: https://review.opendev.org/c/openstack/cinder/+/824472 cheers, brian [0] https://opendev.org/openstack/cinder/commit/a4d8f761639d63172814a593a32da4167150c600 From eogus1217 at gmail.com Tue Jan 11 01:43:39 2022 From: eogus1217 at gmail.com (Daehyun Lim) Date: Tue, 11 Jan 2022 10:43:39 +0900 Subject: Question about smartphone-based HPC system. Message-ID: Dear, My name is Ray from South Korea. My company and I are very interested in Smartphone-based High Performance Computing system. Currently, my company is looking for research about how to build operating system (we called it 'Middleware system) that connects between each smartphone and the central system. We will running AP farm (easy to understand, imagine factory gathering huge amounts of smartphones - attached picture) and a network mining system that allows connecting only smartphones. Our current one of the issues is that how to build middleware system - allocate jobs to each smartphones, validate etc. And I'm curious how much realistic Flops from smartphones. While I was researching related information in Google, I saw Opnestack website. I'm curious that is possible Openstack can be implemented in the smartphone based HPC system. As you know, more than 100 million of phones are abandoned and unused every year. Therefore, I think if we can use these smartphones in new computing power, it will be great business in terms of save the resouces, low electricity and recycle. If you provide some advice about smartphone-based HPC system, I will grateful to you. Thank you. Best regards, Ray. [image: Mining.jpg] [image: photo_2020-01-29_14-38-17.jpg] [image: photo_2019-10-08_14-30-11.jpg] -- Telegram account : Raylim1217 Skype account : live:a43fdbe9bdd9a9c8 Zoom : 263-976-5455 Phone : +821092013947 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Mining.jpg Type: image/jpeg Size: 311370 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: photo_2020-01-29_14-38-17.jpg Type: image/jpeg Size: 197313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: photo_2019-10-08_14-30-11.jpg Type: image/jpeg Size: 132485 bytes Desc: not available URL: From emiller at genesishosting.com Wed Jan 12 18:13:58 2022 From: emiller at genesishosting.com (Eric K. Miller) Date: Wed, 12 Jan 2022 12:13:58 -0600 Subject: [nova] iothread support with Libvirt In-Reply-To: References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65B@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65E@gmsxchsvr01.thecreation.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04AFB685@gmsxchsvr01.thecreation.com> Hi Damian, > With our setup we started with 4800 IOPS and ~0.3ms latency with > standard settings and went to 17,8K IOPS with ~0.054ms latency after > some optimizations. Here is what settings resulted in different > performance data: > > - change I/O scheduler to noop in VM (echo 'noop' > > /sys/block/sda/queue/scheduler) Thank you for the info! It appears that the "noop" scheduler merges requests, so you are likely getting between 3 and 4 I/O command merges per command to go from 4800 to 17800 IOPS. I'll have to check on this to see if that changes anything on this end, since I thought that the default scheduler also performed command merging. Regarding sleep states, you may want to look at the power management functions in the BIOS. If you have "energy efficient" settings, this will definitely have an impact on latency, but as you noticed, the governor can also override some of these sleep states if you set it to performance. We did a little more testing with iothreads on our Proxmox systems, since it is easy to enable/disable this on a virtual disk. The performance difference on both a relatively idle compute node and VM is extremely small (barely noticeable). With a busy VM, it may make a difference, but we haven't had time to test. So, all the work involved in enabling iothreads in OpenStack may not be worth it. One of our storage vendors had done some testing as well long ago, and they indicated that, to benefit from iothreads, dedicated cores should be used for the iothreads, which creates a bit more resource allocation complexity in OpenStack, especially if live migration is required. Eric From ozzzo at yahoo.com Wed Jan 12 18:26:17 2022 From: ozzzo at yahoo.com (Albert Braden) Date: Wed, 12 Jan 2022 18:26:17 +0000 (UTC) Subject: [ops] [kolla] RabbitMQ High Availability In-Reply-To: References: <5dd6d28f-9955-7ca5-0ab8-0c5ce11ceb7e@redhat.com> <14084e87df22458caa7668eea8b496b6@verisign.com> <1147779219.1274196.1639086048233@mail.yahoo.com> <986294621.1553814.1639155002132@mail.yahoo.com> <169252651.2819859.1639516226823@mail.yahoo.com> <1335760337.3548170.1639680236968@mail.yahoo.com> <33441648.1434581.1641304881681@mail.yahoo.com> Message-ID: <385929635.1929303.1642011977053@mail.yahoo.com> This is very helpful. Thank you! It appears that I have successfully set the expire time to 1200, because I no longer see unconsumed messages lingering in my queues, but it's not obvious how to verify. In the web interface, when I look at the queues, I see things like policy, state, features and consumers, but I don't see a timeout or expire value, nor do I find the number 1200 anywhere. Where should I be looking in the web interface to verify that I set the expire time correctly? Or do I need to use the CLI? On Wednesday, January 5, 2022, 04:23:29 AM EST, Mark Goddard wrote: On Tue, 4 Jan 2022 at 14:08, Albert Braden wrote: > > Now that the holidays are over I'm trying this one again. Can anyone help me figure out how to set "expires" and "message-ttl" ? John Garbutt proposed a few patches for RabbitMQ in kolla, including this: https://review.opendev.org/c/openstack/kolla-ansible/+/822191 https://review.opendev.org/q/hashtag:%2522rabbitmq%2522+(status:open+OR+status:merged)+project:openstack/kolla-ansible Note that they are currently untested. Mark > On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden wrote: > > > I tried these policies in ansible/roles/rabbitmq/templates/definitions.json.j2: > > "policies":[ > {"vhost": "/", "name": "ha-all", "pattern": '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, > {"vhost": "/", "name": "notifications-ttl", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"message-ttl":600}, "priority":0} > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"expire":3600}, "priority":0} > {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} > {% endif %} > > But I still see unconsumed messages lingering in notifications_extractor.info. From reading the docs I think this setting should cause messages to expire after 600 seconds, and unused queues to be deleted after 3600 seconds. What am I missing? > On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden wrote: > > > Following [1] I successfully set "amqp_durable_queues = True" and restricted HA to the appropriate queues, but I'm having trouble with some of the other settings such as "expires" and "message-ttl". Does anyone have an example of a working kolla config that includes these changes? > > [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud wrote: > > > So, your config snippet LGTM. > > Le ven. 10 d?c. 2021 ? 17:50, Albert Braden a ?crit : > > Sorry, that was a transcription error. I thought "True" and my fingers typed "False." The correct lines are: > > [oslo_messaging_rabbit] > amqp_durable_queues = True > > On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud wrote: > > > If you plan to let `amqp_durable_queues = False` (i.e if you plan to keep this config equal to false), then you don't need to add these config lines as this is already the default value [1]. > > [1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34 > > Le jeu. 9 d?c. 2021 ? 22:40, Albert Braden a ?crit : > > Replying from my home email because I've been asked to not email the list from my work email anymore, until I get permission from upper management. > > I'm not sure I follow. I was planning to add 2 lines to etc/kolla/config/global.conf: > > [oslo_messaging_rabbit] > amqp_durable_queues = False > > Is that not sufficient? What is involved in configuring dedicated control exchanges for each service? What would that look like in the config? > > > From: Herve Beraud > Sent: Thursday, December 9, 2021 2:45 AM > To: Bogdan Dobrelya > Cc: openstack-discuss at lists.openstack.org > Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability > > > > Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > Le mer. 8 d?c. 2021 ? 11:48, Bogdan Dobrelya a ?crit : > > Please see inline > > >> I read this with great interest because we are seeing this issue. Questions: > >> > >> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? > >> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? > >> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into? > > > > Note that even having rabbit HA policies adjusted like that and its HA > > replication factor [0] decreased (e.g. to a 2), there still might be > > high churn caused by a large enough number of replicated durable RPC > > topic queues. And that might cripple the cloud down with the incurred > > I/O overhead because a durable queue requires all messages in it to be > > persisted to a disk (for all the messaging cluster replicas) before they > > are ack'ed by the broker. > > > > Given that said, Oslo messaging would likely require a more granular > > control for topic exchanges and the durable queues flag - to tell it to > > declare as durable only the most critical paths of a service. A single > > config setting and a single control exchange per a service might be not > > enough. > > Also note that therefore, amqp_durable_queue=True requires dedicated > control exchanges configured for each service. Those that use > 'openstack' as a default cannot turn the feature ON. Changing it to a > service specific might also cause upgrade impact, as described in the > topic [3]. > > > > The same is true for `amqp_auto_delete=True`. That requires dedicated control exchanges else it won't work if each service defines its own policy on a shared control exchange (e.g `openstack`) and if policies differ from each other. > > > > [3] https://review.opendev.org/q/topic:scope-config-opts > > > > > There are also race conditions with durable queues enabled, like [1]. A > > solution could be where each service declare its own dedicated control > > exchange with its own configuration. > > > > Finally, openstack components should add perhaps a *.next CI job to test > > it with durable queues, like [2] > > > > [0] https://www.rabbitmq.com/ha.html#replication-factor > > > > [1] > > https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt > > > > [2] https://review.opendev.org/c/openstack/nova/+/820523 > > > >> > >> Does anyone have a sample set of RMQ config files that they can share? > >> > >> It looks like my Outlook has ruined the link; reposting: > >> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > > > > > -- > > Best regards, > > Bogdan Dobrelya, > > Irc #bogdando > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > > > -- > > Herv? Beraud > > Senior Software Engineer at Red Hat > > irc: hberaud > > https://github.com/4383/ > > https://twitter.com/4383hberaud > > > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > https://twitter.com/4383hberaud > > > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > https://twitter.com/4383hberaud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Jan 12 18:37:27 2022 From: smooney at redhat.com (Sean Mooney) Date: Wed, 12 Jan 2022 18:37:27 +0000 Subject: Question about smartphone-based HPC system. In-Reply-To: References: Message-ID: <0c3951d50172f6101e8a9d1880d892966a4b03f1.camel@redhat.com> On Tue, 2022-01-11 at 10:43 +0900, Daehyun Lim wrote: > Dear, > > My name is Ray from South Korea. > My company and I are very interested in Smartphone-based High Performance > Computing system. Currently, my company is looking for research about how > to build operating system (we called it 'Middleware system) that connects > between each smartphone and the central system. > > We will running AP farm (easy to understand, imagine factory gathering huge > amounts of smartphones - attached picture) and a network mining system that > allows connecting only smartphones. > > Our current one of the issues is that how to build middleware system - > allocate jobs to each smartphones, validate etc. And I'm curious how much > realistic Flops from smartphones. > > While I was researching related information in Google, I saw Opnestack > website. > I'm curious that is possible Openstack can be implemented in the smartphone > based HPC system. its rather unlikely. while you amy be able to run some of the services in general a smarthfone will not have the processing power, ram or disk spaces to make they usable for openstack or workload runing on openstack. you woudl likely be better off looking at https://slurm.schedmd.com/overview.html hpc is not a primary usecase of openstack. managing infra that is used to build a hpc system is but openstack iteslef doe not do hpc style job scheduling. kubernetees might also be a better fit. tehre are many oepsnsouce hpc cluster solution and google will help you find them but with the limited processing power of even the most modern smart phone you woudl really want a distibuted compute plathform like folding at home or boinc https://boinc.berkeley.edu/ that can take peicemeal jobs and distirbute them to the cluster for execution. openstack is not such a plathform. > > As you know, more than 100 million of phones are abandoned and unused every > year. Therefore, I think if we can use these smartphones in new computing > power, it will be great business in terms of save the resouces, low > electricity and recycle. > > If you provide some advice about smartphone-based HPC system, I will > grateful to you. > Thank you. > > Best regards, > Ray. > > [image: Mining.jpg] > [image: photo_2020-01-29_14-38-17.jpg] > [image: photo_2019-10-08_14-30-11.jpg] From emiller at genesishosting.com Wed Jan 12 21:10:11 2022 From: emiller at genesishosting.com (Eric K. Miller) Date: Wed, 12 Jan 2022 15:10:11 -0600 Subject: [nova] iothread support with Libvirt In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04AFB685@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65B@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65E@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB685@gmsxchsvr01.thecreation.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04AFB687@gmsxchsvr01.thecreation.com> > It appears that the "noop" scheduler merges requests, so you are likely > getting between 3 and 4 I/O command merges per command to go from 4800 > to 17800 IOPS. I'll have to check on this to see if that changes anything on this > end, since I thought that the default scheduler also performed command > merging. It appears that "deadline" is the only scheduler available in later kernel versions (at least in Ubuntu). Damian - do you recall what the scheduler was set to prior to changing it to the noop scheduler? Eric From dale at catalystcloud.nz Wed Jan 12 22:04:42 2022 From: dale at catalystcloud.nz (Dale Smith) Date: Thu, 13 Jan 2022 11:04:42 +1300 Subject: [ops] [kolla] RabbitMQ High Availability In-Reply-To: <385929635.1929303.1642011977053@mail.yahoo.com> References: <5dd6d28f-9955-7ca5-0ab8-0c5ce11ceb7e@redhat.com> <14084e87df22458caa7668eea8b496b6@verisign.com> <1147779219.1274196.1639086048233@mail.yahoo.com> <986294621.1553814.1639155002132@mail.yahoo.com> <169252651.2819859.1639516226823@mail.yahoo.com> <1335760337.3548170.1639680236968@mail.yahoo.com> <33441648.1434581.1641304881681@mail.yahoo.com> <385929635.1929303.1642011977053@mail.yahoo.com> Message-ID: In the web interface(RabbitMQ 3.8.23, not using Kolla), when looking at the queue you will see the "Policy" listed by name, and "Effective policy definition". You can either view the policy definition, and the arguments for the definitions applied, or "effective policy definition" should show you the list. It may be relevant to note: "Each exchange or queue will have at most one policy matching" - https://www.rabbitmq.com/parameters.html#how-policies-work I've added a similar comment to the linked patchset. On 13/01/22 7:26 am, Albert Braden wrote: > This is very helpful. Thank you! It appears that I have successfully > set the expire time to 1200, because I no longer see unconsumed > messages lingering in my queues, but it's not obvious how to verify. > In the web interface, when I look at the queues, I see things like > policy, state, features and consumers, but I don't see a timeout or > expire value, nor do I find the number 1200 anywhere. Where should I > be looking in the web interface to verify that I set the expire time > correctly? Or do I need to use the CLI? > On Wednesday, January 5, 2022, 04:23:29 AM EST, Mark Goddard > wrote: > > > On Tue, 4 Jan 2022 at 14:08, Albert Braden > wrote: > > > > Now that the holidays are over I'm trying this one again. Can anyone > help me figure out how to set "expires" and "message-ttl" ? > > John Garbutt proposed a few patches for RabbitMQ in kolla, including > this: https://review.opendev.org/c/openstack/kolla-ansible/+/822191 > > > https://review.opendev.org/q/hashtag:%2522rabbitmq%2522+ > (status:open+OR+status:merged)+project:openstack/kolla-ansible > > Note that they are currently untested. > > Mark > > > > On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden > > wrote: > > > > > > I tried these policies in > ansible/roles/rabbitmq/templates/definitions.json.j2: > > > > "policies":[ > > {"vhost": "/", "name": "ha-all", "pattern": > '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", > "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == > 'outward_rabbitmq' %}, > > {"vhost": "/", "name": "notifications-ttl", "pattern": > "^(notifications|versioned_notifications)\\.", "apply-to": "queues", > "definition": {"message-ttl":600}, "priority":0} > > {"vhost": "/", "name": "notifications-expire", "pattern": > "^(notifications|versioned_notifications)\\.", "apply-to": "queues", > "definition": {"expire":3600}, "priority":0} > > {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", > "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, > "priority":0} > > {% endif %} > > > > But I still see unconsumed messages lingering in > notifications_extractor.info. From reading the docs I think this > setting should cause messages to expire after 600 seconds, and unused > queues to be deleted after 3600 seconds. What am I missing? > > On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden > > wrote: > > > > > > Following [1] I successfully set "amqp_durable_queues = True" and > restricted HA to the appropriate queues, but I'm having trouble with > some of the other settings such as "expires" and "message-ttl". Does > anyone have an example of a working kolla config that includes these > changes? > > > > [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > > On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud > > wrote: > > > > > > So, your config snippet LGTM. > > > > Le ven. 10 d?c. 2021 ? 17:50, Albert Braden > a ?crit : > > > > Sorry, that was a transcription error. I thought "True" and my > fingers typed "False." The correct lines are: > > > > [oslo_messaging_rabbit] > > amqp_durable_queues = True > > > > On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud > > wrote: > > > > > > If you plan to let `amqp_durable_queues = False` (i.e if you plan to > keep this config equal to false), then you don't need to add these > config lines as this is already the default value [1]. > > > > [1] > https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34 > > > > > Le jeu. 9 d?c. 2021 ? 22:40, Albert Braden > a ?crit : > > > > Replying from my home email because I've been asked to not email the > list from my work email anymore, until I get permission from upper > management. > > > > I'm not sure I follow. I was planning to add 2 lines to > etc/kolla/config/global.conf: > > > > [oslo_messaging_rabbit] > > amqp_durable_queues = False > > > > Is that not sufficient? What is involved in configuring dedicated > control exchanges for each service? What would that look like in the > config? > > > > > > From: Herve Beraud > > > Sent: Thursday, December 9, 2021 2:45 AM > > To: Bogdan Dobrelya > > > Cc: openstack-discuss at lists.openstack.org > > > Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability > > > > > > > > Caution: This email originated from outside the organization. Do not > click links or open attachments unless you recognize the sender and > know the content is safe. > > > > > > > > > > > > Le mer. 8 d?c. 2021 ? 11:48, Bogdan Dobrelya > a ?crit : > > > > Please see inline > > > > >> I read this with great interest because we are seeing this issue. > Questions: > > >> > > >> 1. We are running kola-ansible Train, and our RMQ version is > 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? > > >> 2. Document [2] recommends policy > '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our > ansible playbooks, nor in any of the config files in the RMQ > container. What would this look like in Ansible, and what should the > resulting container config look like? > > >> 3. It appears that we are not setting "amqp_durable_queues = > True". What does this setting look like in Ansible, and what file does > it go into? > > > > > > Note that even having rabbit HA policies adjusted like that and its HA > > > replication factor [0] decreased (e.g. to a 2), there still might be > > > high churn caused by a large enough number of replicated durable RPC > > > topic queues. And that might cripple the cloud down with the incurred > > > I/O overhead because a durable queue requires all messages in it to be > > > persisted to a disk (for all the messaging cluster replicas) > before they > > > are ack'ed by the broker. > > > > > > Given that said, Oslo messaging would likely require a more granular > > > control for topic exchanges and the durable queues flag - to tell > it to > > > declare as durable only the most critical paths of a service. A single > > > config setting and a single control exchange per a service might > be not > > > enough. > > > > Also note that therefore, amqp_durable_queue=True requires dedicated > > control exchanges configured for each service. Those that use > > 'openstack' as a default cannot turn the feature ON. Changing it to a > > service specific might also cause upgrade impact, as described in the > > topic [3]. > > > > > > > > The same is true for `amqp_auto_delete=True`. That requires > dedicated control exchanges else it won't work if each service defines > its own policy on a shared control exchange (e.g `openstack`) and if > policies differ from each other. > > > > > > > > [3] https://review.opendev.org/q/topic:scope-config-opts > > > > > > > > > There are also race conditions with durable queues enabled, like > [1]. A > > > solution could be where each service declare its own dedicated control > > > exchange with its own configuration. > > > > > > Finally, openstack components should add perhaps a *.next CI job > to test > > > it with durable queues, like [2] > > > > > > [0] https://www.rabbitmq.com/ha.html#replication-factor > > > > > > > [1] > > > > https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt > > > > > > > [2] https://review.opendev.org/c/openstack/nova/+/820523 > > > > > > >> > > >> Does anyone have a sample set of RMQ config files that they can > share? > > >> > > >> It looks like my Outlook has ruined the link; reposting: > > >> [2] > https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > > > > > > > > > -- > > > Best regards, > > > Bogdan Dobrelya, > > > Irc #bogdando > > > > > > -- > > Best regards, > > Bogdan Dobrelya, > > Irc #bogdando > > > > > > > > > > -- > > > > Herv? Beraud > > > > Senior Software Engineer at Red Hat > > > > irc: hberaud > > > > https://github.com/4383/ > > > > https://twitter.com/4383hberaud > > > > > > > > -- > > Herv? Beraud > > Senior Software Engineer at Red Hat > > irc: hberaud > > https://github.com/4383/ > > https://twitter.com/4383hberaud > > > > > > > > -- > > Herv? Beraud > > Senior Software Engineer at Red Hat > > irc: hberaud > > https://github.com/4383/ > > https://twitter.com/4383hberaud > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jan 13 02:43:50 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 12 Jan 2022 20:43:50 -0600 Subject: [nova] [placement] Proposing Sean Mooney as nova-core In-Reply-To: References: Message-ID: <17e5151eb65.127837d63471274.4365199732662762747@ghanshyammann.com> ---- On Wed, 12 Jan 2022 10:19:04 -0600 Sylvain Bauza wrote ---- > Hi all,I would like to propose Sean as an addition to the nova-core team (which includes placement merge rights as nova-core is implicitly a subgroup). > As we know, he's around for a long time, is already a nova-specs-core and has proven solid experience in reviews. > > Cores, please vote (-1, 0, +1) before next Wednesday Jan 19th 1600UTC. +1, Sean is doing a great contribution to nova for a long time. -gmann > Cheers,-Sylvain > > From laurentfdumont at gmail.com Thu Jan 13 02:49:48 2022 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Wed, 12 Jan 2022 21:49:48 -0500 Subject: [Triple0] In-Reply-To: References: Message-ID: Which user is the UUID 765d7c43b3d54b289c9fa9e1dae15112 referring to? Can you reset it's password? I do believe that Keystone will enforce a password reset until you change it. On Mon, Jan 10, 2022 at 4:25 AM Lokendra Rathour wrote: > Hi Team, > I was trying to deploy Triple0 Train using containers. > After around 30+ min of overcloud running it gett below error: > > The error was: "keystoneauth1.exceptions.http.Unauthorized: The password > is expired and needs to be changed for user: > 765d7c43b3d54b289c9fa9e1dae15112. (HTTP 401) (Request-ID: > req-f1344091-4f81-44c2-8ec6-e2daffbf998c) > 2022-01-09 23:10:39.354147 | 525400bb-6dc5-c157-232d-000000003de1 | > FATAL | As" > > I tried checking the Horizon, where is see that it is asking me to change > the password immediately with the same message and a similar observation is > seen with overcloud file. > > Please advice why this deployment is getting failed > > -- > ~ Lokendra > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jan 13 02:50:06 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 12 Jan 2022 20:50:06 -0600 Subject: [all][tc] Technical Committee next weekly meeting on Jan 13th at 1500 UTC Message-ID: <17e5157a872.b4cd9615471337.6489000773368660806@ghanshyammann.com> Hello Everyone, Below is the agenda for Tomorrow's TC IRC meeting schedule at 1500 UTC. Amy will chair tomorrow's meeting. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting == Agenda for tomorrow's TC meeting == * Roll call * Follow up on past action items * TC vote on gender-neutral language bylaws change. ** https://lists.openinfra.dev/pipermail/foundation/2021-October/003016.html ** Need TC official approval for section Appendix 4, "OpenStack Technical Committee Member Policy". In the first paragraph it changes "his or her" to "their". * Gate health check ** Fixing Zuul config error in OpenStack *** https://etherpad.opendev.org/p/zuul-config-error-openstack * Progress checks on Yoga Tracker ** https://etherpad.opendev.org/p/tc-yoga-tracker ** Z Release Cycle Name ** It is needed for Release Management team to this week's task "Plan the next release cycle schedule" (elodilles) * SIG i18n status check ** Xena translation missing *** http://lists.openstack.org/pipermail/openstack-discuss/2021-December/026244.html ** Translation bug *** https://review.opendev.org/c/openstack/contributor-guide/+/821371 * Adjutant need PTLs and maintainers ** http://lists.openstack.org/pipermail/openstack-discuss/2021-October/025555.html * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open -gmann From lokendrarathour at gmail.com Thu Jan 13 03:40:36 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Thu, 13 Jan 2022 09:10:36 +0530 Subject: [Triple0] In-Reply-To: References: Message-ID: Hey Laurent, Thanks for the update, But I think the issue was because of NTP Not in sync. Somehow my Chrony sync stopped and it cause the Setup completion failure. Solution: Chrony needs to be in sync all time. -Lokendra On Thu, Jan 13, 2022 at 8:20 AM Laurent Dumont wrote: > Which user is the UUID 765d7c43b3d54b289c9fa9e1dae15112 referring to? Can > you reset it's password? I do believe that Keystone will enforce a password > reset until you change it. > > On Mon, Jan 10, 2022 at 4:25 AM Lokendra Rathour < > lokendrarathour at gmail.com> wrote: > >> Hi Team, >> I was trying to deploy Triple0 Train using containers. >> After around 30+ min of overcloud running it gett below error: >> >> The error was: "keystoneauth1.exceptions.http.Unauthorized: The password >> is expired and needs to be changed for user: >> 765d7c43b3d54b289c9fa9e1dae15112. (HTTP 401) (Request-ID: >> req-f1344091-4f81-44c2-8ec6-e2daffbf998c) >> 2022-01-09 23:10:39.354147 | 525400bb-6dc5-c157-232d-000000003de1 | >> FATAL | As" >> >> I tried checking the Horizon, where is see that it is asking me to change >> the password immediately with the same message and a similar observation is >> seen with overcloud file. >> >> Please advice why this deployment is getting failed >> >> -- >> ~ Lokendra >> >> >> -- ~ Lokendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Thu Jan 13 04:46:39 2022 From: melwittt at gmail.com (melanie witt) Date: Wed, 12 Jan 2022 20:46:39 -0800 Subject: [nova] [placement] Proposing Sean Mooney as nova-core In-Reply-To: References: Message-ID: <690eb6c8-725e-d55c-706b-6ea566143cab@gmail.com> On Wed Jan 12 2022 08:19:04 GMT-0800 (Pacific Standard Time), Sylvain Bauza wrote: > Hi all, > I would like to propose Sean as an addition to the nova-core team (which > includes placement merge rights as nova-core is implicitly a subgroup). > > As we know, he's around for a long time, is already a nova-specs-core > and has proven solid experience in reviews. > > Cores, please vote (-1, 0, +1) before next Wednesday Jan 19th 1600UTC. +1 From gregory.orange at pawsey.org.au Thu Jan 13 05:15:22 2022 From: gregory.orange at pawsey.org.au (Gregory Orange) Date: Thu, 13 Jan 2022 13:15:22 +0800 Subject: [kolla] [horizon] Custom logos (WAS: [Openstack-operators] Horizon Custom Logos (Queens, 13.0.1)) In-Reply-To: References: <5B7AEF19.5010502@soe.ucsc.edu> Message-ID: <4a74f5c6-5da3-d756-1bd2-5b7fa67ce11a@pawsey.org.au> We have been using Ubuntu VMs for the control plane until now, so it was a simple matter of inserting our logo-splash.svg and logo.svg into /var/lib/openstack-dashboard/static/dashboard/img/ and then restarting services. Now we're switching to Kolla, and the relevant path isn't mounted as is the case with the likes of /etc/kolla/horizon and /var/log/kolla. We don't (yet?) build our own container images, so I'm wondering what next. Did anyone get any further with this? On 21/8/18 9:28 pm, Nick Jones wrote: > Hi Erich. > > Yeah, I battled against this myself quite recently.? Here's what I did > to add a logo to the Horizon splash page and to the header of each page > itself. > > Create a file called _splash.html, containing: > >
> ? >
> > And a file called _brand.html, containing: > > {% load branding %} > {% load themes %} > > > ? alt="{% site_branding %}"> > > > I then created a folder > called?/usr/share/openstack-dashboard/openstack_dashboard/themes/default/templates/auth/ > and copied _splash.html into there, copied _brand.html > into?/usr/share/openstack-dashboard/openstack_dashboard/templates/header/, > and finally my 'logo.png' was copied > into?/usr/lib/python2.7/site-packages/openstack_dashboard/static/dashboard/img/ > > Note that this approach might differ slightly from your setup, as in my > case it's a Kolla-based deployment so these changes are applied to the > image I'm using to deploy a Horizon container.? But it's the same > release (Queens) and a CentOS base image, so in principle the steps > should work for you. > > Hope that helps. > > -- > > -Nick > > On 20 August 2018 at 17:40, Erich Weiler > wrote: > > Hi Y'all, > > I've been banging my head against a wall for days on this item and > can't find anything via google on how to get around it - I am trying > to install a custom logo onto my Horizon Dashboard front page (the > splash page).? I have my logo ready to go, logo-splash.png.? I have > tried following the instructions here on how to install a custom logo: > > https://docs.openstack.org/horizon/queens/admin/customize-configure.html > > > But it simply doesn't work.? It seems this stanza... > > #splash .login { > background: #355796 url(../img/my_cloud_logo_medium.png) no-repeat > center 35px; > } > > ...doesn't actually replace the logo (which is logo-splash.svg), it > only seems to put my file, logo-splash.png as the *background* to > the .svg logo.? And since the option there is "no-repeat center", it > appears *behind* the svg logo and I can't see it.? I played around > with those options, removing "no-repeat" for example, and it > dutifully shows my logo repeating in the background.? But I need the > default logo-splash.svg file to actually be gone and my logo to > exist in it's place.? Maybe I'm missing something simple? > > I'm restarting apache and memchached after every change I make when > I was testing. > > And because the images directory is rebuilt every time I restart > apache, I can't even copy in a custom logo-splash.svg file.? ?Which > wouldn't help anyway, as I want my .png file in there instead.? I > don't have the means to create a .svg file at this time.? ;) > > Help! > > As a side note, I'm using the Queens distribution via RedHat. > > Many thanks in advance, > erich > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -- Gregory Orange Cloud System Administrator Scientific Platforms Team building representative Pawsey Supercomputing Centre, CSIRO From ykarel at redhat.com Thu Jan 13 06:35:19 2022 From: ykarel at redhat.com (Yatin Karel) Date: Thu, 13 Jan 2022 12:05:19 +0530 Subject: [Triple0] In-Reply-To: References: Message-ID: Hi Lokendra, Just to update this is a known issue https://bugs.launchpad.net/tripleo/+bug/1955414. And is being fixed with https://review.opendev.org/q/I4ac9210f6533e6826b2814f72b7271b43fdca267. This will ensure deployment to fail early if there is an issue with NTP rather than failing randomly later. Thanks and Regards Yatin Karel On Thu, Jan 13, 2022 at 9:17 AM Lokendra Rathour wrote: > Hey Laurent, > Thanks for the update, But I think the issue was because of NTP Not in > sync. > Somehow my Chrony sync stopped and it cause the Setup completion failure. > > Solution: Chrony needs to be in sync all time. > > > -Lokendra > > > On Thu, Jan 13, 2022 at 8:20 AM Laurent Dumont > wrote: > >> Which user is the UUID 765d7c43b3d54b289c9fa9e1dae15112 referring to? >> Can you reset it's password? I do believe that Keystone will enforce a >> password reset until you change it. >> >> On Mon, Jan 10, 2022 at 4:25 AM Lokendra Rathour < >> lokendrarathour at gmail.com> wrote: >> >>> Hi Team, >>> I was trying to deploy Triple0 Train using containers. >>> After around 30+ min of overcloud running it gett below error: >>> >>> The error was: "keystoneauth1.exceptions.http.Unauthorized: The password >>> is expired and needs to be changed for user: >>> 765d7c43b3d54b289c9fa9e1dae15112. (HTTP 401) (Request-ID: >>> req-f1344091-4f81-44c2-8ec6-e2daffbf998c) >>> 2022-01-09 23:10:39.354147 | 525400bb-6dc5-c157-232d-000000003de1 | >>> FATAL | As" >>> >>> I tried checking the Horizon, where is see that it is asking me to >>> change the password immediately with the same message and a similar >>> observation is seen with overcloud file. >>> >>> Please advice why this deployment is getting failed >>> >>> -- >>> ~ Lokendra >>> >>> >>> > > -- > ~ Lokendra > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Thu Jan 13 08:13:12 2022 From: balazs.gibizer at est.tech (Balazs Gibizer) Date: Thu, 13 Jan 2022 09:13:12 +0100 Subject: [nova] [placement] Proposing Sean Mooney as nova-core In-Reply-To: References: Message-ID: <0U2N5R.OJTYZNEZTVI13@est.tech> On Wed, Jan 12 2022 at 05:19:04 PM +0100, Sylvain Bauza wrote: > Hi all, > I would like to propose Sean as an addition to the nova-core team > (which includes placement merge rights as nova-core is implicitly a > subgroup). > > As we know, he's around for a long time, is already a nova-specs-core > and has proven solid experience in reviews. > > Cores, please vote (-1, 0, +1) before next Wednesday Jan 19th 1600UTC. +1, Sean is one of the key contributor of the nova team. Cheers, gibi > > Cheers, > -Sylvain > > From skaplons at redhat.com Thu Jan 13 08:18:53 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 13 Jan 2022 09:18:53 +0100 Subject: [neutron] Bug Deputy Report January 03 - 10 In-Reply-To: <02b79e74-7d84-a6ba-537f-7db1a7ee6532@inovex.de> References: <02b79e74-7d84-a6ba-537f-7db1a7ee6532@inovex.de> Message-ID: <10569539.nUPlyArG6x@p1> Hi, On ?roda, 12 stycznia 2022 16:03:47 CET Christian Rohmann wrote: > Hello Lucas, > > I hope you don't mind my rather blunt question, but how do tickets end > up on this list? > I read > https://docs.openstack.org/neutron/latest/contributor/policies/ bugs.html#neut > ron-bug-deputy but am just wondering if there is anything else I should have > done when reporting the issue? Bug deputy role in Neutron is rotated weekly. Usually person who is doing bug deputy is trying to do initial triage of the bug and then sends summary of such bugs to the ML. We also discuss some of the bugs during our weekly meeting [1]. There is nothing more on Your side which has to be done here. > > I don't just want to shout louder and certainly everybody wants their > issue to be looked at and fixed first. > But I just noticed that my bug report received no reply or confirmation > in the last three months (apart from the tag "vpnaas" being added), > while other issues were triaged quickly and were consequently added to > this deputy report ... The problem with neutron-vpnaas is that there is almost no one who is maintining that project currently and probably because of that nobody works on the bug which You reported. If You are using that project and are interested in maintaining it, patches are always welcome :) > > On 10/01/2022 22:17, Lucas Alvares Gomes wrote: > > [...] > > > > * https://bugs.launchpad.net/neutron/+bug/1956846 - "ha router > > duplicated routes" > > > > - Unassigned > > > > [...] > > Similar to those duplicate routes, I reported an issue about duplicated > IPtables causing VPNaaS not not work in > https://bugs.launchpad.net/neutron/+bug/1943449. > > > > With kind regards, > > Christian [1] https://meetings.opendev.org/#Neutron_Team_Meeting -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From katonalala at gmail.com Thu Jan 13 08:34:46 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 13 Jan 2022 09:34:46 +0100 Subject: [neutron] Bug Deputy Report January 03 - 10 In-Reply-To: <02b79e74-7d84-a6ba-537f-7db1a7ee6532@inovex.de> References: <02b79e74-7d84-a6ba-537f-7db1a7ee6532@inovex.de> Message-ID: Hi Christian, Thanks for highlighting your issue with Linuxbridge driver and VPNaaS. I would like to ask you to consider the following, which will not give you an answer or solution to your problem but perhaps helps to understand and accept if any specific issue has not received the needed attention from the community. Basically it is the core team (and some enthusiasts around it) who do bug triaging in Neutron, they are experienced and willing to provide answers, solutions. The possible combinations of drivers, extensions, backends, deployment tools is huge, and there are combinations which are not well covered in the current community, like linuxbridge is not that used (OVS and perhaps OVN are the top tested and deployed, at least from latest user surveys). So it can happen that in the current core team nobody has the special knowledge and tools (test environment, hardware...) to debug a specific issue. Neutron maintains a list of "lieutenants", to make easier to contact the right person: https://docs.openstack.org/neutron/latest/contributor/policies/neutron-teams.html#neutron-lieutenants For VPNaaS the people are not active anymore in the community. In such difficult to debug situation it is really helpful for the community if you can test the issue on current master code (it is possible that the issue happens only on older branches) and with tools that are available for most of us, like simple devstack. Regards Lajos Katona (lajoskatona) Christian Rohmann ezt ?rta (id?pont: 2022. jan. 12., Sze, 16:10): > Hello Lucas, > > I hope you don't mind my rather blunt question, but how do tickets end > up on this list? > I read > > https://docs.openstack.org/neutron/latest/contributor/policies/bugs.html#neutron-bug-deputy > but am just wondering if there is anything else I should have done when > reporting the issue? > > I don't just want to shout louder and certainly everybody wants their > issue to be looked at and fixed first. > But I just noticed that my bug report received no reply or confirmation > in the last three months (apart from the tag "vpnaas" being added), > while other issues were triaged quickly and were consequently added to > this deputy report ... > > > On 10/01/2022 22:17, Lucas Alvares Gomes wrote: > > [...] > > > > * https://bugs.launchpad.net/neutron/+bug/1956846 - "ha router > > duplicated routes" > > - Unassigned > > > > [...] > > Similar to those duplicate routes, I reported an issue about duplicated > IPtables causing VPNaaS not not work in > https://bugs.launchpad.net/neutron/+bug/1943449. > > > > With kind regards, > > Christian > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tonykarera at gmail.com Thu Jan 13 08:55:45 2022 From: tonykarera at gmail.com (Karera Tony) Date: Thu, 13 Jan 2022 10:55:45 +0200 Subject: Monitoring in Openstack Dashboard Message-ID: Dear Team, I hope all is well. I have deployed Openstack with kolla-ansible and enabled Monasca among the projects. The deployment was successful but unfortunately I can't see Monitoring in the Dashboard. Kindly advise If I could be missing something. Regards Regards Tony Karera -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Thu Jan 13 09:06:51 2022 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 13 Jan 2022 09:06:51 +0000 Subject: [kolla] [horizon] Custom logos (WAS: [Openstack-operators] Horizon Custom Logos (Queens, 13.0.1)) In-Reply-To: <4a74f5c6-5da3-d756-1bd2-5b7fa67ce11a@pawsey.org.au> References: <5B7AEF19.5010502@soe.ucsc.edu> <4a74f5c6-5da3-d756-1bd2-5b7fa67ce11a@pawsey.org.au> Message-ID: On Thu, 13 Jan 2022 at 05:19, Gregory Orange wrote: > > We have been using Ubuntu VMs for the control plane until now, so it was > a simple matter of inserting our logo-splash.svg and logo.svg into > /var/lib/openstack-dashboard/static/dashboard/img/ and then restarting > services. > > Now we're switching to Kolla, and the relevant path isn't mounted as is > the case with the likes of /etc/kolla/horizon and /var/log/kolla. We > don't (yet?) build our own container images, so I'm wondering what next. > > Did anyone get any further with this? Hi Greg, Typically what we do is create a theme repository, e.g. https://github.com/stackhpc/horizon-theme. This is then built into the image in /etc/openstack-dashboard/themes/. There is another approach proposed which does not involve rebuilding the image, but it is still WIP: https://review.opendev.org/c/openstack/kolla-ansible/+/761364 Mark > > > On 21/8/18 9:28 pm, Nick Jones wrote: > > Hi Erich. > > > > Yeah, I battled against this myself quite recently. Here's what I did > > to add a logo to the Horizon splash page and to the header of each page > > itself. > > > > Create a file called _splash.html, containing: > > > >
> > > >
> > > > And a file called _brand.html, containing: > > > > {% load branding %} > > {% load themes %} > > > > > > > alt="{% site_branding %}"> > > > > > > I then created a folder > > called /usr/share/openstack-dashboard/openstack_dashboard/themes/default/templates/auth/ > > and copied _splash.html into there, copied _brand.html > > into /usr/share/openstack-dashboard/openstack_dashboard/templates/header/, > > and finally my 'logo.png' was copied > > into /usr/lib/python2.7/site-packages/openstack_dashboard/static/dashboard/img/ > > > > Note that this approach might differ slightly from your setup, as in my > > case it's a Kolla-based deployment so these changes are applied to the > > image I'm using to deploy a Horizon container. But it's the same > > release (Queens) and a CentOS base image, so in principle the steps > > should work for you. > > > > Hope that helps. > > > > -- > > > > -Nick > > > > On 20 August 2018 at 17:40, Erich Weiler > > wrote: > > > > Hi Y'all, > > > > I've been banging my head against a wall for days on this item and > > can't find anything via google on how to get around it - I am trying > > to install a custom logo onto my Horizon Dashboard front page (the > > splash page). I have my logo ready to go, logo-splash.png. I have > > tried following the instructions here on how to install a custom logo: > > > > https://docs.openstack.org/horizon/queens/admin/customize-configure.html > > > > > > But it simply doesn't work. It seems this stanza... > > > > #splash .login { > > background: #355796 url(../img/my_cloud_logo_medium.png) no-repeat > > center 35px; > > } > > > > ...doesn't actually replace the logo (which is logo-splash.svg), it > > only seems to put my file, logo-splash.png as the *background* to > > the .svg logo. And since the option there is "no-repeat center", it > > appears *behind* the svg logo and I can't see it. I played around > > with those options, removing "no-repeat" for example, and it > > dutifully shows my logo repeating in the background. But I need the > > default logo-splash.svg file to actually be gone and my logo to > > exist in it's place. Maybe I'm missing something simple? > > > > I'm restarting apache and memchached after every change I make when > > I was testing. > > > > And because the images directory is rebuilt every time I restart > > apache, I can't even copy in a custom logo-splash.svg file. Which > > wouldn't help anyway, as I want my .png file in there instead. I > > don't have the means to create a .svg file at this time. ;) > > > > Help! > > > > As a side note, I'm using the Queens distribution via RedHat. > > > > Many thanks in advance, > > erich > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > > > > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > -- > Gregory Orange > > Cloud System Administrator > Scientific Platforms Team building representative > Pawsey Supercomputing Centre, CSIRO > From damian.pietras at hardit.pl Thu Jan 13 09:19:39 2022 From: damian.pietras at hardit.pl (Damian Pietras) Date: Thu, 13 Jan 2022 10:19:39 +0100 Subject: [nova] iothread support with Libvirt In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04AFB687@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04AFB64E@gmsxchsvr01.thecreation.com> <18443a2481900756f1a4e76446cf41ef19601212.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA04AFB657@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65A@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65B@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB65E@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB685@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04AFB687@gmsxchsvr01.thecreation.com> Message-ID: <0d266251-d7a7-7954-7899-0db98d800590@hardit.pl> On 12.01.2022 22:10, Eric K. Miller wrote: > It appears that "deadline" is the only scheduler available in later kernel versions (at least in Ubuntu). > > Damian - do you recall what the scheduler was set to prior to changing it to the noop scheduler? The default in my case (Debian 9) is: root at diskbench:~# cat /sys/block/sda/queue/scheduler noop deadline [cfq] And the disk is detected by the kernel as "rotational": root at diskbench:~# cat /sys/block/sda/queue/rotational 1 -- Damian Pietras HardIT From ignaziocassano at gmail.com Thu Jan 13 09:21:12 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 13 Jan 2022 10:21:12 +0100 Subject: [openstack][stein][cinder] capacity filter is not working Message-ID: Hello, I am using openstack stein on centos 7 with netapp ontap driver. Seems capacity filter is not working and volumes are always creed on the first share where less space is available. My configuration is posted here: enabled_backends = nfsgold1, nfsgold2 [nfsgold1] nas_secure_file_operations = false nas_secure_file_permissions = false volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver netapp_storage_family = ontap_cluster netapp_storage_protocol = nfs netapp_vserver = svm-tstcinder2-cl1 netapp_server_hostname = faspod2.csi.it netapp_server_port = 80 netapp_login = apimanager netapp_password = password nfs_shares_config = /etc/cinder/nfsgold1_shares volume_backend_name = nfsgold #nfs_mount_options = lookupcache=pos nfs_mount_options = lookupcache=pos [nfsgold2] nas_secure_file_operations = false nas_secure_file_permissions = false volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver netapp_storage_family = ontap_cluster netapp_storage_protocol = nfs netapp_vserver = svm-tstcinder2-cl2 netapp_server_hostname = faspod2.csi.it netapp_server_port = 80 netapp_login = apimanager netapp_password = password nfs_shares_config = /etc/cinder/nfsgold2_shares volume_backend_name = nfsgold #nfs_mount_options = lookupcache=pos nfs_mount_options = lookupcache=pos Volumes are created always on nfsgold1 also if has less space available of nfsgold2 share Thanks Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Thu Jan 13 11:03:37 2022 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 13 Jan 2022 12:03:37 +0100 Subject: [openstack][stein][cinder] capacity filter is not working In-Reply-To: References: Message-ID: <20220113110337.f3rvbgezbtvpolzw@localhost> On 13/01, Ignazio Cassano wrote: > Hello, > I am using openstack stein on centos 7 with netapp ontap driver. > Seems capacity filter is not working and volumes are always creed on the > first share where less space is available. > My configuration is posted here: > enabled_backends = nfsgold1, nfsgold2 > > [nfsgold1] > nas_secure_file_operations = false > nas_secure_file_permissions = false > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > netapp_storage_family = ontap_cluster > netapp_storage_protocol = nfs > netapp_vserver = svm-tstcinder2-cl1 > netapp_server_hostname = faspod2.csi.it > netapp_server_port = 80 > netapp_login = apimanager > netapp_password = password > nfs_shares_config = /etc/cinder/nfsgold1_shares > volume_backend_name = nfsgold > #nfs_mount_options = lookupcache=pos > nfs_mount_options = lookupcache=pos > > > [nfsgold2] > nas_secure_file_operations = false > nas_secure_file_permissions = false > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > netapp_storage_family = ontap_cluster > netapp_storage_protocol = nfs > netapp_vserver = svm-tstcinder2-cl2 > netapp_server_hostname = faspod2.csi.it > netapp_server_port = 80 > netapp_login = apimanager > netapp_password = password > nfs_shares_config = /etc/cinder/nfsgold2_shares > volume_backend_name = nfsgold > #nfs_mount_options = lookupcache=pos > nfs_mount_options = lookupcache=pos > > > > Volumes are created always on nfsgold1 also if has less space available of > nfsgold2 share > Thanks > Ignazio Hi, What volume type are you using to create the volumes? If you don't define it it would use the default from the cinder.conf file. What are the extra specs of the volume type? What pool info are the NetApp backends reporting? It's usually a good idea to enabled debugging on the schedulers and look at the details of how they are making the filtering and weighting decisions. Cheers, Gorka. From radoslaw.piliszek at gmail.com Thu Jan 13 11:25:03 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 13 Jan 2022 12:25:03 +0100 Subject: [masakari] Transferring PTL role to suzhengwei Message-ID: Dears, Due to a shift in my priorities and being unable to dedicate enough time, I am transferring the Masakari PTL role to suzhengwei. The governance patch is already proposed. [1] suzhengwei has been lately the main contributor to the Masakari project, working on the major features that were introduced in the last few cycles. I have already obtained his approval by mail. Do note his contributions so far were using a different mail address (sugar-2008 at 163.com) but I have been asked to use the Inspur one (CC'ed) as it is currently preferred. [1] https://review.opendev.org/c/openstack/governance/+/824509 -yoctozepto From ignaziocassano at gmail.com Thu Jan 13 11:32:02 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 13 Jan 2022 12:32:02 +0100 Subject: [openstack][stein][cinder] capacity filter is not working In-Reply-To: <20220113110337.f3rvbgezbtvpolzw@localhost> References: <20220113110337.f3rvbgezbtvpolzw@localhost> Message-ID: Hello, I am using nfsgold volume type. [root at tst-controller-01 ansible]# cinder type-show nfsgold +---------------------------------+--------------------------------------+ | Property | Value | +---------------------------------+--------------------------------------+ | description | None | | extra_specs | volume_backend_name : nfsgold | | id | fd8b1cc8-4c3a-490d-bc95-29e491f850cc | | is_public | True | | name | nfsgold | | os-volume-type-access:is_public | True | | qos_specs_id | None | +---------------------------------+--------------------------------------+ cinder get-pools +----------+--------------------------------------------------------------------+ | Property | Value | +----------+--------------------------------------------------------------------+ | name | cinder-cluster-1 at nfsgold2#10.102.189.156:/svm_tstcinder_cl2_volssd | +----------+--------------------------------------------------------------------+ +----------+--------------------------------------------------------------------+ | Property | Value | +----------+--------------------------------------------------------------------+ | name | cinder-cluster-1 at nfsgold1#10.102.189.155:/svm_tstcinder_cl1_volssd | +----------+--------------------------------------------------------------------+ I noted that nfsgold2 is used also when nfsgold1 is almost full. I expected the volume was created on share with more space availability. Ignazio Il giorno gio 13 gen 2022 alle ore 12:03 Gorka Eguileor ha scritto: > On 13/01, Ignazio Cassano wrote: > > Hello, > > I am using openstack stein on centos 7 with netapp ontap driver. > > Seems capacity filter is not working and volumes are always creed on the > > first share where less space is available. > > My configuration is posted here: > > enabled_backends = nfsgold1, nfsgold2 > > > > [nfsgold1] > > nas_secure_file_operations = false > > nas_secure_file_permissions = false > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > netapp_storage_family = ontap_cluster > > netapp_storage_protocol = nfs > > netapp_vserver = svm-tstcinder2-cl1 > > netapp_server_hostname = faspod2.csi.it > > netapp_server_port = 80 > > netapp_login = apimanager > > netapp_password = password > > nfs_shares_config = /etc/cinder/nfsgold1_shares > > volume_backend_name = nfsgold > > #nfs_mount_options = lookupcache=pos > > nfs_mount_options = lookupcache=pos > > > > > > [nfsgold2] > > nas_secure_file_operations = false > > nas_secure_file_permissions = false > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > netapp_storage_family = ontap_cluster > > netapp_storage_protocol = nfs > > netapp_vserver = svm-tstcinder2-cl2 > > netapp_server_hostname = faspod2.csi.it > > netapp_server_port = 80 > > netapp_login = apimanager > > netapp_password = password > > nfs_shares_config = /etc/cinder/nfsgold2_shares > > volume_backend_name = nfsgold > > #nfs_mount_options = lookupcache=pos > > nfs_mount_options = lookupcache=pos > > > > > > > > Volumes are created always on nfsgold1 also if has less space available > of > > nfsgold2 share > > Thanks > > Ignazio > > Hi, > > What volume type are you using to create the volumes? If you don't > define it it would use the default from the cinder.conf file. > > What are the extra specs of the volume type? > > What pool info are the NetApp backends reporting? > > It's usually a good idea to enabled debugging on the schedulers and look > at the details of how they are making the filtering and weighting > decisions. > > Cheers, > Gorka. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Thu Jan 13 12:08:46 2022 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 13 Jan 2022 13:08:46 +0100 Subject: [openstack][stein][cinder] capacity filter is not working In-Reply-To: References: <20220113110337.f3rvbgezbtvpolzw@localhost> Message-ID: <20220113120846.jntkdbl3ex34q43t@localhost> On 13/01, Ignazio Cassano wrote: > Hello, I am using nfsgold volume type. > [root at tst-controller-01 ansible]# cinder type-show nfsgold > +---------------------------------+--------------------------------------+ > | Property | Value | > +---------------------------------+--------------------------------------+ > | description | None | > | extra_specs | volume_backend_name : nfsgold | > | id | fd8b1cc8-4c3a-490d-bc95-29e491f850cc | > | is_public | True | > | name | nfsgold | > | os-volume-type-access:is_public | True | > | qos_specs_id | None | > +---------------------------------+--------------------------------------+ > > cinder get-pools > +----------+--------------------------------------------------------------------+ > | Property | Value > | > +----------+--------------------------------------------------------------------+ > | name | cinder-cluster-1 at nfsgold2#10.102.189.156:/svm_tstcinder_cl2_volssd > | > +----------+--------------------------------------------------------------------+ > +----------+--------------------------------------------------------------------+ > | Property | Value > | > +----------+--------------------------------------------------------------------+ > | name | cinder-cluster-1 at nfsgold1#10.102.189.155:/svm_tstcinder_cl1_volssd > | > +----------+--------------------------------------------------------------------+ > Hi, We would need to see the details of the pools to see additional information: $ cinder get-pools --detail > I noted that nfsgold2 is used also when nfsgold1 is almost full. > I expected the volume was created on share with more space availability. > Ignazio > Then the capacity filtering seems to be working as expected (we can confirm looking at the debug logs and seeing if both backends pass the filtering). You could see in the logs that both of them are passing the filtering and are valid to create volumes. The thing we'd have to look into is the weighing phase, where the scheduler is selecting nfsgold1 as the best option. I assume you haven't changed the defaults in the configuration options "scheduler_default_weighers" or in "scheduler_weight_handler". So it must be using the "CapacityWeigher". Are you using default values for "capacity_weigher_multiplier" and "allocated_capacity_weight_multiplier" config options? When using defaults the capacity weigher should be spread volumes instead of stacking them. I still think that the best way to debug this is to view the debug logs. In Stein you should be able to dynamically change the logging level of the scheduler services to debug without restarting the services, and then changing it back to info. Cheers, Gorka. > > Il giorno gio 13 gen 2022 alle ore 12:03 Gorka Eguileor > ha scritto: > > > On 13/01, Ignazio Cassano wrote: > > > Hello, > > > I am using openstack stein on centos 7 with netapp ontap driver. > > > Seems capacity filter is not working and volumes are always creed on the > > > first share where less space is available. > > > My configuration is posted here: > > > enabled_backends = nfsgold1, nfsgold2 > > > > > > [nfsgold1] > > > nas_secure_file_operations = false > > > nas_secure_file_permissions = false > > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > > netapp_storage_family = ontap_cluster > > > netapp_storage_protocol = nfs > > > netapp_vserver = svm-tstcinder2-cl1 > > > netapp_server_hostname = faspod2.csi.it > > > netapp_server_port = 80 > > > netapp_login = apimanager > > > netapp_password = password > > > nfs_shares_config = /etc/cinder/nfsgold1_shares > > > volume_backend_name = nfsgold > > > #nfs_mount_options = lookupcache=pos > > > nfs_mount_options = lookupcache=pos > > > > > > > > > [nfsgold2] > > > nas_secure_file_operations = false > > > nas_secure_file_permissions = false > > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > > netapp_storage_family = ontap_cluster > > > netapp_storage_protocol = nfs > > > netapp_vserver = svm-tstcinder2-cl2 > > > netapp_server_hostname = faspod2.csi.it > > > netapp_server_port = 80 > > > netapp_login = apimanager > > > netapp_password = password > > > nfs_shares_config = /etc/cinder/nfsgold2_shares > > > volume_backend_name = nfsgold > > > #nfs_mount_options = lookupcache=pos > > > nfs_mount_options = lookupcache=pos > > > > > > > > > > > > Volumes are created always on nfsgold1 also if has less space available > > of > > > nfsgold2 share > > > Thanks > > > Ignazio > > > > Hi, > > > > What volume type are you using to create the volumes? If you don't > > define it it would use the default from the cinder.conf file. > > > > What are the extra specs of the volume type? > > > > What pool info are the NetApp backends reporting? > > > > It's usually a good idea to enabled debugging on the schedulers and look > > at the details of how they are making the filtering and weighting > > decisions. > > > > Cheers, > > Gorka. > > > > From ignaziocassano at gmail.com Thu Jan 13 12:39:13 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 13 Jan 2022 13:39:13 +0100 Subject: [openstack][stein][cinder] capacity filter is not working In-Reply-To: <20220113120846.jntkdbl3ex34q43t@localhost> References: <20220113110337.f3rvbgezbtvpolzw@localhost> <20220113120846.jntkdbl3ex34q43t@localhost> Message-ID: Hello Gorka, I have not changed default values for filters. When I come backup to office I will senza you details you requested. Ignazio Il Gio 13 Gen 2022, 13:08 Gorka Eguileor ha scritto: > On 13/01, Ignazio Cassano wrote: > > Hello, I am using nfsgold volume type. > > [root at tst-controller-01 ansible]# cinder type-show nfsgold > > > +---------------------------------+--------------------------------------+ > > | Property | Value > | > > > +---------------------------------+--------------------------------------+ > > | description | None > | > > | extra_specs | volume_backend_name : nfsgold > | > > | id | fd8b1cc8-4c3a-490d-bc95-29e491f850cc > | > > | is_public | True > | > > | name | nfsgold > | > > | os-volume-type-access:is_public | True > | > > | qos_specs_id | None > | > > > +---------------------------------+--------------------------------------+ > > > > cinder get-pools > > > +----------+--------------------------------------------------------------------+ > > | Property | Value > > | > > > +----------+--------------------------------------------------------------------+ > > | name | cinder-cluster-1 at nfsgold2#10.102.189.156: > /svm_tstcinder_cl2_volssd > > | > > > +----------+--------------------------------------------------------------------+ > > > +----------+--------------------------------------------------------------------+ > > | Property | Value > > | > > > +----------+--------------------------------------------------------------------+ > > | name | cinder-cluster-1 at nfsgold1#10.102.189.155: > /svm_tstcinder_cl1_volssd > > | > > > +----------+--------------------------------------------------------------------+ > > > > Hi, > > We would need to see the details of the pools to see additional > information: > > $ cinder get-pools --detail > > > I noted that nfsgold2 is used also when nfsgold1 is almost full. > > I expected the volume was created on share with more space availability. > > Ignazio > > > > Then the capacity filtering seems to be working as expected (we can > confirm looking at the debug logs and seeing if both backends pass the > filtering). You could see in the logs that both of them are passing the > filtering and are valid to create volumes. > > The thing we'd have to look into is the weighing phase, where the > scheduler is selecting nfsgold1 as the best option. > > I assume you haven't changed the defaults in the configuration options > "scheduler_default_weighers" or in "scheduler_weight_handler". > > So it must be using the "CapacityWeigher". Are you using default values > for "capacity_weigher_multiplier" and > "allocated_capacity_weight_multiplier" config options? > > When using defaults the capacity weigher should be spread volumes > instead of stacking them. > > I still think that the best way to debug this is to view the debug logs. > In Stein you should be able to dynamically change the logging level of > the scheduler services to debug without restarting the services, and > then changing it back to info. > > Cheers, > Gorka. > > > > > > Il giorno gio 13 gen 2022 alle ore 12:03 Gorka Eguileor < > geguileo at redhat.com> > > ha scritto: > > > > > On 13/01, Ignazio Cassano wrote: > > > > Hello, > > > > I am using openstack stein on centos 7 with netapp ontap driver. > > > > Seems capacity filter is not working and volumes are always creed on > the > > > > first share where less space is available. > > > > My configuration is posted here: > > > > enabled_backends = nfsgold1, nfsgold2 > > > > > > > > [nfsgold1] > > > > nas_secure_file_operations = false > > > > nas_secure_file_permissions = false > > > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > > > netapp_storage_family = ontap_cluster > > > > netapp_storage_protocol = nfs > > > > netapp_vserver = svm-tstcinder2-cl1 > > > > netapp_server_hostname = faspod2.csi.it > > > > netapp_server_port = 80 > > > > netapp_login = apimanager > > > > netapp_password = password > > > > nfs_shares_config = /etc/cinder/nfsgold1_shares > > > > volume_backend_name = nfsgold > > > > #nfs_mount_options = lookupcache=pos > > > > nfs_mount_options = lookupcache=pos > > > > > > > > > > > > [nfsgold2] > > > > nas_secure_file_operations = false > > > > nas_secure_file_permissions = false > > > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > > > netapp_storage_family = ontap_cluster > > > > netapp_storage_protocol = nfs > > > > netapp_vserver = svm-tstcinder2-cl2 > > > > netapp_server_hostname = faspod2.csi.it > > > > netapp_server_port = 80 > > > > netapp_login = apimanager > > > > netapp_password = password > > > > nfs_shares_config = /etc/cinder/nfsgold2_shares > > > > volume_backend_name = nfsgold > > > > #nfs_mount_options = lookupcache=pos > > > > nfs_mount_options = lookupcache=pos > > > > > > > > > > > > > > > > Volumes are created always on nfsgold1 also if has less space > available > > > of > > > > nfsgold2 share > > > > Thanks > > > > Ignazio > > > > > > Hi, > > > > > > What volume type are you using to create the volumes? If you don't > > > define it it would use the default from the cinder.conf file. > > > > > > What are the extra specs of the volume type? > > > > > > What pool info are the NetApp backends reporting? > > > > > > It's usually a good idea to enabled debugging on the schedulers and > look > > > at the details of how they are making the filtering and weighting > > > decisions. > > > > > > Cheers, > > > Gorka. > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Thu Jan 13 13:47:54 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 13 Jan 2022 14:47:54 +0100 Subject: [openstack][stein][cinder] capacity filter is not working In-Reply-To: <20220113120846.jntkdbl3ex34q43t@localhost> References: <20220113110337.f3rvbgezbtvpolzw@localhost> <20220113120846.jntkdbl3ex34q43t@localhost> Message-ID: Hellp Gorka, here you can find more details: https://paste.openstack.org/show/812091/ Many thanks Ignazio Il giorno gio 13 gen 2022 alle ore 13:08 Gorka Eguileor ha scritto: > On 13/01, Ignazio Cassano wrote: > > Hello, I am using nfsgold volume type. > > [root at tst-controller-01 ansible]# cinder type-show nfsgold > > > +---------------------------------+--------------------------------------+ > > | Property | Value > | > > > +---------------------------------+--------------------------------------+ > > | description | None > | > > | extra_specs | volume_backend_name : nfsgold > | > > | id | fd8b1cc8-4c3a-490d-bc95-29e491f850cc > | > > | is_public | True > | > > | name | nfsgold > | > > | os-volume-type-access:is_public | True > | > > | qos_specs_id | None > | > > > +---------------------------------+--------------------------------------+ > > > > cinder get-pools > > > +----------+--------------------------------------------------------------------+ > > | Property | Value > > | > > > +----------+--------------------------------------------------------------------+ > > | name | cinder-cluster-1 at nfsgold2#10.102.189.156: > /svm_tstcinder_cl2_volssd > > | > > > +----------+--------------------------------------------------------------------+ > > > +----------+--------------------------------------------------------------------+ > > | Property | Value > > | > > > +----------+--------------------------------------------------------------------+ > > | name | cinder-cluster-1 at nfsgold1#10.102.189.155: > /svm_tstcinder_cl1_volssd > > | > > > +----------+--------------------------------------------------------------------+ > > > > Hi, > > We would need to see the details of the pools to see additional > information: > > $ cinder get-pools --detail > > > I noted that nfsgold2 is used also when nfsgold1 is almost full. > > I expected the volume was created on share with more space availability. > > Ignazio > > > > Then the capacity filtering seems to be working as expected (we can > confirm looking at the debug logs and seeing if both backends pass the > filtering). You could see in the logs that both of them are passing the > filtering and are valid to create volumes. > > The thing we'd have to look into is the weighing phase, where the > scheduler is selecting nfsgold1 as the best option. > > I assume you haven't changed the defaults in the configuration options > "scheduler_default_weighers" or in "scheduler_weight_handler". > > So it must be using the "CapacityWeigher". Are you using default values > for "capacity_weigher_multiplier" and > "allocated_capacity_weight_multiplier" config options? > > When using defaults the capacity weigher should be spread volumes > instead of stacking them. > > I still think that the best way to debug this is to view the debug logs. > In Stein you should be able to dynamically change the logging level of > the scheduler services to debug without restarting the services, and > then changing it back to info. > > Cheers, > Gorka. > > > > > > Il giorno gio 13 gen 2022 alle ore 12:03 Gorka Eguileor < > geguileo at redhat.com> > > ha scritto: > > > > > On 13/01, Ignazio Cassano wrote: > > > > Hello, > > > > I am using openstack stein on centos 7 with netapp ontap driver. > > > > Seems capacity filter is not working and volumes are always creed on > the > > > > first share where less space is available. > > > > My configuration is posted here: > > > > enabled_backends = nfsgold1, nfsgold2 > > > > > > > > [nfsgold1] > > > > nas_secure_file_operations = false > > > > nas_secure_file_permissions = false > > > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > > > netapp_storage_family = ontap_cluster > > > > netapp_storage_protocol = nfs > > > > netapp_vserver = svm-tstcinder2-cl1 > > > > netapp_server_hostname = faspod2.csi.it > > > > netapp_server_port = 80 > > > > netapp_login = apimanager > > > > netapp_password = password > > > > nfs_shares_config = /etc/cinder/nfsgold1_shares > > > > volume_backend_name = nfsgold > > > > #nfs_mount_options = lookupcache=pos > > > > nfs_mount_options = lookupcache=pos > > > > > > > > > > > > [nfsgold2] > > > > nas_secure_file_operations = false > > > > nas_secure_file_permissions = false > > > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > > > netapp_storage_family = ontap_cluster > > > > netapp_storage_protocol = nfs > > > > netapp_vserver = svm-tstcinder2-cl2 > > > > netapp_server_hostname = faspod2.csi.it > > > > netapp_server_port = 80 > > > > netapp_login = apimanager > > > > netapp_password = password > > > > nfs_shares_config = /etc/cinder/nfsgold2_shares > > > > volume_backend_name = nfsgold > > > > #nfs_mount_options = lookupcache=pos > > > > nfs_mount_options = lookupcache=pos > > > > > > > > > > > > > > > > Volumes are created always on nfsgold1 also if has less space > available > > > of > > > > nfsgold2 share > > > > Thanks > > > > Ignazio > > > > > > Hi, > > > > > > What volume type are you using to create the volumes? If you don't > > > define it it would use the default from the cinder.conf file. > > > > > > What are the extra specs of the volume type? > > > > > > What pool info are the NetApp backends reporting? > > > > > > It's usually a good idea to enabled debugging on the schedulers and > look > > > at the details of how they are making the filtering and weighting > > > decisions. > > > > > > Cheers, > > > Gorka. > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokendrarathour at gmail.com Thu Jan 13 14:17:05 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Thu, 13 Jan 2022 19:47:05 +0530 Subject: [Triple0] In-Reply-To: References: Message-ID: Ok, yes. We did fix this by ensuring the NTP Sync beforehand. Thanks Once again. On Thu, Jan 13, 2022 at 12:06 PM Yatin Karel wrote: > Hi Lokendra, > > Just to update this is a known issue > https://bugs.launchpad.net/tripleo/+bug/1955414. And is being fixed with > https://review.opendev.org/q/I4ac9210f6533e6826b2814f72b7271b43fdca267. > This will ensure deployment to fail early if there is an issue with NTP > rather than failing randomly later. > > > Thanks and Regards > Yatin Karel > > On Thu, Jan 13, 2022 at 9:17 AM Lokendra Rathour < > lokendrarathour at gmail.com> wrote: > >> Hey Laurent, >> Thanks for the update, But I think the issue was because of NTP Not in >> sync. >> Somehow my Chrony sync stopped and it cause the Setup completion failure. >> >> Solution: Chrony needs to be in sync all time. >> >> >> -Lokendra >> >> >> On Thu, Jan 13, 2022 at 8:20 AM Laurent Dumont >> wrote: >> >>> Which user is the UUID 765d7c43b3d54b289c9fa9e1dae15112 referring to? >>> Can you reset it's password? I do believe that Keystone will enforce a >>> password reset until you change it. >>> >>> On Mon, Jan 10, 2022 at 4:25 AM Lokendra Rathour < >>> lokendrarathour at gmail.com> wrote: >>> >>>> Hi Team, >>>> I was trying to deploy Triple0 Train using containers. >>>> After around 30+ min of overcloud running it gett below error: >>>> >>>> The error was: "keystoneauth1.exceptions.http.Unauthorized: The >>>> password is expired and needs to be changed for user: >>>> 765d7c43b3d54b289c9fa9e1dae15112. (HTTP 401) (Request-ID: >>>> req-f1344091-4f81-44c2-8ec6-e2daffbf998c) >>>> 2022-01-09 23:10:39.354147 | 525400bb-6dc5-c157-232d-000000003de1 | >>>> FATAL | As" >>>> >>>> I tried checking the Horizon, where is see that it is asking me to >>>> change the password immediately with the same message and a similar >>>> observation is seen with overcloud file. >>>> >>>> Please advice why this deployment is getting failed >>>> >>>> -- >>>> ~ Lokendra >>>> >>>> >>>> >> >> -- >> ~ Lokendra >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ozzzo at yahoo.com Thu Jan 13 15:54:26 2022 From: ozzzo at yahoo.com (Albert Braden) Date: Thu, 13 Jan 2022 15:54:26 +0000 (UTC) Subject: [ops] [kolla] RabbitMQ High Availability In-Reply-To: References: <5dd6d28f-9955-7ca5-0ab8-0c5ce11ceb7e@redhat.com> <14084e87df22458caa7668eea8b496b6@verisign.com> <1147779219.1274196.1639086048233@mail.yahoo.com> <986294621.1553814.1639155002132@mail.yahoo.com> <169252651.2819859.1639516226823@mail.yahoo.com> <1335760337.3548170.1639680236968@mail.yahoo.com> <33441648.1434581.1641304881681@mail.yahoo.com> <385929635.1929303.1642011977053@mail.yahoo.com> Message-ID: <326590098.315301.1642089266574@mail.yahoo.com> After digging further I realized that I'm not setting TTL; only queue expiration. Here's what I see in the GUI when I look at affected queues: Policy notifications-expire Effective policy definition expires: 1200 This is what I have in definitions.json.j2: {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"expires":1200}, "priority":0}, I tried this to set both: {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":"{{ rabbitmq_message_ttl | int }}","expires":1200}, "priority":0}, But the RMQ containers restart every 60 seconds and puke this into the log: [error] <0.322.0> CRASH REPORT Process <0.322.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"600\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 After reading the doc on TTL: https://www.rabbitmq.com/ttl.html I realized that the TTL is set in ms, so I tried "rabbitmq_message_ttl: 60000" but that only changes the number in the error: [error] <0.318.0> CRASH REPORT Process <0.318.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"60000\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 What am I missing? On Wednesday, January 12, 2022, 05:11:41 PM EST, Dale Smith wrote: In the web interface(RabbitMQ 3.8.23, not using Kolla), when looking at the queue you will see the "Policy" listed by name, and "Effective policy definition". You can either view the policy definition, and the arguments for the definitions applied, or "effective policy definition" should show you the list. It may be relevant to note: "Each exchange or queue will have at most one policy matching" - https://www.rabbitmq.com/parameters.html#how-policies-work I've added a similar comment to the linked patchset. On 13/01/22 7:26 am, Albert Braden wrote: This is very helpful. Thank you! It appears that I have successfully set the expire time to 1200, because I no longer see unconsumed messages lingering in my queues, but it's not obvious how to verify. In the web interface, when I look at the queues, I see things like policy, state, features and consumers, but I don't see a timeout or expire value, nor do I find the number 1200 anywhere. Where should I be looking in the web interface to verify that I set the expire time correctly? Or do I need to use the CLI? On Wednesday, January 5, 2022, 04:23:29 AM EST, Mark Goddard wrote: On Tue, 4 Jan 2022 at 14:08, Albert Braden wrote: > > Now that the holidays are over I'm trying this one again. Can anyone help me figure out how to set "expires" and "message-ttl" ? John Garbutt proposed a few patches for RabbitMQ in kolla, including this: https://review.opendev.org/c/openstack/kolla-ansible/+/822191 https://review.opendev.org/q/hashtag:%2522rabbitmq%2522+(status:open+OR+status:merged)+project:openstack/kolla-ansible Note that they are currently untested. Mark > On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden wrote: > > > I tried these policies in ansible/roles/rabbitmq/templates/definitions.json.j2: > > "policies":[ > {"vhost": "/", "name": "ha-all", "pattern": '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, > {"vhost": "/", "name": "notifications-ttl", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"message-ttl":600}, "priority":0} > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"expire":3600}, "priority":0} > {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} > {% endif %} > > But I still see unconsumed messages lingering in notifications_extractor.info. From reading the docs I think this setting should cause messages to expire after 600 seconds, and unused queues to be deleted after 3600 seconds. What am I missing? > On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden wrote: > > > Following [1] I successfully set "amqp_durable_queues = True" and restricted HA to the appropriate queues, but I'm having trouble with some of the other settings such as "expires" and "message-ttl". Does anyone have an example of a working kolla config that includes these changes? > > [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud wrote: > > > So, your config snippet LGTM. > > Le ven. 10 d?c. 2021 ? 17:50, Albert Braden a ?crit : > > Sorry, that was a transcription error. I thought "True" and my fingers typed "False." The correct lines are: > > [oslo_messaging_rabbit] > amqp_durable_queues = True > > On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud wrote: > > > If you plan to let `amqp_durable_queues = False` (i.e if you plan to keep this config equal to false), then you don't need to add these config lines as this is already the default value [1]. > > [1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34 > > Le jeu. 9 d?c. 2021 ? 22:40, Albert Braden a ?crit : > > Replying from my home email because I've been asked to not email the list from my work email anymore, until I get permission from upper management. > > I'm not sure I follow. I was planning to add 2 lines to etc/kolla/config/global.conf: > > [oslo_messaging_rabbit] > amqp_durable_queues = False > > Is that not sufficient? What is involved in configuring dedicated control exchanges for each service? What would that look like in the config? > > > From: Herve Beraud > Sent: Thursday, December 9, 2021 2:45 AM > To: Bogdan Dobrelya > Cc: openstack-discuss at lists.openstack.org > Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability > > > > Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > Le mer. 8 d?c. 2021 ? 11:48, Bogdan Dobrelya a ?crit : > > Please see inline > > >> I read this with great interest because we are seeing this issue. Questions: > >> > >> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? > >> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? > >> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into? > > > > Note that even having rabbit HA policies adjusted like that and its HA > > replication factor [0] decreased (e.g. to a 2), there still might be > > high churn caused by a large enough number of replicated durable RPC > > topic queues. And that might cripple the cloud down with the incurred > > I/O overhead because a durable queue requires all messages in it to be > > persisted to a disk (for all the messaging cluster replicas) before they > > are ack'ed by the broker. > > > > Given that said, Oslo messaging would likely require a more granular > > control for topic exchanges and the durable queues flag - to tell it to > > declare as durable only the most critical paths of a service. A single > > config setting and a single control exchange per a service might be not > > enough. > > Also note that therefore, amqp_durable_queue=True requires dedicated > control exchanges configured for each service. Those that use > 'openstack' as a default cannot turn the feature ON. Changing it to a > service specific might also cause upgrade impact, as described in the > topic [3]. > > > > The same is true for `amqp_auto_delete=True`. That requires dedicated control exchanges else it won't work if each service defines its own policy on a shared control exchange (e.g `openstack`) and if policies differ from each other. > > > > [3] https://review.opendev.org/q/topic:scope-config-opts > > > > > There are also race conditions with durable queues enabled, like [1]. A > > solution could be where each service declare its own dedicated control > > exchange with its own configuration. > > > > Finally, openstack components should add perhaps a *.next CI job to test > > it with durable queues, like [2] > > > > [0] https://www.rabbitmq.com/ha.html#replication-factor > > > > [1] > > https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt > > > > [2] https://review.opendev.org/c/openstack/nova/+/820523 > > > >> > >> Does anyone have a sample set of RMQ config files that they can share? > >> > >> It looks like my Outlook has ruined the link; reposting: > >> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > > > > > -- > > Best regards, > > Bogdan Dobrelya, > > Irc #bogdando > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > > > -- > > Herv? Beraud > > Senior Software Engineer at Red Hat > > irc: hberaud > > https://github.com/4383/ > > https://twitter.com/4383hberaud > > > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > https://twitter.com/4383hberaud > > > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > https://twitter.com/4383hberaud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tolga at etom.com.tr Thu Jan 13 10:37:17 2022 From: tolga at etom.com.tr (tolga at etom.com.tr) Date: Thu, 13 Jan 2022 13:37:17 +0300 Subject: [swift] EC2 Credentials Returns 403 when Signature v4 Used Message-ID: <553741642070169@mail.yandex.com.tr> An HTML attachment was scrubbed... URL: From stephenfin at redhat.com Thu Jan 13 16:19:20 2022 From: stephenfin at redhat.com (Stephen Finucane) Date: Thu, 13 Jan 2022 16:19:20 +0000 Subject: [nova] [placement] Proposing Sean Mooney as nova-core In-Reply-To: References: Message-ID: On Wed, 2022-01-12 at 17:19 +0100, Sylvain Bauza wrote: > Hi all, > I would like to propose Sean as an addition to the nova-core team (which > includes placement merge rights as nova-core is implicitly a subgroup). > > As we know, he's around for a long time, is already a nova-specs-core and has > proven solid experience in reviews. > > Cores, please vote (-1, 0, +1) before next Wednesday Jan 19th 1600UTC. +1. Sean would make a great addition. Stephen > Cheers, > -Sylvain > > From fungi at yuggoth.org Thu Jan 13 17:23:16 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 13 Jan 2022 17:23:16 +0000 Subject: [openstacksdk] EC2 Credentials Returns 403 when Signature v4 Used In-Reply-To: <553741642070169@mail.yandex.com.tr> Message-ID: <20220113172315.wdl2crscsu7otu7m@yuggoth.org> On 2022-01-10 10:37:17 UTC, Tolga wrote: > We are using ussuri on Ubuntu Focal. We experienced an issue with > Ceph Rados Gateway. [...] Moderator's Note: I've approved this post out of the moderation queue because someone on this mailing list might know the answers to your questions, but please be aware that Ceph and RadosGW are not part of OpenStack and are not the same thing as OpenStack's Swift service. I've removed the [swift] tag from the subject line on my reply accordingly. You may find better answers by asking the Ceph community instead. I see they have mailing list for users, which is probably an appropriate place to start: https://lists.ceph.io/postorius/lists/ceph-users.ceph.io/ > We are generating EC2 credentials from CLI using "openstack > ec2 credentials ..." so is there any option for Sigv4? [...] This particular question may be on topic here, as it pertains to the OpenStack Client (maintained by the OpenStackSDK team), so I've added a subject tag of [openstacksdk] in order to better bring it to their attention. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From geguileo at redhat.com Thu Jan 13 18:13:03 2022 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 13 Jan 2022 19:13:03 +0100 Subject: [openstack][stein][cinder] capacity filter is not working In-Reply-To: References: <20220113110337.f3rvbgezbtvpolzw@localhost> <20220113120846.jntkdbl3ex34q43t@localhost> Message-ID: <20220113181303.xbeudmaqfda745oh@localhost> On 13/01, Ignazio Cassano wrote: > Hellp Gorka, > here you can find more details: > https://paste.openstack.org/show/812091/ > Many thanks > Ignazio > Hi, Given the reported data from the backends, which is: nfsgold1: max_over_subscription_ratio = 20.0 total_capacity_gb = 1945.6 free_capacity_gb = 609.68 reserved_percentage = 0 allocated_capacity_gb = 0 nfsgold2: max_over_subscription_ratio = 20.0 total_capacity_gb = 972.8 free_capacity_gb = 970.36 reserved_percentage = 0 allocated_capacity_gb = 0 Since those backends are not reporting the provisioned_capacity_gb, then it is assigned the same value as the allocated_capacity_gb, which is the sum of existing volumes in Cinder for that backend. I see it is reporting 0 here, so I assume you are using the same storage pool for multiple things and you still don't have volumes in Cinder. The calculation the scheduler does for the weighting is as follows: virtual-free-capacity = total_capacity_gb * max_over_subscription_ratio - provisioned_capacity - math.floor(total_capacity_gb * reserved_percentage) Which results in: nfsgold1 = 1945.6 * 20.0 - 0 - math.floor(1945.6 * 0) = 38,913.8 nfsgold2 = 972.8 * 20.0 - 0 - math.floor(1945.6 * 0) = 19,456 So nfsgold1 is returning a greater value, and therefore is winning the weighing, so only when there is no longer space in nfsgold1 and the filtering fails will nfsgold2 be used. If you look at the debug logs you should see that it describes which backends start the filtering, which ones pass each filter, and then which ones are weighed and which one wins. I see that the NetApp driver has a way to report the provisioned capacity (netapp_driver_reports_provisiones_capacity) that may be able to help you. Another way to resolve the issue may be to use an exclusive pool in the backend. Cheers, Gorka. > Il giorno gio 13 gen 2022 alle ore 13:08 Gorka Eguileor > ha scritto: > > > On 13/01, Ignazio Cassano wrote: > > > Hello, I am using nfsgold volume type. > > > [root at tst-controller-01 ansible]# cinder type-show nfsgold > > > > > +---------------------------------+--------------------------------------+ > > > | Property | Value > > | > > > > > +---------------------------------+--------------------------------------+ > > > | description | None > > | > > > | extra_specs | volume_backend_name : nfsgold > > | > > > | id | fd8b1cc8-4c3a-490d-bc95-29e491f850cc > > | > > > | is_public | True > > | > > > | name | nfsgold > > | > > > | os-volume-type-access:is_public | True > > | > > > | qos_specs_id | None > > | > > > > > +---------------------------------+--------------------------------------+ > > > > > > cinder get-pools > > > > > +----------+--------------------------------------------------------------------+ > > > | Property | Value > > > | > > > > > +----------+--------------------------------------------------------------------+ > > > | name | cinder-cluster-1 at nfsgold2#10.102.189.156: > > /svm_tstcinder_cl2_volssd > > > | > > > > > +----------+--------------------------------------------------------------------+ > > > > > +----------+--------------------------------------------------------------------+ > > > | Property | Value > > > | > > > > > +----------+--------------------------------------------------------------------+ > > > | name | cinder-cluster-1 at nfsgold1#10.102.189.155: > > /svm_tstcinder_cl1_volssd > > > | > > > > > +----------+--------------------------------------------------------------------+ > > > > > > > Hi, > > > > We would need to see the details of the pools to see additional > > information: > > > > $ cinder get-pools --detail > > > > > I noted that nfsgold2 is used also when nfsgold1 is almost full. > > > I expected the volume was created on share with more space availability. > > > Ignazio > > > > > > > Then the capacity filtering seems to be working as expected (we can > > confirm looking at the debug logs and seeing if both backends pass the > > filtering). You could see in the logs that both of them are passing the > > filtering and are valid to create volumes. > > > > The thing we'd have to look into is the weighing phase, where the > > scheduler is selecting nfsgold1 as the best option. > > > > I assume you haven't changed the defaults in the configuration options > > "scheduler_default_weighers" or in "scheduler_weight_handler". > > > > So it must be using the "CapacityWeigher". Are you using default values > > for "capacity_weigher_multiplier" and > > "allocated_capacity_weight_multiplier" config options? > > > > When using defaults the capacity weigher should be spread volumes > > instead of stacking them. > > > > I still think that the best way to debug this is to view the debug logs. > > In Stein you should be able to dynamically change the logging level of > > the scheduler services to debug without restarting the services, and > > then changing it back to info. > > > > Cheers, > > Gorka. > > > > > > > > > > Il giorno gio 13 gen 2022 alle ore 12:03 Gorka Eguileor < > > geguileo at redhat.com> > > > ha scritto: > > > > > > > On 13/01, Ignazio Cassano wrote: > > > > > Hello, > > > > > I am using openstack stein on centos 7 with netapp ontap driver. > > > > > Seems capacity filter is not working and volumes are always creed on > > the > > > > > first share where less space is available. > > > > > My configuration is posted here: > > > > > enabled_backends = nfsgold1, nfsgold2 > > > > > > > > > > [nfsgold1] > > > > > nas_secure_file_operations = false > > > > > nas_secure_file_permissions = false > > > > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > > > > netapp_storage_family = ontap_cluster > > > > > netapp_storage_protocol = nfs > > > > > netapp_vserver = svm-tstcinder2-cl1 > > > > > netapp_server_hostname = faspod2.csi.it > > > > > netapp_server_port = 80 > > > > > netapp_login = apimanager > > > > > netapp_password = password > > > > > nfs_shares_config = /etc/cinder/nfsgold1_shares > > > > > volume_backend_name = nfsgold > > > > > #nfs_mount_options = lookupcache=pos > > > > > nfs_mount_options = lookupcache=pos > > > > > > > > > > > > > > > [nfsgold2] > > > > > nas_secure_file_operations = false > > > > > nas_secure_file_permissions = false > > > > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > > > > netapp_storage_family = ontap_cluster > > > > > netapp_storage_protocol = nfs > > > > > netapp_vserver = svm-tstcinder2-cl2 > > > > > netapp_server_hostname = faspod2.csi.it > > > > > netapp_server_port = 80 > > > > > netapp_login = apimanager > > > > > netapp_password = password > > > > > nfs_shares_config = /etc/cinder/nfsgold2_shares > > > > > volume_backend_name = nfsgold > > > > > #nfs_mount_options = lookupcache=pos > > > > > nfs_mount_options = lookupcache=pos > > > > > > > > > > > > > > > > > > > > Volumes are created always on nfsgold1 also if has less space > > available > > > > of > > > > > nfsgold2 share > > > > > Thanks > > > > > Ignazio > > > > > > > > Hi, > > > > > > > > What volume type are you using to create the volumes? If you don't > > > > define it it would use the default from the cinder.conf file. > > > > > > > > What are the extra specs of the volume type? > > > > > > > > What pool info are the NetApp backends reporting? > > > > > > > > It's usually a good idea to enabled debugging on the schedulers and > > look > > > > at the details of how they are making the filtering and weighting > > > > decisions. > > > > > > > > Cheers, > > > > Gorka. > > > > > > > > > > > > From ozzzo at yahoo.com Thu Jan 13 18:09:49 2022 From: ozzzo at yahoo.com (Albert Braden) Date: Thu, 13 Jan 2022 18:09:49 +0000 (UTC) Subject: [ops] [kolla] RabbitMQ High Availability In-Reply-To: <326590098.315301.1642089266574@mail.yahoo.com> References: <5dd6d28f-9955-7ca5-0ab8-0c5ce11ceb7e@redhat.com> <14084e87df22458caa7668eea8b496b6@verisign.com> <1147779219.1274196.1639086048233@mail.yahoo.com> <986294621.1553814.1639155002132@mail.yahoo.com> <169252651.2819859.1639516226823@mail.yahoo.com> <1335760337.3548170.1639680236968@mail.yahoo.com> <33441648.1434581.1641304881681@mail.yahoo.com> <385929635.1929303.1642011977053@mail.yahoo.com> <326590098.315301.1642089266574@mail.yahoo.com> Message-ID: <2058295726.372026.1642097389964@mail.yahoo.com> Update: I googled around and found this: https://tickets.puppetlabs.com/browse/MODULES-2986 Apparently the " | int " isn't working. I tried '60000' and "60000" but that didn't make a difference. In desperation I tried this: {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":60000,"expires":1200}, "priority":0}, That works, but I'd prefer to use a variable. Has anyone done this successfully? Also, am I understanding correctly that "message-ttl" is set in milliseconds and "expires" is set in seconds? Or do I need to use ms for "expires" too? On Thursday, January 13, 2022, 11:03:11 AM EST, Albert Braden wrote: After digging further I realized that I'm not setting TTL; only queue expiration. Here's what I see in the GUI when I look at affected queues: Policy notifications-expire Effective policy definition expires: 1200 This is what I have in definitions.json.j2: {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"expires":1200}, "priority":0}, I tried this to set both: {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":"{{ rabbitmq_message_ttl | int }}","expires":1200}, "priority":0}, But the RMQ containers restart every 60 seconds and puke this into the log: [error] <0.322.0> CRASH REPORT Process <0.322.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"600\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 After reading the doc on TTL: https://www.rabbitmq.com/ttl.html I realized that the TTL is set in ms, so I tried "rabbitmq_message_ttl: 60000" but that only changes the number in the error: [error] <0.318.0> CRASH REPORT Process <0.318.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"60000\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 What am I missing? On Wednesday, January 12, 2022, 05:11:41 PM EST, Dale Smith wrote: In the web interface(RabbitMQ 3.8.23, not using Kolla), when looking at the queue you will see the "Policy" listed by name, and "Effective policy definition". You can either view the policy definition, and the arguments for the definitions applied, or "effective policy definition" should show you the list. It may be relevant to note: "Each exchange or queue will have at most one policy matching" - https://www.rabbitmq.com/parameters.html#how-policies-work I've added a similar comment to the linked patchset. On 13/01/22 7:26 am, Albert Braden wrote: This is very helpful. Thank you! It appears that I have successfully set the expire time to 1200, because I no longer see unconsumed messages lingering in my queues, but it's not obvious how to verify. In the web interface, when I look at the queues, I see things like policy, state, features and consumers, but I don't see a timeout or expire value, nor do I find the number 1200 anywhere. Where should I be looking in the web interface to verify that I set the expire time correctly? Or do I need to use the CLI? On Wednesday, January 5, 2022, 04:23:29 AM EST, Mark Goddard wrote: On Tue, 4 Jan 2022 at 14:08, Albert Braden wrote: > > Now that the holidays are over I'm trying this one again. Can anyone help me figure out how to set "expires" and "message-ttl" ? John Garbutt proposed a few patches for RabbitMQ in kolla, including this: https://review.opendev.org/c/openstack/kolla-ansible/+/822191 https://review.opendev.org/q/hashtag:%2522rabbitmq%2522+(status:open+OR+status:merged)+project:openstack/kolla-ansible Note that they are currently untested. Mark > On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden wrote: > > > I tried these policies in ansible/roles/rabbitmq/templates/definitions.json.j2: > > "policies":[ > {"vhost": "/", "name": "ha-all", "pattern": '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, > {"vhost": "/", "name": "notifications-ttl", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"message-ttl":600}, "priority":0} > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"expire":3600}, "priority":0} > {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} > {% endif %} > > But I still see unconsumed messages lingering in notifications_extractor.info. From reading the docs I think this setting should cause messages to expire after 600 seconds, and unused queues to be deleted after 3600 seconds. What am I missing? > On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden wrote: > > > Following [1] I successfully set "amqp_durable_queues = True" and restricted HA to the appropriate queues, but I'm having trouble with some of the other settings such as "expires" and "message-ttl". Does anyone have an example of a working kolla config that includes these changes? > > [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud wrote: > > > So, your config snippet LGTM. > > Le ven. 10 d?c. 2021 ? 17:50, Albert Braden a ?crit : > > Sorry, that was a transcription error. I thought "True" and my fingers typed "False." The correct lines are: > > [oslo_messaging_rabbit] > amqp_durable_queues = True > > On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud wrote: > > > If you plan to let `amqp_durable_queues = False` (i.e if you plan to keep this config equal to false), then you don't need to add these config lines as this is already the default value [1]. > > [1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34 > > Le jeu. 9 d?c. 2021 ? 22:40, Albert Braden a ?crit : > > Replying from my home email because I've been asked to not email the list from my work email anymore, until I get permission from upper management. > > I'm not sure I follow. I was planning to add 2 lines to etc/kolla/config/global.conf: > > [oslo_messaging_rabbit] > amqp_durable_queues = False > > Is that not sufficient? What is involved in configuring dedicated control exchanges for each service? What would that look like in the config? > > > From: Herve Beraud > Sent: Thursday, December 9, 2021 2:45 AM > To: Bogdan Dobrelya > Cc: openstack-discuss at lists.openstack.org > Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability > > > > Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > Le mer. 8 d?c. 2021 ? 11:48, Bogdan Dobrelya a ?crit : > > Please see inline > > >> I read this with great interest because we are seeing this issue. Questions: > >> > >> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? > >> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? > >> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into? > > > > Note that even having rabbit HA policies adjusted like that and its HA > > replication factor [0] decreased (e.g. to a 2), there still might be > > high churn caused by a large enough number of replicated durable RPC > > topic queues. And that might cripple the cloud down with the incurred > > I/O overhead because a durable queue requires all messages in it to be > > persisted to a disk (for all the messaging cluster replicas) before they > > are ack'ed by the broker. > > > > Given that said, Oslo messaging would likely require a more granular > > control for topic exchanges and the durable queues flag - to tell it to > > declare as durable only the most critical paths of a service. A single > > config setting and a single control exchange per a service might be not > > enough. > > Also note that therefore, amqp_durable_queue=True requires dedicated > control exchanges configured for each service. Those that use > 'openstack' as a default cannot turn the feature ON. Changing it to a > service specific might also cause upgrade impact, as described in the > topic [3]. > > > > The same is true for `amqp_auto_delete=True`. That requires dedicated control exchanges else it won't work if each service defines its own policy on a shared control exchange (e.g `openstack`) and if policies differ from each other. > > > > [3] https://review.opendev.org/q/topic:scope-config-opts > > > > > There are also race conditions with durable queues enabled, like [1]. A > > solution could be where each service declare its own dedicated control > > exchange with its own configuration. > > > > Finally, openstack components should add perhaps a *.next CI job to test > > it with durable queues, like [2] > > > > [0] https://www.rabbitmq.com/ha.html#replication-factor > > > > [1] > > https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt > > > > [2] https://review.opendev.org/c/openstack/nova/+/820523 > > > >> > >> Does anyone have a sample set of RMQ config files that they can share? > >> > >> It looks like my Outlook has ruined the link; reposting: > >> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > > > > > -- > > Best regards, > > Bogdan Dobrelya, > > Irc #bogdando > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > > > -- > > Herv? Beraud > > Senior Software Engineer at Red Hat > > irc: hberaud > > https://github.com/4383/ > > https://twitter.com/4383hberaud > > > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > https://twitter.com/4383hberaud > > > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > https://twitter.com/4383hberaud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Thu Jan 13 18:44:15 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 13 Jan 2022 18:44:15 +0000 Subject: [cloudkitty][kolla][monasca][neutron][oslo][security-sig] Log4j vulnerabilities and OpenStack In-Reply-To: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> Message-ID: <20220113184414.qaj63mayovrzmriz@yuggoth.org> Thanks to the excellent feedback from Rados?aw Piliszek and Pierre Riteau, the list of things for operators to look out for has grown a bit. Is anyone else aware of other, similar situations where OpenStack is commonly installed alongside Java software using Log4j in vulnerable ways? I think this list is becoming extensive enough we could consider publishing it in a security note (OSSN)... Kolla-Ansible Central Logging ----------------------------- If you're deploying with Kolla-Ansible and have enabled central logging, then it's installing a copy of Elasticsearch (7.13.4 currently, which includes Log4j 2.11.1). According to a statement from Elastic's developers, the relevant risks can be mitigated by passing "-Dlog4j2.formatMsgNoLookups=true" on the JVM's command line. All images built after December 21, 2021 have this workaround applied, with the exception of images for Train which did not get that patch merged until January 7, 2021. The statement from Elastic about the workaround can be found here: https://xeraa.net/blog/2021_mitigate-log4j2-log4shell-elasticsearch/ CloudKitty, Monasca, and OSProfiler ----------------------------------- If you're deploying CloudKitty, Monasca, or OSProfiler, you may be using Elasticsearch as a storage back-end for these services. Make sure you update it or put a suitable mitigation in place. Anyone deploying one or more of these services with Kolla-Ansible is running Elasticsearch, but should be covered so long as they update to the latest available images for their release series, as noted above. Networking-ODL -------------- Neutron's Networking-ODL driver relies on the Java-based OpenDaylight service, which should be updated if used: https://access.redhat.com/solutions/6586821 SUSE OpenStack -------------- The "storm" component of SUSE OpenStack seems to be impacted: https://www.suse.com/c/suse-statement-on-log4j-log4shell-cve-2021-44228-vulnerability/ Sovereign Cloud Stack --------------------- An Elasticsearch component in Sovereign Cloud Stack is affected: https://scs.community/security/2021/12/13/advisory-log4j/ -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ozzzo at yahoo.com Thu Jan 13 18:50:01 2022 From: ozzzo at yahoo.com (Albert Braden) Date: Thu, 13 Jan 2022 18:50:01 +0000 (UTC) Subject: [ops] [kolla] RabbitMQ High Availability In-Reply-To: <2058295726.372026.1642097389964@mail.yahoo.com> References: <5dd6d28f-9955-7ca5-0ab8-0c5ce11ceb7e@redhat.com> <14084e87df22458caa7668eea8b496b6@verisign.com> <1147779219.1274196.1639086048233@mail.yahoo.com> <986294621.1553814.1639155002132@mail.yahoo.com> <169252651.2819859.1639516226823@mail.yahoo.com> <1335760337.3548170.1639680236968@mail.yahoo.com> <33441648.1434581.1641304881681@mail.yahoo.com> <385929635.1929303.1642011977053@mail.yahoo.com> <326590098.315301.1642089266574@mail.yahoo.com> <2058295726.372026.1642097389964@mail.yahoo.com> Message-ID: <2077696744.396726.1642099801140@mail.yahoo.com> After reading more I realize that "expires" is also set in ms. So it looks like the correct settings are: message-ttl: 60000 expires: 120000 This would expire messages in 10 minutes and queues in 20 minutes. The only remaining question is, how can I specify these in a variable without generating the "not a valid message TTL" error? On Thursday, January 13, 2022, 01:22:33 PM EST, Albert Braden wrote: Update: I googled around and found this: https://tickets.puppetlabs.com/browse/MODULES-2986 Apparently the " | int " isn't working. I tried '60000' and "60000" but that didn't make a difference. In desperation I tried this: {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":60000,"expires":1200}, "priority":0}, That works, but I'd prefer to use a variable. Has anyone done this successfully? Also, am I understanding correctly that "message-ttl" is set in milliseconds and "expires" is set in seconds? Or do I need to use ms for "expires" too? On Thursday, January 13, 2022, 11:03:11 AM EST, Albert Braden wrote: After digging further I realized that I'm not setting TTL; only queue expiration. Here's what I see in the GUI when I look at affected queues: Policy notifications-expire Effective policy definition expires: 1200 This is what I have in definitions.json.j2: {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"expires":1200}, "priority":0}, I tried this to set both: {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":"{{ rabbitmq_message_ttl | int }}","expires":1200}, "priority":0}, But the RMQ containers restart every 60 seconds and puke this into the log: [error] <0.322.0> CRASH REPORT Process <0.322.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"600\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 After reading the doc on TTL: https://www.rabbitmq.com/ttl.html I realized that the TTL is set in ms, so I tried "rabbitmq_message_ttl: 60000" but that only changes the number in the error: [error] <0.318.0> CRASH REPORT Process <0.318.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"60000\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 What am I missing? On Wednesday, January 12, 2022, 05:11:41 PM EST, Dale Smith wrote: In the web interface(RabbitMQ 3.8.23, not using Kolla), when looking at the queue you will see the "Policy" listed by name, and "Effective policy definition". You can either view the policy definition, and the arguments for the definitions applied, or "effective policy definition" should show you the list. It may be relevant to note: "Each exchange or queue will have at most one policy matching" - https://www.rabbitmq.com/parameters.html#how-policies-work I've added a similar comment to the linked patchset. On 13/01/22 7:26 am, Albert Braden wrote: This is very helpful. Thank you! It appears that I have successfully set the expire time to 1200, because I no longer see unconsumed messages lingering in my queues, but it's not obvious how to verify. In the web interface, when I look at the queues, I see things like policy, state, features and consumers, but I don't see a timeout or expire value, nor do I find the number 1200 anywhere. Where should I be looking in the web interface to verify that I set the expire time correctly? Or do I need to use the CLI? On Wednesday, January 5, 2022, 04:23:29 AM EST, Mark Goddard wrote: On Tue, 4 Jan 2022 at 14:08, Albert Braden wrote: > > Now that the holidays are over I'm trying this one again. Can anyone help me figure out how to set "expires" and "message-ttl" ? John Garbutt proposed a few patches for RabbitMQ in kolla, including this: https://review.opendev.org/c/openstack/kolla-ansible/+/822191 https://review.opendev.org/q/hashtag:%2522rabbitmq%2522+(status:open+OR+status:merged)+project:openstack/kolla-ansible Note that they are currently untested. Mark > On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden wrote: > > > I tried these policies in ansible/roles/rabbitmq/templates/definitions.json.j2: > > "policies":[ > {"vhost": "/", "name": "ha-all", "pattern": '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, > {"vhost": "/", "name": "notifications-ttl", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"message-ttl":600}, "priority":0} > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"expire":3600}, "priority":0} > {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} > {% endif %} > > But I still see unconsumed messages lingering in notifications_extractor.info. From reading the docs I think this setting should cause messages to expire after 600 seconds, and unused queues to be deleted after 3600 seconds. What am I missing? > On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden wrote: > > > Following [1] I successfully set "amqp_durable_queues = True" and restricted HA to the appropriate queues, but I'm having trouble with some of the other settings such as "expires" and "message-ttl". Does anyone have an example of a working kolla config that includes these changes? > > [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud wrote: > > > So, your config snippet LGTM. > > Le ven. 10 d?c. 2021 ? 17:50, Albert Braden a ?crit : > > Sorry, that was a transcription error. I thought "True" and my fingers typed "False." The correct lines are: > > [oslo_messaging_rabbit] > amqp_durable_queues = True > > On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud wrote: > > > If you plan to let `amqp_durable_queues = False` (i.e if you plan to keep this config equal to false), then you don't need to add these config lines as this is already the default value [1]. > > [1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34 > > Le jeu. 9 d?c. 2021 ? 22:40, Albert Braden a ?crit : > > Replying from my home email because I've been asked to not email the list from my work email anymore, until I get permission from upper management. > > I'm not sure I follow. I was planning to add 2 lines to etc/kolla/config/global.conf: > > [oslo_messaging_rabbit] > amqp_durable_queues = False > > Is that not sufficient? What is involved in configuring dedicated control exchanges for each service? What would that look like in the config? > > > From: Herve Beraud > Sent: Thursday, December 9, 2021 2:45 AM > To: Bogdan Dobrelya > Cc: openstack-discuss at lists.openstack.org > Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability > > > > Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > Le mer. 8 d?c. 2021 ? 11:48, Bogdan Dobrelya a ?crit : > > Please see inline > > >> I read this with great interest because we are seeing this issue. Questions: > >> > >> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? > >> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? > >> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into? > > > > Note that even having rabbit HA policies adjusted like that and its HA > > replication factor [0] decreased (e.g. to a 2), there still might be > > high churn caused by a large enough number of replicated durable RPC > > topic queues. And that might cripple the cloud down with the incurred > > I/O overhead because a durable queue requires all messages in it to be > > persisted to a disk (for all the messaging cluster replicas) before they > > are ack'ed by the broker. > > > > Given that said, Oslo messaging would likely require a more granular > > control for topic exchanges and the durable queues flag - to tell it to > > declare as durable only the most critical paths of a service. A single > > config setting and a single control exchange per a service might be not > > enough. > > Also note that therefore, amqp_durable_queue=True requires dedicated > control exchanges configured for each service. Those that use > 'openstack' as a default cannot turn the feature ON. Changing it to a > service specific might also cause upgrade impact, as described in the > topic [3]. > > > > The same is true for `amqp_auto_delete=True`. That requires dedicated control exchanges else it won't work if each service defines its own policy on a shared control exchange (e.g `openstack`) and if policies differ from each other. > > > > [3] https://review.opendev.org/q/topic:scope-config-opts > > > > > There are also race conditions with durable queues enabled, like [1]. A > > solution could be where each service declare its own dedicated control > > exchange with its own configuration. > > > > Finally, openstack components should add perhaps a *.next CI job to test > > it with durable queues, like [2] > > > > [0] https://www.rabbitmq.com/ha.html#replication-factor > > > > [1] > > https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt > > > > [2] https://review.opendev.org/c/openstack/nova/+/820523 > > > >> > >> Does anyone have a sample set of RMQ config files that they can share? > >> > >> It looks like my Outlook has ruined the link; reposting: > >> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > > > > > -- > > Best regards, > > Bogdan Dobrelya, > > Irc #bogdando > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > > > -- > > Herv? Beraud > > Senior Software Engineer at Red Hat > > irc: hberaud > > https://github.com/4383/ > > https://twitter.com/4383hberaud > > > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > https://twitter.com/4383hberaud > > > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > https://twitter.com/4383hberaud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Thu Jan 13 19:26:46 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 13 Jan 2022 20:26:46 +0100 Subject: [openstack][stein][cinder] capacity filter is not working In-Reply-To: <20220113181303.xbeudmaqfda745oh@localhost> References: <20220113110337.f3rvbgezbtvpolzw@localhost> <20220113120846.jntkdbl3ex34q43t@localhost> <20220113181303.xbeudmaqfda745oh@localhost> Message-ID: Many thanks. I will check it Ignazio Il Gio 13 Gen 2022, 19:13 Gorka Eguileor ha scritto: > On 13/01, Ignazio Cassano wrote: > > Hellp Gorka, > > here you can find more details: > > https://paste.openstack.org/show/812091/ > > Many thanks > > Ignazio > > > > Hi, > > Given the reported data from the backends, which is: > > nfsgold1: > max_over_subscription_ratio = 20.0 > total_capacity_gb = 1945.6 > free_capacity_gb = 609.68 > reserved_percentage = 0 > allocated_capacity_gb = 0 > > nfsgold2: > max_over_subscription_ratio = 20.0 > total_capacity_gb = 972.8 > free_capacity_gb = 970.36 > reserved_percentage = 0 > allocated_capacity_gb = 0 > > Since those backends are not reporting the provisioned_capacity_gb, then > it is assigned the same value as the allocated_capacity_gb, which is the > sum of existing volumes in Cinder for that backend. I see it is > reporting 0 here, so I assume you are using the same storage pool for > multiple things and you still don't have volumes in Cinder. > > The calculation the scheduler does for the weighting is as follows: > > virtual-free-capacity = total_capacity_gb * max_over_subscription_ratio > - provisioned_capacity > - math.floor(total_capacity_gb * > reserved_percentage) > > Which results in: > > nfsgold1 = 1945.6 * 20.0 - 0 - math.floor(1945.6 * 0) = 38,913.8 > > nfsgold2 = 972.8 * 20.0 - 0 - math.floor(1945.6 * 0) = 19,456 > > So nfsgold1 is returning a greater value, and therefore is winning the > weighing, so only when there is no longer space in nfsgold1 and the > filtering fails will nfsgold2 be used. > > If you look at the debug logs you should see that it describes which > backends start the filtering, which ones pass each filter, and then > which ones are weighed and which one wins. > > I see that the NetApp driver has a way to report the provisioned > capacity (netapp_driver_reports_provisiones_capacity) that may be able > to help you. > > Another way to resolve the issue may be to use an exclusive pool in the > backend. > > Cheers, > Gorka. > > > Il giorno gio 13 gen 2022 alle ore 13:08 Gorka Eguileor < > geguileo at redhat.com> > > ha scritto: > > > > > On 13/01, Ignazio Cassano wrote: > > > > Hello, I am using nfsgold volume type. > > > > [root at tst-controller-01 ansible]# cinder type-show nfsgold > > > > > > > > +---------------------------------+--------------------------------------+ > > > > | Property | Value > > > | > > > > > > > > +---------------------------------+--------------------------------------+ > > > > | description | None > > > | > > > > | extra_specs | volume_backend_name : nfsgold > > > | > > > > | id | > fd8b1cc8-4c3a-490d-bc95-29e491f850cc > > > | > > > > | is_public | True > > > | > > > > | name | nfsgold > > > | > > > > | os-volume-type-access:is_public | True > > > | > > > > | qos_specs_id | None > > > | > > > > > > > > +---------------------------------+--------------------------------------+ > > > > > > > > cinder get-pools > > > > > > > > +----------+--------------------------------------------------------------------+ > > > > | Property | Value > > > > | > > > > > > > > +----------+--------------------------------------------------------------------+ > > > > | name | cinder-cluster-1 at nfsgold2#10.102.189.156: > > > /svm_tstcinder_cl2_volssd > > > > | > > > > > > > > +----------+--------------------------------------------------------------------+ > > > > > > > > +----------+--------------------------------------------------------------------+ > > > > | Property | Value > > > > | > > > > > > > > +----------+--------------------------------------------------------------------+ > > > > | name | cinder-cluster-1 at nfsgold1#10.102.189.155: > > > /svm_tstcinder_cl1_volssd > > > > | > > > > > > > > +----------+--------------------------------------------------------------------+ > > > > > > > > > > Hi, > > > > > > We would need to see the details of the pools to see additional > > > information: > > > > > > $ cinder get-pools --detail > > > > > > > I noted that nfsgold2 is used also when nfsgold1 is almost full. > > > > I expected the volume was created on share with more space > availability. > > > > Ignazio > > > > > > > > > > Then the capacity filtering seems to be working as expected (we can > > > confirm looking at the debug logs and seeing if both backends pass the > > > filtering). You could see in the logs that both of them are passing > the > > > filtering and are valid to create volumes. > > > > > > The thing we'd have to look into is the weighing phase, where the > > > scheduler is selecting nfsgold1 as the best option. > > > > > > I assume you haven't changed the defaults in the configuration options > > > "scheduler_default_weighers" or in "scheduler_weight_handler". > > > > > > So it must be using the "CapacityWeigher". Are you using default > values > > > for "capacity_weigher_multiplier" and > > > "allocated_capacity_weight_multiplier" config options? > > > > > > When using defaults the capacity weigher should be spread volumes > > > instead of stacking them. > > > > > > I still think that the best way to debug this is to view the debug > logs. > > > In Stein you should be able to dynamically change the logging level of > > > the scheduler services to debug without restarting the services, and > > > then changing it back to info. > > > > > > Cheers, > > > Gorka. > > > > > > > > > > > > > > Il giorno gio 13 gen 2022 alle ore 12:03 Gorka Eguileor < > > > geguileo at redhat.com> > > > > ha scritto: > > > > > > > > > On 13/01, Ignazio Cassano wrote: > > > > > > Hello, > > > > > > I am using openstack stein on centos 7 with netapp ontap driver. > > > > > > Seems capacity filter is not working and volumes are always > creed on > > > the > > > > > > first share where less space is available. > > > > > > My configuration is posted here: > > > > > > enabled_backends = nfsgold1, nfsgold2 > > > > > > > > > > > > [nfsgold1] > > > > > > nas_secure_file_operations = false > > > > > > nas_secure_file_permissions = false > > > > > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > > > > > netapp_storage_family = ontap_cluster > > > > > > netapp_storage_protocol = nfs > > > > > > netapp_vserver = svm-tstcinder2-cl1 > > > > > > netapp_server_hostname = faspod2.csi.it > > > > > > netapp_server_port = 80 > > > > > > netapp_login = apimanager > > > > > > netapp_password = password > > > > > > nfs_shares_config = /etc/cinder/nfsgold1_shares > > > > > > volume_backend_name = nfsgold > > > > > > #nfs_mount_options = lookupcache=pos > > > > > > nfs_mount_options = lookupcache=pos > > > > > > > > > > > > > > > > > > [nfsgold2] > > > > > > nas_secure_file_operations = false > > > > > > nas_secure_file_permissions = false > > > > > > volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver > > > > > > netapp_storage_family = ontap_cluster > > > > > > netapp_storage_protocol = nfs > > > > > > netapp_vserver = svm-tstcinder2-cl2 > > > > > > netapp_server_hostname = faspod2.csi.it > > > > > > netapp_server_port = 80 > > > > > > netapp_login = apimanager > > > > > > netapp_password = password > > > > > > nfs_shares_config = /etc/cinder/nfsgold2_shares > > > > > > volume_backend_name = nfsgold > > > > > > #nfs_mount_options = lookupcache=pos > > > > > > nfs_mount_options = lookupcache=pos > > > > > > > > > > > > > > > > > > > > > > > > Volumes are created always on nfsgold1 also if has less space > > > available > > > > > of > > > > > > nfsgold2 share > > > > > > Thanks > > > > > > Ignazio > > > > > > > > > > Hi, > > > > > > > > > > What volume type are you using to create the volumes? If you don't > > > > > define it it would use the default from the cinder.conf file. > > > > > > > > > > What are the extra specs of the volume type? > > > > > > > > > > What pool info are the NetApp backends reporting? > > > > > > > > > > It's usually a good idea to enabled debugging on the schedulers and > > > look > > > > > at the details of how they are making the filtering and weighting > > > > > decisions. > > > > > > > > > > Cheers, > > > > > Gorka. > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Thu Jan 13 21:03:54 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 13 Jan 2022 22:03:54 +0100 Subject: [neutron] Drivers meeting agenda - 14.01.2022. Message-ID: Hi Neutron Drivers, The agenda for tomorrow's drivers meeting is at [1]. The only topic for tomorrow: * Can neutron-fwaas project be revived? (mail thread: http://lists.openstack.org/pipermail/openstack-discuss/2021-December/026413.html ) [1] https://wiki.openstack.org/wiki/Meetings/NeutronDrivers#Agenda See you at the meeting tomorrow. Lajos Katona (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Thu Jan 13 21:17:43 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 13 Jan 2022 22:17:43 +0100 Subject: [cloudkitty][kolla][monasca][neutron][oslo][security-sig] Log4j vulnerabilities and OpenStack In-Reply-To: <20220113184414.qaj63mayovrzmriz@yuggoth.org> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220113184414.qaj63mayovrzmriz@yuggoth.org> Message-ID: Thank you Jeremy for putting all this information together. Some important comments inline. On Thu, 13 Jan 2022 at 19:49, Jeremy Stanley wrote: > > Thanks to the excellent feedback from Rados?aw Piliszek and Pierre > Riteau, the list of things for operators to look out for has grown a > bit. Is anyone else aware of other, similar situations where > OpenStack is commonly installed alongside Java software using Log4j > in vulnerable ways? I think this list is becoming extensive enough > we could consider publishing it in a security note (OSSN)... > > Kolla-Ansible Central Logging > ----------------------------- > > If you're deploying with Kolla-Ansible and have enabled central > logging, then it's installing a copy of Elasticsearch (7.13.4 > currently, which includes Log4j 2.11.1). According to a statement > from Elastic's developers, the relevant risks can be mitigated by > passing "-Dlog4j2.formatMsgNoLookups=true" on the JVM's command > line. All images built after December 21, 2021 have this workaround > applied, with the exception of images for Train which did not get > that patch merged until January 7, 2021. The statement from Elastic > about the workaround can be found here: > https://xeraa.net/blog/2021_mitigate-log4j2-log4shell-elasticsearch/ This part has several issues: - Xena and Wallaby use Elasticsearch OSS 7.x, the latest available package being version 7.10.2 from January 2021, which includes Log4j 2.11.1. Unfortunately it doesn't look like it will ever be updated, so we only have the formatMsgNoLookups to help. The Kolla community is planning to move to OpenSearch as a replacement for Elasticsearch. - Victoria and earlier releases use Elasticsearch OSS 6.x which still gets updated packages. Latest Kolla images for these releases use version 6.8.22, which includes Log4j 2.17.0. I see that Elasticsearch 6.8.23 was released today, which includes Log4j 2.17.1. This should get picked up by Kolla image builds in the coming days. - The formatMsgNoLookups mitigation is applied through Kolla Ansible and doesn't depend on when Kolla images were built. The dates you mentioned above are when the commits merged in Kolla Ansible. > CloudKitty, Monasca, and OSProfiler > ----------------------------------- > > If you're deploying CloudKitty, Monasca, or OSProfiler, you may be > using Elasticsearch as a storage back-end for these services. Make > sure you update it or put a suitable mitigation in place. Anyone > deploying one or more of these services with Kolla-Ansible is > running Elasticsearch, but should be covered so long as they update > to the latest available images for their release series, as noted > above. As noted in the previous comment, for Xena and Wallaby we don't have a fix using new images, only the mitigation applied through Kolla Ansible is available. Additional Monasca components that include Log4j: - Logstash: similar vulnerability pattern to Elasticsearch, i.e. no RCE out of the box - Kafka: uses an older Log4j 1.x which is only vulnerable to CVE-2021-4104 if configured to use JMSAppender (not the default), according to https://kafka.apache.org/cve-list - Zookeeper: uses Log4j 1.x, not affected by CVE-2021-44228 according to https://blogs.apache.org/security/entry/cve-2021-44228, not sure about the others - Storm: possibly vulnerable? Pull requests in github.com/apache/storm have bumped Log4j versions, but no new release has been issued yet. Kolla uses version 1.2.2. I am looking at adding a mitigation for CVE-2021-45046 based on removing the JndiLookup class from the classpath. > Networking-ODL > -------------- > > Neutron's Networking-ODL driver relies on the Java-based > OpenDaylight service, which should be updated if used: > https://access.redhat.com/solutions/6586821 > > SUSE OpenStack > -------------- > > The "storm" component of SUSE OpenStack seems to be impacted: > https://www.suse.com/c/suse-statement-on-log4j-log4shell-cve-2021-44228-vulnerability/ > > Sovereign Cloud Stack > --------------------- > > An Elasticsearch component in Sovereign Cloud Stack is affected: > https://scs.community/security/2021/12/13/advisory-log4j/ > > -- > Jeremy Stanley From fungi at yuggoth.org Thu Jan 13 21:26:24 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 13 Jan 2022 21:26:24 +0000 Subject: [cloudkitty][kolla][monasca][neutron][oslo][security-sig] Log4j vulnerabilities and OpenStack In-Reply-To: References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220113184414.qaj63mayovrzmriz@yuggoth.org> Message-ID: <20220113212624.vepvvknzwprkwsyy@yuggoth.org> On 2022-01-13 22:17:43 +0100 (+0100), Pierre Riteau wrote: [...] > This part has several issues: [...] Thanks for the detailed breakdown! I'll try to come up with a summary which retains accuracy while focusing on actionable recommendations, though I'll need to go over it a few more times and think on it for a bit before I can put together a new draft. > - Storm: possibly vulnerable? Pull requests in github.com/apache/storm > have bumped Log4j versions, but no new release has been issued yet. > Kolla uses version 1.2.2. I am looking at adding a mitigation for > CVE-2021-45046 based on removing the JndiLookup class from the > classpath. [...] Could that be the same as this? > > SUSE OpenStack > > -------------- > > > > The "storm" component of SUSE OpenStack seems to be impacted: > > https://www.suse.com/c/suse-statement-on-log4j-log4shell-cve-2021-44228-vulnerability/ [...] -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From pierre at stackhpc.com Thu Jan 13 21:58:17 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 13 Jan 2022 22:58:17 +0100 Subject: [cloudkitty][kolla][monasca][neutron][oslo][security-sig] Log4j vulnerabilities and OpenStack In-Reply-To: <20220113212624.vepvvknzwprkwsyy@yuggoth.org> References: <20220103160213.olwgdubxm56jvlmg@yuggoth.org> <20220113184414.qaj63mayovrzmriz@yuggoth.org> <20220113212624.vepvvknzwprkwsyy@yuggoth.org> Message-ID: On Thu, 13 Jan 2022 at 22:30, Jeremy Stanley wrote: > > On 2022-01-13 22:17:43 +0100 (+0100), Pierre Riteau wrote: > [...] > > This part has several issues: > [...] > > Thanks for the detailed breakdown! I'll try to come up with a > summary which retains accuracy while focusing on actionable > recommendations, though I'll need to go over it a few more times and > think on it for a bit before I can put together a new draft. > > > - Storm: possibly vulnerable? Pull requests in github.com/apache/storm > > have bumped Log4j versions, but no new release has been issued yet. > > Kolla uses version 1.2.2. I am looking at adding a mitigation for > > CVE-2021-45046 based on removing the JndiLookup class from the > > classpath. > [...] > > Could that be the same as this? I believe so. This lead me to [1] and [2] which have more details. SUSE opted to remove the JndiLookup class from log4j 2.x jars during build. I've actually already submitted a Kolla patch to apply the same mitigation: https://review.opendev.org/c/openstack/kolla/+/824651 [1] https://lists.suse.com/pipermail/sle-security-updates/2021-December/009911.html [2] https://bugzilla.suse.com/show_bug.cgi?id=1193641 > > > SUSE OpenStack > > > -------------- > > > > > > The "storm" component of SUSE OpenStack seems to be impacted: > > > https://www.suse.com/c/suse-statement-on-log4j-log4shell-cve-2021-44228-vulnerability/ > [...] > > -- > Jeremy Stanley From sandeepggn93 at gmail.com Fri Jan 14 00:49:56 2022 From: sandeepggn93 at gmail.com (Sandeep Yadav) Date: Fri, 14 Jan 2022 06:19:56 +0530 Subject: [TripleO] Proposing Juan Badia Payno for TripleO core reviewer. In-Reply-To: References: Message-ID: +1 On Wed, 12 Jan, 2022, 8:04 PM Marios Andreou, wrote: > +1 > > On Wed, Jan 12, 2022 at 3:19 PM Carlos Camacho Gonzalez > wrote: > > > > Hi everyone! > > > > I'd like to propose Juan Badia Paino [1][2][3] as a core reviewer on the > TripleO repositories that are or might be related to the backup and restore > efforts (openstack/tripleo-ci, openstack/tripleo-ansible, > openstack/python-tripleoclient, openstack/tripleo-quickstart-extras, > openstack/tripleo-quickstart). > > > > Juan has been around since 2016 making useful contributions and code > reviews to the community and I believe adding him to our core reviewer > group will help us improve the review and coding speed for the backup and > restore codebase. > > > > As usual, consider this email as an initial +1 from my side, I will keep > an eye on this thread for a week, and based on your feedback and if there > are no objections I will add him as a core reviewer in two weeks. > > > > [1]: https://review.opendev.org/q/owner:jbadiapa%2540redhat.com > > [2]: > https://www.stackalytics.io/?project_type=all&metric=commits&user_id=jbadiapa&release=all > > [3]: > https://www.stackalytics.io/report/contribution?module=tripleo-group&project_type=openstack&days=180 > > > > Cheers, > > Carlos. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Fri Jan 14 05:31:24 2022 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 14 Jan 2022 00:31:24 -0500 Subject: [kolla-ansible] Gather facts strange error Message-ID: Folks, I have a working cluster but today when I ran the following command I got very nasty error like the following. Even 192.168.75.144 is function node but giving super log error output that i can't copy paste here. $ sudo kolla-ansible -i /etc/kolla/multinode deploy -t nova-cell --limit 192.168.75.144 ... .... ....... PLAY [Gather facts for all hosts] ******************************************************************************************************************************************************************** TASK [Gather facts] ********************************************************************************************************************************************************************************** ok: [192.168.75.144] TASK [Group hosts to determine when using --limit] *************************************************************************************************************************************************** ok: [192.168.75.144] PLAY [Gather facts for all hosts (if using --limit)] ************************************************************************************************************************************************* TASK [Gather facts] ********************************************************************************************************************************************************************************** skipping: [192.168.75.144] => (item=192.168.75.144) ok: [192.168.75.144 -> 192.168.75.145] => (item=192.168.75.145) ok: [192.168.75.144 -> 192.168.75.146] => (item=192.168.75.146) failed: [192.168.75.144 -> 192.168.75.178] (item=192.168.75.178) => {"ansible_loop_var": "item", "item": "192.168.75.178", "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.75.178 port 22: No route to host", "unreachable": true} ok: [192.168.75.144 -> 192.168.75.179] => (item=192.168.75.179) failed: [192.168.75.144 -> 192.168.75.180] (item=192.168.75.180) => {"ansible_loop_var": "item", "item": "192.168.75.180", "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.75.180 port 22: No route to host", "unreachable": true} ok: [192.168.75.144 -> 192.168.75.181] => (item=192.168.75.181) ok: [192.168.75.144 -> 192.168.75.147] => (item=192.168.75.147) ok: [192.168.75.144 -> localhost] => (item=localhost) fatal: [192.168.75.144 -> {{ item }}]: UNREACHABLE! => {"changed": false, "msg": "All items completed", "results": [{"ansible_loop_var": "item", "changed": false, "item": "192.168.75.144", "skip_reason": "Conditional result was False", "skipped": true}, {"ansible_facts": {"ansible_all_ipv4_addresses": ["192.168.75.145"], "ansible_all_ipv6_addresses": ["fe80::3eec:efff:fe1f:1776", "fe80::f802:e3ff:fe71:e58c"], "ansible_apparmor": {"status": "disabled"}, "ansible_architecture": "x86_64", "ansible_bios_date": "02/27/2020", "ansible_bios_version": "3.3", "ansible_br_ex": {"active": false, "device": "br-ex", "features": {"esp_hw_offload": "off [fixed]", "esp_tx_csum_hw_offload": "off [fixed]", "fcoe_mtu": "off [fixed]", "generic_receive_offload": "on", "generic_segmentation_offload": "on", "highdma": "on", "hw_tc_offload": "off [fixed]", "l2_fwd_offload": "off [fixed]", "large_receive_offload": "off [fixed]", "loopback": "off [fixed]", "netns_local": "off [fixed]", "ntuple_filters": "off [fixed]", "receive_hashing": "off [fixed]", "rx_all": "off [fixed]", "rx_checksumming": "off [fixed]", "rx_fcs": "off [fixed]", "rx_gro_hw": "off [fixed]", "rx_gro_list": "off", "rx_udp_tunnel_port_offload": "off [fixed]", "rx_vlan_filter": "off [fixed]", "rx_vlan_offload": "off [fixed]", "rx_vlan_stag_filter": "off [fixed]", "rx_vlan_stag_hw_parse": "off [fixed]", "scatter_gather": "on", "tcp_segmentation_offload": "on", "tls_hw_record": "off [fixed]", "tls_hw_rx_offload": "off [fixed]", .... .... END PLAY RECAP ******************************************************************************************************************************************************************************************* 192.168.75.144 : ok=2 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 Command failed ansible-playbook -i /etc/kolla/multinode -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla --tags nova-cell --limit 192.168.75.144 -e kolla_action=deploy /usr/local/share/kolla-ansible/ansible/site.yml From pierre at stackhpc.com Fri Jan 14 08:50:33 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Fri, 14 Jan 2022 09:50:33 +0100 Subject: [all][infra][kayobe][kolla] ping not permitted on latest centos-8-stream images Message-ID: Hello, Late yesterday, I noticed many Kayobe CI jobs started failing with "ping: socket: Operation not permitted". I investigated the issue with clarkb on #openstack-infra, with help from #centos-devel as well (on Libera). This happens on the latest CentOS Stream 8 images and is caused by iputils 20180629-8.el8 removing capabilities on the ping binary [1]. This should have been shipped with a sysctl configuration allowing any group to access unprivileged ICMP echo sockets [2], but this is not in the systemd package yet. As a result, using ping without root privileges fails. TripleO is also impacted. They have fixed it in their CI jobs [3]. It is possible other projects are affected. There are multiple places within Kayobe and Kolla where we would need to set this sysctl to fix our CI, including backports to all supported branches. I was wondering if infra could instead customise their stream image or apply the sysctl in one of the common roles from zuul/zuul-jobs that are run at the beginning of each job? Many thanks. Best wishes, Pierre Riteau (priteau) [1] https://git.centos.org/rpms/iputils/c/efa64b5e05ccb2c1332304ad493acc874b61e13a?branch=c8s [2] https://github.com/redhat-plumbers/systemd-rhel8/pull/246 [3] https://review.opendev.org/c/openstack/tripleo-ci/+/824635 From lyarwood at redhat.com Fri Jan 14 09:26:12 2022 From: lyarwood at redhat.com (Lee Yarwood) Date: Fri, 14 Jan 2022 09:26:12 +0000 Subject: [nova] [placement] Proposing Sean Mooney as nova-core In-Reply-To: References: Message-ID: <20220114092612.g4oz5orppjtlgedl@lyarwood-laptop.usersys.redhat.com> On 12-01-22 17:19:04, Sylvain Bauza wrote: > Hi all, > I would like to propose Sean as an addition to the nova-core team (which > includes placement merge rights as nova-core is implicitly a subgroup). > > As we know, he's around for a long time, is already a nova-specs-core and > has proven solid experience in reviews. > > Cores, please vote (-1, 0, +1) before next Wednesday Jan 19th 1600UTC. +1 -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From fungi at yuggoth.org Fri Jan 14 14:50:54 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 14 Jan 2022 14:50:54 +0000 Subject: [all][infra][kayobe][kolla] ping not permitted on latest centos-8-stream images In-Reply-To: References: Message-ID: <20220114145054.6bcnmu2335jepbvq@yuggoth.org> On 2022-01-14 09:50:33 +0100 (+0100), Pierre Riteau wrote: [...] > There are multiple places within Kayobe and Kolla where we would need > to set this sysctl to fix our CI, including backports to all supported > branches. I was wondering if infra could instead customise their > stream image or apply the sysctl in one of the common roles from > zuul/zuul-jobs that are run at the beginning of each job? Many thanks. [...] How close are the CentOS Stream maintainers from uploading a regression fix for the package? If it's going to be a while, then the safest solution is probably to add a platform-specific DIB element and rebuild our centos-8-stream images with that. Making modifications to our "base" job (or any of the roles it uses) is far more time consuming and likely to take a lot longer, because of the precautions we take in order to avoid accidentally breaking every job in the system (changes to the "base" job are not directly testable). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From aschultz at redhat.com Fri Jan 14 15:07:26 2022 From: aschultz at redhat.com (Alex Schultz) Date: Fri, 14 Jan 2022 08:07:26 -0700 Subject: [TripleO] Proposing Juan Badia Payno for TripleO core reviewer. In-Reply-To: References: Message-ID: +1 On Wed, Jan 12, 2022 at 6:16 AM Carlos Camacho Gonzalez wrote: > > Hi everyone! > > I'd like to propose Juan Badia Paino [1][2][3] as a core reviewer on the TripleO repositories that are or might be related to the backup and restore efforts (openstack/tripleo-ci, openstack/tripleo-ansible, openstack/python-tripleoclient, openstack/tripleo-quickstart-extras, openstack/tripleo-quickstart). > > Juan has been around since 2016 making useful contributions and code reviews to the community and I believe adding him to our core reviewer group will help us improve the review and coding speed for the backup and restore codebase. > > As usual, consider this email as an initial +1 from my side, I will keep an eye on this thread for a week, and based on your feedback and if there are no objections I will add him as a core reviewer in two weeks. > > [1]: https://review.opendev.org/q/owner:jbadiapa%2540redhat.com > [2]: https://www.stackalytics.io/?project_type=all&metric=commits&user_id=jbadiapa&release=all > [3]: https://www.stackalytics.io/report/contribution?module=tripleo-group&project_type=openstack&days=180 > > Cheers, > Carlos. From aschultz at redhat.com Fri Jan 14 15:13:25 2022 From: aschultz at redhat.com (Alex Schultz) Date: Fri, 14 Jan 2022 08:13:25 -0700 Subject: [all][infra][kayobe][kolla] ping not permitted on latest centos-8-stream images In-Reply-To: <20220114145054.6bcnmu2335jepbvq@yuggoth.org> References: <20220114145054.6bcnmu2335jepbvq@yuggoth.org> Message-ID: On Fri, Jan 14, 2022 at 7:53 AM Jeremy Stanley wrote: > > On 2022-01-14 09:50:33 +0100 (+0100), Pierre Riteau wrote: > [...] > > There are multiple places within Kayobe and Kolla where we would need > > to set this sysctl to fix our CI, including backports to all supported > > branches. I was wondering if infra could instead customise their > > stream image or apply the sysctl in one of the common roles from > > zuul/zuul-jobs that are run at the beginning of each job? Many thanks. > [...] > > How close are the CentOS Stream maintainers from uploading a > regression fix for the package? If it's going to be a while, then > the safest solution is probably to add a platform-specific DIB > element and rebuild our centos-8-stream images with that. Making > modifications to our "base" job (or any of the roles it uses) is far > more time consuming and likely to take a lot longer, because of the > precautions we take in order to avoid accidentally breaking every > job in the system (changes to the "base" job are not directly > testable). The fix is going into systemd[0]. I'm uncertain the time to hit the mirrors but there is a package already. In the meantime a workaround would be to apply the sysctl values like the tripleo-ci is doing. [0] https://bugzilla.redhat.com/show_bug.cgi?id=2037807 > -- > Jeremy Stanley From fungi at yuggoth.org Fri Jan 14 15:25:26 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 14 Jan 2022 15:25:26 +0000 Subject: [all][infra][kayobe][kolla] ping not permitted on latest centos-8-stream images In-Reply-To: References: <20220114145054.6bcnmu2335jepbvq@yuggoth.org> Message-ID: <20220114152526.6jwft7dc7crspmgi@yuggoth.org> On 2022-01-14 08:13:25 -0700 (-0700), Alex Schultz wrote: [...] > The fix is going into systemd[0]. I'm uncertain the time to hit the > mirrors but there is a package already. In the meantime a workaround > would be to apply the sysctl values like the tripleo-ci is doing. > > [0] https://bugzilla.redhat.com/show_bug.cgi?id=2037807 How do I determine from that what systemd package version number we're looking for? I can force mirror updates and image rebuilds far more quickly than any workarounds which require changing our image building recipes or central job configs, and with no need to spend time cleaning up the workarounds afterwards. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From cboylan at sapwetik.org Fri Jan 14 15:33:20 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 14 Jan 2022 07:33:20 -0800 Subject: [all][infra][kayobe][kolla] ping not permitted on latest centos-8-stream images In-Reply-To: <20220114152526.6jwft7dc7crspmgi@yuggoth.org> References: <20220114145054.6bcnmu2335jepbvq@yuggoth.org> <20220114152526.6jwft7dc7crspmgi@yuggoth.org> Message-ID: On Fri, Jan 14, 2022, at 7:25 AM, Jeremy Stanley wrote: > On 2022-01-14 08:13:25 -0700 (-0700), Alex Schultz wrote: > [...] >> The fix is going into systemd[0]. I'm uncertain the time to hit the >> mirrors but there is a package already. In the meantime a workaround >> would be to apply the sysctl values like the tripleo-ci is doing. >> >> [0] https://bugzilla.redhat.com/show_bug.cgi?id=2037807 > > How do I determine from that what systemd package version number > we're looking for? I can force mirror updates and image rebuilds far > more quickly than any workarounds which require changing our image > building recipes or central job configs, and with no need to spend > time cleaning up the workarounds afterwards. I don't think any package update has been proposed to CentOS 8 Stream yet: https://git.centos.org/rpms/systemd We want systemd-239-55.el8 and no PR exists for that. > -- > Jeremy Stanley From cboylan at sapwetik.org Fri Jan 14 15:35:05 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 14 Jan 2022 07:35:05 -0800 Subject: [all][infra][kayobe][kolla] ping not permitted on latest centos-8-stream images In-Reply-To: <20220114145054.6bcnmu2335jepbvq@yuggoth.org> References: <20220114145054.6bcnmu2335jepbvq@yuggoth.org> Message-ID: <192ecffc-4919-4576-805a-927f9fbd60f5@www.fastmail.com> On Fri, Jan 14, 2022, at 6:50 AM, Jeremy Stanley wrote: > On 2022-01-14 09:50:33 +0100 (+0100), Pierre Riteau wrote: > [...] >> There are multiple places within Kayobe and Kolla where we would need >> to set this sysctl to fix our CI, including backports to all supported >> branches. I was wondering if infra could instead customise their >> stream image or apply the sysctl in one of the common roles from >> zuul/zuul-jobs that are run at the beginning of each job? Many thanks. > [...] > > How close are the CentOS Stream maintainers from uploading a > regression fix for the package? If it's going to be a while, then > the safest solution is probably to add a platform-specific DIB > element and rebuild our centos-8-stream images with that. Making > modifications to our "base" job (or any of the roles it uses) is far > more time consuming and likely to take a lot longer, because of the > precautions we take in order to avoid accidentally breaking every > job in the system (changes to the "base" job are not directly > testable). > -- > Jeremy Stanley I don't think we should update DIB or our images to fix this. The distro is broken and our images accurately represent that state. If the software in CI fails as a result that is because our CI system is properly catching this problem. The software needs to work around this to ensure that it is deployable in the real world and not just on our systems. This approach of fixing it in the software itself appears to be the one TripleO took and is the correct approach. From katonalala at gmail.com Fri Jan 14 15:43:55 2022 From: katonalala at gmail.com (Lajos Katona) Date: Fri, 14 Jan 2022 16:43:55 +0100 Subject: [Neutron][drivers] Proposing Oleg Bondarev (obondarev) to the Drivers team Message-ID: Hi Neutron Drivers, I would like to propose Oleg Bondarev to be a member of the Neutron Drivers team. He has long experience with Neutron, he has been always around to help with advice and reviews, and enthusiastically participated in the Drivers meeting (big +1 as it is on Friday 1400UTC, quite late afternoon in his timezone :-)). Neutron drivers, please vote before the next Drivers meeting (next Friday, 21. January). Best Regards Lajos Katona (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Jan 14 15:52:31 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 14 Jan 2022 15:52:31 +0000 Subject: [all][infra][kayobe][kolla] ping not permitted on latest centos-8-stream images In-Reply-To: <192ecffc-4919-4576-805a-927f9fbd60f5@www.fastmail.com> References: <20220114145054.6bcnmu2335jepbvq@yuggoth.org> <192ecffc-4919-4576-805a-927f9fbd60f5@www.fastmail.com> Message-ID: <20220114155231.zkqlje3bsj7st6dv@yuggoth.org> On 2022-01-14 07:35:05 -0800 (-0800), Clark Boylan wrote: [...] > I don't think we should update DIB or our images to fix this. The > distro is broken and our images accurately represent that state. > If the software in CI fails as a result that is because our CI > system is properly catching this problem. The software needs to > work around this to ensure that it is deployable in the real world > and not just on our systems. > > This approach of fixing it in the software itself appears to be > the one TripleO took and is the correct approach. Thanks, in reflection I agree. It's good to keep reminding ourselves that what we're testing is that the software works on the target platform. Unfortunate and temporary as it may be, the current state of CentOS Stream 8 is that you need root privileges in order to use the ping utility. If we work around this in our testing, then users who are trying to deploy that software onto the current state of CentOS Stream 8 will not get the benefit of the workaround. It's good to be reminded that the goal is not to make tests pass no matter the cost, it's to make sure the software will work for its users. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ralonsoh at redhat.com Fri Jan 14 17:16:13 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Fri, 14 Jan 2022 18:16:13 +0100 Subject: [Neutron][drivers] Proposing Oleg Bondarev (obondarev) to the Drivers team In-Reply-To: References: Message-ID: Oleg will be a great addition to the drivers team. +1 from me. On Fri, Jan 14, 2022 at 4:51 PM Lajos Katona wrote: > Hi Neutron Drivers, > > I would like to propose Oleg Bondarev to be a member of the Neutron > Drivers team. > He has long experience with Neutron, he has been always around to help > with advice and reviews, and enthusiastically participated in the Drivers > meeting (big +1 as it is on Friday 1400UTC, quite late afternoon in his > timezone :-)). > > Neutron drivers, please vote before the next Drivers meeting (next Friday, > 21. January). > > Best Regards > Lajos Katona (lajoskatona) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Jan 14 17:18:28 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 14 Jan 2022 11:18:28 -0600 Subject: [gate][nova][qa] tempest-*-centos-8-stream job failing consistently since yesterday Message-ID: <17e59990993.b47647d1605836.2392080917091176150@ghanshyammann.com> Hello Everyone, You might have noticed that 'tempest-integrated-compute-centos-8-stream' and 'tempest tempest-full-py3-centos-8-stream' job started failing the below two tests consistently since yesterday (~7 PM CST) - https://169dddc67bd535a0361f-0632fd6194b48b475d9eb0d8f7720c6c.ssl.cf2.rackcdn.com/824478/5/check/tempest-integrated-compute-centos-8-stream/e0db6a9/testr_results.html I have filed the bug and to unblock the gate (nova & tempest), I have pushed patch to make these job non voting until bug is fixed. - https://review.opendev.org/c/openstack/tempest/+/824740 Please hold the recheck on nova or tempest (or any other effected project). ralonsoh mentioned that this is not the same issue which is raised in - http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026682.html or may be triggered due to same root cause? -gmann From lance at osuosl.org Fri Jan 14 17:24:28 2022 From: lance at osuosl.org (Lance Albertson) Date: Fri, 14 Jan 2022 09:24:28 -0800 Subject: [all][infra][qa][tripleo][openstack-ansible][rally][aodh][cinder][kolla][chef][sahara] CentOS 8 EOL and removal from CI label/image lists In-Reply-To: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> References: <5ae024ea-9cdf-4fe0-9794-27fcda507e4b@www.fastmail.com> Message-ID: On Tue, Jan 11, 2022 at 1:52 PM Clark Boylan wrote: > As noted last month the OpenDev [0] team intends on removing CentOS 8 > images from our CI system now that the release has gone EOL. A number of > you have already shifted over to CentOS 8 Stream in CI (thank you!), but > there is still quite a bit remaining based on codesearch and some manual > digging. The OpenDev team has begun the process of removing some of the > supporting infrastructure and testing as well [1]. > > This list is probably not comprehensive but is a start. These projects > will need to look at removing their CentOS 8 CI jobs (optionally replacing > them with CentOS 8 Stream jobs): > > * openstack-chef > I was planning on making this change in the next week or so to Stream. -- Lance Albertson Director Oregon State University | Open Source Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Fri Jan 14 17:29:38 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 14 Jan 2022 09:29:38 -0800 Subject: [gate][nova][qa] tempest-*-centos-8-stream job failing consistently since yesterday In-Reply-To: <17e59990993.b47647d1605836.2392080917091176150@ghanshyammann.com> References: <17e59990993.b47647d1605836.2392080917091176150@ghanshyammann.com> Message-ID: <9698de60-e9c9-4e4f-bf69-ad90689ff287@www.fastmail.com> On Fri, Jan 14, 2022, at 9:18 AM, Ghanshyam Mann wrote: > Hello Everyone, > > You might have noticed that 'tempest-integrated-compute-centos-8-stream' > and 'tempest tempest-full-py3-centos-8-stream' job started failing the > below two > tests consistently since yesterday (~7 PM CST) > > - > https://169dddc67bd535a0361f-0632fd6194b48b475d9eb0d8f7720c6c.ssl.cf2.rackcdn.com/824478/5/check/tempest-integrated-compute-centos-8-stream/e0db6a9/testr_results.html > > I have filed the bug and to unblock the gate (nova & tempest), I have > pushed patch > to make these job non voting until bug is fixed. > > - https://review.opendev.org/c/openstack/tempest/+/824740 > > Please hold the recheck on nova or tempest (or any other effected project). > > ralonsoh mentioned that this is not the same issue which is raised in > - > http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026682.html > > or may be triggered due to same root cause? I think this is the very same issue. These tests are failing in tempest when attempting to verify VM connectivity. The vm connectivity checking routines appear to fork and exec ping in tempest and tempest does not run as root. > > > -gmann From elod.illes at est.tech Fri Jan 14 20:27:45 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Fri, 14 Jan 2022 21:27:45 +0100 Subject: [all] Proposed Z cycle schedule Message-ID: Hi, As we are beyond Yoga milestone 2 it's time to start planning the next, 'Z' cycle and its release schedule, so I've prepared a draft. (Since we don't have the name yet [1] I used the *codename* Zen in the patch, that will be updated whenever the official release name is announced) The patch: https://review.opendev.org/c/openstack/releases/+/824489 (Or see the current patch set's generated schedule page [2] for easier overview.) Feel free to review it and comment on the patch if there is something that should be considered for the schedule. Thanks in advance! [1] https://wiki.openstack.org/wiki/Release_Naming/Z_Proposals [2] https://1b3db30de08d0b6cba67-15d40643a81384357fc5a6da097a7ab7.ssl.cf5.rackcdn.com/824489/2/check/openstack-tox-docs/d92363b/docs/zen/schedule.html El?d Ill?s irc: elodilles -------------- next part -------------- An HTML attachment was scrubbed... URL: From cel975 at yahoo.com Fri Jan 14 23:23:26 2022 From: cel975 at yahoo.com (Celinio Fernandes) Date: Fri, 14 Jan 2022 23:23:26 +0000 (UTC) Subject: Cannot ssh/ping instance In-Reply-To: <5690630.DvuYhMxLoT@p1> References: <869855278.629940.1641716238605.ref@mail.yahoo.com> <869855278.629940.1641716238605@mail.yahoo.com> <5690630.DvuYhMxLoT@p1> Message-ID: <2012398446.869565.1642202606158@mail.yahoo.com> Thanks very much for your help. Before you replied, I tried what you wrote but on the wrong interfaces : enp0s3 and virbr0. I had no idea I needed to add the IP address from the public network's subnet on the br-ex interface. So to ping/ssh the floating IP this is what I did : ip link set dev br-ex up ip link set dev br-ex state up sudo ip addr add 172.24.4.254/24 dev br-ex And then I can finally ping the floating IP : ping 172.24.4.133 And I can also ssh into the VM : ssh cirros at 172.24.4.133 Thanks again :) On Sunday, January 9, 2022, 08:21:18 PM GMT+1, Slawek Kaplonski wrote: Hi, On niedziela, 9 stycznia 2022 09:17:18 CET Celinio Fernandes wrote: > Hi, > I am running Ubuntu Server 20.04 LTS on Virtualbox. I installed OpenStack > (Xena release) through Devstack. Here is the content of my > /opt/stack/devstack/local.conf file : > [[local|localrc]] > ADMIN_PASSWORD=secret > DATABASE_PASSWORD=$ADMIN_PASSWORD > RABBIT_PASSWORD=$ADMIN_PASSWORD > SERVICE_PASSWORD=$ADMIN_PASSWORD > HOST_IP=10.0.2.15 > > > I created an instance through Horizon. The security group contains the > 2 rules needed (one to be able to ping and one to be able to ssh the > instance). I also allocated and associated a floating IP address. And a ssh > key pair. > > Here is the configuration : > openstack server list > ---------------------------------+--------------------------+---------+ > > | ID? | Name | Status | Networks | Image? | Flavor? | > > ---------------------------------+--------------------------+---------+ > > | f5f0fdd5-298b-4fa3-9ee9-e6e4288f4327 | InstanceJanvier | ACTIVE | > | shared=172.24.4.133, 192.168.233.165 | cirros-0.5.2-x86_64-disk | m1.nano > | | > ------------------------------------------------------+ > > > openstack network list : > ------------------------------------------------------+ > > | ID? ? | Name? ? | Subnets? ? ? ? ? ? | > > ------------------------------------------------------+ > > | 96a04799-7fc7-4525-b05c-ad57261aed38 | public? | > | 07ce42db-6f3f-4135-ace7-2fc104ea62a0, 6dba13fc-b10c-48b1-b1b4-e1f1afe25b53 > | | c42638dc-fa56-4644-ad34-295fce4811d2 | shared? | > | a4e2d8cc-02b2-42e2-a525-e0eebbb08980? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > | | ffb8a527-266e-4e96-ad60-f7e9aba8f0c1 | private | > | 42e36677-cf3c-4df4-88a1-8cf79b9d6060, e507e6dd-132a-4249-96b1-83761562dd73 > | | > ------------------------------------------------------+ > > openstack router list : > +--------------------------------------+----------------+--------+------ > > | ID? ? | Name? | Status | State | Project? ? ? ? ? ? ? ? ? ? ? ? ? | > > +--------------------------------------+----------------+--------+------ > > | b9a15051-a532-4c93-95ad-53c057720c62 | Virtual_router | ACTIVE | UP? ? | > | 6556c02dd88f4c45b535c2dbb8ba1a04 | > +--------------------------------------+----------------+--------+------ > > > I cannot ping/ssh neither the fixed IP address or the floating IP address : > ping -c 3 172.24.4.133 > PING 172.24.4.133 (172.24.4.133) 56(84) bytes of data. > --- 172.24.4.133 ping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > ping -c 3 192.168.233.165 > PING 192.168.233.165 (192.168.233.165) 56(84) bytes of data. > --- 192.168.233.165 ping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > Maybe that has something to do with the network namespaces configuration on > Ubuntu. Does anyone know what could go wrong or what is missing ? > Thanks for helping. If You are trying to ping Floating IP directly from the host where devstack is installed (Virtualbox VM in Your case IIUC) then You should first have those floating IP addresses somehow reachable on the host, otherwise traffic is probably going through default gateway so is going outside the VM. If You are using ML2/OVN (default in Devstack) or ML2/OVS You probably have in the openvswitch bridge called br-ex which is used to send external network traffic from the OpenStack networks in Devstack. In such case You can e.g. add some IP address from the public network's subnet on the br-ex interface, like 192.168.233.254/24 - that will tell Your OS to reach that subnet through br- ex, so traffic will be able to go "into" the OVS managed by Neutron. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurentfdumont at gmail.com Sat Jan 15 01:46:39 2022 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Fri, 14 Jan 2022 20:46:39 -0500 Subject: [kolla-ansible] Gather facts strange error In-Reply-To: References: Message-ID: Is 192.168.75.144 a compute/controller/ansible node? Are you able to ssh into it? The ssh error are a bit weird, assuming you are on a flat lan. What are you attempting to do with the kolla command? On Fri, Jan 14, 2022 at 12:34 AM Satish Patel wrote: > Folks, > > I have a working cluster but today when I ran the following command I > got very nasty error like the following. Even 192.168.75.144 is > function node but giving super log error output that i can't copy > paste here. > > $ sudo kolla-ansible -i /etc/kolla/multinode deploy -t nova-cell > --limit 192.168.75.144 > ... > .... > ....... > PLAY [Gather facts for all hosts] > > ******************************************************************************************************************************************************************** > > TASK [Gather facts] > > ********************************************************************************************************************************************************************************** > ok: [192.168.75.144] > > TASK [Group hosts to determine when using --limit] > > *************************************************************************************************************************************************** > ok: [192.168.75.144] > > PLAY [Gather facts for all hosts (if using --limit)] > > ************************************************************************************************************************************************* > > TASK [Gather facts] > > ********************************************************************************************************************************************************************************** > skipping: [192.168.75.144] => (item=192.168.75.144) > ok: [192.168.75.144 -> 192.168.75.145] => (item=192.168.75.145) > ok: [192.168.75.144 -> 192.168.75.146] => (item=192.168.75.146) > failed: [192.168.75.144 -> 192.168.75.178] (item=192.168.75.178) => > {"ansible_loop_var": "item", "item": "192.168.75.178", "msg": "Failed > to connect to the host via ssh: ssh: connect to host 192.168.75.178 > port 22: No route to host", "unreachable": true} > ok: [192.168.75.144 -> 192.168.75.179] => (item=192.168.75.179) > failed: [192.168.75.144 -> 192.168.75.180] (item=192.168.75.180) => > {"ansible_loop_var": "item", "item": "192.168.75.180", "msg": "Failed > to connect to the host via ssh: ssh: connect to host 192.168.75.180 > port 22: No route to host", "unreachable": true} > ok: [192.168.75.144 -> 192.168.75.181] => (item=192.168.75.181) > ok: [192.168.75.144 -> 192.168.75.147] => (item=192.168.75.147) > ok: [192.168.75.144 -> localhost] => (item=localhost) > fatal: [192.168.75.144 -> {{ item }}]: UNREACHABLE! => {"changed": > false, "msg": "All items completed", "results": [{"ansible_loop_var": > "item", "changed": false, "item": "192.168.75.144", "skip_reason": > "Conditional result was False", "skipped": true}, {"ansible_facts": > {"ansible_all_ipv4_addresses": ["192.168.75.145"], > "ansible_all_ipv6_addresses": ["fe80::3eec:efff:fe1f:1776", > "fe80::f802:e3ff:fe71:e58c"], "ansible_apparmor": {"status": > "disabled"}, "ansible_architecture": "x86_64", "ansible_bios_date": > "02/27/2020", "ansible_bios_version": "3.3", "ansible_br_ex": > {"active": false, "device": "br-ex", "features": {"esp_hw_offload": > "off [fixed]", "esp_tx_csum_hw_offload": "off [fixed]", "fcoe_mtu": > "off [fixed]", "generic_receive_offload": "on", > "generic_segmentation_offload": "on", "highdma": "on", > "hw_tc_offload": "off [fixed]", "l2_fwd_offload": "off [fixed]", > "large_receive_offload": "off [fixed]", "loopback": "off [fixed]", > "netns_local": "off [fixed]", "ntuple_filters": "off [fixed]", > "receive_hashing": "off [fixed]", "rx_all": "off [fixed]", > "rx_checksumming": "off [fixed]", "rx_fcs": "off [fixed]", > "rx_gro_hw": "off [fixed]", "rx_gro_list": "off", > "rx_udp_tunnel_port_offload": "off [fixed]", "rx_vlan_filter": "off > [fixed]", "rx_vlan_offload": "off [fixed]", "rx_vlan_stag_filter": > "off [fixed]", "rx_vlan_stag_hw_parse": "off [fixed]", > "scatter_gather": "on", "tcp_segmentation_offload": "on", > "tls_hw_record": "off [fixed]", "tls_hw_rx_offload": "off [fixed]", > > .... > .... > END > > PLAY RECAP > ******************************************************************************************************************************************************************************************* > 192.168.75.144 : ok=2 changed=0 unreachable=1 > failed=0 skipped=0 rescued=0 ignored=0 > > Command failed ansible-playbook -i /etc/kolla/multinode -e > @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e > CONFIG_DIR=/etc/kolla --tags nova-cell --limit 192.168.75.144 -e > kolla_action=deploy /usr/local/share/kolla-ansible/ansible/site.yml > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Sat Jan 15 03:22:36 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 14 Jan 2022 21:22:36 -0600 Subject: [all][tc] What's happening in Technical Committee: summary 14th Jan, 21: Reading: 5 min Message-ID: <17e5bc2217b.e94248fe616050.369174012774815179@ghanshyammann.com> Hello Everyone, Here is this week's summary of the Technical Committee activities. 1. TC Meetings: ============ * We had this week meeting yesterday. Most of the meeting discussions are summarized below (Completed or in-progress activities section). Meeting full logs are available @https://meetings.opendev.org/meetings/tc/2022/tc.2022-01-13-15.00.log.html * Next week's meeting is on 20th Jan Thursday 15:00 UTC, feel free add the topic on the agenda[1] by 19th Jan. 2. What we completed this week: ========================= * Added the cinder-solidfire charm to Openstack charms[2] * Added the cinder-nimblestorage charm to Openstack charms[3] 3. Activities In progress: ================== TC Tracker for Yoga cycle ------------------------------ * This etherpad includes the Yoga cycle targets/working items for TC[4]. Open Reviews ----------------- * 6 open reviews for ongoing activities[5]. Z release cycle name ------------------------- Z release cycle naming process is started[6]. Nomination to collect the name are now open, feel free to propose it on wiki page[7]. TC vote on gender-neutral language change in bylaws 's TC section ------------------------------------------------------------------------------- In this week meeting, TC voted in favor of gender-neutral language change in bylaws 's TC section. They are officially approved and votes are recorded in meeting logs[8]. Remove the tags framework --------------------------------- yoctozepto has proposed the WIP patch to remove the tag framework, feel free to review and provide early feedback[9]. SIG i18n status check -------------------------- Ian (SIG chair) is now back and fixed the IN translation issue also. So we are good here, but if you would like to help in i18n SIG, feel free to reachout to Ian. OpenStack Pain points discussion ---------------------------------------- No updates in this week. Stay tuned on the ML thread[10] and Rico will inform you about the next meeting details. TC position on release cadence ------------------------------------- No updates on this. Discussion is in-progress in ML thread[11]. No other updates from TC on this and we will set up a separate call to continue the discussion. Fixing Zuul config error ---------------------------- Requesting projects with zuul config error to look into those and fix them which should not take much time[12]. Adjutant need maintainers and PTLs ------------------------------------------- We are still waiting to hear from Braden, Albert on permission to work on this project[13]. Project updates ------------------- * masakari Transfer PTL role to suzhengwei[14] * Retire js-openstack-lib (waiting on Adjutant to have new PTL/maintainer) [15] 4. How to contact the TC: ==================== If you would like to discuss or give feedback to TC, you can reach out to us in multiple ways: 1. Email: you can send the email with tag [tc] on openstack-discuss ML[16]. 2. Weekly meeting: The Technical Committee conduct a weekly meeting every Thursday 15 UTC [17] 3. Ping us using 'tc-members' nickname on #openstack-tc IRC channel. Welcome back from the holidays and stay safe! [1] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting [2] https://review.opendev.org/c/openstack/governance/+/823805 [3] https://review.opendev.org/c/openstack/governance/+/824585 [4] https://etherpad.opendev.org/p/tc-yoga-tracker [5] https://review.opendev.org/q/projects:openstack/governance+status:open [6] http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026620.html [7] https://wiki.openstack.org/wiki/Release_Naming/Z_Proposals [8] https://meetings.opendev.org/meetings/tc/2022/tc.2022-01-13-15.00.log.html#l-18 [9] https://review.opendev.org/c/openstack/governance/+/822900 [10] http://lists.openstack.org/pipermail/openstack-discuss/2021-December/026245.html [11] http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025684.html [12] https://etherpad.opendev.org/p/zuul-config-error-openstack [13] http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025786.html [14] https://review.opendev.org/c/openstack/governance/+/824509 [15] https://review.opendev.org/c/openstack/governance/+/798540 [16] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss [17] http://eavesdrop.openstack.org/#Technical_Committee_Meeting -gmann From miguel at mlavalle.com Sat Jan 15 16:24:45 2022 From: miguel at mlavalle.com (Miguel Lavalle) Date: Sat, 15 Jan 2022 10:24:45 -0600 Subject: [Neutron][drivers] Proposing Oleg Bondarev (obondarev) to the Drivers team In-Reply-To: References: Message-ID: Long overdue! Big +2 from me On Fri, Jan 14, 2022 at 11:24 AM Rodolfo Alonso Hernandez < ralonsoh at redhat.com> wrote: > Oleg will be a great addition to the drivers team. +1 from me. > > On Fri, Jan 14, 2022 at 4:51 PM Lajos Katona wrote: > >> Hi Neutron Drivers, >> >> I would like to propose Oleg Bondarev to be a member of the Neutron >> Drivers team. >> He has long experience with Neutron, he has been always around to help >> with advice and reviews, and enthusiastically participated in the Drivers >> meeting (big +1 as it is on Friday 1400UTC, quite late afternoon in his >> timezone :-)). >> >> Neutron drivers, please vote before the next Drivers meeting (next >> Friday, 21. January). >> >> Best Regards >> Lajos Katona (lajoskatona) >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From soumplis at admin.grnet.gr Sat Jan 15 22:03:13 2022 From: soumplis at admin.grnet.gr (Alexandros Soumplis) Date: Sun, 16 Jan 2022 00:03:13 +0200 Subject: [ops][kolla][kolla-ansible][neutron] Xena upgrade connectivity issues Message-ID: <8855c543-5fb6-a1e9-46db-6281f41fd261@admin.grnet.gr> Hello all, I am doing an upgrade with kolla ansible from Wallaby to Xena, using source containers on ubuntu and openvswitch. After the upgrade, there is no connectivity from the VMs to the network (both provider networks and self-service) and also neutron services (dhcp, l3-agent, metadata) do not work. For the VMs I had no errors related, while for neutron services, probably due to lack of connectivity a continuous flapping of virtual routers and dhcp agents. I have downgraded neutron_openvswitch_agent, openvswitch_vswitchd and openvswitch_db to wallaby again and it everything works. I suspect there is something related to the flows created from the neutron agent but was not able to identify anything specific. Any help to debug this and identify the root cause is greatly appreciated. Thank you! a. From rlandy at redhat.com Sun Jan 16 10:39:04 2022 From: rlandy at redhat.com (Ronelle Landy) Date: Sun, 16 Jan 2022 05:39:04 -0500 Subject: [TripleO] Gate blocker - tripleo-ci-centos-9-content-provider Message-ID: Hello All, We have gate/check blocker which started late on Friday - and still seems to be hitting jobs in check and gate: 2022-01-16 02:53:58.796730 | primary | Error: 2022-01-16 02:53:58.796795 | primary | Problem: cannot install the best update candidate for package gcc-11.2.1-6.1.el9.x86_64 2022-01-16 02:53:58.796813 | primary | - nothing provides libgomp = 11.2.1-7.4.el9 needed by gcc-11.2.1-7.4.el9.x86_64 2022-01-16 02:53:58.796828 | primary | - nothing provides libgcc >= 11.2.1-7.4.el9 needed by gcc-11.2.1-7.4.el9.x86_64 2022-01-16 02:53:58.796904 | primary | (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages) https://composes.stream.centos.org/production/latest-CentOS-Stream/compose/BaseOS/x86_64/os/Packages/ does show libgomp-11.2.1-7.4.el9.x86_64.rpm 2022-01-11 11:59 283K but http://mirror.iad3.inmotion.opendev.org/centos-stream/9-stream/BaseOS/x86_64/os/Packages/ (for example) does not. It looks like the mirrors do not have this latest content. The failure is logged at: https://bugs.launchpad.net/tripleo/+bug/1957950. We will respond to this posting once the failure has been cleared. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cel975 at yahoo.com Sun Jan 16 11:08:20 2022 From: cel975 at yahoo.com (Celinio Fernandes) Date: Sun, 16 Jan 2022 11:08:20 +0000 (UTC) Subject: Cannot ssh/ping instance In-Reply-To: <2012398446.869565.1642202606158@mail.yahoo.com> References: <869855278.629940.1641716238605.ref@mail.yahoo.com> <869855278.629940.1641716238605@mail.yahoo.com> <5690630.DvuYhMxLoT@p1> <2012398446.869565.1642202606158@mail.yahoo.com> Message-ID: <1103131717.1158277.1642331300997@mail.yahoo.com> Hi, I can ssh into the instance now but I noticed the VM does not have any external network access (internet). Before I dig any deeper into that problem, does anyone know what configuration i need to set up for that ? I already added 2 new security rules to make sure I can access HTTP and HTTPS ports (80 and 443), in vain : Ingress?? IPv4? TCP?? 80 (HTTP)?? 0.0.0.0/0 Ingress?? IPv4? TCP?? 443 (HTTPS)?? 0.0.0.0/0 Thanks. On Saturday, January 15, 2022, 12:29:40 AM GMT+1, Celinio Fernandes wrote: Thanks very much for your help. Before you replied, I tried what you wrote but on the wrong interfaces : enp0s3 and virbr0. I had no idea I needed to add the IP address from the public network's subnet on the br-ex interface. So to ping/ssh the floating IP this is what I did : ip link set dev br-ex up ip link set dev br-ex state up sudo ip addr add 172.24.4.254/24 dev br-ex And then I can finally ping the floating IP : ping 172.24.4.133 And I can also ssh into the VM : ssh cirros at 172.24.4.133 Thanks again :) On Sunday, January 9, 2022, 08:21:18 PM GMT+1, Slawek Kaplonski wrote: Hi, On niedziela, 9 stycznia 2022 09:17:18 CET Celinio Fernandes wrote: > Hi, > I am running Ubuntu Server 20.04 LTS on Virtualbox. I installed OpenStack > (Xena release) through Devstack. Here is the content of my > /opt/stack/devstack/local.conf file : > [[local|localrc]] > ADMIN_PASSWORD=secret > DATABASE_PASSWORD=$ADMIN_PASSWORD > RABBIT_PASSWORD=$ADMIN_PASSWORD > SERVICE_PASSWORD=$ADMIN_PASSWORD > HOST_IP=10.0.2.15 > > > I created an instance through Horizon. The security group contains the > 2 rules needed (one to be able to ping and one to be able to ssh the > instance). I also allocated and associated a floating IP address. And a ssh > key pair. > > Here is the configuration : > openstack server list > ---------------------------------+--------------------------+---------+ > > | ID? | Name | Status | Networks | Image? | Flavor? | > > ---------------------------------+--------------------------+---------+ > > | f5f0fdd5-298b-4fa3-9ee9-e6e4288f4327 | InstanceJanvier | ACTIVE | > | shared=172.24.4.133, 192.168.233.165 | cirros-0.5.2-x86_64-disk | m1.nano > | | > ------------------------------------------------------+ > > > openstack network list : > ------------------------------------------------------+ > > | ID? ? | Name? ? | Subnets? ? ? ? ? ? | > > ------------------------------------------------------+ > > | 96a04799-7fc7-4525-b05c-ad57261aed38 | public? | > | 07ce42db-6f3f-4135-ace7-2fc104ea62a0, 6dba13fc-b10c-48b1-b1b4-e1f1afe25b53 > | | c42638dc-fa56-4644-ad34-295fce4811d2 | shared? | > | a4e2d8cc-02b2-42e2-a525-e0eebbb08980? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > | | ffb8a527-266e-4e96-ad60-f7e9aba8f0c1 | private | > | 42e36677-cf3c-4df4-88a1-8cf79b9d6060, e507e6dd-132a-4249-96b1-83761562dd73 > | | > ------------------------------------------------------+ > > openstack router list : > +--------------------------------------+----------------+--------+------ > > | ID? ? | Name? | Status | State | Project? ? ? ? ? ? ? ? ? ? ? ? ? | > > +--------------------------------------+----------------+--------+------ > > | b9a15051-a532-4c93-95ad-53c057720c62 | Virtual_router | ACTIVE | UP? ? | > | 6556c02dd88f4c45b535c2dbb8ba1a04 | > +--------------------------------------+----------------+--------+------ > > > I cannot ping/ssh neither the fixed IP address or the floating IP address : > ping -c 3 172.24.4.133 > PING 172.24.4.133 (172.24.4.133) 56(84) bytes of data. > --- 172.24.4.133 ping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > ping -c 3 192.168.233.165 > PING 192.168.233.165 (192.168.233.165) 56(84) bytes of data. > --- 192.168.233.165 ping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > Maybe that has something to do with the network namespaces configuration on > Ubuntu. Does anyone know what could go wrong or what is missing ? > Thanks for helping. If You are trying to ping Floating IP directly from the host where devstack is installed (Virtualbox VM in Your case IIUC) then You should first have those floating IP addresses somehow reachable on the host, otherwise traffic is probably going through default gateway so is going outside the VM. If You are using ML2/OVN (default in Devstack) or ML2/OVS You probably have in the openvswitch bridge called br-ex which is used to send external network traffic from the OpenStack networks in Devstack. In such case You can e.g. add some IP address from the public network's subnet on the br-ex interface, like 192.168.233.254/24 - that will tell Your OS to reach that subnet through br- ex, so traffic will be able to go "into" the OVS managed by Neutron. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Sun Jan 16 13:26:13 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sun, 16 Jan 2022 13:26:13 +0000 Subject: [TripleO] Gate blocker - tripleo-ci-centos-9-content-provider In-Reply-To: References: Message-ID: <20220116132612.dfjpyacx2oko42zq@yuggoth.org> On 2022-01-16 05:39:04 -0500 (-0500), Ronelle Landy wrote: [...] > https://composes.stream.centos.org/production/latest-CentOS-Stream/compose/BaseOS/x86_64/os/Packages/ > does show libgomp-11.2.1-7.4.el9.x86_64.rpm > > 2022-01-11 11:59 283K but > http://mirror.iad3.inmotion.opendev.org/centos-stream/9-stream/BaseOS/x86_64/os/Packages/ > (for example) does not. > > It looks like the mirrors do not have this latest content. [...] Our mirrors pull updates for that from rsync://dfw.mirror.rackspace.com/centos-stream/9-stream/ and last succeeded at 2022-01-14 14:07 UTC according to the timestamp.txt file. I can see in our update logs that, starting at 2022-01-14 16:07 UTC, we've been getting the following error: rsync: failed to connect to dfw.mirror.rackspace.com (74.205.112.120): Connection refused (111) I don't know if that site has ceased providing rsync access or is merely suffering from a temporary problem, but we can switch to pulling updates from another official rsync site if someone can suggest an appropriate change to this file: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/files/centos-stream-mirror-update -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From chkumar at redhat.com Sun Jan 16 14:14:25 2022 From: chkumar at redhat.com (Chandan Kumar) Date: Sun, 16 Jan 2022 19:44:25 +0530 Subject: [TripleO] Gate blocker - tripleo-ci-centos-9-content-provider In-Reply-To: <20220116132612.dfjpyacx2oko42zq@yuggoth.org> References: <20220116132612.dfjpyacx2oko42zq@yuggoth.org> Message-ID: Hello Jeremy, On Sun, Jan 16, 2022 at 7:03 PM Jeremy Stanley wrote: > > On 2022-01-16 05:39:04 -0500 (-0500), Ronelle Landy wrote: > [...] > > https://composes.stream.centos.org/production/latest-CentOS-Stream/compose/BaseOS/x86_64/os/Packages/ > > does show libgomp-11.2.1-7.4.el9.x86_64.rpm > > > > 2022-01-11 11:59 283K but > > http://mirror.iad3.inmotion.opendev.org/centos-stream/9-stream/BaseOS/x86_64/os/Packages/ > > (for example) does not. > > > > It looks like the mirrors do not have this latest content. > [...] > > Our mirrors pull updates for that from > rsync://dfw.mirror.rackspace.com/centos-stream/9-stream/ and last > succeeded at 2022-01-14 14:07 UTC according to the timestamp.txt > file. I can see in our update logs that, starting at > 2022-01-14 16:07 UTC, we've been getting the following error: > > rsync: failed to connect to dfw.mirror.rackspace.com > (74.205.112.120): Connection refused (111) > > I don't know if that site has ceased providing rsync access or is > merely suffering from a temporary problem, but we can switch to > pulling updates from another official rsync site if someone can > suggest an appropriate change to this file: > Thank you for looking into this. Based on https://admin.fedoraproject.org/mirrormanager/mirrors/CentOS/9-stream/x86_64, I have proposed a patch to use facebook rsync mirror to pull cs9 contents: https://review.opendev.org/c/opendev/system-config/+/824829 It might be more reliable, The above missing package is also available on facebook mirror: https://mirror.facebook.net/centos-stream/9-stream/BaseOS/x86_64/os/Packages/libgomp-11.2.1-7.4.el9.x86_64.rpm I hope it will help to unblock the CI. Thanks, Chandan Kumar From fungi at yuggoth.org Sun Jan 16 14:41:20 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sun, 16 Jan 2022 14:41:20 +0000 Subject: [TripleO] Gate blocker - tripleo-ci-centos-9-content-provider In-Reply-To: References: <20220116132612.dfjpyacx2oko42zq@yuggoth.org> Message-ID: <20220116144119.icvk7rgjahdaswz3@yuggoth.org> On 2022-01-16 19:44:25 +0530 (+0530), Chandan Kumar wrote: [...] > Thank you for looking into this. > Based on https://admin.fedoraproject.org/mirrormanager/mirrors/CentOS/9-stream/x86_64, > I have proposed a patch to use facebook rsync mirror to pull cs9 > contents: https://review.opendev.org/c/opendev/system-config/+/824829 > > It might be more reliable, The above missing package is also available > on facebook mirror: > https://mirror.facebook.net/centos-stream/9-stream/BaseOS/x86_64/os/Packages/libgomp-11.2.1-7.4.el9.x86_64.rpm > > I hope it will help to unblock the CI. I've update the mirror successfully from the URL you provided, so please check at your earliest convenience that jobs are finding the packages you expect. Thanks! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From rlandy at redhat.com Sun Jan 16 15:48:42 2022 From: rlandy at redhat.com (Ronelle Landy) Date: Sun, 16 Jan 2022 10:48:42 -0500 Subject: [TripleO] Gate blocker - tripleo-ci-centos-9-content-provider In-Reply-To: <20220116144119.icvk7rgjahdaswz3@yuggoth.org> References: <20220116132612.dfjpyacx2oko42zq@yuggoth.org> <20220116144119.icvk7rgjahdaswz3@yuggoth.org> Message-ID: On Sun, Jan 16, 2022 at 9:46 AM Jeremy Stanley wrote: > On 2022-01-16 19:44:25 +0530 (+0530), Chandan Kumar wrote: > [...] > > Thank you for looking into this. > > Based on > https://admin.fedoraproject.org/mirrormanager/mirrors/CentOS/9-stream/x86_64 > , > > I have proposed a patch to use facebook rsync mirror to pull cs9 > > contents: https://review.opendev.org/c/opendev/system-config/+/824829 > > > > It might be more reliable, The above missing package is also available > > on facebook mirror: > > > https://mirror.facebook.net/centos-stream/9-stream/BaseOS/x86_64/os/Packages/libgomp-11.2.1-7.4.el9.x86_64.rpm > > > > I hope it will help to unblock the CI. > > I've update the mirror successfully from the URL you provided, so > please check at your earliest convenience that jobs are finding the > packages you expect. Thanks! Thanks Chandan and Jeremy for taking action on this so quickly! Rekicked a few jobs and it looks like they are proceeding past the failure point. > > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregory.orange at pawsey.org.au Mon Jan 17 01:13:58 2022 From: gregory.orange at pawsey.org.au (Gregory Orange) Date: Mon, 17 Jan 2022 09:13:58 +0800 Subject: [kolla] [horizon] Custom logos (WAS: [Openstack-operators] Horizon Custom Logos (Queens, 13.0.1)) In-Reply-To: References: <5B7AEF19.5010502@soe.ucsc.edu> <4a74f5c6-5da3-d756-1bd2-5b7fa67ce11a@pawsey.org.au> Message-ID: <89b0b5ba-1449-1533-a9dc-d5aa7aa3d283@pawsey.org.au> On 13/1/22 5:06 pm, Mark Goddard wrote: > On Thu, 13 Jan 2022 at 05:19, Gregory Orange > wrote: >> We have been using Ubuntu VMs for the control plane until now, so it was >> a simple matter of inserting our logo-splash.svg and logo.svg into >> /var/lib/openstack-dashboard/static/dashboard/img/ and then restarting >> services. >> >> Now we're switching to Kolla, and the relevant path isn't mounted as is >> the case with the likes of /etc/kolla/horizon and /var/log/kolla. We >> don't (yet?) build our own container images, so I'm wondering what next. >> > Typically what we do is create a theme repository, e.g. > https://github.com/stackhpc/horizon-theme. This is then built into the > image in/etc/openstack-dashboard/themes/. > > There is another approach proposed which does not involve rebuilding > the image, but it is still WIP: > https://review.opendev.org/c/openstack/kolla-ansible/+/761364 Good to know, thank you. For now I have figured out that 'docker cp'ing the files into place works, although of course that doesn't persist across things like reconfigure runs. Curiously though it does persist with a container restart, even though I didn't `commit` the change to the container image. Cheers, Greg. From zhangbailin at inspur.com Mon Jan 17 01:12:44 2022 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Mon, 17 Jan 2022 01:12:44 +0000 Subject: [cyborg] Proposing core reviewers Message-ID: Hello all, Eric xie has been actively contributing to Cyborg in various areas, adding new features, improving quality, reviewing patches. Despite the relatively short time, he has been one of the most prolific contributors, and brings an enthusiastic and active mindset. I would like to thank and acknowledge him for his steady valuable contributions, and propose him as a core reviewer for Cyborg. Some of the currently listed core reviewers have not been participating for a lengthy period of time. It is proposed that those who have had no contributions for the past 18 months ?C i.e. no participation in meetings, no code contributions, not participating in Cyborg open source activities and no reviews ?C be removed from the list of core reviewers. -- The Cyborg team recognizes everyone's contributions, but we need to ensure the activity of the core-reviewer list. If you are interested in rejoining the cyborg team, feel free to ping us to restore the core reviewer for you. If no objections are made known by January 24, I will make the changes proposed above.. Regards, Brin Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jan 17 07:46:29 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 17 Jan 2022 08:46:29 +0100 Subject: [Neutron][drivers] Proposing Oleg Bondarev (obondarev) to the Drivers team In-Reply-To: References: Message-ID: <4370820.LvFx2qVVIh@p1> Hi, On pi?tek, 14 stycznia 2022 16:43:55 CET Lajos Katona wrote: > Hi Neutron Drivers, > > I would like to propose Oleg Bondarev to be a member of the Neutron Drivers > team. > He has long experience with Neutron, he has been always around to help with > advice and reviews, and enthusiastically participated in the Drivers > meeting (big +1 as it is on Friday 1400UTC, quite late afternoon in his > timezone :-)). > > Neutron drivers, please vote before the next Drivers meeting (next Friday, > 21. January). > > Best Regards > Lajos Katona (lajoskatona) +1 from me. Oleg will be great addition to the drivers team for sure :) -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From syedammad83 at gmail.com Mon Jan 17 07:53:35 2022 From: syedammad83 at gmail.com (Ammad Syed) Date: Mon, 17 Jan 2022 12:53:35 +0500 Subject: [nova] Instance Even Scheduling Message-ID: Hi, I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it. enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts. - Ammad -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Jan 17 09:21:01 2022 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 17 Jan 2022 09:21:01 +0000 Subject: [ops] [kolla] RabbitMQ High Availability In-Reply-To: <2077696744.396726.1642099801140@mail.yahoo.com> References: <5dd6d28f-9955-7ca5-0ab8-0c5ce11ceb7e@redhat.com> <14084e87df22458caa7668eea8b496b6@verisign.com> <1147779219.1274196.1639086048233@mail.yahoo.com> <986294621.1553814.1639155002132@mail.yahoo.com> <169252651.2819859.1639516226823@mail.yahoo.com> <1335760337.3548170.1639680236968@mail.yahoo.com> <33441648.1434581.1641304881681@mail.yahoo.com> <385929635.1929303.1642011977053@mail.yahoo.com> <326590098.315301.1642089266574@mail.yahoo.com> <2058295726.372026.1642097389964@mail.yahoo.com> <2077696744.396726.1642099801140@mail.yahoo.com> Message-ID: Drop the double quotes around On Thu, 13 Jan 2022 at 18:55, Albert Braden wrote: > > After reading more I realize that "expires" is also set in ms. So it looks like the correct settings are: > > message-ttl: 60000 > expires: 120000 > > This would expire messages in 10 minutes and queues in 20 minutes. > > The only remaining question is, how can I specify these in a variable without generating the "not a valid message TTL" error? > On Thursday, January 13, 2022, 01:22:33 PM EST, Albert Braden wrote: > > > Update: I googled around and found this: https://tickets.puppetlabs.com/browse/MODULES-2986 > > Apparently the " | int " isn't working. I tried '60000' and "60000" but that didn't make a difference. In desperation I tried this: > > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":60000,"expires":1200}, "priority":0}, > > That works, but I'd prefer to use a variable. Has anyone done this successfully? Also, am I understanding correctly that "message-ttl" is set in milliseconds and "expires" is set in seconds? Or do I need to use ms for "expires" too? > On Thursday, January 13, 2022, 11:03:11 AM EST, Albert Braden wrote: > > > After digging further I realized that I'm not setting TTL; only queue expiration. Here's what I see in the GUI when I look at affected queues: > > Policy notifications-expire > Effective policy definition expires: 1200 > > This is what I have in definitions.json.j2: > > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"expires":1200}, "priority":0}, > > I tried this to set both: > > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":"{{ rabbitmq_message_ttl | int }}","expires":1200}, "priority":0}, Drop the double quotes around the jinja expression. It's not YAML, so you don't need them. Please update the upstream patches with any fixes. > > But the RMQ containers restart every 60 seconds and puke this into the log: > > [error] <0.322.0> CRASH REPORT Process <0.322.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"600\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 > > After reading the doc on TTL: https://www.rabbitmq.com/ttl.html I realized that the TTL is set in ms, so I tried "rabbitmq_message_ttl: 60000" > > but that only changes the number in the error: > > [error] <0.318.0> CRASH REPORT Process <0.318.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"60000\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 > > What am I missing? > > > On Wednesday, January 12, 2022, 05:11:41 PM EST, Dale Smith wrote: > > > In the web interface(RabbitMQ 3.8.23, not using Kolla), when looking at the queue you will see the "Policy" listed by name, and "Effective policy definition". > > You can either view the policy definition, and the arguments for the definitions applied, or "effective policy definition" should show you the list. > > > It may be relevant to note: "Each exchange or queue will have at most one policy matching" - https://www.rabbitmq.com/parameters.html#how-policies-work > > I've added a similar comment to the linked patchset. > > > On 13/01/22 7:26 am, Albert Braden wrote: > > This is very helpful. Thank you! It appears that I have successfully set the expire time to 1200, because I no longer see unconsumed messages lingering in my queues, but it's not obvious how to verify. In the web interface, when I look at the queues, I see things like policy, state, features and consumers, but I don't see a timeout or expire value, nor do I find the number 1200 anywhere. Where should I be looking in the web interface to verify that I set the expire time correctly? Or do I need to use the CLI? > On Wednesday, January 5, 2022, 04:23:29 AM EST, Mark Goddard wrote: > > > On Tue, 4 Jan 2022 at 14:08, Albert Braden wrote: > > > > Now that the holidays are over I'm trying this one again. Can anyone help me figure out how to set "expires" and "message-ttl" ? > > John Garbutt proposed a few patches for RabbitMQ in kolla, including > this: https://review.opendev.org/c/openstack/kolla-ansible/+/822191 > > https://review.opendev.org/q/hashtag:%2522rabbitmq%2522+(status:open+OR+status:merged)+project:openstack/kolla-ansible > > Note that they are currently untested. > > Mark > > > > On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden wrote: > > > > > > I tried these policies in ansible/roles/rabbitmq/templates/definitions.json.j2: > > > > "policies":[ > > {"vhost": "/", "name": "ha-all", "pattern": '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, > > {"vhost": "/", "name": "notifications-ttl", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"message-ttl":600}, "priority":0} > > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"expire":3600}, "priority":0} > > {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} > > {% endif %} > > > > But I still see unconsumed messages lingering in notifications_extractor.info. From reading the docs I think this setting should cause messages to expire after 600 seconds, and unused queues to be deleted after 3600 seconds. What am I missing? > > On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden wrote: > > > > > > Following [1] I successfully set "amqp_durable_queues = True" and restricted HA to the appropriate queues, but I'm having trouble with some of the other settings such as "expires" and "message-ttl". Does anyone have an example of a working kolla config that includes these changes? > > > > [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud wrote: > > > > > > So, your config snippet LGTM. > > > > Le ven. 10 d?c. 2021 ? 17:50, Albert Braden a ?crit : > > > > Sorry, that was a transcription error. I thought "True" and my fingers typed "False." The correct lines are: > > > > [oslo_messaging_rabbit] > > amqp_durable_queues = True > > > > On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud wrote: > > > > > > If you plan to let `amqp_durable_queues = False` (i.e if you plan to keep this config equal to false), then you don't need to add these config lines as this is already the default value [1]. > > > > [1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34 > > > > Le jeu. 9 d?c. 2021 ? 22:40, Albert Braden a ?crit : > > > > Replying from my home email because I've been asked to not email the list from my work email anymore, until I get permission from upper management. > > > > I'm not sure I follow. I was planning to add 2 lines to etc/kolla/config/global.conf: > > > > [oslo_messaging_rabbit] > > amqp_durable_queues = False > > > > Is that not sufficient? What is involved in configuring dedicated control exchanges for each service? What would that look like in the config? > > > > > > From: Herve Beraud > > Sent: Thursday, December 9, 2021 2:45 AM > > To: Bogdan Dobrelya > > Cc: openstack-discuss at lists.openstack.org > > Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability > > > > > > > > Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > > > > > > > Le mer. 8 d?c. 2021 ? 11:48, Bogdan Dobrelya a ?crit : > > > > Please see inline > > > > >> I read this with great interest because we are seeing this issue. Questions: > > >> > > >> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? > > >> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? > > >> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into? > > > > > > Note that even having rabbit HA policies adjusted like that and its HA > > > replication factor [0] decreased (e.g. to a 2), there still might be > > > high churn caused by a large enough number of replicated durable RPC > > > topic queues. And that might cripple the cloud down with the incurred > > > I/O overhead because a durable queue requires all messages in it to be > > > persisted to a disk (for all the messaging cluster replicas) before they > > > are ack'ed by the broker. > > > > > > Given that said, Oslo messaging would likely require a more granular > > > control for topic exchanges and the durable queues flag - to tell it to > > > declare as durable only the most critical paths of a service. A single > > > config setting and a single control exchange per a service might be not > > > enough. > > > > Also note that therefore, amqp_durable_queue=True requires dedicated > > control exchanges configured for each service. Those that use > > 'openstack' as a default cannot turn the feature ON. Changing it to a > > service specific might also cause upgrade impact, as described in the > > topic [3]. > > > > > > > > The same is true for `amqp_auto_delete=True`. That requires dedicated control exchanges else it won't work if each service defines its own policy on a shared control exchange (e.g `openstack`) and if policies differ from each other. > > > > > > > > [3] https://review.opendev.org/q/topic:scope-config-opts > > > > > > > > There are also race conditions with durable queues enabled, like [1]. A > > > solution could be where each service declare its own dedicated control > > > exchange with its own configuration. > > > > > > Finally, openstack components should add perhaps a *.next CI job to test > > > it with durable queues, like [2] > > > > > > [0] https://www.rabbitmq.com/ha.html#replication-factor > > > > > > [1] > > > https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt > > > > > > [2] https://review.opendev.org/c/openstack/nova/+/820523 > > > > > >> > > >> Does anyone have a sample set of RMQ config files that they can share? > > >> > > >> It looks like my Outlook has ruined the link; reposting: > > >> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > > > > > > > > -- > > > Best regards, > > > Bogdan Dobrelya, > > > Irc #bogdando > > > > > > -- > > Best regards, > > Bogdan Dobrelya, > > Irc #bogdando > > > > > > > > > > -- > > > > Herv? Beraud > > > > Senior Software Engineer at Red Hat > > > > irc: hberaud > > > > https://github.com/4383/ > > > > https://twitter.com/4383hberaud > > > > > > > > -- > > Herv? Beraud > > Senior Software Engineer at Red Hat > > irc: hberaud > > https://github.com/4383/ > > https://twitter.com/4383hberaud > > > > > > > > -- > > Herv? Beraud > > Senior Software Engineer at Red Hat > > irc: hberaud > > https://github.com/4383/ > > https://twitter.com/4383hberaud > > From mark at stackhpc.com Mon Jan 17 09:23:08 2022 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 17 Jan 2022 09:23:08 +0000 Subject: [kolla-ansible] Gather facts strange error In-Reply-To: References: Message-ID: Failed to connect to the host via ssh: ssh: connect to host 192.168.75.178 port 22: No route to host On Fri, 14 Jan 2022 at 05:35, Satish Patel wrote: > > Folks, > > I have a working cluster but today when I ran the following command I > got very nasty error like the following. Even 192.168.75.144 is > function node but giving super log error output that i can't copy > paste here. > > $ sudo kolla-ansible -i /etc/kolla/multinode deploy -t nova-cell > --limit 192.168.75.144 > ... > .... > ....... > PLAY [Gather facts for all hosts] > ******************************************************************************************************************************************************************** > > TASK [Gather facts] > ********************************************************************************************************************************************************************************** > ok: [192.168.75.144] > > TASK [Group hosts to determine when using --limit] > *************************************************************************************************************************************************** > ok: [192.168.75.144] > > PLAY [Gather facts for all hosts (if using --limit)] > ************************************************************************************************************************************************* > > TASK [Gather facts] > ********************************************************************************************************************************************************************************** > skipping: [192.168.75.144] => (item=192.168.75.144) > ok: [192.168.75.144 -> 192.168.75.145] => (item=192.168.75.145) > ok: [192.168.75.144 -> 192.168.75.146] => (item=192.168.75.146) > failed: [192.168.75.144 -> 192.168.75.178] (item=192.168.75.178) => > {"ansible_loop_var": "item", "item": "192.168.75.178", "msg": "Failed > to connect to the host via ssh: ssh: connect to host 192.168.75.178 > port 22: No route to host", "unreachable": true} > ok: [192.168.75.144 -> 192.168.75.179] => (item=192.168.75.179) > failed: [192.168.75.144 -> 192.168.75.180] (item=192.168.75.180) => > {"ansible_loop_var": "item", "item": "192.168.75.180", "msg": "Failed > to connect to the host via ssh: ssh: connect to host 192.168.75.180 > port 22: No route to host", "unreachable": true} > ok: [192.168.75.144 -> 192.168.75.181] => (item=192.168.75.181) > ok: [192.168.75.144 -> 192.168.75.147] => (item=192.168.75.147) > ok: [192.168.75.144 -> localhost] => (item=localhost) > fatal: [192.168.75.144 -> {{ item }}]: UNREACHABLE! => {"changed": > false, "msg": "All items completed", "results": [{"ansible_loop_var": > "item", "changed": false, "item": "192.168.75.144", "skip_reason": > "Conditional result was False", "skipped": true}, {"ansible_facts": > {"ansible_all_ipv4_addresses": ["192.168.75.145"], > "ansible_all_ipv6_addresses": ["fe80::3eec:efff:fe1f:1776", > "fe80::f802:e3ff:fe71:e58c"], "ansible_apparmor": {"status": > "disabled"}, "ansible_architecture": "x86_64", "ansible_bios_date": > "02/27/2020", "ansible_bios_version": "3.3", "ansible_br_ex": > {"active": false, "device": "br-ex", "features": {"esp_hw_offload": > "off [fixed]", "esp_tx_csum_hw_offload": "off [fixed]", "fcoe_mtu": > "off [fixed]", "generic_receive_offload": "on", > "generic_segmentation_offload": "on", "highdma": "on", > "hw_tc_offload": "off [fixed]", "l2_fwd_offload": "off [fixed]", > "large_receive_offload": "off [fixed]", "loopback": "off [fixed]", > "netns_local": "off [fixed]", "ntuple_filters": "off [fixed]", > "receive_hashing": "off [fixed]", "rx_all": "off [fixed]", > "rx_checksumming": "off [fixed]", "rx_fcs": "off [fixed]", > "rx_gro_hw": "off [fixed]", "rx_gro_list": "off", > "rx_udp_tunnel_port_offload": "off [fixed]", "rx_vlan_filter": "off > [fixed]", "rx_vlan_offload": "off [fixed]", "rx_vlan_stag_filter": > "off [fixed]", "rx_vlan_stag_hw_parse": "off [fixed]", > "scatter_gather": "on", "tcp_segmentation_offload": "on", > "tls_hw_record": "off [fixed]", "tls_hw_rx_offload": "off [fixed]", > > .... > .... > END > > PLAY RECAP ******************************************************************************************************************************************************************************************* > 192.168.75.144 : ok=2 changed=0 unreachable=1 > failed=0 skipped=0 rescued=0 ignored=0 > > Command failed ansible-playbook -i /etc/kolla/multinode -e > @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e > CONFIG_DIR=/etc/kolla --tags nova-cell --limit 192.168.75.144 -e > kolla_action=deploy /usr/local/share/kolla-ansible/ansible/site.yml > From thierry at openstack.org Mon Jan 17 11:00:21 2022 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 17 Jan 2022 12:00:21 +0100 Subject: [largescale-sig] Next meeting: Jan 19th, 15utc Message-ID: <9dceae99-a644-8184-430e-b2dedba0235e@openstack.org> Hi everyone, The Large Scale SIG will be meeting this Wednesday in #openstack-operators on OFTC IRC, at 15UTC. You can doublecheck how that time translates locally at: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20220119T15 We will be finalizing the discussion of our next OpenInfra Live episode. You can add other topics to our agenda at: https://etherpad.openstack.org/p/large-scale-sig-meeting Regards, -- Thierry Carrez From eblock at nde.ag Mon Jan 17 12:57:36 2022 From: eblock at nde.ag (Eugen Block) Date: Mon, 17 Jan 2022 12:57:36 +0000 Subject: [horizon] Missing button "create user" in dashboard Message-ID: <20220117125736.Horde.djFrXoev-vMIrVXie5UnauE@webmail.nde.ag> Hi *, I have a fresh Victoria installation where the "create user" button is missing in the dashboard. Creating users via CLI works fine, also when I go to https://controller/identity/users/create I see the dialog to create a new user. I see the same in an Ussuri installation. The clouds are deployed manually (we have our own Salt mechanism) on baremetal, the OS is openSUSE Leap 15.2. What I have tried so far is to apply the keystonev3_policy.json from [1], but without success. There are no other custom policies applied, most configs are on default. There are no obvious errors in the dashboard logs, and before I change too much I wanted to ask for your advice. Can anyone help me out? Please let me know if you need more information. Thanks in advance, Eugen [1] https://bugs.launchpad.net/charm-openstack-dashboard/+bug/1775224 From joykechen at 163.com Mon Jan 17 10:44:00 2022 From: joykechen at 163.com (=?GBK?B?s8K/yw==?=) Date: Mon, 17 Jan 2022 18:44:00 +0800 (GMT+08:00) Subject: [cyborg] Proposing core reviewers In-Reply-To: References: Message-ID: <7a260c87.568b.17e67a2f69f.Coremail.joykechen@163.com> Congratulations! ---- ?????? ---- | ??? | Brin Zhang(???) | | ?? | 2022?01?17? 09:12 | | ??? | openstack-discuss at lists.openstack.org | | ??? | xin-ran.wang at intel.com?Alex Song (???)?huangzhipeng at huawei.com?liliueecg at gmail.com?shogo.saito.ac at hco.ntt.co.jp?sundar.nadathur at intel.com?yumeng_bao at yahoo.com?chen.ke14 at zte.com.cn?419546439 at qq.com<419546439 at qq.com>?shaohe.feng at intel.com?wangzhengh at chinatelecom.cn?zhuli2317 at gmail.com | | ?? | [cyborg] Proposing core reviewers | Hello all, Eric xie has been actively contributing to Cyborg in various areas, adding new features, improving quality, reviewing patches. Despite the relatively short time, he has been one of the most prolific contributors, and brings an enthusiastic and active mindset. I would like to thank and acknowledge him for his steady valuable contributions, and propose him as a core reviewer for Cyborg. Some of the currently listed core reviewers have not been participating for a lengthy period of time. It is proposed that those who have had no contributions for the past 18 months ? i.e. no participation in meetings, no code contributions, not participating in Cyborg open source activities and no reviews ? be removed from the list of core reviewers. -- The Cyborg team recognizes everyone's contributions, but we need to ensure the activity of the core-reviewer list. If you are interested in rejoining the cyborg team, feel free to ping us to restore the core reviewer for you. If no objections are made known by January 24, I will make the changes proposed above.. Regards, Brin Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: From strigazi at gmail.com Mon Jan 17 14:37:09 2022 From: strigazi at gmail.com (Spyros Trigazis) Date: Mon, 17 Jan 2022 15:37:09 +0100 Subject: [magnum] New meeting Wednesday Jan 19 09:00 AM UTC 2022 #openstack-containers Message-ID: Hello all, Let's restart the magnum team weekly meetings. The next meeting will happen on Wed Jan 19 09:00: AM UTC 2022 https://www.timeanddate.com/worldclock/converter.html?iso=20220119T090000&p1=1440 We can keep track of the meeting agenda in: https://etherpad.opendev.org/p/magnum-weekly-meeting Cheers, Spyros -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlibosva at redhat.com Mon Jan 17 15:37:28 2022 From: jlibosva at redhat.com (Jakub Libosvar) Date: Mon, 17 Jan 2022 10:37:28 -0500 Subject: [neutron] Bug Deputy Report January 10 - 107 Message-ID: <36792dba-6ce8-cad2-b9a0-898d74073d94@redhat.com> Hi, I was bug deputy last week, there are no bugs that require immediate attention. Here is the week summary Critical -------- - [tempest ] neutron-tempest-plugin-dynamic jobs are failing - https://bugs.launchpad.net/neutron/+bug/1957453 - Unassigned - As per Lajos it's fixed in os-ken - https://review.opendev.org/q/58a39392397cd01e174c4736fa31a9a5617f1ff0 High ---- - Functional tests for HA routers fails due to router transitioned to FAULT state - https://bugs.launchpad.net/neutron/+bug/1956958 - Assigned to Slawek - [FT] Test "test_port_dhcp_options" failing - https://bugs.launchpad.net/neutron/+bug/1956965 - Unassigned - Regular user can remove qos from a port despite the policy - https://bugs.launchpad.net/neutron/+bug/1957175 - Assigned to Yatin - OVS: except MappingNotFound instead of KeyError - https://bugs.launchpad.net/neutron/+bug/1957931 - Unassigned - Patch: https://review.opendev.org/c/openstack/neutron/+/824727 - test_network_basic_ops failing on centos-8-stream job - https://bugs.launchpad.net/neutron/+bug/1957941 - Unassigned - fixed in packaging but needs repo update Medium ------ - Wrong ACTIVE status of subports attached to a trunk whose parent is DOWN - https://bugs.launchpad.net/neutron/+bug/1957161 - patch: https://review.opendev.org/c/openstack/neutron/+/824378 - Assigned to Luis - NAT reflection with OVN on xena not working - https://bugs.launchpad.net/neutron/+bug/1957185 - Unassigned - DVR Router Update Error - https://bugs.launchpad.net/neutron/+bug/1957189 - Unassigned - qrouter ns leak while last service port delete because of router gw port - https://bugs.launchpad.net/neutron/+bug/1957794 - Assigned to Krzysztof - Patch: https://review.opendev.org/c/openstack/neutron/+/824008 - ha backup router ipv6 accept_ra broken - https://bugs.launchpad.net/neutron/+bug/1958149 - Unassigned Low --- - Neutron l3 agent keeps restarting (Ubuntu) - https://bugs.launchpad.net/neutron/+bug/1958128 - Unassigned - Related to fwaas and docs Needs more info --------------- - Incorrect openstack xena neutron install repo in documentation - https://bugs.launchpad.net/neutron/+bug/1956847 - Unassigned - Requires input from the reporter From katonalala at gmail.com Mon Jan 17 16:14:51 2022 From: katonalala at gmail.com (Lajos Katona) Date: Mon, 17 Jan 2022 17:14:51 +0100 Subject: Can neutron-fwaas project be revived? In-Reply-To: References: <87f786a0f0714c0a89fb7097e965b1d6@inspur.com> <035f05bc97204972bb975c962d47507c@inspur.com> Message-ID: Hi, Neutron team discussed this question on the Drivers meeting, see the logs: https://meetings.opendev.org/meetings/neutron_drivers/2022/neutron_drivers.2022-01-14-14.01.log.html The agreement is to move neutron-fwaas to x/ namespace, and revive it as x/neutron-fwaas. Best regards Lajos Katona (lajoskatona) Dazhong Qin (???)-??????? ezt ?rta (id?pont: 2022. jan. 6., Cs, 2:30): > Hi?Miguel? > > > > Ok?let?s meet at January 14th. > > > > Best regards > > > > > > *???:* Miguel Lavalle [mailto:miguel at mlavalle.com] > *????:* 2022?1?6? 9:12 > *???:* Dazhong Qin (???)-??????? > *??:* openstack-discuss at lists.openstack.org > *??:* Re: Can neutron-fwaas project be revived? > > > > Hi Qin, > > > > Unfortunately, this coming January 7th several members of the drivers team > will be off on holiday. We won't have a quorum to discuss your proposal. I > hope that January 14th works for you and your team. > > > > Best regards > > > > Miguel > > > > On Fri, Dec 24, 2021 at 10:18 AM Miguel Lavalle > wrote: > > Hi Qin, > > > > I have added this topic to the drivers meeting agenda (see on demand > agenda close to the bottom): > https://wiki.openstack.org/wiki/Meetings/NeutronDrivers > > > > Cheers > > > > On Thu, Dec 23, 2021 at 7:42 PM Dazhong Qin (???)-??????? < > qinhaizhong01 at inspur.com> wrote: > > Hi Miguel, > > Thank you for your suggestion. My colleague HengZhou will submit relevant > documents as soon as possible in accordance with the official neutron rules. > > Yes?we will attend the neutron drivers meeting on January 7th. > > Merry Christmas! > > Best wish for you! > > > > *???:* Miguel Lavalle [mailto:miguel at mlavalle.com] > *????:* 2021?12?24? 0:43 > *???:* Dazhong Qin (???)-??????? > *??:* openstack-discuss at lists.openstack.org > *??:* Re: Can neutron-fwaas project be revived? > > > > Hi Qin, > > > > In preparation for your meeting with the drivers team, I suggest we follow > as a starting point the Neutron Stadium Governance rules and processes as > outlined in the official documentation: > https://docs.openstack.org/neutron/latest/contributor/stadium/governance.html. > In the past, we have re-incorporated projects to the Stadium, like for > example in the case of neutron-vpnaas. This document in the Neutron specs > repo summarizes how we assessed the readiness of vpnaas for the stadium: > https://specs.openstack.org/openstack/neutron-specs/specs/stadium/queens/neutron-vpnaas.html > (https://review.opendev.org/c/openstack/neutron-specs/+/506012). I > suggest you start a similar document for fwaas in the folder for the > current cycle: > https://specs.openstack.org/openstack/neutron-specs/specs/yoga/index.html. > As soon as you can, please push it to gerrit, so we can start reviewing it. > > > > Did I understand correctly that you will attend the drivers meeting on > January 7th? > > > > Best regards > > > > Miguel > > > > > > On Wed, Dec 22, 2021 at 8:09 PM Dazhong Qin (???)-??????? < > qinhaizhong01 at inspur.com> wrote: > > Hi Miguel, > > I am glad to hear this news. How about our discussion on January 7th, this > Friday is not convenient, what do I need to prepare before the discussion, > do I need to submit rfe or other descriptions? > > > > *???:* Miguel Lavalle [mailto:miguel at mlavalle.com] > *????:* 2021?12?23? 0:20 > *???:* Dazhong Qin (???)-??????? > *??:* openstack-discuss at lists.openstack.org > *??:* Re: Can neutron-fwaas project be revived? > > > > Hi Qin, > > > > I think that in principle the community will be delighted if you and your > team can reactivate the project and maintain it. Probably the best next > step is for you to attend the next Neutron drivers meeting ( > https://wiki.openstack.org/wiki/Meetings/NeutronDrivers) so we > can discuss the specifics of your proposal. This meeting takes place on > Fridays at 1400 UTC over IRC in oftc.net, channel #openstack-neutron. Due > to the end of year festivities in much of Europe and America, the next > meeting will take place until January 7th. Is that a good next step for > you? If yes, I'll add this topic to the meeting's agenda. > > > > Best regards > > > > On Tue, Dec 21, 2021 at 10:29 AM Dazhong Qin (???)-??????? < > qinhaizhong01 at inspur.com> wrote: > > Hi? > > The firewall project is a necessary function when the project is > delivered. The lack of firewall function after switching OVN is not > acceptable to customers. We intend to maintain this project and develop the > fwaas driver based on ovn. Whether the neutron-fwaas project can be > reactivate? What should I do ? > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tonyliu0592 at hotmail.com Mon Jan 17 16:35:46 2022 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Mon, 17 Jan 2022 16:35:46 +0000 Subject: [nova] Instance Even Scheduling In-Reply-To: References: Message-ID: https://docs.openstack.org/nova/latest/admin/scheduling.html Filter gives you a group of valid hosts, assuming they are equally weighted, you may try with these two settings to pick up a host in a more even manner. host_subset_size (increase the size) shuffle_best_same_weighed_hosts (enable the shuffle) https://docs.openstack.org/nova/latest/configuration/config.html Tony ________________________________________ From: Ammad Syed Sent: January 16, 2022 11:53 PM To: openstack-discuss Subject: [nova] Instance Even Scheduling Hi, I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it. enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts. - Ammad From doug at stackhpc.com Mon Jan 17 17:01:57 2022 From: doug at stackhpc.com (Doug Szumski) Date: Mon, 17 Jan 2022 17:01:57 +0000 Subject: [ops] [kolla] RabbitMQ High Availability In-Reply-To: References: <5dd6d28f-9955-7ca5-0ab8-0c5ce11ceb7e@redhat.com> <14084e87df22458caa7668eea8b496b6@verisign.com> <1147779219.1274196.1639086048233@mail.yahoo.com> <986294621.1553814.1639155002132@mail.yahoo.com> <169252651.2819859.1639516226823@mail.yahoo.com> <1335760337.3548170.1639680236968@mail.yahoo.com> <33441648.1434581.1641304881681@mail.yahoo.com> <385929635.1929303.1642011977053@mail.yahoo.com> <326590098.315301.1642089266574@mail.yahoo.com> <2058295726.372026.1642097389964@mail.yahoo.com> <2077696744.396726.1642099801140@mail.yahoo.com> Message-ID: On 17/01/2022 09:21, Mark Goddard wrote: > Drop the double quotes around > > On Thu, 13 Jan 2022 at 18:55, Albert Braden wrote: >> After reading more I realize that "expires" is also set in ms. So it looks like the correct settings are: >> >> message-ttl: 60000 >> expires: 120000 >> >> This would expire messages in 10 minutes and queues in 20 minutes. >> >> The only remaining question is, how can I specify these in a variable without generating the "not a valid message TTL" error? >> On Thursday, January 13, 2022, 01:22:33 PM EST, Albert Braden wrote: >> >> >> Update: I googled around and found this: https://tickets.puppetlabs.com/browse/MODULES-2986 >> >> Apparently the " | int " isn't working. I tried '60000' and "60000" but that didn't make a difference. In desperation I tried this: >> >> {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":60000,"expires":1200}, "priority":0}, >> >> That works, but I'd prefer to use a variable. Has anyone done this successfully? Also, am I understanding correctly that "message-ttl" is set in milliseconds and "expires" is set in seconds? Or do I need to use ms for "expires" too? >> On Thursday, January 13, 2022, 11:03:11 AM EST, Albert Braden wrote: >> >> >> After digging further I realized that I'm not setting TTL; only queue expiration. Here's what I see in the GUI when I look at affected queues: >> >> Policy notifications-expire >> Effective policy definition expires: 1200 >> >> This is what I have in definitions.json.j2: >> >> {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"expires":1200}, "priority":0}, >> >> I tried this to set both: >> >> {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":"{{ rabbitmq_message_ttl | int }}","expires":1200}, "priority":0}, > Drop the double quotes around the jinja expression. It's not YAML, so > you don't need them. > > Please update the upstream patches with any fixes. > >> But the RMQ containers restart every 60 seconds and puke this into the log: >> >> [error] <0.322.0> CRASH REPORT Process <0.322.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"600\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 >> >> After reading the doc on TTL: https://www.rabbitmq.com/ttl.html I realized that the TTL is set in ms, so I tried "rabbitmq_message_ttl: 60000" >> >> but that only changes the number in the error: >> >> [error] <0.318.0> CRASH REPORT Process <0.318.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"60000\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 >> >> What am I missing? >> >> >> On Wednesday, January 12, 2022, 05:11:41 PM EST, Dale Smith wrote: >> >> >> In the web interface(RabbitMQ 3.8.23, not using Kolla), when looking at the queue you will see the "Policy" listed by name, and "Effective policy definition". >> >> You can either view the policy definition, and the arguments for the definitions applied, or "effective policy definition" should show you the list. >> >> >> It may be relevant to note: "Each exchange or queue will have at most one policy matching" - https://www.rabbitmq.com/parameters.html#how-policies-work >> >> I've added a similar comment to the linked patchset. >> >> >> On 13/01/22 7:26 am, Albert Braden wrote: >> >> This is very helpful. Thank you! It appears that I have successfully set the expire time to 1200, because I no longer see unconsumed messages lingering in my queues, but it's not obvious how to verify. In the web interface, when I look at the queues, I see things like policy, state, features and consumers, but I don't see a timeout or expire value, nor do I find the number 1200 anywhere. Where should I be looking in the web interface to verify that I set the expire time correctly? Or do I need to use the CLI? >> On Wednesday, January 5, 2022, 04:23:29 AM EST, Mark Goddard wrote: >> >> >> On Tue, 4 Jan 2022 at 14:08, Albert Braden wrote: >>> Now that the holidays are over I'm trying this one again. Can anyone help me figure out how to set "expires" and "message-ttl" ? >> John Garbutt proposed a few patches for RabbitMQ in kolla, including >> this: https://review.opendev.org/c/openstack/kolla-ansible/+/822191 >> >> https://review.opendev.org/q/hashtag:%2522rabbitmq%2522+(status:open+OR+status:merged)+project:openstack/kolla-ansible >> >> Note that they are currently untested. I've proposed one more as an alternative to reducing the number of queue mirrors (disable all mirroring): https://review.opendev.org/c/openstack/kolla-ansible/+/824994 The reasoning behind it is in the commit message. It's partly justified by the fact that we quite frequently have to 'reset' RabbitMQ with the current transient mirrored configuration by removing all state anyway. >> >> Mark >> >> >>> On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden wrote: >>> >>> >>> I tried these policies in ansible/roles/rabbitmq/templates/definitions.json.j2: >>> >>> "policies":[ >>> {"vhost": "/", "name": "ha-all", "pattern": '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, >>> {"vhost": "/", "name": "notifications-ttl", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"message-ttl":600}, "priority":0} >>> {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"expire":3600}, "priority":0} >>> {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} >>> {% endif %} >>> >>> But I still see unconsumed messages lingering in notifications_extractor.info. From reading the docs I think this setting should cause messages to expire after 600 seconds, and unused queues to be deleted after 3600 seconds. What am I missing? >>> On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden wrote: >>> >>> >>> Following [1] I successfully set "amqp_durable_queues = True" and restricted HA to the appropriate queues, but I'm having trouble with some of the other settings such as "expires" and "message-ttl". Does anyone have an example of a working kolla config that includes these changes? >>> >>> [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit >>> On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud wrote: >>> >>> >>> So, your config snippet LGTM. >>> >>> Le ven. 10 d?c. 2021 ? 17:50, Albert Braden a ?crit : >>> >>> Sorry, that was a transcription error. I thought "True" and my fingers typed "False." The correct lines are: >>> >>> [oslo_messaging_rabbit] >>> amqp_durable_queues = True >>> >>> On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud wrote: >>> >>> >>> If you plan to let `amqp_durable_queues = False` (i.e if you plan to keep this config equal to false), then you don't need to add these config lines as this is already the default value [1]. >>> >>> [1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34 >>> >>> Le jeu. 9 d?c. 2021 ? 22:40, Albert Braden a ?crit : >>> >>> Replying from my home email because I've been asked to not email the list from my work email anymore, until I get permission from upper management. >>> >>> I'm not sure I follow. I was planning to add 2 lines to etc/kolla/config/global.conf: >>> >>> [oslo_messaging_rabbit] >>> amqp_durable_queues = False >>> >>> Is that not sufficient? What is involved in configuring dedicated control exchanges for each service? What would that look like in the config? >>> >>> >>> From: Herve Beraud >>> Sent: Thursday, December 9, 2021 2:45 AM >>> To: Bogdan Dobrelya >>> Cc: openstack-discuss at lists.openstack.org >>> Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability >>> >>> >>> >>> Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. >>> >>> >>> >>> >>> >>> Le mer. 8 d?c. 2021 ? 11:48, Bogdan Dobrelya a ?crit : >>> >>> Please see inline >>> >>>>> I read this with great interest because we are seeing this issue. Questions: >>>>> >>>>> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? >>>>> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? >>>>> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into? >>>> Note that even having rabbit HA policies adjusted like that and its HA >>>> replication factor [0] decreased (e.g. to a 2), there still might be >>>> high churn caused by a large enough number of replicated durable RPC >>>> topic queues. And that might cripple the cloud down with the incurred >>>> I/O overhead because a durable queue requires all messages in it to be >>>> persisted to a disk (for all the messaging cluster replicas) before they >>>> are ack'ed by the broker. >>>> >>>> Given that said, Oslo messaging would likely require a more granular >>>> control for topic exchanges and the durable queues flag - to tell it to >>>> declare as durable only the most critical paths of a service. A single >>>> config setting and a single control exchange per a service might be not >>>> enough. >>> Also note that therefore, amqp_durable_queue=True requires dedicated >>> control exchanges configured for each service. Those that use >>> 'openstack' as a default cannot turn the feature ON. Changing it to a >>> service specific might also cause upgrade impact, as described in the >>> topic [3]. >>> >>> >>> >>> The same is true for `amqp_auto_delete=True`. That requires dedicated control exchanges else it won't work if each service defines its own policy on a shared control exchange (e.g `openstack`) and if policies differ from each other. >>> >>> >>> >>> [3] https://review.opendev.org/q/topic:scope-config-opts >>> >>>> There are also race conditions with durable queues enabled, like [1]. A >>>> solution could be where each service declare its own dedicated control >>>> exchange with its own configuration. >>>> >>>> Finally, openstack components should add perhaps a *.next CI job to test >>>> it with durable queues, like [2] >>>> >>>> [0] https://www.rabbitmq.com/ha.html#replication-factor >>>> >>>> [1] >>>> https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt >>>> >>>> [2] https://review.opendev.org/c/openstack/nova/+/820523 >>>> >>>>> Does anyone have a sample set of RMQ config files that they can share? >>>>> >>>>> It looks like my Outlook has ruined the link; reposting: >>>>> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit >>>> >>>> -- >>>> Best regards, >>>> Bogdan Dobrelya, >>>> Irc #bogdando >>> >>> -- >>> Best regards, >>> Bogdan Dobrelya, >>> Irc #bogdando >>> >>> >>> >>> >>> -- >>> >>> Herv? Beraud >>> >>> Senior Software Engineer at Red Hat >>> >>> irc: hberaud >>> >>> https://github.com/4383/ >>> >>> https://twitter.com/4383hberaud >>> >>> >>> >>> -- >>> Herv? Beraud >>> Senior Software Engineer at Red Hat >>> irc: hberaud >>> https://github.com/4383/ >>> https://twitter.com/4383hberaud >>> >>> >>> >>> -- >>> Herv? Beraud >>> Senior Software Engineer at Red Hat >>> irc: hberaud >>> https://github.com/4383/ >>> https://twitter.com/4383hberaud >>> From fungi at yuggoth.org Mon Jan 17 17:02:01 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 17 Jan 2022 17:02:01 +0000 Subject: Can neutron-fwaas project be revived? In-Reply-To: References: <87f786a0f0714c0a89fb7097e965b1d6@inspur.com> <035f05bc97204972bb975c962d47507c@inspur.com> Message-ID: <20220117170200.gdx2czhmjpcn5bxa@yuggoth.org> On 2022-01-17 17:14:51 +0100 (+0100), Lajos Katona wrote: [...] > The agreement is to move neutron-fwaas to x/ namespace, and revive > it as x/neutron-fwaas. [...] Which technically won't be done as a "move" because of https://governance.openstack.org/tc/resolutions/20190711-mandatory-repository-retirement.html but can still be forked in x/ while the original in openstack/ is deprecated and eventually retired. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From smooney at redhat.com Mon Jan 17 17:06:40 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 17 Jan 2022 17:06:40 +0000 Subject: [nova] Instance Even Scheduling In-Reply-To: References: Message-ID: On Mon, 2022-01-17 at 16:35 +0000, Tony Liu wrote: > https://docs.openstack.org/nova/latest/admin/scheduling.html > > Filter gives you a group of valid hosts, assuming they are equally weighted, > you may try with these two settings to pick up a host in a more even manner. > host_subset_size (increase the size) > shuffle_best_same_weighed_hosts (enable the shuffle) > > https://docs.openstack.org/nova/latest/configuration/config.html yes the weighers are what will blance between the hosts and the filters determin which host are valid so if you want to spread based on ram then you need to adject the https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.ram_weight_multiplier for example set ram_weight_multiplier=10.0 to make it relitivly more important. the way the weigher work is all wheigher calulate the weight for a host, we then add them after multiplying them by the weights and then sort. > > > Tony > ________________________________________ > From: Ammad Syed > Sent: January 16, 2022 11:53 PM > To: openstack-discuss > Subject: [nova] Instance Even Scheduling > > Hi, > > I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it. > > enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter > > I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts. > > - Ammad > > From mkopec at redhat.com Mon Jan 17 17:29:09 2022 From: mkopec at redhat.com (Martin Kopec) Date: Mon, 17 Jan 2022 18:29:09 +0100 Subject: [heat-tempest-plugin] call for help from plugin's maintainers Message-ID: Hello, most of the tests of heat-tempest-plugin have started failing. We noticed that in interop [1], however, we reproduced that in the project's gates as well [2]. I suspect it might be an issue with the new gabbi package - there has been an update recently. Could you please have a look. [1] https://review.opendev.org/c/openinfra/ansible-role-refstack-client/+/824832 [2] https://review.opendev.org/c/openstack/heat-tempest-plugin/+/823794 Thank you, -- Martin Kopec Senior Software Quality Engineer Red Hat EMEA -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Mon Jan 17 17:38:12 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 17 Jan 2022 11:38:12 -0600 Subject: [all][tc] Technical Committee next weekly meeting on Jan 20th at 1500 UTC Message-ID: <17e691e2e00.111c2bdad727170.2468892074465147644@ghanshyammann.com> Hello Everyone, Technical Committee's next weekly meeting is scheduled for Jan 20th at 1500 UTC. If you would like to add topics for discussion, please add them to the below wiki page by Wednesday, Jan 19th, at 2100 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting -gmann From satish.txt at gmail.com Mon Jan 17 17:41:33 2022 From: satish.txt at gmail.com (Satish Patel) Date: Mon, 17 Jan 2022 12:41:33 -0500 Subject: Tesla V100 32G GPU with openstack Message-ID: <7CCE77EA-F7EA-45AF-85B5-3566D1DAB1CB@gmail.com> Folk, We have Tesla V100 32G GPU and I?m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question. 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet. 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct? 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here? 3. What are the config difference between configure this card with passthrough vs vGPU? Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ? Sent from my iPhone From tonyliu0592 at hotmail.com Mon Jan 17 17:45:38 2022 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Mon, 17 Jan 2022 17:45:38 +0000 Subject: [nova] Instance Even Scheduling In-Reply-To: References: Message-ID: I recall weight didn't work as what I expected, that's why I used shuffle_best_same_weighed_hosts. Here is what I experienced. With Ussuri and default Nova scheduling settings. All weighers are supposed to be enabled and all multipliers are positive. On 10x empty compute nodes with the same spec, say the first vm is created on compute-2. Because some memory and vCPU are consumed, the second vm should be created on some node other than compute-2, if weighers are working fine. But it's still created on compute-2, until I increased host_subset_size and enable shuffle_best_same_weighed_hosts. It seems that all compute nodes are equally weighted, although they don't have the same amount of resource. Am I missing anything there? Thanks! Tony ________________________________________ From: Sean Mooney Sent: January 17, 2022 09:06 AM To: openstack-discuss at lists.openstack.org Subject: Re: [nova] Instance Even Scheduling On Mon, 2022-01-17 at 16:35 +0000, Tony Liu wrote: > https://docs.openstack.org/nova/latest/admin/scheduling.html > > Filter gives you a group of valid hosts, assuming they are equally weighted, > you may try with these two settings to pick up a host in a more even manner. > host_subset_size (increase the size) > shuffle_best_same_weighed_hosts (enable the shuffle) > > https://docs.openstack.org/nova/latest/configuration/config.html yes the weighers are what will blance between the hosts and the filters determin which host are valid so if you want to spread based on ram then you need to adject the https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.ram_weight_multiplier for example set ram_weight_multiplier=10.0 to make it relitivly more important. the way the weigher work is all wheigher calulate the weight for a host, we then add them after multiplying them by the weights and then sort. > > > Tony > ________________________________________ > From: Ammad Syed > Sent: January 16, 2022 11:53 PM > To: openstack-discuss > Subject: [nova] Instance Even Scheduling > > Hi, > > I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it. > > enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter > > I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts. > > - Ammad > > From smooney at redhat.com Mon Jan 17 17:57:04 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 17 Jan 2022 17:57:04 +0000 Subject: [nova] Instance Even Scheduling In-Reply-To: References: Message-ID: <4bc575f1c56c93b79677f8fc535049fa7b99a976.camel@redhat.com> On Mon, 2022-01-17 at 17:45 +0000, Tony Liu wrote: > I recall weight didn't work as what I expected, that's why I used > shuffle_best_same_weighed_hosts. > > Here is what I experienced. > With Ussuri and default Nova scheduling settings. All weighers are supposed > to be enabled and all multipliers are positive. > yes by default all weighers are enabled and the shcduler spreads by default. > On 10x empty compute nodes > with the same spec, say the first vm is created on compute-2. Because some > memory and vCPU are consumed, the second vm should be created on some > node other than compute-2, if weighers are working fine. But it's still created > on compute-2, until I increased host_subset_size and enable shuffle_best_same_weighed_hosts. > i would guess that either the disk weigher or failed build wiehter is likely what results in teh behaivor different the default behavior is still to speread. before assuming there is a but you shoudl enable the schduler in debug mode to look at the weighters that are assinged to each host and determin why you are seeing differnt behavior. shuffle_best_same_weighed_hosts does as the name suggest. it shuffles the result if and only if there is a tie. that means it will only have a effect if 2 hosts were judged by thge weigher as beeing equally good candiates. host_subset_size instalead of looking at only the top host in the list enables you to consider the top n hosts. host_subset_size does a random selection from the host_subset_size top element after the hosts are sorted by the weighers intentionlaly adding randomness to the selection. this should not be needed in general. > It seems that all compute nodes are equally > weighted, although they don't have the same amount of resource. > Am I missing anything there? > > Thanks! > Tony > ________________________________________ > From: Sean Mooney > Sent: January 17, 2022 09:06 AM > To: openstack-discuss at lists.openstack.org > Subject: Re: [nova] Instance Even Scheduling > > On Mon, 2022-01-17 at 16:35 +0000, Tony Liu wrote: > > https://docs.openstack.org/nova/latest/admin/scheduling.html > > > > Filter gives you a group of valid hosts, assuming they are equally weighted, > > you may try with these two settings to pick up a host in a more even manner. > > host_subset_size (increase the size) > > shuffle_best_same_weighed_hosts (enable the shuffle) > > > > https://docs.openstack.org/nova/latest/configuration/config.html > > yes the weighers are what will blance between the hosts and the filters determin which host are valid > so if you want to spread based on ram then you need to adject the > https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.ram_weight_multiplier > > for example set ram_weight_multiplier=10.0 to make it relitivly more important. > the way the weigher work is all wheigher calulate the weight for a host, > we then add them after multiplying them by the weights and then sort. > > > > > > > > Tony > > ________________________________________ > > From: Ammad Syed > > Sent: January 16, 2022 11:53 PM > > To: openstack-discuss > > Subject: [nova] Instance Even Scheduling > > > > Hi, > > > > I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it. > > > > enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter > > > > I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts. > > > > - Ammad > > > > > > > > From tonyliu0592 at hotmail.com Mon Jan 17 18:11:36 2022 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Mon, 17 Jan 2022 18:11:36 +0000 Subject: [nova] Instance Even Scheduling In-Reply-To: <4bc575f1c56c93b79677f8fc535049fa7b99a976.camel@redhat.com> References: <4bc575f1c56c93b79677f8fc535049fa7b99a976.camel@redhat.com> Message-ID: That disk weigher is a good point. I am using Ceph as the storage backend for all compute nodes. Disk weigher may not handle that properly and cause some failure. Anyways, I will enable debug and look into more details. Thanks! Tony ________________________________________ From: Sean Mooney Sent: January 17, 2022 09:57 AM To: Tony Liu; openstack-discuss at lists.openstack.org Subject: Re: [nova] Instance Even Scheduling On Mon, 2022-01-17 at 17:45 +0000, Tony Liu wrote: > I recall weight didn't work as what I expected, that's why I used > shuffle_best_same_weighed_hosts. > > Here is what I experienced. > With Ussuri and default Nova scheduling settings. All weighers are supposed > to be enabled and all multipliers are positive. > yes by default all weighers are enabled and the shcduler spreads by default. > On 10x empty compute nodes > with the same spec, say the first vm is created on compute-2. Because some > memory and vCPU are consumed, the second vm should be created on some > node other than compute-2, if weighers are working fine. But it's still created > on compute-2, until I increased host_subset_size and enable shuffle_best_same_weighed_hosts. > i would guess that either the disk weigher or failed build wiehter is likely what results in teh behaivor different the default behavior is still to speread. before assuming there is a but you shoudl enable the schduler in debug mode to look at the weighters that are assinged to each host and determin why you are seeing differnt behavior. shuffle_best_same_weighed_hosts does as the name suggest. it shuffles the result if and only if there is a tie. that means it will only have a effect if 2 hosts were judged by thge weigher as beeing equally good candiates. host_subset_size instalead of looking at only the top host in the list enables you to consider the top n hosts. host_subset_size does a random selection from the host_subset_size top element after the hosts are sorted by the weighers intentionlaly adding randomness to the selection. this should not be needed in general. > It seems that all compute nodes are equally > weighted, although they don't have the same amount of resource. > Am I missing anything there? > > Thanks! > Tony > ________________________________________ > From: Sean Mooney > Sent: January 17, 2022 09:06 AM > To: openstack-discuss at lists.openstack.org > Subject: Re: [nova] Instance Even Scheduling > > On Mon, 2022-01-17 at 16:35 +0000, Tony Liu wrote: > > https://docs.openstack.org/nova/latest/admin/scheduling.html > > > > Filter gives you a group of valid hosts, assuming they are equally weighted, > > you may try with these two settings to pick up a host in a more even manner. > > host_subset_size (increase the size) > > shuffle_best_same_weighed_hosts (enable the shuffle) > > > > https://docs.openstack.org/nova/latest/configuration/config.html > > yes the weighers are what will blance between the hosts and the filters determin which host are valid > so if you want to spread based on ram then you need to adject the > https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.ram_weight_multiplier > > for example set ram_weight_multiplier=10.0 to make it relitivly more important. > the way the weigher work is all wheigher calulate the weight for a host, > we then add them after multiplying them by the weights and then sort. > > > > > > > > Tony > > ________________________________________ > > From: Ammad Syed > > Sent: January 16, 2022 11:53 PM > > To: openstack-discuss > > Subject: [nova] Instance Even Scheduling > > > > Hi, > > > > I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it. > > > > enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter > > > > I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts. > > > > - Ammad > > > > > > > > From haleyb.dev at gmail.com Mon Jan 17 18:16:44 2022 From: haleyb.dev at gmail.com (Brian Haley) Date: Mon, 17 Jan 2022 13:16:44 -0500 Subject: [Neutron][drivers] Proposing Oleg Bondarev (obondarev) to the Drivers team In-Reply-To: References: Message-ID: <7f84b899-7930-a0a6-6b7e-62a7d9a568a7@gmail.com> +1 from me, Oleg will be a great addition to the team! On 1/14/22 10:43 AM, Lajos Katona wrote: > Hi Neutron Drivers, > > I would like to propose Oleg Bondarev to be a member of the Neutron > Drivers team. > He has long experience with Neutron, he has been always around to help > with advice and reviews, and enthusiastically?participated in the > Drivers meeting (big?+1 as it is on Friday 1400UTC, quite late afternoon > in his timezone :-)). > > Neutron drivers, please vote before the next Drivers meeting (next > Friday, 21. January). > > Best Regards > Lajos Katona (lajoskatona) From gustavofaganello.santos at windriver.com Mon Jan 17 19:07:34 2022 From: gustavofaganello.santos at windriver.com (Gustavo Faganello Santos) Date: Mon, 17 Jan 2022 16:07:34 -0300 Subject: Tesla V100 32G GPU with openstack In-Reply-To: <7CCE77EA-F7EA-45AF-85B5-3566D1DAB1CB@gmail.com> References: <7CCE77EA-F7EA-45AF-85B5-3566D1DAB1CB@gmail.com> Message-ID: Hello, Satish. I've been working with vGPU lately and I believe I can answer your questions: 1. As you pointed out in question #2, the pci-passthrough will allocate the entire physical GPU to one single guest VM, while vGPU allows you to spawn from 1 to several VMs using the same physical GPU, depending on the vGPU type you choose (check NVIDIA docs to see which vGPU types the Tesla V100 supports and their properties); 2. Correct; 3. To use vGPU, you need vGPU drivers installed on the platform where your deployment of OpenStack is running AND in the VMs, so there are two drivers to be installed in order to use the feature. I believe both of them have to be purchased from NVIDIA in order to be used, and you would also have to deploy an NVIDIA licensing server in order to validate the licenses of the drivers running in the VMs. 4. You can see what the instructions are for each of these scenarios in [1] and [2]. There is also extensive documentation on vGPU at NVIDIA's website [3]. [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html [2] https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html [3] https://docs.nvidia.com/grid/13.0/index.html Regards, Gustavo. On 17/01/2022 14:41, Satish Patel wrote: > [Please note: This e-mail is from an EXTERNAL e-mail address] > > Folk, > > We have Tesla V100 32G GPU and I?m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question. > > 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet. > 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct? > 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here? > 3. What are the config difference between configure this card with passthrough vs vGPU? > > > Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ? > > Sent from my iPhone > From satish.txt at gmail.com Tue Jan 18 01:46:06 2022 From: satish.txt at gmail.com (Satish Patel) Date: Mon, 17 Jan 2022 20:46:06 -0500 Subject: Tesla V100 32G GPU with openstack In-Reply-To: References: <7CCE77EA-F7EA-45AF-85B5-3566D1DAB1CB@gmail.com> Message-ID: Thank you so much! This is what I was looking for. It is very odd that we buy a pricey card but then we have to buy a license to make those features available. On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos wrote: > > Hello, Satish. > > I've been working with vGPU lately and I believe I can answer your > questions: > > 1. As you pointed out in question #2, the pci-passthrough will allocate > the entire physical GPU to one single guest VM, while vGPU allows you to > spawn from 1 to several VMs using the same physical GPU, depending on > the vGPU type you choose (check NVIDIA docs to see which vGPU types the > Tesla V100 supports and their properties); > 2. Correct; > 3. To use vGPU, you need vGPU drivers installed on the platform where > your deployment of OpenStack is running AND in the VMs, so there are two > drivers to be installed in order to use the feature. I believe both of > them have to be purchased from NVIDIA in order to be used, and you would > also have to deploy an NVIDIA licensing server in order to validate the > licenses of the drivers running in the VMs. > 4. You can see what the instructions are for each of these scenarios in > [1] and [2]. > > There is also extensive documentation on vGPU at NVIDIA's website [3]. > > [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html > [2] https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html > [3] https://docs.nvidia.com/grid/13.0/index.html > > Regards, > Gustavo. > > On 17/01/2022 14:41, Satish Patel wrote: > > [Please note: This e-mail is from an EXTERNAL e-mail address] > > > > Folk, > > > > We have Tesla V100 32G GPU and I?m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question. > > > > 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet. > > 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct? > > 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here? > > 3. What are the config difference between configure this card with passthrough vs vGPU? > > > > > > Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ? > > > > Sent from my iPhone > > From tonyliu0592 at hotmail.com Tue Jan 18 02:27:48 2022 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Tue, 18 Jan 2022 02:27:48 +0000 Subject: [nova] Instance Even Scheduling In-Reply-To: References: <4bc575f1c56c93b79677f8fc535049fa7b99a976.camel@redhat.com> Message-ID: Hi, I enabled debug on nova-scheduler and launched 5 VMs. 8 hosts are returned as valid hosts from filter. Here is the weight log. This is from Train release. This is for the first VM. "ram" is the total memory. Is it supposed to be the available or consumed memory? It's the same for all nodes because they all have the same spec. "disk" is also the total. Because all compute nodes are using the same shared Ceph storage, disk is the same for all nodes. "instances" is the current number of instances on that node. I don't see cpu. Is cpu weigher not there yet in Train? Only compute-11 has positive weight, all others have negative weight. How comes the weight is negative for other nodes? Given the logging, they are all the same except for instances. ================ Weighed [WeighedHost [host: (compute-11, compute-11) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: 2.9901550710003333], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462 ================ For the second VM. ================ Weighed [WeighedHost [host: (compute-11, compute-11) ram: 757535MB disk: 114565120MB io_ops: 1 instances: 6, weight: 1.9888744586443294], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462 ================ Given above logging, compute-11 is always the winner of weight. It's just that when weighing for the next VM, the "instances" of compute-11 bump up, all others are the same. At the end, all 5 VMs are created on that same node. Is this all expected? Thanks! Tony ________________________________________ From: Tony Liu Sent: January 17, 2022 10:11 AM To: Sean Mooney; openstack-discuss at lists.openstack.org Subject: Re: [nova] Instance Even Scheduling That disk weigher is a good point. I am using Ceph as the storage backend for all compute nodes. Disk weigher may not handle that properly and cause some failure. Anyways, I will enable debug and look into more details. Thanks! Tony ________________________________________ From: Sean Mooney Sent: January 17, 2022 09:57 AM To: Tony Liu; openstack-discuss at lists.openstack.org Subject: Re: [nova] Instance Even Scheduling On Mon, 2022-01-17 at 17:45 +0000, Tony Liu wrote: > I recall weight didn't work as what I expected, that's why I used > shuffle_best_same_weighed_hosts. > > Here is what I experienced. > With Ussuri and default Nova scheduling settings. All weighers are supposed > to be enabled and all multipliers are positive. > yes by default all weighers are enabled and the shcduler spreads by default. > On 10x empty compute nodes > with the same spec, say the first vm is created on compute-2. Because some > memory and vCPU are consumed, the second vm should be created on some > node other than compute-2, if weighers are working fine. But it's still created > on compute-2, until I increased host_subset_size and enable shuffle_best_same_weighed_hosts. > i would guess that either the disk weigher or failed build wiehter is likely what results in teh behaivor different the default behavior is still to speread. before assuming there is a but you shoudl enable the schduler in debug mode to look at the weighters that are assinged to each host and determin why you are seeing differnt behavior. shuffle_best_same_weighed_hosts does as the name suggest. it shuffles the result if and only if there is a tie. that means it will only have a effect if 2 hosts were judged by thge weigher as beeing equally good candiates. host_subset_size instalead of looking at only the top host in the list enables you to consider the top n hosts. host_subset_size does a random selection from the host_subset_size top element after the hosts are sorted by the weighers intentionlaly adding randomness to the selection. this should not be needed in general. > It seems that all compute nodes are equally > weighted, although they don't have the same amount of resource. > Am I missing anything there? > > Thanks! > Tony > ________________________________________ > From: Sean Mooney > Sent: January 17, 2022 09:06 AM > To: openstack-discuss at lists.openstack.org > Subject: Re: [nova] Instance Even Scheduling > > On Mon, 2022-01-17 at 16:35 +0000, Tony Liu wrote: > > https://docs.openstack.org/nova/latest/admin/scheduling.html > > > > Filter gives you a group of valid hosts, assuming they are equally weighted, > > you may try with these two settings to pick up a host in a more even manner. > > host_subset_size (increase the size) > > shuffle_best_same_weighed_hosts (enable the shuffle) > > > > https://docs.openstack.org/nova/latest/configuration/config.html > > yes the weighers are what will blance between the hosts and the filters determin which host are valid > so if you want to spread based on ram then you need to adject the > https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.ram_weight_multiplier > > for example set ram_weight_multiplier=10.0 to make it relitivly more important. > the way the weigher work is all wheigher calulate the weight for a host, > we then add them after multiplying them by the weights and then sort. > > > > > > > > Tony > > ________________________________________ > > From: Ammad Syed > > Sent: January 16, 2022 11:53 PM > > To: openstack-discuss > > Subject: [nova] Instance Even Scheduling > > > > Hi, > > > > I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it. > > > > enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter > > > > I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts. > > > > - Ammad > > > > > > > > From tkajinam at redhat.com Tue Jan 18 02:42:44 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Tue, 18 Jan 2022 11:42:44 +0900 Subject: [heat-tempest-plugin] call for help from plugin's maintainers In-Reply-To: References: Message-ID: I've looked into this but it seems the following error was actually caused by the latest release of gabbi(2.5.0). TypeError: __init__() got an unexpected keyword argument 'server_hostname' I've reported that issue to gabbi in [1] but if my observation is correct the problem should be fixed in urllib3 which gabbi is dependent on. [1] https://github.com/cdent/gabbi/issues/309 On Tue, Jan 18, 2022 at 2:34 AM Martin Kopec wrote: > Hello, > > most of the tests of heat-tempest-plugin have started failing. We noticed > that in interop [1], however, we reproduced that in the project's gates as > well [2]. > > I suspect it might be an issue with the new gabbi package - there has been > an update recently. > Could you please have a look. > > [1] > https://review.opendev.org/c/openinfra/ansible-role-refstack-client/+/824832 > [2] https://review.opendev.org/c/openstack/heat-tempest-plugin/+/823794 > > Thank you, > -- > Martin Kopec > Senior Software Quality Engineer > Red Hat EMEA > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Tue Jan 18 04:44:12 2022 From: amotoki at gmail.com (Akihiro Motoki) Date: Tue, 18 Jan 2022 13:44:12 +0900 Subject: [Neutron][drivers] Proposing Oleg Bondarev (obondarev) to the Drivers team In-Reply-To: References: Message-ID: +1 Oleg would be a great addition to the drivers team. Akihiro Motoki (amotoki) On Sat, Jan 15, 2022 at 12:48 AM Lajos Katona wrote: > > Hi Neutron Drivers, > > I would like to propose Oleg Bondarev to be a member of the Neutron Drivers team. > He has long experience with Neutron, he has been always around to help with advice and reviews, and enthusiastically participated in the Drivers meeting (big +1 as it is on Friday 1400UTC, quite late afternoon in his timezone :-)). > > Neutron drivers, please vote before the next Drivers meeting (next Friday, 21. January). > > Best Regards > Lajos Katona (lajoskatona) From chkumar at redhat.com Tue Jan 18 07:27:40 2022 From: chkumar at redhat.com (Chandan Kumar) Date: Tue, 18 Jan 2022 12:57:40 +0530 Subject: [TripleO] Gate/Check blocker - tripleo-ci-centos-9-content-provider Message-ID: Hello All, tripleo-ci-centos-9-content-provider is failing in Check and gate pipeline with following error ``` Last metadata expiration check: 0:01:21 ago on Tue 18 Jan 2022 03:13:29 AM UTC. centos-stream-release-9.0-6.el9.noarch.rpm 7.9 MB/s | 26 kB 00:00 warning: centos-stream-release-9.0-6.el9.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID 8483c65d: NOKEY Verifying... ######################################## Preparing... ######################################## package centos-stream-release-9.0-8.el9.noarch (which is newer than centos-stream-release-9.0-6.el9.noarch) is already installed ``` A bug is already logged against the failure: https://bugs.launchpad.net/tripleo/+bug/1958202 and wip fix is here: https://review.opendev.org/c/openstack/tripleo-common/+/825038 Please hold your recheck on the tripleo patches where the CS9 content provider runs. Will update this thread once we have fixes merged. Thanks, Chandan Kumar From tkajinam at redhat.com Tue Jan 18 08:35:00 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Tue, 18 Jan 2022 17:35:00 +0900 Subject: [heat-tempest-plugin] call for help from plugin's maintainers In-Reply-To: References: Message-ID: I have opened an issue for urllib3[1], too, and created a PR to discuss a potential fix. [1] https://github.com/urllib3/urllib3/issues/2534 Because it'd take some time until we get feedback from these two communities, I've proposed a change to pin gabbi to 2.4.0[2]. [2] https://review.opendev.org/c/openstack/requirements/+/825044 The issue might affect other projects using gabbi as well, unless https, instead of http, is used for endpoint access. On Tue, Jan 18, 2022 at 11:42 AM Takashi Kajinami wrote: > I've looked into this but it seems the following error was actually caused > by the latest release of gabbi(2.5.0). > TypeError: __init__() got an unexpected keyword argument 'server_hostname' > > I've reported that issue to gabbi in [1] but if my observation is correct > the problem should be > fixed in urllib3 which gabbi is dependent on. > [1] https://github.com/cdent/gabbi/issues/309 > > On Tue, Jan 18, 2022 at 2:34 AM Martin Kopec wrote: > >> Hello, >> >> most of the tests of heat-tempest-plugin have started failing. We noticed >> that in interop [1], however, we reproduced that in the project's gates as >> well [2]. >> >> I suspect it might be an issue with the new gabbi package - there has >> been an update recently. >> Could you please have a look. >> >> [1] >> https://review.opendev.org/c/openinfra/ansible-role-refstack-client/+/824832 >> [2] https://review.opendev.org/c/openstack/heat-tempest-plugin/+/823794 >> >> Thank you, >> -- >> Martin Kopec >> Senior Software Quality Engineer >> Red Hat EMEA >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From antonio.paulo at cern.ch Tue Jan 18 10:55:52 2022 From: antonio.paulo at cern.ch (=?UTF-8?Q?Ant=c3=b3nio_Paulo?=) Date: Tue, 18 Jan 2022 11:55:52 +0100 Subject: Tesla V100 32G GPU with openstack In-Reply-To: References: <7CCE77EA-F7EA-45AF-85B5-3566D1DAB1CB@gmail.com> Message-ID: Hey Satish, Gustavo, Just to clarify a bit on point 3, you will have to buy a vGPU license per card and this gives you access to all the downloads you need through NVIDIA's web dashboard -- both the host and guest drivers as well as the license server setup files. Cheers, Ant?nio On 18/01/22 02:46, Satish Patel wrote: > Thank you so much! This is what I was looking for. It is very odd that > we buy a pricey card but then we have to buy a license to make those > features available. > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos > wrote: >> >> Hello, Satish. >> >> I've been working with vGPU lately and I believe I can answer your >> questions: >> >> 1. As you pointed out in question #2, the pci-passthrough will allocate >> the entire physical GPU to one single guest VM, while vGPU allows you to >> spawn from 1 to several VMs using the same physical GPU, depending on >> the vGPU type you choose (check NVIDIA docs to see which vGPU types the >> Tesla V100 supports and their properties); >> 2. Correct; >> 3. To use vGPU, you need vGPU drivers installed on the platform where >> your deployment of OpenStack is running AND in the VMs, so there are two >> drivers to be installed in order to use the feature. I believe both of >> them have to be purchased from NVIDIA in order to be used, and you would >> also have to deploy an NVIDIA licensing server in order to validate the >> licenses of the drivers running in the VMs. >> 4. You can see what the instructions are for each of these scenarios in >> [1] and [2]. >> >> There is also extensive documentation on vGPU at NVIDIA's website [3]. >> >> [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html >> [2] https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html >> [3] https://docs.nvidia.com/grid/13.0/index.html >> >> Regards, >> Gustavo. >> >> On 17/01/2022 14:41, Satish Patel wrote: >>> [Please note: This e-mail is from an EXTERNAL e-mail address] >>> >>> Folk, >>> >>> We have Tesla V100 32G GPU and I?m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question. >>> >>> 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet. >>> 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct? >>> 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here? >>> 3. What are the config difference between configure this card with passthrough vs vGPU? >>> >>> >>> Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ? >>> >>> Sent from my iPhone >>> > From mkopec at redhat.com Tue Jan 18 13:21:20 2022 From: mkopec at redhat.com (Martin Kopec) Date: Tue, 18 Jan 2022 14:21:20 +0100 Subject: [heat-tempest-plugin] call for help from plugin's maintainers In-Reply-To: References: Message-ID: Thank you Takashi for looking into this. Should we consider using only https for endpoint access in the future? On Tue, 18 Jan 2022 at 09:35, Takashi Kajinami wrote: > I have opened an issue for urllib3[1], too, and created a PR to discuss a > potential fix. > [1] https://github.com/urllib3/urllib3/issues/2534 > > Because it'd take some time until we get feedback from these two > communities, > I've proposed a change to pin gabbi to 2.4.0[2]. > [2] https://review.opendev.org/c/openstack/requirements/+/825044 > > The issue might affect other projects using gabbi as well, unless https, > instead of http, > is used for endpoint access. > > > On Tue, Jan 18, 2022 at 11:42 AM Takashi Kajinami > wrote: > >> I've looked into this but it seems the following error was actually >> caused by the latest release of gabbi(2.5.0). >> TypeError: __init__() got an unexpected keyword argument >> 'server_hostname' >> >> I've reported that issue to gabbi in [1] but if my observation is correct >> the problem should be >> fixed in urllib3 which gabbi is dependent on. >> [1] https://github.com/cdent/gabbi/issues/309 >> >> On Tue, Jan 18, 2022 at 2:34 AM Martin Kopec wrote: >> >>> Hello, >>> >>> most of the tests of heat-tempest-plugin have started failing. We >>> noticed that in interop [1], however, we reproduced that in the project's >>> gates as well [2]. >>> >>> I suspect it might be an issue with the new gabbi package - there has >>> been an update recently. >>> Could you please have a look. >>> >>> [1] >>> https://review.opendev.org/c/openinfra/ansible-role-refstack-client/+/824832 >>> [2] https://review.opendev.org/c/openstack/heat-tempest-plugin/+/823794 >>> >>> Thank you, >>> -- >>> Martin Kopec >>> Senior Software Quality Engineer >>> Red Hat EMEA >>> >>> >>> >>> -- Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Jan 18 13:34:35 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 18 Jan 2022 14:34:35 +0100 Subject: [neutron] CI meeting agenda Message-ID: <11906918.O9o76ZdvQC@p1> Hi, Please remember that today at 3pm UTC we will have Neutron CI meeting. It will be on both IRC and video: [1] as meetpad.opendev.org seems that is still disabled. Agenda for the meeting is at [2]. [1] https://meet.jit.si/neutron-ci-meetings [2] https://etherpad.opendev.org/p/neutron-ci-meetings -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From fungi at yuggoth.org Tue Jan 18 13:49:55 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 18 Jan 2022 13:49:55 +0000 Subject: [neutron] CI meeting agenda In-Reply-To: <11906918.O9o76ZdvQC@p1> References: <11906918.O9o76ZdvQC@p1> Message-ID: <20220118134954.5a77amtpuraaewqi@yuggoth.org> On 2022-01-18 14:34:35 +0100 (+0100), Slawek Kaplonski wrote: [...] > as meetpad.opendev.org seems that is still disabled. [...] Yes, apologies, we're still waiting for them to release new images, but will consider alternative solutions if this goes on much longer. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From zhouhenglc at inspur.com Tue Jan 18 06:34:58 2022 From: zhouhenglc at inspur.com (=?utf-8?B?SGVuZyBaaG91ICjlkajmgZIpLea1qua9ruaVsOaNrg==?=) Date: Tue, 18 Jan 2022 06:34:58 +0000 Subject: Can neutron-fwaas project be revived? Message-ID: <771f9e50a5f0498caecf3cb892902954@inspur.com> Hi Lajos? I understand that the process is to retire neutron-fwaas first, and then create a new project in the X namespace. If want to retire the project, do we need to wait until the stable version(Victoria) of the project is no longer maintained(Extended Maintenance estimated 2022-04-27)? ???: Lajos Katona [mailto:katonalala at gmail.com] ????: 2022?1?18? 0:15 ???: Dazhong Qin (???)-??????? ??: miguel at mlavalle.com; openstack-discuss at lists.openstack.org; Heng Zhou (??)-???? ??: Re: Can neutron-fwaas project be revived? Hi, Neutron team discussed this question on the Drivers meeting, see the logs: https://meetings.opendev.org/meetings/neutron_drivers/2022/neutron_drivers.2022-01-14-14.01.log.html The agreement is to move neutron-fwaas to x/ namespace, and revive it as x/neutron-fwaas. Best regards Lajos Katona (lajoskatona) Dazhong Qin (???)-??????? > ezt ?rta (id?pont: 2022. jan. 6., Cs, 2:30): Hi?Miguel? Ok?let?s meet at January 14th. Best regards ???: Miguel Lavalle [mailto:miguel at mlavalle.com ] ????: 2022?1?6? 9:12 ???: Dazhong Qin (???)-??????? > ??: openstack-discuss at lists.openstack.org ??: Re: Can neutron-fwaas project be revived? Hi Qin, Unfortunately, this coming January 7th several members of the drivers team will be off on holiday. We won't have a quorum to discuss your proposal. I hope that January 14th works for you and your team. Best regards Miguel On Fri, Dec 24, 2021 at 10:18 AM Miguel Lavalle > wrote: Hi Qin, I have added this topic to the drivers meeting agenda (see on demand agenda close to the bottom): https://wiki.openstack.org/wiki/Meetings/NeutronDrivers Cheers On Thu, Dec 23, 2021 at 7:42 PM Dazhong Qin (???)-??????? > wrote: Hi Miguel, Thank you for your suggestion. My colleague HengZhou will submit relevant documents as soon as possible in accordance with the official neutron rules. Yes?we will attend the neutron drivers meeting on January 7th. Merry Christmas! Best wish for you! ???: Miguel Lavalle [mailto:miguel at mlavalle.com ] ????: 2021?12?24? 0:43 ???: Dazhong Qin (???)-??????? > ??: openstack-discuss at lists.openstack.org ??: Re: Can neutron-fwaas project be revived? Hi Qin, In preparation for your meeting with the drivers team, I suggest we follow as a starting point the Neutron Stadium Governance rules and processes as outlined in the official documentation: https://docs.openstack.org/neutron/latest/contributor/stadium/governance.html. In the past, we have re-incorporated projects to the Stadium, like for example in the case of neutron-vpnaas. This document in the Neutron specs repo summarizes how we assessed the readiness of vpnaas for the stadium: https://specs.openstack.org/openstack/neutron-specs/specs/stadium/queens/neutron-vpnaas.html (https://review.opendev.org/c/openstack/neutron-specs/+/506012). I suggest you start a similar document for fwaas in the folder for the current cycle: https://specs.openstack.org/openstack/neutron-specs/specs/yoga/index.html. As soon as you can, please push it to gerrit, so we can start reviewing it. Did I understand correctly that you will attend the drivers meeting on January 7th? Best regards Miguel On Wed, Dec 22, 2021 at 8:09 PM Dazhong Qin (???)-??????? > wrote: Hi Miguel, I am glad to hear this news. How about our discussion on January 7th, this Friday is not convenient, what do I need to prepare before the discussion, do I need to submit rfe or other descriptions? ???: Miguel Lavalle [mailto: miguel at mlavalle.com] ????: 2021?12?23? 0:20 ???: Dazhong Qin (???)-??????? < qinhaizhong01 at inspur.com> ??: openstack-discuss at lists.openstack.org ??: Re: Can neutron-fwaas project be revived? Hi Qin, I think that in principle the community will be delighted if you and your team can reactivate the project and maintain it. Probably the best next step is for you to attend the next Neutron drivers meeting (https://wiki.openstack.org/wiki/Meetings/NeutronDrivers) so we can discuss the specifics of your proposal. This meeting takes place on Fridays at 1400 UTC over IRC in oftc.net , channel #openstack-neutron. Due to the end of year festivities in much of Europe and America, the next meeting will take place until January 7th. Is that a good next step for you? If yes, I'll add this topic to the meeting's agenda. Best regards On Tue, Dec 21, 2021 at 10:29 AM Dazhong Qin (???)-??????? > wrote: Hi? The firewall project is a necessary function when the project is delivered. The lack of firewall function after switching OVN is not acceptable to customers. We intend to maintain this project and develop the fwaas driver based on ovn. Whether the neutron-fwaas project can be reactivate? What should I do ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3615 bytes Desc: not available URL: From chkumar at redhat.com Tue Jan 18 16:20:41 2022 From: chkumar at redhat.com (Chandan Kumar) Date: Tue, 18 Jan 2022 21:50:41 +0530 Subject: [TripleO] Gate/Check blocker - tripleo-ci-centos-9-content-provider In-Reply-To: References: Message-ID: On Tue, Jan 18, 2022 at 12:57 PM Chandan Kumar wrote: > > Hello All, > > tripleo-ci-centos-9-content-provider is failing in Check and gate > pipeline with following error > > ``` > Last metadata expiration check: 0:01:21 ago on Tue 18 Jan 2022 03:13:29 AM UTC. > centos-stream-release-9.0-6.el9.noarch.rpm 7.9 MB/s | 26 kB 00:00 > warning: centos-stream-release-9.0-6.el9.noarch.rpm: Header V3 > RSA/SHA256 Signature, key ID 8483c65d: NOKEY > Verifying... ######################################## > Preparing... ######################################## > package centos-stream-release-9.0-8.el9.noarch (which is newer than > centos-stream-release-9.0-6.el9.noarch) is already installed > > ``` The package is synced now and https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-9-content-provider&skip=0 is back to green We are still working on proper fix: Please go ahead and recheck patches. :-) Thanks, Chandan Kumar From gmann at ghanshyammann.com Tue Jan 18 16:49:48 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 18 Jan 2022 10:49:48 -0600 Subject: Can neutron-fwaas project be revived? In-Reply-To: <771f9e50a5f0498caecf3cb892902954@inspur.com> References: <771f9e50a5f0498caecf3cb892902954@inspur.com> Message-ID: <17e6e183905.d179775f807310.479674362228455950@ghanshyammann.com> ---- On Tue, 18 Jan 2022 00:34:58 -0600 Heng Zhou (??)-???? wrote ---- > > Hi Lajos? > I understand that the process is to retire neutron-fwaas first, and then create a new project in the X namespace. > If want to retire the project, do we need to wait until the stable version(Victoria) of the project is no longer maintained(Extended Maintenance estimated 2022-04-27)? We could do that but I do not think we need to hold you guys to maintain it in x/namespace. As discussed in project-config change[1], you or neutron folks can propose the retirement now itself (considering there is no one to maintain/release stable/victoria for new bug fixes) and TC will merge it as per process. After that, creating it in x/ namespace will be good to do. [1] https://review.opendev.org/c/openstack/project-config/+/824905 -gmann > > ???: Lajos Katona [mailto:katonalala at gmail.com] > ????: 2022?1?18? 0:15 > ???: Dazhong Qin (???)-??????? > ??: miguel at mlavalle.com; openstack-discuss at lists.openstack.org; Heng Zhou (??)-???? > ??: Re: Can neutron-fwaas project be revived? > > Hi, > Neutron team discussed this question on the Drivers meeting, see the logs: > https://meetings.opendev.org/meetings/neutron_drivers/2022/neutron_drivers.2022-01-14-14.01.log.html > > The agreement is to move neutron-fwaas to x/ namespace, and revive it as x/neutron-fwaas. > > Best regards > Lajos Katona (lajoskatona) > > Dazhong Qin (???)-??????? ezt ?rta (id?pont: 2022. jan. 6., Cs, 2:30): > Hi?Miguel? > > Ok?let?s meet at January 14th. > > Best regards > > > ???: Miguel Lavalle [mailto:miguel at mlavalle.com] > ????: 2022?1?6? 9:12 > ???: Dazhong Qin (???)-??????? > ??: openstack-discuss at lists.openstack.org > ??: Re: Can neutron-fwaas project be revived? > > Hi Qin, > > Unfortunately, this coming January 7th several members of the drivers team will be off on holiday. We won't have a quorum to discuss your proposal. I hope that January 14th works for you and your team. > > Best regards > > Miguel > > On Fri, Dec 24, 2021 at 10:18 AM Miguel Lavalle wrote: > Hi Qin, > > I have added this topic to the drivers meeting agenda (see on demand agenda close to the bottom): https://wiki.openstack.org/wiki/Meetings/NeutronDrivers > > Cheers > > On Thu, Dec 23, 2021 at 7:42 PM Dazhong Qin (???)-??????? wrote: > Hi Miguel, > Thank you for your suggestion. My colleague HengZhou will submit relevant documents as soon as possible in accordance with the official neutron rules. > Yes?we will attend the neutron drivers meeting on January 7th. > Merry Christmas! > Best wish for you! > > ???: Miguel Lavalle [mailto:miguel at mlavalle.com] > ????: 2021?12?24? 0:43 > ???: Dazhong Qin (???)-??????? > ??: openstack-discuss at lists.openstack.org > ??: Re: Can neutron-fwaas project be revived? > > Hi Qin, > > In preparation for your meeting with the drivers team, I suggest we follow as a starting point the Neutron Stadium Governance rules and processes as outlined in the official documentation: https://docs.openstack.org/neutron/latest/contributor/stadium/governance.html. In the past, we have re-incorporated projects to the Stadium, like for example in the case of neutron-vpnaas. This document in the Neutron specs repo summarizes how we assessed the readiness of vpnaas for the stadium: https://specs.openstack.org/openstack/neutron-specs/specs/stadium/queens/neutron-vpnaas.html (https://review.opendev.org/c/openstack/neutron-specs/+/506012). I suggest you start a similar document for fwaas in the folder for the current cycle: https://specs.openstack.org/openstack/neutron-specs/specs/yoga/index.html. As soon as you can, please push it to gerrit, so we can start reviewing it. > > Did I understand correctly that you will attend the drivers meeting on January 7th? > > Best regards > > Miguel > > > On Wed, Dec 22, 2021 at 8:09 PM Dazhong Qin (???)-??????? wrote: > Hi Miguel, > I am glad to hear this news. How about our discussion on January 7th, this Friday is not convenient, what do I need to prepare before the discussion, do I need to submit rfe or other descriptions? > > ???: Miguel Lavalle [mailto:miguel at mlavalle.com] > ????: 2021?12?23? 0:20 > ???: Dazhong Qin (???)-??????? > ??: openstack-discuss at lists.openstack.org > ??: Re: Can neutron-fwaas project be revived? > > Hi Qin, > > I think that in principle the community will be delighted if you and your team can reactivate the project and maintain it. Probably the best next step is for you to attend the next Neutron drivers meeting (https://wiki.openstack.org/wiki/Meetings/NeutronDrivers) so we can discuss the specifics of your proposal. This meeting takes place on Fridays at 1400 UTC over IRC in oftc.net, channel #openstack-neutron. Due to the end of year festivities in much of Europe and America, the next meeting will take place until January 7th. Is that a good next step for you? If yes, I'll add this topic to the meeting's agenda. > > Best regards > > On Tue, Dec 21, 2021 at 10:29 AM Dazhong Qin (???)-??????? wrote: > Hi? > The firewall project is a necessary function when the project is delivered. The lack of firewall function after switching OVN is not acceptable to customers. We intend to maintain this project and develop the fwaas driver based on ovn. Whether the neutron-fwaas project can be reactivate? What should I do ? > > > > From allison at openinfra.dev Tue Jan 18 17:25:08 2022 From: allison at openinfra.dev (Allison Price) Date: Tue, 18 Jan 2022 11:25:08 -0600 Subject: OpenInfra Live - January 20 at 1500 UTC Message-ID: <18BB4CBC-767E-4D2F-B524-11630D539254@openinfra.dev> Hi everyone, This week?s OpenInfra Live episode is brought to you by the global OpenInfra Community. As we approach the launch of the CFP (call for papers) for the upcoming OpenInfra Summit in Berlin (June 7-9), join members of the 2022 Programming Committee as they share their top tips and tricks for writing a winning submission and answer your questions on what they look for in talks. Erin Disney, Senior Event Marketing Manager at the OpenInfra Foundation will host members of the Berlin Summit Programming committee, including: Armstrong Foundjem (AI, Machine Learning & HPC) Amy Marrich (Getting Started) Shuquan Huang (5G, NFV, and Edge) Arkady Kanevsky (Hardware Enablement) Arne Wiebalck (Private and Hybrid Cloud) Episode: Berlin Summit 2022 CFP 101 Date and time: Thursday, January 20 at 9am CT (1500 UTC) You can watch us live on: YouTube: https://www.youtube.com/watch?v=9s1VPCyuJXI LinkedIn: https://www.linkedin.com/video/event/urn:li:ugcPost:6887809515576815616/ Facebook: https://www.facebook.com/104139126308032/posts/4735936496461582/ WeChat: the recording will be posted on OpenStack WeChat after the live stream Have an idea for a future episode? Share it now at ideas.openinfra.live . Thanks, Allison -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Jan 18 17:51:04 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 18 Jan 2022 17:51:04 +0000 Subject: Can neutron-fwaas project be revived? In-Reply-To: <17e6e183905.d179775f807310.479674362228455950@ghanshyammann.com> References: <771f9e50a5f0498caecf3cb892902954@inspur.com> <17e6e183905.d179775f807310.479674362228455950@ghanshyammann.com> Message-ID: <20220118175104.a6ppj2kxpijeztz7@yuggoth.org> On 2022-01-18 10:49:48 -0600 (-0600), Ghanshyam Mann wrote: [...] > As discussed in project-config change[1], you or neutron folks can > propose the retirement now itself (considering there is no one to > maintain/release stable/victoria for new bug fixes) and TC will > merge it as per process. After that, creating it in x/ namespace > will be good to do. [...] Looking at this from a logistical perspective, it's a fair amount of churn in code hosting as well as unwelcoming to the new volunteers, compared to just leaving the repository where it is now and letting them contribute to it there. If the concern is that the Neutron team doesn't want to retain responsibility for it while they evaluate the conviction of the new maintainers for eventual re-inclusion, then the TC would be well within its rights to declare that the repository can remain in place while not having it be part of the Neutron team's responsibilities. There are a number of possible solutions, ranging from making a new category of provisional deliverable, to creating a lightweight project team under the DPL model, to declaring it a pop-up team with a TC-owned repository. There are repositories within the OpenStack namespace which are not an official part of the OpenStack coordinated release, after all. Solutions which don't involve having the new work take place somewhere separate, and the work involved in making that separate place, which will simply be closed down as transient cruft if everything goes as desired. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From masghar.bese15seecs at seecs.edu.pk Tue Jan 18 18:21:21 2022 From: masghar.bese15seecs at seecs.edu.pk (Mahnoor Asghar) Date: Tue, 18 Jan 2022 23:21:21 +0500 Subject: [Ironic][docs] Moving API Documentation closer to code Message-ID: Dear all, I am an Outreachy intern, working with OpenStack Ironic on improving their API reference documentation mechanism. Currently, the API documentation is kept separately from the code (in the Ironic repository), which means it diverges from the code, and is difficult to maintain and manage as development progresses. It would be desirable to auto-document our REST API interactions and data structures in code along with the very classes they come from. This way, we can clearly understand the request, the response, and any other important interaction details from a single point of view inside the code. It will also be a lot easier to maintain. The current Ironic API documentation is being generated by Sphinx, using the os-api-ref extension. It would be a viable solution to move the documentation from the separate sub-directory to docstrings within the code files itself, and develop a Sphinx extension that provides essential features to generate the documentation from docstrings. More details about the proposed extension, and its features can be found on the Ironic Storyboard [1]. The wider OpenStack community may be facing similar problems maintaining their documentation, and it would be helpful to understand their perspective and concerns - with respect to using such an extension, their expectations from it, collaborating on it, and whether they have additional ideas about how we may bring API documentation closer to the code. [1]: https://storyboard.openstack.org/#!/story/2009785 Thank you and regards, Mahnoor Asghar From tonyliu0592 at hotmail.com Tue Jan 18 18:22:27 2022 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Tue, 18 Jan 2022 18:22:27 +0000 Subject: [nova] Instance Even Scheduling In-Reply-To: References: <4bc575f1c56c93b79677f8fc535049fa7b99a976.camel@redhat.com> Message-ID: While looking into it, found a bug on this line. It's in all branches. https://github.com/openstack/nova/blob/master/nova/weights.py#L51 It's supposed to return a list of normalized values. Here is the fix. - return ((i - minval) / range_ for i in weight_list) + return ([(i - minval) / range_ for i in weight_list]) The negative weight is from build-failure weigher. Looking into nova-compute... Again, is nova-compute supposed to provide available resource, instead of total resource, for weigher to evaluate? Thanks! Tony ________________________________________ From: Tony Liu Sent: January 17, 2022 06:27 PM To: Sean Mooney; openstack-discuss at lists.openstack.org Subject: Re: [nova] Instance Even Scheduling Hi, I enabled debug on nova-scheduler and launched 5 VMs. 8 hosts are returned as valid hosts from filter. Here is the weight log. This is from Train release. This is for the first VM. "ram" is the total memory. Is it supposed to be the available or consumed memory? It's the same for all nodes because they all have the same spec. "disk" is also the total. Because all compute nodes are using the same shared Ceph storage, disk is the same for all nodes. "instances" is the current number of instances on that node. I don't see cpu. Is cpu weigher not there yet in Train? Only compute-11 has positive weight, all others have negative weight. How comes the weight is negative for other nodes? Given the logging, they are all the same except for instances. ================ Weighed [WeighedHost [host: (compute-11, compute-11) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: 2.9901550710003333], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462 ================ For the second VM. ================ Weighed [WeighedHost [host: (compute-11, compute-11) ram: 757535MB disk: 114565120MB io_ops: 1 instances: 6, weight: 1.9888744586443294], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462 ================ Given above logging, compute-11 is always the winner of weight. It's just that when weighing for the next VM, the "instances" of compute-11 bump up, all others are the same. At the end, all 5 VMs are created on that same node. Is this all expected? Thanks! Tony ________________________________________ From: Tony Liu Sent: January 17, 2022 10:11 AM To: Sean Mooney; openstack-discuss at lists.openstack.org Subject: Re: [nova] Instance Even Scheduling That disk weigher is a good point. I am using Ceph as the storage backend for all compute nodes. Disk weigher may not handle that properly and cause some failure. Anyways, I will enable debug and look into more details. Thanks! Tony ________________________________________ From: Sean Mooney Sent: January 17, 2022 09:57 AM To: Tony Liu; openstack-discuss at lists.openstack.org Subject: Re: [nova] Instance Even Scheduling On Mon, 2022-01-17 at 17:45 +0000, Tony Liu wrote: > I recall weight didn't work as what I expected, that's why I used > shuffle_best_same_weighed_hosts. > > Here is what I experienced. > With Ussuri and default Nova scheduling settings. All weighers are supposed > to be enabled and all multipliers are positive. > yes by default all weighers are enabled and the shcduler spreads by default. > On 10x empty compute nodes > with the same spec, say the first vm is created on compute-2. Because some > memory and vCPU are consumed, the second vm should be created on some > node other than compute-2, if weighers are working fine. But it's still created > on compute-2, until I increased host_subset_size and enable shuffle_best_same_weighed_hosts. > i would guess that either the disk weigher or failed build wiehter is likely what results in teh behaivor different the default behavior is still to speread. before assuming there is a but you shoudl enable the schduler in debug mode to look at the weighters that are assinged to each host and determin why you are seeing differnt behavior. shuffle_best_same_weighed_hosts does as the name suggest. it shuffles the result if and only if there is a tie. that means it will only have a effect if 2 hosts were judged by thge weigher as beeing equally good candiates. host_subset_size instalead of looking at only the top host in the list enables you to consider the top n hosts. host_subset_size does a random selection from the host_subset_size top element after the hosts are sorted by the weighers intentionlaly adding randomness to the selection. this should not be needed in general. > It seems that all compute nodes are equally > weighted, although they don't have the same amount of resource. > Am I missing anything there? > > Thanks! > Tony > ________________________________________ > From: Sean Mooney > Sent: January 17, 2022 09:06 AM > To: openstack-discuss at lists.openstack.org > Subject: Re: [nova] Instance Even Scheduling > > On Mon, 2022-01-17 at 16:35 +0000, Tony Liu wrote: > > https://docs.openstack.org/nova/latest/admin/scheduling.html > > > > Filter gives you a group of valid hosts, assuming they are equally weighted, > > you may try with these two settings to pick up a host in a more even manner. > > host_subset_size (increase the size) > > shuffle_best_same_weighed_hosts (enable the shuffle) > > > > https://docs.openstack.org/nova/latest/configuration/config.html > > yes the weighers are what will blance between the hosts and the filters determin which host are valid > so if you want to spread based on ram then you need to adject the > https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.ram_weight_multiplier > > for example set ram_weight_multiplier=10.0 to make it relitivly more important. > the way the weigher work is all wheigher calulate the weight for a host, > we then add them after multiplying them by the weights and then sort. > > > > > > > > Tony > > ________________________________________ > > From: Ammad Syed > > Sent: January 16, 2022 11:53 PM > > To: openstack-discuss > > Subject: [nova] Instance Even Scheduling > > > > Hi, > > > > I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it. > > > > enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter > > > > I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts. > > > > - Ammad > > > > > > > > From fungi at yuggoth.org Tue Jan 18 18:45:59 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 18 Jan 2022 18:45:59 +0000 Subject: [Ironic][docs][api-sig] Moving API Documentation closer to code In-Reply-To: References: Message-ID: <20220118184558.l4bkh2xtvlf5j6qe@yuggoth.org> On 2022-01-18 23:21:21 +0500 (+0500), Mahnoor Asghar wrote: [...] > I am an Outreachy intern, working with OpenStack Ironic on improving > their API reference documentation mechanism. [...] Welcome! > The wider OpenStack community may be facing similar problems > maintaining their documentation, and it would be helpful to > understand their perspective and concerns - with respect to using > such an extension, their expectations from it, collaborating on > it, and whether they have additional ideas about how we may bring > API documentation closer to the code. [...] Keep in mind that developers of OpenStack's services have traditionally held the belief that a good public API should be versioned independently from the software which implements it. One version of the software may provide multiple API versions, and there's not necessarily a 1:1 correlation between public REST API methods and the internal Python API methods which implement them. I suspect this has driven some of the desire to separate REST API documentation from the source code itself. There have been past attempts to implement self-describing APIs, scraping running services in CI jobs, and so on. I don't recall the details, but folks who were active in the API SIG likely have a clearer recollection and can probably relate some of the challenges encountered with various approaches. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From dtantsur at redhat.com Tue Jan 18 19:10:12 2022 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Tue, 18 Jan 2022 20:10:12 +0100 Subject: [ironic] [release] RFC: stop doing bugfix branches for Bifrost? Message-ID: Hi team! Some time ago we introduced bugfix/X.Y branches [1] to some of the Ironic projects. This has worked pretty well and has been very helpful in ironic/inspector/IPA, but I have second thoughts about Bifrost. First, maintaining Bifrost branches is tedious enough because of how many distros we support and how quickly they change. Second, our recommended approach to using Bifrost is to git-clone master and work from it. I'm honestly unsure if the regular stable branches are used (outside of the Kolla CI), let alone bugfix branches. (I also doubt that Bifrost releases are very popular or even meaningful, but that's another topic.) As one of few people who is maintaining bugfix branches, I suggest we stop making them for Bifrost and switch Bifrost back to normal cycle-with-intermediaries. We can keep releasing 3x per cycle, just to have checkpoints, but only create "normal" stable branches. Thoughts? Dmitry [1] https://specs.openstack.org/openstack/ironic-specs/specs/approved/new-release-model.html -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Tue Jan 18 19:14:59 2022 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 18 Jan 2022 11:14:59 -0800 Subject: [Ironic][docs][api-sig] Moving API Documentation closer to code In-Reply-To: <20220118184558.l4bkh2xtvlf5j6qe@yuggoth.org> References: <20220118184558.l4bkh2xtvlf5j6qe@yuggoth.org> Message-ID: On Tue, Jan 18, 2022 at 10:49 AM Jeremy Stanley wrote: > > On 2022-01-18 23:21:21 +0500 (+0500), Mahnoor Asghar wrote: > [...] > > I am an Outreachy intern, working with OpenStack Ironic on improving > > their API reference documentation mechanism. > [...] > > Welcome! > > > The wider OpenStack community may be facing similar problems > > maintaining their documentation, and it would be helpful to > > understand their perspective and concerns - with respect to using > > such an extension, their expectations from it, collaborating on > > it, and whether they have additional ideas about how we may bring > > API documentation closer to the code. > [...] > > Keep in mind that developers of OpenStack's services have > traditionally held the belief that a good public API should be > versioned independently from the software which implements it. One > version of the software may provide multiple API versions, and > there's not necessarily a 1:1 correlation between public REST API > methods and the internal Python API methods which implement them. I > suspect this has driven some of the desire to separate REST API > documentation from the source code itself. And the API version contracts we've tended to offer has been a very linear progression matching the software versions, for the most part. I think as the groups of maintainers have evolved, I believe the traditional belief may no longer really hold true. Or at least, there doesn't seem to be a good reason to continue to keep aspects disjointed and separated, at least that I'm aware of. I guess to put it another way, we have finite capacity/bandwidth, and anything we do to reduce hurdles, can help us further our work. > > There have been past attempts to implement self-describing APIs, > scraping running services in CI jobs, and so on. I don't recall the > details, but folks who were active in the API SIG likely have a > clearer recollection and can probably relate some of the challenges > encountered with various approaches. > -- > Jeremy Stanley From smooney at redhat.com Tue Jan 18 19:15:16 2022 From: smooney at redhat.com (Sean Mooney) Date: Tue, 18 Jan 2022 19:15:16 +0000 Subject: [nova] Instance Even Scheduling In-Reply-To: References: <4bc575f1c56c93b79677f8fc535049fa7b99a976.camel@redhat.com> Message-ID: <371c09db25789eb97c5f87f52aa6a8ceea400090.camel@redhat.com> On Tue, 2022-01-18 at 18:22 +0000, Tony Liu wrote: > While looking into it, found a bug on this line. It's in all branches. > > https://github.com/openstack/nova/blob/master/nova/weights.py#L51 > > It's supposed to return a list of normalized values. Here is the fix. > > - return ((i - minval) / range_ for i in weight_list) > + return ([(i - minval) / range_ for i in weight_list]) its currently returning generator of vaules your change just converts it to a list. i dont think this is a bug nessisarly the orginal code is more effince. it would only be an issue if the calling code tried to loop over it multiple times. looking at how its used https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/weights.py#L132-L140 a generator should work fine i belive. without your change. you shoudl not read too much into the fact the docstring says "Normalize the values in a list between 0 and 1.0." we are not using mypy and type hints in this par tof the code and its asl not using the doc strig syntax to refer to peratmer tyeps and return values. > > The negative weight is from build-failure weigher. > Looking into nova-compute... > > Again, is nova-compute supposed to provide available resource, instead of > total resource, for weigher to evaluate? in genergal yes we use the curreently avaiable resouce when weigheing but it depends on the weigher. quantitive weighers like ram disk and cpu will use the aviable capastiy rather then total. > > Thanks! > Tony > ________________________________________ > From: Tony Liu > Sent: January 17, 2022 06:27 PM > To: Sean Mooney; openstack-discuss at lists.openstack.org > Subject: Re: [nova] Instance Even Scheduling > > Hi, > > I enabled debug on nova-scheduler and launched 5 VMs. 8 hosts are returned > as valid hosts from filter. Here is the weight log. This is from Train release. > > This is for the first VM. > "ram" is the total memory. Is it supposed to be the available or consumed memory? > It's the same for all nodes because they all have the same spec. > "disk" is also the total. Because all compute nodes are using the same shared > Ceph storage, disk is the same for all nodes. > "instances" is the current number of instances on that node. > I don't see cpu. Is cpu weigher not there yet in Train? > Only compute-11 has positive weight, all others have negative weight. > How comes the weight is negative for other nodes? Given the logging, > they are all the same except for instances. > ================ > Weighed [WeighedHost [host: (compute-11, compute-11) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: 2.9901550710003333], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462 > ================ > > For the second VM. > ================ > Weighed [WeighedHost [host: (compute-11, compute-11) ram: 757535MB disk: 114565120MB io_ops: 1 instances: 6, weight: 1.9888744586443294], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462 > ================ > > Given above logging, compute-11 is always the winner of weight. It's just that > when weighing for the next VM, the "instances" of compute-11 bump up, all others > are the same. At the end, all 5 VMs are created on that same node. > > Is this all expected? > > Thanks! > Tony > ________________________________________ > From: Tony Liu > Sent: January 17, 2022 10:11 AM > To: Sean Mooney; openstack-discuss at lists.openstack.org > Subject: Re: [nova] Instance Even Scheduling > > That disk weigher is a good point. I am using Ceph as the storage backend > for all compute nodes. Disk weigher may not handle that properly and cause > some failure. Anyways, I will enable debug and look into more details. > > Thanks! > Tony > ________________________________________ > From: Sean Mooney > Sent: January 17, 2022 09:57 AM > To: Tony Liu; openstack-discuss at lists.openstack.org > Subject: Re: [nova] Instance Even Scheduling > > On Mon, 2022-01-17 at 17:45 +0000, Tony Liu wrote: > > I recall weight didn't work as what I expected, that's why I used > > shuffle_best_same_weighed_hosts. > > > > Here is what I experienced. > > With Ussuri and default Nova scheduling settings. All weighers are supposed > > to be enabled and all multipliers are positive. > > > yes by default all weighers are enabled and the shcduler spreads by default. > > On 10x empty compute nodes > > with the same spec, say the first vm is created on compute-2. Because some > > memory and vCPU are consumed, the second vm should be created on some > > node other than compute-2, if weighers are working fine. But it's still created > > on compute-2, until I increased host_subset_size and enable shuffle_best_same_weighed_hosts. > > > i would guess that either the disk weigher or failed build wiehter is likely what results in teh behaivor different > the default behavior is still to speread. before assuming there is a but you shoudl enable the schduler > in debug mode to look at the weighters that are assinged to each host and determin why you are seeing differnt behavior. > > > shuffle_best_same_weighed_hosts does as the name suggest. it shuffles the result if and only if there is a tie. > that means it will only have a effect if 2 hosts were judged by thge weigher as beeing equally good candiates. > > host_subset_size instalead of looking at only the top host in the list enables you to consider the top n hosts. > > host_subset_size does a random selection from the host_subset_size top element after the hosts are sorted by the weighers > intentionlaly adding randomness to the selection. > this should not be needed in general. > > > > It seems that all compute nodes are equally > > weighted, although they don't have the same amount of resource. > > Am I missing anything there? > > > > Thanks! > > Tony > > ________________________________________ > > From: Sean Mooney > > Sent: January 17, 2022 09:06 AM > > To: openstack-discuss at lists.openstack.org > > Subject: Re: [nova] Instance Even Scheduling > > > > On Mon, 2022-01-17 at 16:35 +0000, Tony Liu wrote: > > > https://docs.openstack.org/nova/latest/admin/scheduling.html > > > > > > Filter gives you a group of valid hosts, assuming they are equally weighted, > > > you may try with these two settings to pick up a host in a more even manner. > > > host_subset_size (increase the size) > > > shuffle_best_same_weighed_hosts (enable the shuffle) > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html > > > > yes the weighers are what will blance between the hosts and the filters determin which host are valid > > so if you want to spread based on ram then you need to adject the > > https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.ram_weight_multiplier > > > > for example set ram_weight_multiplier=10.0 to make it relitivly more important. > > the way the weigher work is all wheigher calulate the weight for a host, > > we then add them after multiplying them by the weights and then sort. > > > > > > > > > > > > > Tony > > > ________________________________________ > > > From: Ammad Syed > > > Sent: January 16, 2022 11:53 PM > > > To: openstack-discuss > > > Subject: [nova] Instance Even Scheduling > > > > > > Hi, > > > > > > I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it. > > > > > > enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter > > > > > > I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts. > > > > > > - Ammad > > > > > > > > > > > > > > > > > > > From juliaashleykreger at gmail.com Tue Jan 18 19:23:59 2022 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 18 Jan 2022 11:23:59 -0800 Subject: [ironic] [release] RFC: stop doing bugfix branches for Bifrost? In-Reply-To: References: Message-ID: +1, drop the bugfix branches on bifrost. There are two cases where we've seen people want or need to use *stable* branches in bifrost. 1) "I want to run some precise stable branch of all the things because surely the stable branch will have every fix for better experience." 2) "I want to run a precise version and need behavior which has been removed in newer releases. Only the latter has really been a case where they have *had* to use a stable branch of bifrost, since bifrost has long supported specific branch/tag overrides for what to install from source. The same capability has often allowed those with the fromer desire to tune exactly what they want/desire if they know they need that aspect. -Julia On Tue, Jan 18, 2022 at 11:13 AM Dmitry Tantsur wrote: > > Hi team! > > Some time ago we introduced bugfix/X.Y branches [1] to some of the Ironic projects. This has worked pretty well and has been very helpful in ironic/inspector/IPA, but I have second thoughts about Bifrost. > > First, maintaining Bifrost branches is tedious enough because of how many distros we support and how quickly they change. > > Second, our recommended approach to using Bifrost is to git-clone master and work from it. I'm honestly unsure if the regular stable branches are used (outside of the Kolla CI), let alone bugfix branches. (I also doubt that Bifrost releases are very popular or even meaningful, but that's another topic.) > > As one of few people who is maintaining bugfix branches, I suggest we stop making them for Bifrost and switch Bifrost back to normal cycle-with-intermediaries. We can keep releasing 3x per cycle, just to have checkpoints, but only create "normal" stable branches. > > Thoughts? > > Dmitry > > [1] https://specs.openstack.org/openstack/ironic-specs/specs/approved/new-release-model.html > > -- > Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, > Commercial register: Amtsgericht Muenchen, HRB 153243, > Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill From tonyliu0592 at hotmail.com Tue Jan 18 19:41:19 2022 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Tue, 18 Jan 2022 19:41:19 +0000 Subject: [nova] Instance Even Scheduling In-Reply-To: <371c09db25789eb97c5f87f52aa6a8ceea400090.camel@redhat.com> References: <4bc575f1c56c93b79677f8fc535049fa7b99a976.camel@redhat.com> <371c09db25789eb97c5f87f52aa6a8ceea400090.camel@redhat.com> Message-ID: Thank you Sean for all the comments! Returning the generator is not a bug at all. I thought it was related to those negative value, which is actually caused by build-failure weigher. Eventually, setting build_failure_weight_multiplier to 0 makes it work as expected. It's just not easy to see the build failure until turn on debug. And no easy way to clear the failure until restart nova-compute. Thanks again! Tony ________________________________________ From: Sean Mooney Sent: January 18, 2022 11:15 AM To: Tony Liu; openstack-discuss at lists.openstack.org Subject: Re: [nova] Instance Even Scheduling On Tue, 2022-01-18 at 18:22 +0000, Tony Liu wrote: > While looking into it, found a bug on this line. It's in all branches. > > https://github.com/openstack/nova/blob/master/nova/weights.py#L51 > > It's supposed to return a list of normalized values. Here is the fix. > > - return ((i - minval) / range_ for i in weight_list) > + return ([(i - minval) / range_ for i in weight_list]) its currently returning generator of vaules your change just converts it to a list. i dont think this is a bug nessisarly the orginal code is more effince. it would only be an issue if the calling code tried to loop over it multiple times. looking at how its used https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/weights.py#L132-L140 a generator should work fine i belive. without your change. you shoudl not read too much into the fact the docstring says "Normalize the values in a list between 0 and 1.0." we are not using mypy and type hints in this par tof the code and its asl not using the doc strig syntax to refer to peratmer tyeps and return values. > > The negative weight is from build-failure weigher. > Looking into nova-compute... > > Again, is nova-compute supposed to provide available resource, instead of > total resource, for weigher to evaluate? in genergal yes we use the curreently avaiable resouce when weigheing but it depends on the weigher. quantitive weighers like ram disk and cpu will use the aviable capastiy rather then total. > > Thanks! > Tony > ________________________________________ > From: Tony Liu > Sent: January 17, 2022 06:27 PM > To: Sean Mooney; openstack-discuss at lists.openstack.org > Subject: Re: [nova] Instance Even Scheduling > > Hi, > > I enabled debug on nova-scheduler and launched 5 VMs. 8 hosts are returned > as valid hosts from filter. Here is the weight log. This is from Train release. > > This is for the first VM. > "ram" is the total memory. Is it supposed to be the available or consumed memory? > It's the same for all nodes because they all have the same spec. > "disk" is also the total. Because all compute nodes are using the same shared > Ceph storage, disk is the same for all nodes. > "instances" is the current number of instances on that node. > I don't see cpu. Is cpu weigher not there yet in Train? > Only compute-11 has positive weight, all others have negative weight. > How comes the weight is negative for other nodes? Given the logging, > they are all the same except for instances. > ================ > Weighed [WeighedHost [host: (compute-11, compute-11) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: 2.9901550710003333], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462 > ================ > > For the second VM. > ================ > Weighed [WeighedHost [host: (compute-11, compute-11) ram: 757535MB disk: 114565120MB io_ops: 1 instances: 6, weight: 1.9888744586443294], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462 > ================ > > Given above logging, compute-11 is always the winner of weight. It's just that > when weighing for the next VM, the "instances" of compute-11 bump up, all others > are the same. At the end, all 5 VMs are created on that same node. > > Is this all expected? > > Thanks! > Tony > ________________________________________ > From: Tony Liu > Sent: January 17, 2022 10:11 AM > To: Sean Mooney; openstack-discuss at lists.openstack.org > Subject: Re: [nova] Instance Even Scheduling > > That disk weigher is a good point. I am using Ceph as the storage backend > for all compute nodes. Disk weigher may not handle that properly and cause > some failure. Anyways, I will enable debug and look into more details. > > Thanks! > Tony > ________________________________________ > From: Sean Mooney > Sent: January 17, 2022 09:57 AM > To: Tony Liu; openstack-discuss at lists.openstack.org > Subject: Re: [nova] Instance Even Scheduling > > On Mon, 2022-01-17 at 17:45 +0000, Tony Liu wrote: > > I recall weight didn't work as what I expected, that's why I used > > shuffle_best_same_weighed_hosts. > > > > Here is what I experienced. > > With Ussuri and default Nova scheduling settings. All weighers are supposed > > to be enabled and all multipliers are positive. > > > yes by default all weighers are enabled and the shcduler spreads by default. > > On 10x empty compute nodes > > with the same spec, say the first vm is created on compute-2. Because some > > memory and vCPU are consumed, the second vm should be created on some > > node other than compute-2, if weighers are working fine. But it's still created > > on compute-2, until I increased host_subset_size and enable shuffle_best_same_weighed_hosts. > > > i would guess that either the disk weigher or failed build wiehter is likely what results in teh behaivor different > the default behavior is still to speread. before assuming there is a but you shoudl enable the schduler > in debug mode to look at the weighters that are assinged to each host and determin why you are seeing differnt behavior. > > > shuffle_best_same_weighed_hosts does as the name suggest. it shuffles the result if and only if there is a tie. > that means it will only have a effect if 2 hosts were judged by thge weigher as beeing equally good candiates. > > host_subset_size instalead of looking at only the top host in the list enables you to consider the top n hosts. > > host_subset_size does a random selection from the host_subset_size top element after the hosts are sorted by the weighers > intentionlaly adding randomness to the selection. > this should not be needed in general. > > > > It seems that all compute nodes are equally > > weighted, although they don't have the same amount of resource. > > Am I missing anything there? > > > > Thanks! > > Tony > > ________________________________________ > > From: Sean Mooney > > Sent: January 17, 2022 09:06 AM > > To: openstack-discuss at lists.openstack.org > > Subject: Re: [nova] Instance Even Scheduling > > > > On Mon, 2022-01-17 at 16:35 +0000, Tony Liu wrote: > > > https://docs.openstack.org/nova/latest/admin/scheduling.html > > > > > > Filter gives you a group of valid hosts, assuming they are equally weighted, > > > you may try with these two settings to pick up a host in a more even manner. > > > host_subset_size (increase the size) > > > shuffle_best_same_weighed_hosts (enable the shuffle) > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html > > > > yes the weighers are what will blance between the hosts and the filters determin which host are valid > > so if you want to spread based on ram then you need to adject the > > https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.ram_weight_multiplier > > > > for example set ram_weight_multiplier=10.0 to make it relitivly more important. > > the way the weigher work is all wheigher calulate the weight for a host, > > we then add them after multiplying them by the weights and then sort. > > > > > > > > > > > > > Tony > > > ________________________________________ > > > From: Ammad Syed > > > Sent: January 16, 2022 11:53 PM > > > To: openstack-discuss > > > Subject: [nova] Instance Even Scheduling > > > > > > Hi, > > > > > > I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it. > > > > > > enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter > > > > > > I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts. > > > > > > - Ammad > > > > > > > > > > > > > > > > > > > From fungi at yuggoth.org Tue Jan 18 19:54:17 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 18 Jan 2022 19:54:17 +0000 Subject: [Ironic][docs][api-sig] Moving API Documentation closer to code In-Reply-To: References: <20220118184558.l4bkh2xtvlf5j6qe@yuggoth.org> Message-ID: <20220118195417.kjxhwdyvkiduajj2@yuggoth.org> On 2022-01-18 11:14:59 -0800 (-0800), Julia Kreger wrote: > On Tue, Jan 18, 2022 at 10:49 AM Jeremy Stanley wrote: > > > > On 2022-01-18 23:21:21 +0500 (+0500), Mahnoor Asghar wrote: > > [...] > > > I am an Outreachy intern, working with OpenStack Ironic on improving > > > their API reference documentation mechanism. > > [...] > > > > Welcome! > > > > > The wider OpenStack community may be facing similar problems > > > maintaining their documentation, and it would be helpful to > > > understand their perspective and concerns - with respect to using > > > such an extension, their expectations from it, collaborating on > > > it, and whether they have additional ideas about how we may bring > > > API documentation closer to the code. > > [...] > > > > Keep in mind that developers of OpenStack's services have > > traditionally held the belief that a good public API should be > > versioned independently from the software which implements it. One > > version of the software may provide multiple API versions, and > > there's not necessarily a 1:1 correlation between public REST API > > methods and the internal Python API methods which implement them. I > > suspect this has driven some of the desire to separate REST API > > documentation from the source code itself. > > And the API version contracts we've tended to offer has been a very > linear progression matching the software versions, for the most part. > I think as the groups of maintainers have evolved, I believe the > traditional belief may no longer really hold true. Or at least, there > doesn't seem to be a good reason to continue to keep aspects > disjointed and separated, at least that I'm aware of. I guess to put > it another way, we have finite capacity/bandwidth, and anything we do > to reduce hurdles, can help us further our work. [...] Absolutely, that's why I qualified it with "traditionally" (many may no longer feel this way, or the people who made those choices may no longer be around/involved in their respective projects). I'm in favor of the Ironic team pursuing whatever makes documenting easier, just noting that the approach may not be applicable OpenStack-wide for these or other reasons. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From iurygregory at gmail.com Tue Jan 18 20:03:23 2022 From: iurygregory at gmail.com (Iury Gregory) Date: Tue, 18 Jan 2022 17:03:23 -0300 Subject: [ironic] [release] RFC: stop doing bugfix branches for Bifrost? In-Reply-To: References: Message-ID: Thanks for starting this Dmitry! +1 to drop the bugfix branches for Bifrost. Em ter., 18 de jan. de 2022 ?s 16:24, Julia Kreger < juliaashleykreger at gmail.com> escreveu: > +1, drop the bugfix branches on bifrost. > > There are two cases where we've seen people want or need to use > *stable* branches in bifrost. > > 1) "I want to run some precise stable branch of all the things because > surely the stable branch will have every fix for better experience." > 2) "I want to run a precise version and need behavior which has been > removed in newer releases. > > Only the latter has really been a case where they have *had* to use a > stable branch of bifrost, since bifrost has long supported specific > branch/tag overrides for what to install from source. The same > capability has often allowed those with the fromer desire to tune > exactly what they want/desire if they know they need that aspect. > > -Julia > > On Tue, Jan 18, 2022 at 11:13 AM Dmitry Tantsur > wrote: > > > > Hi team! > > > > Some time ago we introduced bugfix/X.Y branches [1] to some of the > Ironic projects. This has worked pretty well and has been very helpful in > ironic/inspector/IPA, but I have second thoughts about Bifrost. > > > > First, maintaining Bifrost branches is tedious enough because of how > many distros we support and how quickly they change. > > > > Second, our recommended approach to using Bifrost is to git-clone master > and work from it. I'm honestly unsure if the regular stable branches are > used (outside of the Kolla CI), let alone bugfix branches. (I also doubt > that Bifrost releases are very popular or even meaningful, but that's > another topic.) > > > > As one of few people who is maintaining bugfix branches, I suggest we > stop making them for Bifrost and switch Bifrost back to normal > cycle-with-intermediaries. We can keep releasing 3x per cycle, just to have > checkpoints, but only create "normal" stable branches. > I loved this idea! > > Thoughts? > > > > Dmitry > > > > [1] > https://specs.openstack.org/openstack/ironic-specs/specs/approved/new-release-model.html > > > > -- > > Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, > > Commercial register: Amtsgericht Muenchen, HRB 153243, > > Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael > O'Neill > > -- *Att[]'sIury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Part of the ironic-core and puppet-manager-core team in OpenStack* *Software Engineer at Red Hat Czech* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue Jan 18 20:11:16 2022 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 18 Jan 2022 20:11:16 +0000 Subject: [ironic] [release] RFC: stop doing bugfix branches for Bifrost? In-Reply-To: References: Message-ID: On Tue, 18 Jan 2022 at 19:27, Julia Kreger wrote: > > +1, drop the bugfix branches on bifrost. > > There are two cases where we've seen people want or need to use > *stable* branches in bifrost. > > 1) "I want to run some precise stable branch of all the things because > surely the stable branch will have every fix for better experience." > 2) "I want to run a precise version and need behavior which has been > removed in newer releases. > > Only the latter has really been a case where they have *had* to use a > stable branch of bifrost, since bifrost has long supported specific > branch/tag overrides for what to install from source. The same > capability has often allowed those with the fromer desire to tune > exactly what they want/desire if they know they need that aspect. Kayobe consumes the bifrost stable branches. We often get bitten by changes to bifrost (and its deps) in master, and the stable branches shield us from this while allowing for bug fixes. Secondly, while the version of bifrost & its dependencies isn't tied to those of the cloud infrastructure, it does simplify things somewhat to be able to say everything is running code from series X. We don't use the bugfix branches. Mark > > -Julia > > On Tue, Jan 18, 2022 at 11:13 AM Dmitry Tantsur wrote: > > > > Hi team! > > > > Some time ago we introduced bugfix/X.Y branches [1] to some of the Ironic projects. This has worked pretty well and has been very helpful in ironic/inspector/IPA, but I have second thoughts about Bifrost. > > > > First, maintaining Bifrost branches is tedious enough because of how many distros we support and how quickly they change. > > > > Second, our recommended approach to using Bifrost is to git-clone master and work from it. I'm honestly unsure if the regular stable branches are used (outside of the Kolla CI), let alone bugfix branches. (I also doubt that Bifrost releases are very popular or even meaningful, but that's another topic.) > > > > As one of few people who is maintaining bugfix branches, I suggest we stop making them for Bifrost and switch Bifrost back to normal cycle-with-intermediaries. We can keep releasing 3x per cycle, just to have checkpoints, but only create "normal" stable branches. > > > > Thoughts? > > > > Dmitry > > > > [1] https://specs.openstack.org/openstack/ironic-specs/specs/approved/new-release-model.html > > > > -- > > Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, > > Commercial register: Amtsgericht Muenchen, HRB 153243, > > Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill > From jp.methot at planethoster.info Tue Jan 18 21:27:46 2022 From: jp.methot at planethoster.info (J-P Methot) Date: Tue, 18 Jan 2022 16:27:46 -0500 Subject: [kolla] Calling Kolla-ansible in an ansible playbook Message-ID: Hi, I've been trying to write an ansible-playbook to run some kolla tasks and I've been wondering if this is even possible? I have kolla running inside a virtual environment, so of course? I'm running into issues having ansible running it. The current task looks like this: ?- name: run kolla-ansible upgrade on controllers ?? command: "{{ venv }}/bin/kolla-ansible --limit {{ item }} -i multinode upgrade" ?? loop: ???? - control ???? - monitoring ???? - storage When run, it's complaining that ansible isn't installed in the current virtual environment. I was under the impression that specifying the whole path to my executable would have it automatically run in that virtual environment, but visibly this excludes access to ansible that's installed there. So, basically, is what I'm trying to do even possible? And if so, what would be the best way to do it? -- Jean-Philippe M?thot Senior Openstack system administrator Administrateur syst?me Openstack s?nior PlanetHoster inc. From fungi at yuggoth.org Tue Jan 18 21:52:49 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 18 Jan 2022 21:52:49 +0000 Subject: [neutron] CI meeting agenda In-Reply-To: <20220118134954.5a77amtpuraaewqi@yuggoth.org> References: <11906918.O9o76ZdvQC@p1> <20220118134954.5a77amtpuraaewqi@yuggoth.org> Message-ID: <20220118215248.whixnav6t2i5agws@yuggoth.org> On 2022-01-18 13:49:54 +0000 (+0000), Jeremy Stanley wrote: > On 2022-01-18 14:34:35 +0100 (+0100), Slawek Kaplonski wrote: > [...] > > as meetpad.opendev.org seems that is still disabled. > [...] > > Yes, apologies, we're still waiting for them to release new images, > but will consider alternative solutions if this goes on much longer. And now it's back up and running again. New images appeared moments too late to have it going before the meeting, unfortunately. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From eblock at nde.ag Tue Jan 18 21:55:13 2022 From: eblock at nde.ag (Eugen Block) Date: Tue, 18 Jan 2022 21:55:13 +0000 Subject: [horizon] Missing button "create user" in dashboard In-Reply-To: <20220117125736.Horde.djFrXoev-vMIrVXie5UnauE@webmail.nde.ag> Message-ID: <20220118215513.Horde.KLjlMWNK1pXFnPIirnCzBOj@webmail.nde.ag> The issue is resolved, apparently I didn?t have the required changes in local_settings.py in the keystone backend section. Zitat von Eugen Block : > Hi *, > > I have a fresh Victoria installation where the "create user" button > is missing in the dashboard. Creating users via CLI works fine, also > when I go to https://controller/identity/users/create I see the > dialog to create a new user. I see the same in an Ussuri installation. > The clouds are deployed manually (we have our own Salt mechanism) on > baremetal, the OS is openSUSE Leap 15.2. > What I have tried so far is to apply the keystonev3_policy.json from > [1], but without success. There are no other custom policies > applied, most configs are on default. > There are no obvious errors in the dashboard logs, and before I > change too much I wanted to ask for your advice. Can anyone help me > out? > Please let me know if you need more information. > > Thanks in advance, > Eugen > > [1] https://bugs.launchpad.net/charm-openstack-dashboard/+bug/1775224 From marc-antoine.godde at student-cs.fr Tue Jan 18 21:56:24 2022 From: marc-antoine.godde at student-cs.fr (Marc-Antoine Godde (Student at CentraleSupelec)) Date: Tue, 18 Jan 2022 21:56:24 +0000 Subject: Problem with ressource provider Message-ID: Hello, In our cluster, we have 4 computes running and we have an issue with the number 4. We can't create VMs on it, we can't migrate VMs to or from that node. VMs are still perfectly working though. After a first diagnosis, it appears that there's a problem with the ressource provider. Node is declared in the db with: - name: os-compute-4, uuid: d12ea77b-d678-40ce-a813-d8094cabbbd8 Here are the ressource provider: - name: os-compute-4, uuid: a9dc2a56-5b2d-49b1-ac47-6d996d2d029a - name: os-compute-4.openstack.local, uuid: d12ea776-d678-40ce-a813-d8094cabbbd8 In our opinion, os-compute-4.openstack.local shouldn't be there at all. We want to destroy both of the ressource provider and recreate one. I must also precise that os-compute-4 ressource provider has 0 allocation and os-compute-4.openstack.local only 3 (there?s at least 50 VMs running on it?). Moreover, for these 3 allocations, the server uuid doesn't correspond to any existing VMs. Overall, none of the VMs has a ressource allocation on os-compute-4. We found the command nova-manage placement heal_allocations on the Internet but we can't find it in any container, maybe deprecated ? The cluster is running Ussuri installed with Openstack-ansible. If you have any suggestion, any help would be appreciated. Thanks. :) Best, Marc-Antoine Godde -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbaker at redhat.com Tue Jan 18 22:10:58 2022 From: sbaker at redhat.com (Steve Baker) Date: Wed, 19 Jan 2022 11:10:58 +1300 Subject: [Ironic][docs][api-sig] Moving API Documentation closer to code In-Reply-To: <20220118184558.l4bkh2xtvlf5j6qe@yuggoth.org> References: <20220118184558.l4bkh2xtvlf5j6qe@yuggoth.org> Message-ID: <29553f81-cc4e-3461-8517-190ac48ff7c9@redhat.com> On 19/01/22 07:45, Jeremy Stanley wrote: > On 2022-01-18 23:21:21 +0500 (+0500), Mahnoor Asghar wrote: > [...] >> I am an Outreachy intern, working with OpenStack Ironic on improving >> their API reference documentation mechanism. > [...] > > Welcome! > >> The wider OpenStack community may be facing similar problems >> maintaining their documentation, and it would be helpful to >> understand their perspective and concerns - with respect to using >> such an extension, their expectations from it, collaborating on >> it, and whether they have additional ideas about how we may bring >> API documentation closer to the code. > [...] > > Keep in mind that developers of OpenStack's services have > traditionally held the belief that a good public API should be > versioned independently from the software which implements it. One > version of the software may provide multiple API versions, and > there's not necessarily a 1:1 correlation between public REST API > methods and the internal Python API methods which implement them. I > suspect this has driven some of the desire to separate REST API > documentation from the source code itself. > > There have been past attempts to implement self-describing APIs, > scraping running services in CI jobs, and so on. I don't recall the > details, but folks who were active in the API SIG likely have a > clearer recollection and can probably relate some of the challenges > encountered with various approaches. This proposal is mostly about moving the existing documentation to docstrings rather than introspecting the functions/methods themselves to auto-generate. There may be some auto-generate potential for parameters and output schemas, but whatever is done needs to be able to document the API version evolution. The benefits of this is more about giving developers proximity to the documentation of the REST API they're changing, making discrepancies more obvious, and giving reviewers a clear view on whether a change is documented correctly. From tonyliu0592 at hotmail.com Tue Jan 18 22:17:46 2022 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Tue, 18 Jan 2022 22:17:46 +0000 Subject: Problem with ressource provider In-Reply-To: References: Message-ID: It would be easier to check resource provider by openstack cli, than looking into db. What's the name, short or FDQN, used by other compute nodes? Restart nova-compute and look into log, see which name is used to register resource provider. Tony ________________________________________ From: Marc-Antoine Godde (Student at CentraleSupelec) Sent: January 18, 2022 01:56 PM To: openstack-discuss at lists.openstack.org Subject: Problem with ressource provider Hello, In our cluster, we have 4 computes running and we have an issue with the number 4. We can't create VMs on it, we can't migrate VMs to or from that node. VMs are still perfectly working though. After a first diagnosis, it appears that there's a problem with the ressource provider. Node is declared in the db with: - name: os-compute-4, uuid: d12ea77b-d678-40ce-a813-d8094cabbbd8 Here are the ressource provider: - name: os-compute-4, uuid: a9dc2a56-5b2d-49b1-ac47-6d996d2d029a - name: os-compute-4.openstack.local, uuid: d12ea776-d678-40ce-a813-d8094cabbbd8 In our opinion, os-compute-4.openstack.local shouldn't be there at all. We want to destroy both of the ressource provider and recreate one. I must also precise that os-compute-4 ressource provider has 0 allocation and os-compute-4.openstack.local only 3 (there?s at least 50 VMs running on it?). Moreover, for these 3 allocations, the server uuid doesn't correspond to any existing VMs. Overall, none of the VMs has a ressource allocation on os-compute-4. We found the command nova-manage placement heal_allocations on the Internet but we can't find it in any container, maybe deprecated ? The cluster is running Ussuri installed with Openstack-ansible. If you have any suggestion, any help would be appreciated. Thanks. :) Best, Marc-Antoine Godde From satish.txt at gmail.com Tue Jan 18 22:30:40 2022 From: satish.txt at gmail.com (Satish Patel) Date: Tue, 18 Jan 2022 17:30:40 -0500 Subject: Tesla V100 32G GPU with openstack In-Reply-To: References: <7CCE77EA-F7EA-45AF-85B5-3566D1DAB1CB@gmail.com> Message-ID: Thank you for the information. I have a quick question. [root at gpu01 ~]# lspci | grep -i nv 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] (rev a1) d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] (rev a1) In the above output showing two cards does that mean they are physical two or just BUS representation. Also i have the following entry in openstack flavor, does :1 means first GPU card? {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"} On Tue, Jan 18, 2022 at 5:55 AM Ant?nio Paulo wrote: > > Hey Satish, Gustavo, > > Just to clarify a bit on point 3, you will have to buy a vGPU license > per card and this gives you access to all the downloads you need through > NVIDIA's web dashboard -- both the host and guest drivers as well as the > license server setup files. > > Cheers, > Ant?nio > > On 18/01/22 02:46, Satish Patel wrote: > > Thank you so much! This is what I was looking for. It is very odd that > > we buy a pricey card but then we have to buy a license to make those > > features available. > > > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos > > wrote: > >> > >> Hello, Satish. > >> > >> I've been working with vGPU lately and I believe I can answer your > >> questions: > >> > >> 1. As you pointed out in question #2, the pci-passthrough will allocate > >> the entire physical GPU to one single guest VM, while vGPU allows you to > >> spawn from 1 to several VMs using the same physical GPU, depending on > >> the vGPU type you choose (check NVIDIA docs to see which vGPU types the > >> Tesla V100 supports and their properties); > >> 2. Correct; > >> 3. To use vGPU, you need vGPU drivers installed on the platform where > >> your deployment of OpenStack is running AND in the VMs, so there are two > >> drivers to be installed in order to use the feature. I believe both of > >> them have to be purchased from NVIDIA in order to be used, and you would > >> also have to deploy an NVIDIA licensing server in order to validate the > >> licenses of the drivers running in the VMs. > >> 4. You can see what the instructions are for each of these scenarios in > >> [1] and [2]. > >> > >> There is also extensive documentation on vGPU at NVIDIA's website [3]. > >> > >> [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html > >> [2] https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html > >> [3] https://docs.nvidia.com/grid/13.0/index.html > >> > >> Regards, > >> Gustavo. > >> > >> On 17/01/2022 14:41, Satish Patel wrote: > >>> [Please note: This e-mail is from an EXTERNAL e-mail address] > >>> > >>> Folk, > >>> > >>> We have Tesla V100 32G GPU and I?m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question. > >>> > >>> 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet. > >>> 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct? > >>> 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here? > >>> 3. What are the config difference between configure this card with passthrough vs vGPU? > >>> > >>> > >>> Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ? > >>> > >>> Sent from my iPhone > >>> > > From marc-antoine.godde at student-cs.fr Tue Jan 18 22:37:23 2022 From: marc-antoine.godde at student-cs.fr (Marc-Antoine Godde (Student at CentraleSupelec)) Date: Tue, 18 Jan 2022 22:37:23 +0000 Subject: Problem with ressource provider In-Reply-To: References: <4D26E50B-FA85-4CD1-A1C6-77C35C2EF4FB@student-cs.fr> Message-ID: I will do so. Should I let nova recreate the provider by restarting nova and then manually add allocation to VMs on the new provider ? Marc-Antoine > Le 18 janv. 2022 ? 23:30, Tony Liu a ?crit : > > Check /etc/hosts and /etc/hostname in all 4 compute nodes and ensure they > are consistent. > > Check nova-compute logging to see which name is used as the provider. > > If you are sure the provide is safe to be deleted, you can remove it by openstack cli. > > > Tony > > ________________________________________ > From: Marc-Antoine Godde (Student at CentraleSupelec) > Sent: January 18, 2022 02:25 PM > To: Tony Liu > Cc: openstack-discuss at lists.openstack.org > Subject: Re: Problem with ressource provider > > Here is what we get. :) > > Thanks for your help > > [cid:9921FF69-0B8A-456C-969F-2261C88048E0] > > Le 18 janv. 2022 ? 23:17, Tony Liu > a ?crit : > > It would be easier to check resource provider by openstack cli, than looking > into db. > > What's the name, short or FDQN, used by other compute nodes? > Restart nova-compute and look into log, see which name is used > to register resource provider. > > > Tony > ________________________________________ > From: Marc-Antoine Godde (Student at CentraleSupelec) > > Sent: January 18, 2022 01:56 PM > To: openstack-discuss at lists.openstack.org > Subject: Problem with ressource provider > > Hello, > > > In our cluster, we have 4 computes running and we have an issue with the number 4. > > > We can't create VMs on it, we can't migrate VMs to or from that node. VMs are still perfectly working though. After a first diagnosis, it appears that there's a problem with the ressource provider. > > Node is declared in the db with: > - name: os-compute-4, uuid: d12ea77b-d678-40ce-a813-d8094cabbbd8 > > > > Here are the ressource provider: > > - name: os-compute-4, uuid: a9dc2a56-5b2d-49b1-ac47-6d996d2d029a > > - name: os-compute-4.openstack.local, uuid: d12ea776-d678-40ce-a813-d8094cabbbd8 > > > In our opinion, os-compute-4.openstack.local shouldn't be there at all. We want to destroy both of the ressource provider and recreate one. > > > I must also precise that os-compute-4 ressource provider has 0 allocation and os-compute-4.openstack.local only 3 (there?s at least 50 VMs running on it?). Moreover, for these 3 allocations, the server uuid doesn't correspond to any existing VMs. Overall, none of the VMs has a ressource allocation on os-compute-4. > > > We found the command nova-manage placement heal_allocations on the Internet but we can't find it in any container, maybe deprecated ? The cluster is running Ussuri installed with Openstack-ansible. > > If you have any suggestion, any help would be appreciated. Thanks. :) > > Best, > Marc-Antoine Godde > > From fungi at yuggoth.org Tue Jan 18 23:03:40 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 18 Jan 2022 23:03:40 +0000 Subject: [Ironic][docs][api-sig] Moving API Documentation closer to code In-Reply-To: <29553f81-cc4e-3461-8517-190ac48ff7c9@redhat.com> References: <20220118184558.l4bkh2xtvlf5j6qe@yuggoth.org> <29553f81-cc4e-3461-8517-190ac48ff7c9@redhat.com> Message-ID: <20220118230340.5tbadv5fzk5gu6hi@yuggoth.org> On 2022-01-19 11:10:58 +1300 (+1300), Steve Baker wrote: [...] > This proposal is mostly about moving the existing documentation to > docstrings rather than introspecting the functions/methods > themselves to auto-generate. There may be some auto-generate > potential for parameters and output schemas, but whatever is done > needs to be able to document the API version evolution. The > benefits of this is more about giving developers proximity to the > documentation of the REST API they're changing, making > discrepancies more obvious, and giving reviewers a clear view on > whether a change is documented correctly. Correct, I don't recall anyone having tried exactly that approach yet. It's more akin to what you'd do for documenting a Python library. I was merely mentioning what other solutions had been attempted, as an explanation for why I was adding the [api-sig] subject tag. Apologies if that was unclear. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From massimo.sgaravatto at gmail.com Wed Jan 19 06:53:57 2022 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Wed, 19 Jan 2022 07:53:57 +0100 Subject: Tesla V100 32G GPU with openstack In-Reply-To: References: <7CCE77EA-F7EA-45AF-85B5-3566D1DAB1CB@gmail.com> Message-ID: If I am not wrong those are 2 GPUs "tesla-v100:1" means 1 GPU So e.g. a flavor with "pci_passthrough:alias": "tesla-v100:2"} will be used to create an instance with 2 GPUs Cheers, Massimo On Tue, Jan 18, 2022 at 11:35 PM Satish Patel wrote: > Thank you for the information. I have a quick question. > > [root at gpu01 ~]# lspci | grep -i nv > 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe > 32GB] (rev a1) > d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe > 32GB] (rev a1) > > In the above output showing two cards does that mean they are physical > two or just BUS representation. > > Also i have the following entry in openstack flavor, does :1 means > first GPU card? > > {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"} > > > > > > > On Tue, Jan 18, 2022 at 5:55 AM Ant?nio Paulo > wrote: > > > > Hey Satish, Gustavo, > > > > Just to clarify a bit on point 3, you will have to buy a vGPU license > > per card and this gives you access to all the downloads you need through > > NVIDIA's web dashboard -- both the host and guest drivers as well as the > > license server setup files. > > > > Cheers, > > Ant?nio > > > > On 18/01/22 02:46, Satish Patel wrote: > > > Thank you so much! This is what I was looking for. It is very odd that > > > we buy a pricey card but then we have to buy a license to make those > > > features available. > > > > > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos > > > wrote: > > >> > > >> Hello, Satish. > > >> > > >> I've been working with vGPU lately and I believe I can answer your > > >> questions: > > >> > > >> 1. As you pointed out in question #2, the pci-passthrough will > allocate > > >> the entire physical GPU to one single guest VM, while vGPU allows you > to > > >> spawn from 1 to several VMs using the same physical GPU, depending on > > >> the vGPU type you choose (check NVIDIA docs to see which vGPU types > the > > >> Tesla V100 supports and their properties); > > >> 2. Correct; > > >> 3. To use vGPU, you need vGPU drivers installed on the platform where > > >> your deployment of OpenStack is running AND in the VMs, so there are > two > > >> drivers to be installed in order to use the feature. I believe both of > > >> them have to be purchased from NVIDIA in order to be used, and you > would > > >> also have to deploy an NVIDIA licensing server in order to validate > the > > >> licenses of the drivers running in the VMs. > > >> 4. You can see what the instructions are for each of these scenarios > in > > >> [1] and [2]. > > >> > > >> There is also extensive documentation on vGPU at NVIDIA's website [3]. > > >> > > >> [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html > > >> [2] > https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html > > >> [3] https://docs.nvidia.com/grid/13.0/index.html > > >> > > >> Regards, > > >> Gustavo. > > >> > > >> On 17/01/2022 14:41, Satish Patel wrote: > > >>> [Please note: This e-mail is from an EXTERNAL e-mail address] > > >>> > > >>> Folk, > > >>> > > >>> We have Tesla V100 32G GPU and I?m trying to configure with > openstack wallaby. This is first time dealing with GPU so I have couple of > question. > > >>> > > >>> 1. What is the difference between passthrough vs vGPU? I did google > but not very clear yet. > > >>> 2. If I configure it passthrough then does it only work with single > VM ? ( I meant whole GPU will get allocate to single VM correct? > > >>> 3. Also some document saying Tesla v100 support vGPU but some folks > saying you need license. I have no idea where to get that license. What is > the deal here? > > >>> 3. What are the config difference between configure this card with > passthrough vs vGPU? > > >>> > > >>> > > >>> Currently I configure it with passthrough based one one article and > I am able to spun up with and I can see nvidia card exposed to vm. (I used > iommu and vfio based driver) so if this card support vGPU then do I need > iommu and vfio or some other driver to make it virtualize ? > > >>> > > >>> Sent from my iPhone > > >>> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Wed Jan 19 07:20:51 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Wed, 19 Jan 2022 08:20:51 +0100 Subject: [neutron] CI meeting agenda In-Reply-To: <20220118215248.whixnav6t2i5agws@yuggoth.org> References: <11906918.O9o76ZdvQC@p1> <20220118134954.5a77amtpuraaewqi@yuggoth.org> <20220118215248.whixnav6t2i5agws@yuggoth.org> Message-ID: <2091788.irdbgypaU6@p1> Hi, On wtorek, 18 stycznia 2022 22:52:49 CET Jeremy Stanley wrote: > On 2022-01-18 13:49:54 +0000 (+0000), Jeremy Stanley wrote: > > On 2022-01-18 14:34:35 +0100 (+0100), Slawek Kaplonski wrote: > > [...] > > > > > as meetpad.opendev.org seems that is still disabled. > > > > [...] > > > > Yes, apologies, we're still waiting for them to release new images, > > but will consider alternative solutions if this goes on much longer. > > And now it's back up and running again. New images appeared moments > too late to have it going before the meeting, unfortunately. Thx a lot. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From katonalala at gmail.com Wed Jan 19 08:23:39 2022 From: katonalala at gmail.com (Lajos Katona) Date: Wed, 19 Jan 2022 09:23:39 +0100 Subject: Can neutron-fwaas project be revived? In-Reply-To: <20220118175104.a6ppj2kxpijeztz7@yuggoth.org> References: <771f9e50a5f0498caecf3cb892902954@inspur.com> <17e6e183905.d179775f807310.479674362228455950@ghanshyammann.com> <20220118175104.a6ppj2kxpijeztz7@yuggoth.org> Message-ID: Hi, Thanks for the advice. The intention from the Neutron team was to make it clear that the team currently has no capacity to help the maintenance of neutron-fwaas, and can't help to maintain it. If there's easier ways for volunteers to keep it maintained other than forking it to x/ namespace that would be really helpful. Lajos Katona (lajoskatona) Jeremy Stanley ezt ?rta (id?pont: 2022. jan. 18., K, 18:58): > On 2022-01-18 10:49:48 -0600 (-0600), Ghanshyam Mann wrote: > [...] > > As discussed in project-config change[1], you or neutron folks can > > propose the retirement now itself (considering there is no one to > > maintain/release stable/victoria for new bug fixes) and TC will > > merge it as per process. After that, creating it in x/ namespace > > will be good to do. > [...] > > Looking at this from a logistical perspective, it's a fair amount of > churn in code hosting as well as unwelcoming to the new volunteers, > compared to just leaving the repository where it is now and letting > them contribute to it there. If the concern is that the Neutron team > doesn't want to retain responsibility for it while they evaluate the > conviction of the new maintainers for eventual re-inclusion, then > the TC would be well within its rights to declare that the > repository can remain in place while not having it be part of the > Neutron team's responsibilities. > > There are a number of possible solutions, ranging from making a new > category of provisional deliverable, to creating a lightweight > project team under the DPL model, to declaring it a pop-up team with > a TC-owned repository. There are repositories within the OpenStack > namespace which are not an official part of the OpenStack > coordinated release, after all. Solutions which don't involve having > the new work take place somewhere separate, and the work involved in > making that separate place, which will simply be closed down as > transient cruft if everything goes as desired. > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jahson.babel at cc.in2p3.fr Wed Jan 19 08:54:18 2022 From: jahson.babel at cc.in2p3.fr (Jahson Babel) Date: Wed, 19 Jan 2022 09:54:18 +0100 Subject: [VGPU] Multiple VGPUs on one VM Message-ID: <905b6f6e-68c2-43d3-1401-64946bab0443@cc.in2p3.fr> Hello folks, I've seen a thread on vGPUs on this mailing list lately and I was wondering if multiple vGPUs are supported on one VM on OpenStack. From the OpenStack's documentation this seems pretty clear that it's not. ( https://docs.openstack.org/nova/latest/admin/virtual-gpu.html) But on Nvidia's side multiple vGPUs? should be able to be possible on specific Linux KVM version. (https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-red-hat-el-kvm/index.html#multiple-vgpu-support) Only true for RHEL systems maybe. So which one of those pieces of information should I follow ? Presumably the OpenStack's side is the correct one but I would like to be sure. And does anyone has been able to attach multiples vgpus on one VM ? Another side question that I have in mind despite the fact that I can't see how it could possibly work. Is it possible to over-commit the vGPUs resources ? Via the placement API ? Thanks in advance. Kindly, Jahson Babel -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4261 bytes Desc: S/MIME Cryptographic Signature URL: From elfosardo at gmail.com Wed Jan 19 08:57:08 2022 From: elfosardo at gmail.com (Riccardo Pittau) Date: Wed, 19 Jan 2022 09:57:08 +0100 Subject: [ironic] [release] RFC: stop doing bugfix branches for Bifrost? In-Reply-To: References: Message-ID: +1 to stop creating and maintaining bugfix branches for bifrost it's been an interesting experiment but they're not really as useful as expected Riccardo On Tue, Jan 18, 2022 at 9:16 PM Mark Goddard wrote: > On Tue, 18 Jan 2022 at 19:27, Julia Kreger > wrote: > > > > +1, drop the bugfix branches on bifrost. > > > > There are two cases where we've seen people want or need to use > > *stable* branches in bifrost. > > > > 1) "I want to run some precise stable branch of all the things because > > surely the stable branch will have every fix for better experience." > > 2) "I want to run a precise version and need behavior which has been > > removed in newer releases. > > > > Only the latter has really been a case where they have *had* to use a > > stable branch of bifrost, since bifrost has long supported specific > > branch/tag overrides for what to install from source. The same > > capability has often allowed those with the fromer desire to tune > > exactly what they want/desire if they know they need that aspect. > > Kayobe consumes the bifrost stable branches. We often get bitten by > changes to bifrost (and its deps) in master, and the stable branches > shield us from this while allowing for bug fixes. > > Secondly, while the version of bifrost & its dependencies isn't tied > to those of the cloud infrastructure, it does simplify things somewhat > to be able to say everything is running code from series X. > > We don't use the bugfix branches. > > Mark > > > > > -Julia > > > > On Tue, Jan 18, 2022 at 11:13 AM Dmitry Tantsur > wrote: > > > > > > Hi team! > > > > > > Some time ago we introduced bugfix/X.Y branches [1] to some of the > Ironic projects. This has worked pretty well and has been very helpful in > ironic/inspector/IPA, but I have second thoughts about Bifrost. > > > > > > First, maintaining Bifrost branches is tedious enough because of how > many distros we support and how quickly they change. > > > > > > Second, our recommended approach to using Bifrost is to git-clone > master and work from it. I'm honestly unsure if the regular stable branches > are used (outside of the Kolla CI), let alone bugfix branches. (I also > doubt that Bifrost releases are very popular or even meaningful, but that's > another topic.) > > > > > > As one of few people who is maintaining bugfix branches, I suggest we > stop making them for Bifrost and switch Bifrost back to normal > cycle-with-intermediaries. We can keep releasing 3x per cycle, just to have > checkpoints, but only create "normal" stable branches. > > > > > > Thoughts? > > > > > > Dmitry > > > > > > [1] > https://specs.openstack.org/openstack/ironic-specs/specs/approved/new-release-model.html > > > > > > -- > > > Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, > > > Commercial register: Amtsgericht Muenchen, HRB 153243, > > > Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, > Michael O'Neill > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Wed Jan 19 09:16:45 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Wed, 19 Jan 2022 18:16:45 +0900 Subject: [heat-tempest-plugin] call for help from plugin's maintainers In-Reply-To: References: Message-ID: The issue triggered by gabbi 2.5.0 was temporarily resolved by pinning it to 2.4.0 . The gate is unblocked, I believe (We found an issue with a functional job for stable/train which is being fixed now). We can remove that pin once the issue with urllib3 is resolved. > Should we consider using only https for endpoint access in the future? Maybe ? But this is a separate topic, IMHO, and I don't have any strong opinion about this. There are several devstack plugins (like heat, aodh, ...) which don't support setting up tls-proxy for https endpoints yet and we need to fix each plugin if we take this direction. On Tue, Jan 18, 2022 at 10:21 PM Martin Kopec wrote: > Thank you Takashi for looking into this. > > Should we consider using only https for endpoint access in the future? > > On Tue, 18 Jan 2022 at 09:35, Takashi Kajinami > wrote: > >> I have opened an issue for urllib3[1], too, and created a PR to discuss a >> potential fix. >> [1] https://github.com/urllib3/urllib3/issues/2534 >> >> Because it'd take some time until we get feedback from these two >> communities, >> I've proposed a change to pin gabbi to 2.4.0[2]. >> [2] https://review.opendev.org/c/openstack/requirements/+/825044 >> >> The issue might affect other projects using gabbi as well, unless https, >> instead of http, >> is used for endpoint access. >> >> >> On Tue, Jan 18, 2022 at 11:42 AM Takashi Kajinami >> wrote: >> >>> I've looked into this but it seems the following error was actually >>> caused by the latest release of gabbi(2.5.0). >>> TypeError: __init__() got an unexpected keyword argument >>> 'server_hostname' >>> >>> I've reported that issue to gabbi in [1] but if my observation is >>> correct the problem should be >>> fixed in urllib3 which gabbi is dependent on. >>> [1] https://github.com/cdent/gabbi/issues/309 >>> >>> On Tue, Jan 18, 2022 at 2:34 AM Martin Kopec wrote: >>> >>>> Hello, >>>> >>>> most of the tests of heat-tempest-plugin have started failing. We >>>> noticed that in interop [1], however, we reproduced that in the project's >>>> gates as well [2]. >>>> >>>> I suspect it might be an issue with the new gabbi package - there has >>>> been an update recently. >>>> Could you please have a look. >>>> >>>> [1] >>>> https://review.opendev.org/c/openinfra/ansible-role-refstack-client/+/824832 >>>> [2] https://review.opendev.org/c/openstack/heat-tempest-plugin/+/823794 >>>> >>>> Thank you, >>>> -- >>>> Martin Kopec >>>> Senior Software Quality Engineer >>>> Red Hat EMEA >>>> >>>> >>>> >>>> > > -- > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From masghar.bese15seecs at seecs.edu.pk Wed Jan 19 09:33:11 2022 From: masghar.bese15seecs at seecs.edu.pk (Mahnoor Asghar) Date: Wed, 19 Jan 2022 14:33:11 +0500 Subject: [Ironic][docs][api-sig] Moving API Documentation closer to code In-Reply-To: <20220118230340.5tbadv5fzk5gu6hi@yuggoth.org> References: <20220118184558.l4bkh2xtvlf5j6qe@yuggoth.org> <29553f81-cc4e-3461-8517-190ac48ff7c9@redhat.com> <20220118230340.5tbadv5fzk5gu6hi@yuggoth.org> Message-ID: I understand more clearly now the reason for the initial choice of keeping the documentation and code separate. Any solution to address the maintainability problem that has arisen as a consequence, should be able to maintain API version history. Auto-documentation is another approach we could try, but I think it would not offer many features or be very descriptive. (The output might be very boilerplate-looking). On Wed, Jan 19, 2022 at 4:10 AM Jeremy Stanley wrote: > > On 2022-01-19 11:10:58 +1300 (+1300), Steve Baker wrote: > [...] > > This proposal is mostly about moving the existing documentation to > > docstrings rather than introspecting the functions/methods > > themselves to auto-generate. There may be some auto-generate > > potential for parameters and output schemas, but whatever is done > > needs to be able to document the API version evolution. The > > benefits of this is more about giving developers proximity to the > > documentation of the REST API they're changing, making > > discrepancies more obvious, and giving reviewers a clear view on > > whether a change is documented correctly. > > Correct, I don't recall anyone having tried exactly that approach > yet. It's more akin to what you'd do for documenting a Python > library. I was merely mentioning what other solutions had been > attempted, as an explanation for why I was adding the [api-sig] > subject tag. Apologies if that was unclear. > -- > Jeremy Stanley From mark at stackhpc.com Wed Jan 19 09:37:17 2022 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 19 Jan 2022 09:37:17 +0000 Subject: [kolla] Calling Kolla-ansible in an ansible playbook In-Reply-To: References: Message-ID: Use the shell module, then activate the virualenv first before running kolla-ansible. However, I'd argue against running Ansible in Ansible. It can be had enough to read the output when troubleshooting, without nesting it. Kayobe intentionally avoided this, instead adding a python CLI that calls out to kayobe and kolla-ansible. Mark On Tue, 18 Jan 2022 at 21:32, J-P Methot wrote: > > Hi, > > I've been trying to write an ansible-playbook to run some kolla tasks > and I've been wondering if this is even possible? I have kolla running > inside a virtual environment, so of course I'm running into issues > having ansible running it. The current task looks like this: > > - name: run kolla-ansible upgrade on controllers > command: "{{ venv }}/bin/kolla-ansible --limit {{ item }} -i > multinode upgrade" > loop: > - control > - monitoring > - storage > > When run, it's complaining that ansible isn't installed in the current > virtual environment. I was under the impression that specifying the > whole path to my executable would have it automatically run in that > virtual environment, but visibly this excludes access to ansible that's > installed there. > > So, basically, is what I'm trying to do even possible? And if so, what > would be the best way to do it? > > -- > Jean-Philippe M?thot > Senior Openstack system administrator > Administrateur syst?me Openstack s?nior > PlanetHoster inc. > > From mark at stackhpc.com Wed Jan 19 10:02:26 2022 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 19 Jan 2022 10:02:26 +0000 Subject: [rbac][keystone][kolla][openstack-ansible][tripleo] RBAC in Yoga for deployment projects Message-ID: Hi, From mkopec at redhat.com Wed Jan 19 10:16:52 2022 From: mkopec at redhat.com (Martin Kopec) Date: Wed, 19 Jan 2022 11:16:52 +0100 Subject: [heat-tempest-plugin] call for help from plugin's maintainers In-Reply-To: References: Message-ID: I confirm the pin helped, the tests are passing now: https://review.opendev.org/c/openinfra/ansible-role-refstack-client/+/825183 Although I see a strange error in the master job ^^, 2 tests are failing to create a stack. We noticed this some time ago and the tests have been failing consistently since. It's strange that only master job is affected. Any idea what could be causing that? https://zuul.opendev.org/t/openstack/build/c9568ac2e4f140f8b2ff461e2b1f5030 On Wed, 19 Jan 2022 at 10:17, Takashi Kajinami wrote: > The issue triggered by gabbi 2.5.0 was temporarily resolved by pinning it > to 2.4.0 . > The gate is unblocked, I believe (We found an issue with a functional job > for stable/train which is being fixed now). > We can remove that pin once the issue with urllib3 is resolved. > > > Should we consider using only https for endpoint access in the future? > Maybe ? But this is a separate topic, IMHO, and I don't have any strong > opinion about this. > There are several devstack plugins (like heat, aodh, ...) which don't > support > setting up tls-proxy for https endpoints yet and we need to fix each plugin > if we take this direction. > > > On Tue, Jan 18, 2022 at 10:21 PM Martin Kopec wrote: > >> Thank you Takashi for looking into this. >> >> Should we consider using only https for endpoint access in the future? >> >> On Tue, 18 Jan 2022 at 09:35, Takashi Kajinami >> wrote: >> >>> I have opened an issue for urllib3[1], too, and created a PR to discuss >>> a potential fix. >>> [1] https://github.com/urllib3/urllib3/issues/2534 >>> >>> Because it'd take some time until we get feedback from these two >>> communities, >>> I've proposed a change to pin gabbi to 2.4.0[2]. >>> [2] https://review.opendev.org/c/openstack/requirements/+/825044 >>> >>> The issue might affect other projects using gabbi as well, unless https, >>> instead of http, >>> is used for endpoint access. >>> >>> >>> On Tue, Jan 18, 2022 at 11:42 AM Takashi Kajinami >>> wrote: >>> >>>> I've looked into this but it seems the following error was actually >>>> caused by the latest release of gabbi(2.5.0). >>>> TypeError: __init__() got an unexpected keyword argument >>>> 'server_hostname' >>>> >>>> I've reported that issue to gabbi in [1] but if my observation is >>>> correct the problem should be >>>> fixed in urllib3 which gabbi is dependent on. >>>> [1] https://github.com/cdent/gabbi/issues/309 >>>> >>>> On Tue, Jan 18, 2022 at 2:34 AM Martin Kopec wrote: >>>> >>>>> Hello, >>>>> >>>>> most of the tests of heat-tempest-plugin have started failing. We >>>>> noticed that in interop [1], however, we reproduced that in the project's >>>>> gates as well [2]. >>>>> >>>>> I suspect it might be an issue with the new gabbi package - there has >>>>> been an update recently. >>>>> Could you please have a look. >>>>> >>>>> [1] >>>>> https://review.opendev.org/c/openinfra/ansible-role-refstack-client/+/824832 >>>>> [2] >>>>> https://review.opendev.org/c/openstack/heat-tempest-plugin/+/823794 >>>>> >>>>> Thank you, >>>>> -- >>>>> Martin Kopec >>>>> Senior Software Quality Engineer >>>>> Red Hat EMEA >>>>> >>>>> >>>>> >>>>> >> >> -- >> Martin >> > -- Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jan 19 10:35:53 2022 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 19 Jan 2022 10:35:53 +0000 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects Message-ID: Hi, If you haven't been paying close attention, it would be easy to miss some of the upcoming RBAC changes which will have an impact on deployment projects. I thought I'd start a thread so that we can share how we are approaching this, get answers to open questions, and ideally all end up with a fairly consistent approach. The secure RBAC work has a long history, and continues to evolve. According to [1], we should start to see some fairly substantial changes over the next few releases. That spec is fairly long, but worth a read. In the yoga timeline [2], there is one change in particular that has an impact on deployment projects, "3. Keystone enforces scope by default". After this change, all of the deprecated policies that many still rely on in Keystone will be removed. In kolla-ansible, we have an etherpad [5] with some notes, questions and half-baked plans. We made some changes in Xena [3] to use system scope in some places when interacting with system APIs in Ansible tasks. The next change we have staged is to add the service role to all service users [4], in preparation for [2]. Question: should the role be added with system scope or in the existing service project? The obvious main use for this is token validation, which seems to allow system or project scope. We anticipate that some service users may still require some project-scoped roles, e.g. when creating resources for octavia. We'll deal with those on a case by case basis. In anticipation of keystone setting enforce_scope=True and removing old default policies (which I assume effectively removes enforce_new_defaults?), we will set this in kolla-ansible, and try to deal with any fallout. Hopefully the previous work will make this minimal. How does that line up with other projects' approaches? What have we missed? Mark [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible From victoria at vmartinezdelacruz.com Wed Jan 19 11:04:07 2022 From: victoria at vmartinezdelacruz.com (=?UTF-8?Q?Victoria_Mart=C3=ADnez_de_la_Cruz?=) Date: Wed, 19 Jan 2022 12:04:07 +0100 Subject: [manila][cinder][glance][nova] Pop-up team for design and development of a Cephadm DevStack plugin Message-ID: Hi all, I'm reaching out to you to let you know that we will start the design and development of a Cephadm DevStack plugin. Some of the reasons on why we want to take this approach: - devstack-plugin-ceph worked for us for a lot of years, but the development of it relies on several hacks to adapt to the different Ceph versions we use and the different distros we support. This led to a monolithic script that sometimes is hard to debug and break our development environments and our CI - cephadm is the deployment tool developed and maintained by the Ceph community, it allows their users to get specific Ceph versions very easily and enforces good practices for Ceph clusters. From their docs, "Cephadm manages the full lifecycle of a Ceph cluster. It starts by bootstrapping a tiny Ceph cluster on a single node (one monitor and one manager) and then uses the orchestration interface (?day 2? commands) to expand the cluster to include all hosts and to provision all Ceph daemons and services. [0]" - OpenStack deployment tools are starting to use cephadm as their way to deploy Ceph, so it would be nice to include cephadm in our development process to be closer with what is being done in the field I started the development of this in [1], but it might be better to change devstack-plugin-ceph to do this instead of having a new plugin. This is something I would love to discuss in a first meeting. Having said this, I propose using the channel #openstack-cephadm in the OFTC network to talk about this and set up a first meeting with people interested in contributing to this effort. Thanks, Victoria [0] https://docs.ceph.com/en/pacific/cephadm/ [1] https://github.com/vkmc/devstack-plugin-cephadm -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramishra at redhat.com Wed Jan 19 11:11:00 2022 From: ramishra at redhat.com (Rabi Mishra) Date: Wed, 19 Jan 2022 16:41:00 +0530 Subject: [heat-tempest-plugin] call for help from plugin's maintainers In-Reply-To: References: Message-ID: On Wed, Jan 19, 2022 at 3:49 PM Martin Kopec wrote: > I confirm the pin helped, the tests are passing now: > > https://review.opendev.org/c/openinfra/ansible-role-refstack-client/+/825183 > Although I see a strange error in the master job ^^, 2 tests are failing > to create a stack. We noticed this some time ago and the tests have been > failing consistently since. It's strange that only master job is affected. > Any idea what could be causing that? > https://zuul.opendev.org/t/openstack/build/c9568ac2e4f140f8b2ff461e2b1f5030 > > AFAICT it could not build instances with below failures in nova/neutron logs. Jan 19 00:41:45.839386 ubuntu-focal-ovh-bhs1-0028063966 nova-compute[111088]: ERROR nova.compute.manager [instance: 862d3350-f712-466c-87d2-b82299b06755] nova.exception.PortBindingFailed: Binding failed for port 7dcc5c11-3e32-491c-8615-044ae78c1507, please check neutron logs for more information. Jan 19 00:41:45.006133 ubuntu-focal-ovh-bhs1-0028063966 neutron-server[105341]: ERROR neutron.plugins.ml2.managers [req-2a879dd1-4da2-435d-a4f9-39d91cde831e req-c57b35fb-525f-4311-9dca-8b0cee02b1d9 service neutron] Port 7dcc5c11-3e32-491c-8615-044ae78c1507 does not have an IP address assigned and there are no driver with 'connectivity' = 'l2'. The port cannot be bound. These tests are running fine in the heat gate and the only difference I could see is that you've tls_proxy enabled in devstack [1] (used with ovn) which we don't[2]. Probably some configuration issue that neutron folks could help. [1] ovn_sb_connection = ssl:158.69.73.104:6642 ovn_nb_connection = ssl:158.69.73.104:6641 [2] ovn_sb_connection = tcp:158.69.74.7:6642 ovn_nb_connection = tcp:158.69.74.7:6641 On Wed, 19 Jan 2022 at 10:17, Takashi Kajinami wrote: > >> The issue triggered by gabbi 2.5.0 was temporarily resolved by pinning it >> to 2.4.0 . >> The gate is unblocked, I believe (We found an issue with a functional job >> for stable/train which is being fixed now). >> We can remove that pin once the issue with urllib3 is resolved. >> >> > Should we consider using only https for endpoint access in the future? >> Maybe ? But this is a separate topic, IMHO, and I don't have any strong >> opinion about this. >> There are several devstack plugins (like heat, aodh, ...) which don't >> support >> setting up tls-proxy for https endpoints yet and we need to fix each >> plugin >> if we take this direction. >> >> >> On Tue, Jan 18, 2022 at 10:21 PM Martin Kopec wrote: >> >>> Thank you Takashi for looking into this. >>> >>> Should we consider using only https for endpoint access in the future? >>> >>> On Tue, 18 Jan 2022 at 09:35, Takashi Kajinami >>> wrote: >>> >>>> I have opened an issue for urllib3[1], too, and created a PR to discuss >>>> a potential fix. >>>> [1] https://github.com/urllib3/urllib3/issues/2534 >>>> >>>> Because it'd take some time until we get feedback from these two >>>> communities, >>>> I've proposed a change to pin gabbi to 2.4.0[2]. >>>> [2] https://review.opendev.org/c/openstack/requirements/+/825044 >>>> >>>> The issue might affect other projects using gabbi as well, unless >>>> https, instead of http, >>>> is used for endpoint access. >>>> >>>> >>>> On Tue, Jan 18, 2022 at 11:42 AM Takashi Kajinami >>>> wrote: >>>> >>>>> I've looked into this but it seems the following error was actually >>>>> caused by the latest release of gabbi(2.5.0). >>>>> TypeError: __init__() got an unexpected keyword argument >>>>> 'server_hostname' >>>>> >>>>> I've reported that issue to gabbi in [1] but if my observation is >>>>> correct the problem should be >>>>> fixed in urllib3 which gabbi is dependent on. >>>>> [1] https://github.com/cdent/gabbi/issues/309 >>>>> >>>>> On Tue, Jan 18, 2022 at 2:34 AM Martin Kopec >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> most of the tests of heat-tempest-plugin have started failing. We >>>>>> noticed that in interop [1], however, we reproduced that in the project's >>>>>> gates as well [2]. >>>>>> >>>>>> I suspect it might be an issue with the new gabbi package - there has >>>>>> been an update recently. >>>>>> Could you please have a look. >>>>>> >>>>>> [1] >>>>>> https://review.opendev.org/c/openinfra/ansible-role-refstack-client/+/824832 >>>>>> [2] >>>>>> https://review.opendev.org/c/openstack/heat-tempest-plugin/+/823794 >>>>>> >>>>>> Thank you, >>>>>> -- >>>>>> Martin Kopec >>>>>> Senior Software Quality Engineer >>>>>> Red Hat EMEA >>>>>> >>>>>> >>>>>> >>>>>> >>> >>> -- >>> Martin >>> >> > > -- > Martin > -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Wed Jan 19 11:15:05 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Wed, 19 Jan 2022 20:15:05 +0900 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: Message-ID: Hi, (The topic doesn't include puppet but ...) I recently spent some time implementing initial support for SRBAC in Puppet OpenStack. You can find details in the etherpad[1] I created as my working note. It includes some items commonly required by all toolings in addition to ones specific to puppet. [1] https://etherpad.opendev.org/p/puppet-secure-rbac I expect some of them (especially the configuration parameters) would be used by TripleO later. > Question: should the role be added with system scope or in the > existing service project? The obvious main use for this is token > validation, which seems to allow system or project scope. I'd add one more question which is; Which roles should be assigned for the service users ? In the project which already implemented SRBAC, system-admin + system-reader allows any API calls and works like the previous project-admin. For token validations system-reader(or service role) would be enough but there are some system-admin-only APIs (os-server-external-events API in nova called by neutron, Create allocation in placement called by nova or neutron) used for communications between services. If we agree system-admin + system-reader is the right set then I'll update the default role assignment accordingly. This is important for Puppet OpenStack because there are implementations in puppet (which is usually called as providers) to manage some resources like Flavors, and these rely on credentials of service users after trying to look up user credentials. Takashi On Wed, Jan 19, 2022 at 7:40 PM Mark Goddard wrote: > Hi, > > If you haven't been paying close attention, it would be easy to miss > some of the upcoming RBAC changes which will have an impact on > deployment projects. I thought I'd start a thread so that we can share > how we are approaching this, get answers to open questions, and > ideally all end up with a fairly consistent approach. > > The secure RBAC work has a long history, and continues to evolve. > According to [1], we should start to see some fairly substantial > changes over the next few releases. That spec is fairly long, but > worth a read. > > In the yoga timeline [2], there is one change in particular that has > an impact on deployment projects, "3. Keystone enforces scope by > default". After this change, all of the deprecated policies that many > still rely on in Keystone will be removed. > > In kolla-ansible, we have an etherpad [5] with some notes, questions > and half-baked plans. We made some changes in Xena [3] to use system > scope in some places when interacting with system APIs in Ansible > tasks. > > The next change we have staged is to add the service role to all > service users [4], in preparation for [2]. > > Question: should the role be added with system scope or in the > existing service project? The obvious main use for this is token > validation, which seems to allow system or project scope. > > We anticipate that some service users may still require some > project-scoped roles, e.g. when creating resources for octavia. We'll > deal with those on a case by case basis. > > In anticipation of keystone setting enforce_scope=True and removing > old default policies (which I assume effectively removes > enforce_new_defaults?), we will set this in kolla-ansible, and try to > deal with any fallout. Hopefully the previous work will make this > minimal. > > How does that line up with other projects' approaches? What have we missed? > > Mark > > [1] > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > [2] > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > [3] > https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jan 19 12:22:36 2022 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 19 Jan 2022 12:22:36 +0000 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: Message-ID: On Wed, 19 Jan 2022 at 11:15, Takashi Kajinami wrote: > > Hi, > > > (The topic doesn't include puppet but ...) > I recently spent some time implementing initial support for SRBAC > in Puppet OpenStack. You can find details in the etherpad[1] I created > as my working note. It includes some items commonly required by all toolings > in addition to ones specific to puppet. > [1] https://etherpad.opendev.org/p/puppet-secure-rbac Thanks for responding, Takashi - that's useful. > > I expect some of them (especially the configuration parameters) would be used > by TripleO later. > > > Question: should the role be added with system scope or in the > > existing service project? The obvious main use for this is token > > validation, which seems to allow system or project scope. > > I'd add one more question which is; > Which roles should be assigned for the service users ? > > In the project which already implemented SRBAC, system-admin + system-reader > allows any API calls and works like the previous project-admin. IIUC the direction of travel has changed, and now the intention is that system-admin won't have access to project-scoped APIs. > > For token validations system-reader(or service role) would be enough but there are > some system-admin-only APIs (os-server-external-events API in nova called by neutron, > Create allocation in placement called by nova or neutron) used for communications > between services. The token validation API has the following default policy: identity:validate_token: (role:reader and system_scope:all) or rule:service_role or rule:token_subject So system-reader, system-admin or service (any scope) should work. The spec suggests that the service role is intended for use by service to service APIs, in this case the credentials provided in the keystone_authtoken config. I would guess that system scope makes most sense here with the service role, although the rule suggests it would work with project scope and the service role. > > If we agree system-admin + system-reader is the right set then I'll update the default role > assignment accordingly. This is important for Puppet OpenStack because there are implementations > in puppet (which is usually called as providers) to manage some resources like Flavors, > and these rely on credentials of service users after trying to look up user credentials. I think one of the outcomes of this work is that authentication will necessarily become a bit more fine-grained. It might not make sense to have the same role assignments for all users. To your example, I would say that registering flavors should be done by a different user with different permissions than a service user. In kolla-ansible we don't really register flavors other than for octavia - this is up to operators. > > Takashi > > On Wed, Jan 19, 2022 at 7:40 PM Mark Goddard wrote: >> >> Hi, >> >> If you haven't been paying close attention, it would be easy to miss >> some of the upcoming RBAC changes which will have an impact on >> deployment projects. I thought I'd start a thread so that we can share >> how we are approaching this, get answers to open questions, and >> ideally all end up with a fairly consistent approach. >> >> The secure RBAC work has a long history, and continues to evolve. >> According to [1], we should start to see some fairly substantial >> changes over the next few releases. That spec is fairly long, but >> worth a read. >> >> In the yoga timeline [2], there is one change in particular that has >> an impact on deployment projects, "3. Keystone enforces scope by >> default". After this change, all of the deprecated policies that many >> still rely on in Keystone will be removed. >> >> In kolla-ansible, we have an etherpad [5] with some notes, questions >> and half-baked plans. We made some changes in Xena [3] to use system >> scope in some places when interacting with system APIs in Ansible >> tasks. >> >> The next change we have staged is to add the service role to all >> service users [4], in preparation for [2]. >> >> Question: should the role be added with system scope or in the >> existing service project? The obvious main use for this is token >> validation, which seems to allow system or project scope. >> >> We anticipate that some service users may still require some >> project-scoped roles, e.g. when creating resources for octavia. We'll >> deal with those on a case by case basis. >> >> In anticipation of keystone setting enforce_scope=True and removing >> old default policies (which I assume effectively removes >> enforce_new_defaults?), we will set this in kolla-ansible, and try to >> deal with any fallout. Hopefully the previous work will make this >> minimal. >> >> How does that line up with other projects' approaches? What have we missed? >> >> Mark >> >> [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst >> [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 >> [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 >> [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 >> [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible >> From smooney at redhat.com Wed Jan 19 12:54:50 2022 From: smooney at redhat.com (Sean Mooney) Date: Wed, 19 Jan 2022 12:54:50 +0000 Subject: [manila][cinder][glance][nova] Pop-up team for design and development of a Cephadm DevStack plugin In-Reply-To: References: Message-ID: On Wed, 2022-01-19 at 12:04 +0100, Victoria Mart?nez de la Cruz wrote: > Hi all, > > I'm reaching out to you to let you know that we will start the design and > development of a Cephadm DevStack plugin. > > Some of the reasons on why we want to take this approach: > > - devstack-plugin-ceph worked for us for a lot of years, but the > development of it relies on several hacks to adapt to the different Ceph > versions we use and the different distros we support. This led to a > monolithic script that sometimes is hard to debug and break our development > environments and our CI > - cephadm is the deployment tool developed and maintained by the Ceph > community, it allows their users to get specific Ceph versions very easily > and enforces good practices for Ceph clusters. From their docs, "Cephadm > manages the full lifecycle of a Ceph cluster. It starts by bootstrapping a > tiny Ceph cluster on a single node (one monitor and one manager) and then > uses the orchestration interface (?day 2? commands) to expand the cluster > to include all hosts and to provision all Ceph daemons and services. [0]" > - OpenStack deployment tools are starting to use cephadm as their way to > deploy Ceph, so it would be nice to include cephadm in our development > process to be closer with what is being done in the field > > I started the development of this in [1], but it might be better to change > devstack-plugin-ceph to do this instead of having a new plugin. This is > something I would love to discuss in a first meeting. i would advocate for pivoting devstack-plugin-ceph. i dont think we have the capsity as a comunity to devleop, maintaine and debug/support 2 differnt ways of deploying ceph in our ci system in the long term. to me the way devstack-plugin-ceph install cpeh is jsut an implementaion detail. its contract is that it will install and configure ceph for use with openstack. if you make it use cephadm for that its just and internal detail that should not affect the consomes of the plugin provide you maintain the interface to the devstack pluging mostly the same. i would suggest addign a devstack macro initally to choose the backend but then eventually once the cephadm appoch is stable just swap the default. > > Having said this, I propose using the channel #openstack-cephadm in the > OFTC network to talk about this and set up a first meeting with people > interested in contributing to this effort. ack im not sure i will get involed with this but the other option woudl be to just use #openstack-qa since that is the chanlle for devstack development. > > Thanks, > > Victoria > > [0] https://docs.ceph.com/en/pacific/cephadm/ > [1] https://github.com/vkmc/devstack-plugin-cephadm From senrique at redhat.com Wed Jan 19 12:55:31 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 19 Jan 2022 09:55:31 -0300 Subject: [cinder] Bug deputy report for week of 01-19-2022 Message-ID: This is a bug report from 01-12-2022 to 01-19-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- Medium - https://bugs.launchpad.net/cinder/+bug/1957804 "RBD deferred deletion causes undeletable RBD snapshots." In Progress. Assigned to Eric. - https://bugs.launchpad.net/cinder/+bug/1958122 "HPE 3PAR: In multi host env, multi-detach works partially if volume is attached to instances from separate hosts." In Progress. Assigned to Raghavendra Tilay. Low - https://bugs.launchpad.net/cinder/+bug/1958023 "Solidfire: there are some references to the removed parameters left." In Progress. Assigned to kajinamit. - https://bugs.launchpad.net/cinder/+bug/1958245 "NetApp ONTAP driver shows type error exception when replicating FlexGroups." Unassigned. Cheers, -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Wed Jan 19 14:01:00 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Wed, 19 Jan 2022 23:01:00 +0900 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: Message-ID: On Wed, Jan 19, 2022 at 9:22 PM Mark Goddard wrote: > On Wed, 19 Jan 2022 at 11:15, Takashi Kajinami > wrote: > > > > Hi, > > > > > > (The topic doesn't include puppet but ...) > > I recently spent some time implementing initial support for SRBAC > > in Puppet OpenStack. You can find details in the etherpad[1] I created > > as my working note. It includes some items commonly required by all > toolings > > in addition to ones specific to puppet. > > [1] https://etherpad.opendev.org/p/puppet-secure-rbac > > Thanks for responding, Takashi - that's useful. > > > > > I expect some of them (especially the configuration parameters) would be > used > > by TripleO later. > > > > > Question: should the role be added with system scope or in the > > > existing service project? The obvious main use for this is token > > > validation, which seems to allow system or project scope. > > > > I'd add one more question which is; > > Which roles should be assigned for the service users ? > > > > In the project which already implemented SRBAC, system-admin + > system-reader > > allows any API calls and works like the previous project-admin. > > IIUC the direction of travel has changed, and now the intention is > that system-admin won't have access to project-scoped APIs. > > > > > For token validations system-reader(or service role) would be enough but > there are > > some system-admin-only APIs (os-server-external-events API in nova > called by neutron, > > Create allocation in placement called by nova or neutron) used for > communications > > between services. > > The token validation API has the following default policy: > > identity:validate_token: (role:reader and system_scope:all) or > rule:service_role or rule:token_subject > > So system-reader, system-admin or service (any scope) should work. The > spec suggests that the service role is intended for use by service to > service APIs, in this case the credentials provided in the > keystone_authtoken config. I would guess that system scope makes most > sense here with the service role, although the rule suggests it would > work with project scope and the service role. > > I noticed I ignored implied roles... Thanks for clarifying that. I understand and I agree with this. Considering the intention of SRBAC this would fix better with system-scoped, as you earlier mentioned but I'll defer to the others. > > > > If we agree system-admin + system-reader is the right set then I'll > update the default role > > assignment accordingly. This is important for Puppet OpenStack because > there are implementations > > in puppet (which is usually called as providers) to manage some > resources like Flavors, > > and these rely on credentials of service users after trying to look up > user credentials. > > I think one of the outcomes of this work is that authentication will > necessarily become a bit more fine-grained. It might not make sense to > have the same role assignments for all users. To your example, I would > say that registering flavors should be done by a different user with > different permissions than a service user. In kolla-ansible we don't > really register flavors other than for octavia - this is up to > operators. > My main concern was that some service users would require system-admin but I should have read this part more carefully. https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#phase-2 So Assigning the service role (for the proper scope which is asked in the original thread) is the right way to go. For the provider stuff I'll look into any available option to replace usage of service user credential but that's specific to Puppet which we can ignore here in this discussion. > > > Takashi > > > > On Wed, Jan 19, 2022 at 7:40 PM Mark Goddard wrote: > >> > >> Hi, > >> > >> If you haven't been paying close attention, it would be easy to miss > >> some of the upcoming RBAC changes which will have an impact on > >> deployment projects. I thought I'd start a thread so that we can share > >> how we are approaching this, get answers to open questions, and > >> ideally all end up with a fairly consistent approach. > >> > >> The secure RBAC work has a long history, and continues to evolve. > >> According to [1], we should start to see some fairly substantial > >> changes over the next few releases. That spec is fairly long, but > >> worth a read. > >> > >> In the yoga timeline [2], there is one change in particular that has > >> an impact on deployment projects, "3. Keystone enforces scope by > >> default". After this change, all of the deprecated policies that many > >> still rely on in Keystone will be removed. > >> > >> In kolla-ansible, we have an etherpad [5] with some notes, questions > >> and half-baked plans. We made some changes in Xena [3] to use system > >> scope in some places when interacting with system APIs in Ansible > >> tasks. > >> > >> The next change we have staged is to add the service role to all > >> service users [4], in preparation for [2]. > >> > >> Question: should the role be added with system scope or in the > >> existing service project? The obvious main use for this is token > >> validation, which seems to allow system or project scope. > >> > >> We anticipate that some service users may still require some > >> project-scoped roles, e.g. when creating resources for octavia. We'll > >> deal with those on a case by case basis. > >> > >> In anticipation of keystone setting enforce_scope=True and removing > >> old default policies (which I assume effectively removes > >> enforce_new_defaults?), we will set this in kolla-ansible, and try to > >> deal with any fallout. Hopefully the previous work will make this > >> minimal. > >> > >> How does that line up with other projects' approaches? What have we > missed? > >> > >> Mark > >> > >> [1] > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > >> [2] > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > >> [3] > https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > >> [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > >> [5] > https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Jan 19 14:02:14 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 19 Jan 2022 08:02:14 -0600 Subject: Can neutron-fwaas project be revived? In-Reply-To: References: <771f9e50a5f0498caecf3cb892902954@inspur.com> <17e6e183905.d179775f807310.479674362228455950@ghanshyammann.com> <20220118175104.a6ppj2kxpijeztz7@yuggoth.org> Message-ID: <17e72a52aee.f17e7143871608.3933408575637218060@ghanshyammann.com> ---- On Wed, 19 Jan 2022 02:23:39 -0600 Lajos Katona wrote ---- > Hi, > Thanks for the advice. > The intention from the Neutron team was to make it clear that the team currently has no capacity to help the maintenance of neutron-fwaas, and can't help to maintain it.If there's easier ways for volunteers to keep it maintained other than forking it to x/ namespace that would be really helpful. Thanks Lajos, Main point here is if it is maintained by current maintainer (inspur team or other developers) whether neutron team will consider that to be in added in neutron stadium? If yes, then it will be extra work to move to x/ namespace now and then bring back to openstack/. If no, then moving to x/ namespace is good option or if maintainer want to be in openstack then we can discuss about a separate new project (but that needs more discussion on host much cost it adds). -gmann > Lajos Katona (lajoskatona) > > Jeremy Stanley ezt ?rta (id?pont: 2022. jan. 18., K, 18:58): > On 2022-01-18 10:49:48 -0600 (-0600), Ghanshyam Mann wrote: > [...] > > As discussed in project-config change[1], you or neutron folks can > > propose the retirement now itself (considering there is no one to > > maintain/release stable/victoria for new bug fixes) and TC will > > merge it as per process. After that, creating it in x/ namespace > > will be good to do. > [...] > > Looking at this from a logistical perspective, it's a fair amount of > churn in code hosting as well as unwelcoming to the new volunteers, > compared to just leaving the repository where it is now and letting > them contribute to it there. If the concern is that the Neutron team > doesn't want to retain responsibility for it while they evaluate the > conviction of the new maintainers for eventual re-inclusion, then > the TC would be well within its rights to declare that the > repository can remain in place while not having it be part of the > Neutron team's responsibilities. > > There are a number of possible solutions, ranging from making a new > category of provisional deliverable, to creating a lightweight > project team under the DPL model, to declaring it a pop-up team with > a TC-owned repository. There are repositories within the OpenStack > namespace which are not an official part of the OpenStack > coordinated release, after all. Solutions which don't involve having > the new work take place somewhere separate, and the work involved in > making that separate place, which will simply be closed down as > transient cruft if everything goes as desired. > -- > Jeremy Stanley > From dbengt at redhat.com Wed Jan 19 14:08:40 2022 From: dbengt at redhat.com (Daniel Mats Niklas Bengtsson) Date: Wed, 19 Jan 2022 15:08:40 +0100 Subject: [Oslo] IRC meeting. Message-ID: Hi there, There were no more oslo meetings but I will take over the role of facilitator and organize the meetings. I think it's important and help to have a status and follow some topics. The next meeting will be Monday, then we will resume the old rhythm with a meeting on the first and third Monday of the month. Please feel free to update your name in the pinglist[1] if you want to join it or on the contrary if you do not want to. [1] https://wiki.openstack.org/wiki/Meetings/Oslo#Agenda_Template From mark at stackhpc.com Wed Jan 19 14:40:52 2022 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 19 Jan 2022 14:40:52 +0000 Subject: [manila][cinder][glance][nova] Pop-up team for design and development of a Cephadm DevStack plugin In-Reply-To: References: Message-ID: On Wed, 19 Jan 2022 at 11:08, Victoria Mart?nez de la Cruz wrote: > > Hi all, > > I'm reaching out to you to let you know that we will start the design and development of a Cephadm DevStack plugin. > > Some of the reasons on why we want to take this approach: > > - devstack-plugin-ceph worked for us for a lot of years, but the development of it relies on several hacks to adapt to the different Ceph versions we use and the different distros we support. This led to a monolithic script that sometimes is hard to debug and break our development environments and our CI > - cephadm is the deployment tool developed and maintained by the Ceph community, it allows their users to get specific Ceph versions very easily and enforces good practices for Ceph clusters. From their docs, "Cephadm manages the full lifecycle of a Ceph cluster. It starts by bootstrapping a tiny Ceph cluster on a single node (one monitor and one manager) and then uses the orchestration interface (?day 2? commands) to expand the cluster to include all hosts and to provision all Ceph daemons and services. [0]" > - OpenStack deployment tools are starting to use cephadm as their way to deploy Ceph, so it would be nice to include cephadm in our development process to be closer with what is being done in the field > > I started the development of this in [1], but it might be better to change devstack-plugin-ceph to do this instead of having a new plugin. This is something I would love to discuss in a first meeting. > > Having said this, I propose using the channel #openstack-cephadm in the OFTC network to talk about this and set up a first meeting with people interested in contributing to this effort. > > Thanks, > > Victoria > > [0] https://docs.ceph.com/en/pacific/cephadm/ > [1] https://github.com/vkmc/devstack-plugin-cephadm In case it's useful as a reference, we built an Ansible collection that drives cephadm: https://github.com/stackhpc/ansible-collection-cephadm Feedback welcome, of course! Mark From thierry at openstack.org Wed Jan 19 15:36:06 2022 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 19 Jan 2022 16:36:06 +0100 Subject: [largescale-sig] Next meeting: Jan 19th, 15utc In-Reply-To: <9dceae99-a644-8184-430e-b2dedba0235e@openstack.org> References: <9dceae99-a644-8184-430e-b2dedba0235e@openstack.org> Message-ID: We held our meeting today! We discussed the details of our next "Large Scale OpenStack" episode on OpenInfra.Live, which will be more in a podcast style with one special guest each episode. The Feb 3rd one will feature OVHCloud. You can read the meeting logs at: https://meetings.opendev.org/meetings/large_scale_sig/2022/large_scale_sig.2022-01-19-15.02.html Our next IRC meeting will be February 16, at 1500utc on #openstack-operators on OFTC. Regards, -- Thierry Carrez (ttx) From gmann at ghanshyammann.com Wed Jan 19 16:12:51 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 19 Jan 2022 10:12:51 -0600 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: Message-ID: <17e731cbfcc.e2463d54885770.5926410035050303741@ghanshyammann.com> ---- On Wed, 19 Jan 2022 04:35:53 -0600 Mark Goddard wrote ---- > Hi, > > If you haven't been paying close attention, it would be easy to miss > some of the upcoming RBAC changes which will have an impact on > deployment projects. I thought I'd start a thread so that we can share > how we are approaching this, get answers to open questions, and > ideally all end up with a fairly consistent approach. > > The secure RBAC work has a long history, and continues to evolve. > According to [1], we should start to see some fairly substantial > changes over the next few releases. That spec is fairly long, but > worth a read. > > In the yoga timeline [2], there is one change in particular that has > an impact on deployment projects, "3. Keystone enforces scope by > default". After this change, all of the deprecated policies that many > still rely on in Keystone will be removed. > > In kolla-ansible, we have an etherpad [5] with some notes, questions > and half-baked plans. We made some changes in Xena [3] to use system > scope in some places when interacting with system APIs in Ansible > tasks. > > The next change we have staged is to add the service role to all > service users [4], in preparation for [2]. > > Question: should the role be added with system scope or in the > existing service project? The obvious main use for this is token > validation, which seems to allow system or project scope. > > We anticipate that some service users may still require some > project-scoped roles, e.g. when creating resources for octavia. We'll > deal with those on a case by case basis. Service roles are planned for phase2 which is Z release[1]. The Idea here is service to service communication will happen with 'service' role (which keystone need to implement yet) and end users will keep using the what ever role is default (or overridden in policy file) which can be project or system scoped depends on the APIs. So at the end service-service APIs policy default will looks like '(role:admin and system:network and project_id:%(project_id)s) or (role:service and project_name:service)' Say nova will use that service role to communicate to cinder and cinder policy will pass as service role is in OR in default policy. But let's see how they are going to be and if any challenges when we will implement it in Z cycle. > > In anticipation of keystone setting enforce_scope=True and removing > old default policies (which I assume effectively removes > enforce_new_defaults?), we will set this in kolla-ansible, and try to > deal with any fallout. Hopefully the previous work will make this > minimal. > > How does that line up with other projects' approaches? What have we missed? Yeah, we want users/deployment projects/horizon etc to use the new policy from keystone as first and we will see feedback how they are (good, bad, really bad) from usage perspective. Why we choose keystone is, because new policy are there since many cycle and ready to use. Other projects needs to work their policy as per new SRBAC design/direction (for example nova needs to modify their policy before we ask users to use new policy and work is under progress[2]). I think trying in kolla will be good way to know if we can move to keystone's new policy completely in yoga. [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#z-release-timeline [2] https://blueprints.launchpad.net/nova/+spec/policy-defaults-refresh-2 -gmann > > Mark > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > From gmann at ghanshyammann.com Wed Jan 19 16:24:43 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 19 Jan 2022 10:24:43 -0600 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: Message-ID: <17e73279ebc.da3cd092886796.8771224917629901204@ghanshyammann.com> ---- On Wed, 19 Jan 2022 08:01:00 -0600 Takashi Kajinami wrote ---- > > On Wed, Jan 19, 2022 at 9:22 PM Mark Goddard wrote: > On Wed, 19 Jan 2022 at 11:15, Takashi Kajinami wrote: > > > > Hi, > > > > > > (The topic doesn't include puppet but ...) > > I recently spent some time implementing initial support for SRBAC > > in Puppet OpenStack. You can find details in the etherpad[1] I created > > as my working note. It includes some items commonly required by all toolings > > in addition to ones specific to puppet. > > [1] https://etherpad.opendev.org/p/puppet-secure-rbac > > Thanks for responding, Takashi - that's useful. > > > > > I expect some of them (especially the configuration parameters) would be used > > by TripleO later. > > > > > Question: should the role be added with system scope or in the > > > existing service project? The obvious main use for this is token > > > validation, which seems to allow system or project scope. > > > > I'd add one more question which is; > > Which roles should be assigned for the service users ? > > > > In the project which already implemented SRBAC, system-admin + system-reader > > allows any API calls and works like the previous project-admin. > > IIUC the direction of travel has changed, and now the intention is > that system-admin won't have access to project-scoped APIs. Yes, as mark mentioned. And that is the key change from prevous direction. We are isolating the system and project level APIs. system token will be able to perform only system level operation and not allowed to do project level operation. For example: system user will not be allowed to create the server in nova. To have a quick view on those (we have not finished yet in nova), you can check how it will look like in the below series: - https://review.opendev.org/q/topic:%22bp%252Fpolicy-defaults-refresh-2%22+(status:open%20OR%20status:merged) You can see the test cases for all four possible configuration combination and what all roles are allowed in which configuration (case 4th is end goal we want to be for RBAC): 1. enforce_scope=False + legacy rule (current default policies) 2. enforce_scope=False + No legacy rule (enable scope but remove old policy default) 3. enforce_scope=True + legacy rule (enable scope with old policy default) 4. enforce_scope=True + no legacy rule (end goal of new RBAC) > > > > > For token validations system-reader(or service role) would be enough but there are > > some system-admin-only APIs (os-server-external-events API in nova called by neutron, > > Create allocation in placement called by nova or neutron) used for communications > > between services. > > The token validation API has the following default policy: > > identity:validate_token: (role:reader and system_scope:all) or > rule:service_role or rule:token_subject > > So system-reader, system-admin or service (any scope) should work. The > spec suggests that the service role is intended for use by service to > service APIs, in this case the credentials provided in the > keystone_authtoken config. I would guess that system scope makes most > sense here with the service role, although the rule suggests it would > work with project scope and the service role. > > I noticed I ignored implied roles... Thanks for clarifying that. > I understand and I agree with this. Considering the intention of SRBAC this would fixbetter with system-scoped, as you earlier mentioned but I'll defer to the others. > Another thigns to note here is, in Yoga cycle we are doing only system-admin. system-reader, system-member will be done in phase3 which is for future releases (BB). > > If we agree system-admin + system-reader is the right set then I'll update the default role > > assignment accordingly. This is important for Puppet OpenStack because there are implementations > > in puppet (which is usually called as providers) to manage some resources like Flavors, > > and these rely on credentials of service users after trying to look up user credentials. > > I think one of the outcomes of this work is that authentication will > necessarily become a bit more fine-grained. It might not make sense to > have the same role assignments for all users. To your example, I would > say that registering flavors should be done by a different user with > different permissions than a service user. In kolla-ansible we don't > really register flavors other than for octavia - this is up to > operators. > My main concern was that some service users would require system-admin butI should have read this part more carefully. https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#phase-2 > So Assigning the service role (for the proper scope which is asked in the original thread)is the right way to go. For the provider stuff I'll look into any available option to replace usage of serviceuser credential but that's specific to Puppet which we can ignore here in this discussion. right, once we have service role implemented then we will have clear way on how services will be communicating to other services APIs. -gmann > > > > > Takashi > > > > On Wed, Jan 19, 2022 at 7:40 PM Mark Goddard wrote: > >> > >> Hi, > >> > >> If you haven't been paying close attention, it would be easy to miss > >> some of the upcoming RBAC changes which will have an impact on > >> deployment projects. I thought I'd start a thread so that we can share > >> how we are approaching this, get answers to open questions, and > >> ideally all end up with a fairly consistent approach. > >> > >> The secure RBAC work has a long history, and continues to evolve. > >> According to [1], we should start to see some fairly substantial > >> changes over the next few releases. That spec is fairly long, but > >> worth a read. > >> > >> In the yoga timeline [2], there is one change in particular that has > >> an impact on deployment projects, "3. Keystone enforces scope by > >> default". After this change, all of the deprecated policies that many > >> still rely on in Keystone will be removed. > >> > >> In kolla-ansible, we have an etherpad [5] with some notes, questions > >> and half-baked plans. We made some changes in Xena [3] to use system > >> scope in some places when interacting with system APIs in Ansible > >> tasks. > >> > >> The next change we have staged is to add the service role to all > >> service users [4], in preparation for [2]. > >> > >> Question: should the role be added with system scope or in the > >> existing service project? The obvious main use for this is token > >> validation, which seems to allow system or project scope. > >> > >> We anticipate that some service users may still require some > >> project-scoped roles, e.g. when creating resources for octavia. We'll > >> deal with those on a case by case basis. > >> > >> In anticipation of keystone setting enforce_scope=True and removing > >> old default policies (which I assume effectively removes > >> enforce_new_defaults?), we will set this in kolla-ansible, and try to > >> deal with any fallout. Hopefully the previous work will make this > >> minimal. > >> > >> How does that line up with other projects' approaches? What have we missed? > >> > >> Mark > >> > >> [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > >> [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > >> [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > >> [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > >> [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > >> > > From erin at openstack.org Wed Jan 19 16:31:28 2022 From: erin at openstack.org (Erin Disney) Date: Wed, 19 Jan 2022 10:31:28 -0600 Subject: OpenInfra Summit Berlin 2022 Programming Committee Nominations Now Open In-Reply-To: <4A25C8ED-9FAE-4844-99FF-EE9368EE45F0@openstack.org> References: <4A25C8ED-9FAE-4844-99FF-EE9368EE45F0@openstack.org> Message-ID: <017D3603-9726-419A-B413-DA5DA6D441B1@openstack.org> Quick reminder that the deadline to nominate yourself or a colleague for the 2022 OpenInfra Programming Committee is today! Form is available here: https://openinfrafoundation.formstack.com/forms/programmingcommittee2022 Erin Disney Sr. Event Marketing Manager Open Infrastructure Foundation > On Dec 21, 2021, at 11:05 AM, Erin Disney wrote: > > Hey everyone- > > Programming Committee nominations for the 2022 OpenInfra Summit in Berlin (June 7-9, 2022) are now open! > > Programming Committees for each Track will help build the Summit schedule, and are made up of individuals working in open infrastructure. > > Responsibilities include: > Help the Summit team put together the best possible content based on your subject matter expertise > Promote the individual Tracks within your networks > Review the submissions and Community voting results in your particular Track > Determine if there are any major content gaps in your Track, and if so, potentially solicit additional speakers directly to submit > Ensure diversity of speakers and companies represented in your Track > Avoid vendor sales pitches, focusing more on real-world user stories and technical, in-the-trenches experiences > Identify which submissions would make good content for OpenInfra Live, the virtual show hosted by the OpenInfra Foundation that encourages live questions from a global audience > > 2022 Summit Tracks: > 5G, NFV & Edge > AI, Machine Learning & HPC > CI/CD > Container Infrastructure > Getting Started > Hands-on Workshops > NEW: Hardware Enablement > Open Development > Private & Hybrid Cloud > Public Cloud > Security > > Full track descriptions are available here . If you?re interested in nominating yourself or someone else to be a member of the Summit Programming Committee for a specific Track, please fill out the nomination form . Nominations will close on January 19, 2022. > > Programming Committee selections will occur before we close the Call for Presentations (CFP) so that the Committees can host office hours to consult on submissions, and help promote the event. CFP will be opening in January, registration and sponsorship information are already available. > > Please email speakersupport at openinfra.dev with any questions or feedback. > > Cheers, > > Erin Disney > Event Marketing > Open Infrastructure Foundation > -------------- next part -------------- An HTML attachment was scrubbed... URL: From songwenping at inspur.com Wed Jan 19 02:52:47 2022 From: songwenping at inspur.com (=?gb2312?B?QWxleCBTb25nICjLzs7Exr0p?=) Date: Wed, 19 Jan 2022 02:52:47 +0000 Subject: =?gb2312?B?tPC4tDogW2N5Ym9yZ10gUHJvcG9zaW5nIGNvcmUgcmV2aWV3ZXJz?= In-Reply-To: References: Message-ID: +1 Congratulations! Eric, welcome! ??????: Brin Zhang(??????) ????????: 2022??1??17?? 9:13 ??????: openstack-discuss at lists.openstack.org ????: xin-ran.wang at intel.com; Alex Song (??????) ; huangzhipeng at huawei.com; liliueecg at gmail.com; shogo.saito.ac at hco.ntt.co.jp; sundar.nadathur at intel.com; yumeng_bao at yahoo.com; chen.ke14 at zte.com.cn; 419546439 at qq.com; shaohe.feng at intel.com; wangzhengh at chinatelecom.cn; zhuli2317 at gmail.com ????: [cyborg] Proposing core reviewers Hello all, Eric xie has been actively contributing to Cyborg in various areas, adding new features, improving quality, reviewing patches. Despite the relatively short time, he has been one of the most prolific contributors, and brings an enthusiastic and active mindset. I would like to thank and acknowledge him for his steady valuable contributions, and propose him as a core reviewer for Cyborg. Some of the currently listed core reviewers have not been participating for a lengthy period of time. It is proposed that those who have had no contributions for the past 18 months ?C i.e. no participation in meetings, no code contributions, not participating in Cyborg open source activities and no reviews ?C be removed from the list of core reviewers. -- The Cyborg team recognizes everyone's contributions, but we need to ensure the activity of the core-reviewer list. If you are interested in rejoining the cyborg team, feel free to ping us to restore the core reviewer for you. If no objections are made known by January 24, I will make the changes proposed above.. Regards, Brin Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3774 bytes Desc: not available URL: From zhipengh512 at gmail.com Wed Jan 19 13:20:33 2022 From: zhipengh512 at gmail.com (Zhipeng Huang) Date: Wed, 19 Jan 2022 21:20:33 +0800 Subject: [cyborg] Proposing core reviewers In-Reply-To: References: Message-ID: Hi Brin, For the sake of due process, I have two suggestions : 1. Plz provide a link of Eric's full contribution from review.opendev.org. It is a custom to provide such reference. 2. For the removal of non active committers, we've adopt a policy of "voluntarily step down" in the past, for any changes please also submit a patch to create a document for a new governance folder for current core reviewers to have a formal vote. On Mon, Jan 17, 2022, 9:40 AM Brin Zhang(???) wrote: > Hello all, > > Eric xie has been actively contributing to Cyborg in various areas, adding > new features, improving quality, reviewing patches. Despite the relatively > short time, he has been one of the most prolific contributors, and brings > an enthusiastic and active mindset. I would like to thank and acknowledge > him for his steady valuable contributions, and propose him as a core > reviewer for Cyborg. > > > > Some of the currently listed core reviewers have not been participating > for a lengthy period of time. It is proposed that those who have had no > contributions for the past 18 months ? i.e. no participation in meetings, > no code contributions, not participating in Cyborg open source activities > and no reviews ? be removed from the list of core reviewers. > > -- The Cyborg team recognizes everyone's contributions, but we need to > ensure the activity of the core-reviewer list. If you are interested in > rejoining the cyborg team, feel free to ping us to restore the core > reviewer for you. > > > > If no objections are made known by January 24, I will make the changes > proposed above.. > > > > Regards, > > Brin Zhang > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc-antoine.godde at student-cs.fr Tue Jan 18 22:25:26 2022 From: marc-antoine.godde at student-cs.fr (Marc-Antoine Godde (Student at CentraleSupelec)) Date: Tue, 18 Jan 2022 22:25:26 +0000 Subject: Problem with ressource provider In-Reply-To: References: Message-ID: <4D26E50B-FA85-4CD1-A1C6-77C35C2EF4FB@student-cs.fr> Here is what we get. :) Thanks for your help [cid:9921FF69-0B8A-456C-969F-2261C88048E0] Le 18 janv. 2022 ? 23:17, Tony Liu > a ?crit : It would be easier to check resource provider by openstack cli, than looking into db. What's the name, short or FDQN, used by other compute nodes? Restart nova-compute and look into log, see which name is used to register resource provider. Tony ________________________________________ From: Marc-Antoine Godde (Student at CentraleSupelec) > Sent: January 18, 2022 01:56 PM To: openstack-discuss at lists.openstack.org Subject: Problem with ressource provider Hello, In our cluster, we have 4 computes running and we have an issue with the number 4. We can't create VMs on it, we can't migrate VMs to or from that node. VMs are still perfectly working though. After a first diagnosis, it appears that there's a problem with the ressource provider. Node is declared in the db with: - name: os-compute-4, uuid: d12ea77b-d678-40ce-a813-d8094cabbbd8 Here are the ressource provider: - name: os-compute-4, uuid: a9dc2a56-5b2d-49b1-ac47-6d996d2d029a - name: os-compute-4.openstack.local, uuid: d12ea776-d678-40ce-a813-d8094cabbbd8 In our opinion, os-compute-4.openstack.local shouldn't be there at all. We want to destroy both of the ressource provider and recreate one. I must also precise that os-compute-4 ressource provider has 0 allocation and os-compute-4.openstack.local only 3 (there?s at least 50 VMs running on it?). Moreover, for these 3 allocations, the server uuid doesn't correspond to any existing VMs. Overall, none of the VMs has a ressource allocation on os-compute-4. We found the command nova-manage placement heal_allocations on the Internet but we can't find it in any container, maybe deprecated ? The cluster is running Ussuri installed with Openstack-ansible. If you have any suggestion, any help would be appreciated. Thanks. :) Best, Marc-Antoine Godde -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: virt.png Type: image/png Size: 859316 bytes Desc: virt.png URL: From tonyliu0592 at hotmail.com Tue Jan 18 22:30:29 2022 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Tue, 18 Jan 2022 22:30:29 +0000 Subject: Problem with ressource provider In-Reply-To: <4D26E50B-FA85-4CD1-A1C6-77C35C2EF4FB@student-cs.fr> References: <4D26E50B-FA85-4CD1-A1C6-77C35C2EF4FB@student-cs.fr> Message-ID: Check /etc/hosts and /etc/hostname in all 4 compute nodes and ensure they are consistent. Check nova-compute logging to see which name is used as the provider. If you are sure the provide is safe to be deleted, you can remove it by openstack cli. Tony ________________________________________ From: Marc-Antoine Godde (Student at CentraleSupelec) Sent: January 18, 2022 02:25 PM To: Tony Liu Cc: openstack-discuss at lists.openstack.org Subject: Re: Problem with ressource provider Here is what we get. :) Thanks for your help [cid:9921FF69-0B8A-456C-969F-2261C88048E0] Le 18 janv. 2022 ? 23:17, Tony Liu > a ?crit : It would be easier to check resource provider by openstack cli, than looking into db. What's the name, short or FDQN, used by other compute nodes? Restart nova-compute and look into log, see which name is used to register resource provider. Tony ________________________________________ From: Marc-Antoine Godde (Student at CentraleSupelec) > Sent: January 18, 2022 01:56 PM To: openstack-discuss at lists.openstack.org Subject: Problem with ressource provider Hello, In our cluster, we have 4 computes running and we have an issue with the number 4. We can't create VMs on it, we can't migrate VMs to or from that node. VMs are still perfectly working though. After a first diagnosis, it appears that there's a problem with the ressource provider. Node is declared in the db with: - name: os-compute-4, uuid: d12ea77b-d678-40ce-a813-d8094cabbbd8 Here are the ressource provider: - name: os-compute-4, uuid: a9dc2a56-5b2d-49b1-ac47-6d996d2d029a - name: os-compute-4.openstack.local, uuid: d12ea776-d678-40ce-a813-d8094cabbbd8 In our opinion, os-compute-4.openstack.local shouldn't be there at all. We want to destroy both of the ressource provider and recreate one. I must also precise that os-compute-4 ressource provider has 0 allocation and os-compute-4.openstack.local only 3 (there?s at least 50 VMs running on it?). Moreover, for these 3 allocations, the server uuid doesn't correspond to any existing VMs. Overall, none of the VMs has a ressource allocation on os-compute-4. We found the command nova-manage placement heal_allocations on the Internet but we can't find it in any container, maybe deprecated ? The cluster is running Ussuri installed with Openstack-ansible. If you have any suggestion, any help would be appreciated. Thanks. :) Best, Marc-Antoine Godde -------------- next part -------------- A non-text attachment was scrubbed... Name: virt.png Type: image/png Size: 859316 bytes Desc: virt.png URL: From ih at imranh.co.uk Wed Jan 19 14:21:14 2022 From: ih at imranh.co.uk (Imran Hussain) Date: Wed, 19 Jan 2022 14:21:14 +0000 Subject: Secure Boot VM issues (libvirt / SMM) | Secure boot requires SMM feature enabled Message-ID: Hi, Deployed Wallaby on Ubuntu 20.04 nodes. Having issues with libvirt XML being incorrect, I need the smm bit () and it isn't being added to the XML. Anyone seen this before? Or any ideas? More info below... Error message: : libvirt.libvirtError: unsupported configuration: Secure boot requires SMM feature enabled Versions: libvirt version: 6.0.0, package: 0ubuntu8.15 QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.18) Nova 23.1.1 (deployed via kolla, so kolla/ubuntu-source-nova-compute:wallaby is the image) ovmf 0~20191122.bd85bf54-2ubuntu3.3 Context: https://specs.openstack.org/openstack/nova-specs/specs/wallaby/implemented/allow-secure-boot-for-qemu-kvm-guests.html Image metadata: hw_firmware_type: uefi hw_machine_type: q35 os_secure_boot: required os_hidden: false hw_disk_bus: scsi hw_qemu_guest_agent: yes hw_scsi_model: virtio-scsi hw_video_model: virtio os_require_quiesce: yes os_secure_boot: required os_hidden: false XML snippets taken from nova-compute.log: OpenStack Foundation OpenStack Nova 23.1.1 2798e3fe-ffae-4c26-955b-ef150b849561 2798e3fe-ffae-4c26-955b-ef150b849561 Virtual Machine hvm /usr/share/OVMF/OVMF_CODE.ms.fd Other info: # cat /usr/share/qemu/firmware/40-edk2-x86_64-secure-enrolled.json { "description": "UEFI firmware for x86_64, with Secure Boot and SMM, SB enabled, MS certs enrolled", "interface-types": [ "uefi" ], "mapping": { "device": "flash", "executable": { "filename": "/usr/share/OVMF/OVMF_CODE.ms.fd", "format": "raw" }, "nvram-template": { "filename": "/usr/share/OVMF/OVMF_VARS.ms.fd", "format": "raw" } }, "targets": [ { "architecture": "x86_64", "machines": [ "pc-q35-*" ] } ], "features": [ "acpi-s3", "amd-sev", "enrolled-keys", "requires-smm", "secure-boot", "verbose-dynamic" ], "tags": [ ] } From rosmaita.fossdev at gmail.com Wed Jan 19 16:54:59 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 19 Jan 2022 11:54:59 -0500 Subject: [cinder] festival of XS reviews 21 January 2022 Message-ID: Hello Argonauts, This is a reminder that the most recent edition of the Cinder Festival of XS Reviews will be held at the end of this week on Friday 21 January. who: Everyone! what: The Cinder Festival of XS Reviews when: Friday 21 January 2022 from 1400-1600 UTC where: https://meetpad.opendev.org/cinder-festival-of-reviews etherpad: https://etherpad.opendev.org/p/cinder-festival-of-reviews This recurring meeting can be placed on your calendar by using this handy ICS file: http://eavesdrop.openstack.org/calendars/cinder-festival-of-reviews.ics See you there! brian From jp.methot at planethoster.info Wed Jan 19 17:04:39 2022 From: jp.methot at planethoster.info (J-P Methot) Date: Wed, 19 Jan 2022 12:04:39 -0500 Subject: [kolla] Calling Kolla-ansible in an ansible playbook In-Reply-To: References: Message-ID: <6538b7fd-5251-ff27-76c8-6c5aa2f4c703@planethoster.info> Thank you. To be honest I'm only testing options right now, in part to satisfy our curiosity. I will keep your recommendation in mind though. On 1/19/22 4:37 AM, Mark Goddard wrote: > Use the shell module, then activate the virualenv first before running > kolla-ansible. > > However, I'd argue against running Ansible in Ansible. It can be had > enough to read the output when troubleshooting, without nesting it. > Kayobe intentionally avoided this, instead adding a python CLI that > calls out to kayobe and kolla-ansible. > > Mark > > On Tue, 18 Jan 2022 at 21:32, J-P Methot wrote: >> Hi, >> >> I've been trying to write an ansible-playbook to run some kolla tasks >> and I've been wondering if this is even possible? I have kolla running >> inside a virtual environment, so of course I'm running into issues >> having ansible running it. The current task looks like this: >> >> - name: run kolla-ansible upgrade on controllers >> command: "{{ venv }}/bin/kolla-ansible --limit {{ item }} -i >> multinode upgrade" >> loop: >> - control >> - monitoring >> - storage >> >> When run, it's complaining that ansible isn't installed in the current >> virtual environment. I was under the impression that specifying the >> whole path to my executable would have it automatically run in that >> virtual environment, but visibly this excludes access to ansible that's >> installed there. >> >> So, basically, is what I'm trying to do even possible? And if so, what >> would be the best way to do it? >> >> -- >> Jean-Philippe M?thot >> Senior Openstack system administrator >> Administrateur syst?me Openstack s?nior >> PlanetHoster inc. >> >> -- Jean-Philippe M?thot Senior Openstack system administrator Administrateur syst?me Openstack s?nior PlanetHoster inc. From rosmaita.fossdev at gmail.com Wed Jan 19 17:04:46 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 19 Jan 2022 12:04:46 -0500 Subject: [cinder] yoga R-9 virtual midcycle on 26 January Message-ID: <3a8e2d48-743f-5083-a362-de9e9e2f06a5@gmail.com> As decided at today's weekly meeting, the Cinder Yoga R-9 virtual midcycle will be held: DATE: Wednesday 26 January 2022 TIME: 1400-1600 UTC LOCATION: https://bluejeans.com/3228528973 The meeting will be recorded. Please add topics to the midcycle etherpad: https://etherpad.opendev.org/p/cinder-yoga-midcycles cheers, brian From smooney at redhat.com Wed Jan 19 17:31:51 2022 From: smooney at redhat.com (Sean Mooney) Date: Wed, 19 Jan 2022 17:31:51 +0000 Subject: Secure Boot VM issues (libvirt / SMM) | Secure boot requires SMM feature enabled In-Reply-To: References: Message-ID: On Wed, 2022-01-19 at 14:21 +0000, Imran Hussain wrote: > Hi, > > Deployed Wallaby on Ubuntu 20.04 nodes. Having issues with libvirt XML > being incorrect, I need the smm bit () and it isn't > being added to the XML. Anyone seen this before? Or any ideas? More info > below... > > Error message: > : libvirt.libvirtError: unsupported configuration: Secure boot requires > SMM feature enabled > > Versions: > libvirt version: 6.0.0, package: 0ubuntu8.15 > QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.18) > Nova 23.1.1 (deployed via kolla, so > kolla/ubuntu-source-nova-compute:wallaby is the image) > ovmf 0~20191122.bd85bf54-2ubuntu3.3 > > Context: > https://specs.openstack.org/openstack/nova-specs/specs/wallaby/implemented/allow-secure-boot-for-qemu-kvm-guests.html > > Image metadata: > > hw_firmware_type: uefi > hw_machine_type: q35 > os_secure_boot: required ok those d seam to be allinged with the documentaiton https://docs.openstack.org/nova/latest/admin/secure-boot.html how in addtion to those option the uefi firmware image used but qemu which is provide by the ovmf package also need to provide a secure boot capable image waht failing here is the system manamgemt mode feature. when os_secure_boot is set we defien the "secure" attibute on the loader element. https://github.com/openstack/nova/blob/7aa3a0f558ddbcac3cb97a7eef58cd878acc3f7a/nova/virt/libvirt/config.py#L2871-L2873 based on the https://libvirt.org/formatdomain.html#hypervisor-features smm should be enabled by default smm Depending on the state attribute (values on, off, default on) enable or disable System Management Mode. Since 2.1.0 Optional sub-element tseg can be used to specify the amount of memory dedicated to SMM's extended TSEG. That offers a fourth option size apart from the existing ones (1 MiB, 2 MiB and 8 MiB) that the guest OS (or rather loader) can choose from. The size can be specified as a value of that element, optional attribute unit can be used to specify the unit of the aforementioned value (defaults to 'MiB'). If set to 0 the extended size is not advertised and only the default ones (see above) are available. If the VM is booting you should leave this option alone, unless you are very certain you know what you are doing. This value is configurable due to the fact that the calculation cannot be done right with the guarantee that it will work correctly. In QEMU, the user-configurable extended TSEG feature was unavailable up to and including pc-q35-2.9. Starting with pc-q35-2.10 the feature is available, with default size 16 MiB. That should suffice for up to roughly 272 vCPUs, 5 GiB guest RAM in total, no hotplug memory range, and 32 GiB of 64-bit PCI MMIO aperture. Or for 48 vCPUs, with 1TB of guest RAM, no hotplug DIMM range, and 32GB of 64-bit PCI MMIO aperture. The values may also vary based on the loader the VM is using. Additional size might be needed for significantly higher vCPU counts or increased address space (that can be memory, maxMemory, 64-bit PCI MMIO aperture size; roughly 8 MiB of TSEG per 1 TiB of address space) which can also be rounded up. Due to the nature of this setting being similar to "how much RAM should the guest have" users are advised to either consult the documentation of the guest OS or loader (if there is any), or test this by trial-and-error changing the value until the VM boots successfully. Yet another guiding value for users might be the fact that 48 MiB should be enough for pretty large guests (240 vCPUs and 4TB guest RAM), but it is on purpose not set as default as 48 MiB of unavailable RAM might be too much for small guests (e.g. with 512 MiB of RAM). See Memory Allocation for more details about the unit attribute. Since 4.5.0 (QEMU only) so my guess is you are missing the secure boot capable ovmf image on the host or there is a bug in your libvirt and smm is not being enabled by default. > os_hidden: false > > hw_disk_bus: scsi > hw_qemu_guest_agent: yes > hw_scsi_model: virtio-scsi > hw_video_model: virtio > os_require_quiesce: yes > os_secure_boot: required > os_hidden: false > > XML snippets taken from nova-compute.log: > > > OpenStack Foundation > OpenStack Nova > 23.1.1 > 2798e3fe-ffae-4c26-955b-ef150b849561 > 2798e3fe-ffae-4c26-955b-ef150b849561 > Virtual Machine > > > > hvm > secure="yes">/usr/share/OVMF/OVMF_CODE.ms.fd > > > > > > > > > > Other info: > # cat /usr/share/qemu/firmware/40-edk2-x86_64-secure-enrolled.json > { > "description": "UEFI firmware for x86_64, with Secure Boot and SMM, > SB enabled, MS certs enrolled", > "interface-types": [ > "uefi" > ], > "mapping": { > "device": "flash", > "executable": { > "filename": "/usr/share/OVMF/OVMF_CODE.ms.fd", > "format": "raw" > }, > "nvram-template": { > "filename": "/usr/share/OVMF/OVMF_VARS.ms.fd", > "format": "raw" > } > }, > "targets": [ > { > "architecture": "x86_64", > "machines": [ > "pc-q35-*" > ] > } > ], > "features": [ > "acpi-s3", > "amd-sev", > "enrolled-keys", > "requires-smm", > "secure-boot", > "verbose-dynamic" > ], > "tags": [ > > ] > } > > > From franck.vedel at univ-grenoble-alpes.fr Wed Jan 19 17:46:24 2022 From: franck.vedel at univ-grenoble-alpes.fr (Franck VEDEL) Date: Wed, 19 Jan 2022 18:46:24 +0100 Subject: [kolla-anbsible][centos-stream][novnc] Message-ID: <40524444-1DA5-4D61-A45A-6844398FA61B@univ-grenoble-alpes.fr> Hello, what is the solution to adjust the size of novnc consoles. I found for ubuntu (https://docs.openstack.org/nova/rocky/admin/remote-console-access.html) but not for centos-stream which I use with kolla-ansible. thanks in advance Franck -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnaud.morin at gmail.com Wed Jan 19 17:48:34 2022 From: arnaud.morin at gmail.com (Arnaud) Date: Wed, 19 Jan 2022 18:48:34 +0100 Subject: [ops] [kolla] RabbitMQ High Availability In-Reply-To: References: <5dd6d28f-9955-7ca5-0ab8-0c5ce11ceb7e@redhat.com> <14084e87df22458caa7668eea8b496b6@verisign.com> <1147779219.1274196.1639086048233@mail.yahoo.com> <986294621.1553814.1639155002132@mail.yahoo.com> <169252651.2819859.1639516226823@mail.yahoo.com> <1335760337.3548170.1639680236968@mail.yahoo.com> <33441648.1434581.1641304881681@mail.yahoo.com> <385929635.1929303.1642011977053@mail.yahoo.com> <326590098.315301.1642089266574@mail.yahoo.com> <2058295726.372026.1642097389964@mail.yahoo.com> <2077696744.396726.1642099801140@mail.yahoo.com> Message-ID: <7E57792E-45C7-4F53-A658-9D2F6AE3A636@gmail.com> Hey Doug, That's nice piece of information! Would it be possible for you to update the wiki at [1] with your last data if you think this could be relevant? Cheers, Arnaud [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit Le 17 janvier 2022 18:01:57 GMT+01:00, Doug Szumski a ?crit?: > >On 17/01/2022 09:21, Mark Goddard wrote: >> Drop the double quotes around >> >> On Thu, 13 Jan 2022 at 18:55, Albert Braden wrote: >>> After reading more I realize that "expires" is also set in ms. So it looks like the correct settings are: >>> >>> message-ttl: 60000 >>> expires: 120000 >>> >>> This would expire messages in 10 minutes and queues in 20 minutes. >>> >>> The only remaining question is, how can I specify these in a variable without generating the "not a valid message TTL" error? >>> On Thursday, January 13, 2022, 01:22:33 PM EST, Albert Braden wrote: >>> >>> >>> Update: I googled around and found this: https://tickets.puppetlabs.com/browse/MODULES-2986 >>> >>> Apparently the " | int " isn't working. I tried '60000' and "60000" but that didn't make a difference. In desperation I tried this: >>> >>> {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":60000,"expires":1200}, "priority":0}, >>> >>> That works, but I'd prefer to use a variable. Has anyone done this successfully? Also, am I understanding correctly that "message-ttl" is set in milliseconds and "expires" is set in seconds? Or do I need to use ms for "expires" too? >>> On Thursday, January 13, 2022, 11:03:11 AM EST, Albert Braden wrote: >>> >>> >>> After digging further I realized that I'm not setting TTL; only queue expiration. Here's what I see in the GUI when I look at affected queues: >>> >>> Policy notifications-expire >>> Effective policy definition expires: 1200 >>> >>> This is what I have in definitions.json.j2: >>> >>> {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"expires":1200}, "priority":0}, >>> >>> I tried this to set both: >>> >>> {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":"{{ rabbitmq_message_ttl | int }}","expires":1200}, "priority":0}, >> Drop the double quotes around the jinja expression. It's not YAML, so >> you don't need them. >> >> Please update the upstream patches with any fixes. >> >>> But the RMQ containers restart every 60 seconds and puke this into the log: >>> >>> [error] <0.322.0> CRASH REPORT Process <0.322.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"600\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 >>> >>> After reading the doc on TTL: https://www.rabbitmq.com/ttl.html I realized that the TTL is set in ms, so I tried "rabbitmq_message_ttl: 60000" >>> >>> but that only changes the number in the error: >>> >>> [error] <0.318.0> CRASH REPORT Process <0.318.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"60000\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 >>> >>> What am I missing? >>> >>> >>> On Wednesday, January 12, 2022, 05:11:41 PM EST, Dale Smith wrote: >>> >>> >>> In the web interface(RabbitMQ 3.8.23, not using Kolla), when looking at the queue you will see the "Policy" listed by name, and "Effective policy definition". >>> >>> You can either view the policy definition, and the arguments for the definitions applied, or "effective policy definition" should show you the list. >>> >>> >>> It may be relevant to note: "Each exchange or queue will have at most one policy matching" - https://www.rabbitmq.com/parameters.html#how-policies-work >>> >>> I've added a similar comment to the linked patchset. >>> >>> >>> On 13/01/22 7:26 am, Albert Braden wrote: >>> >>> This is very helpful. Thank you! It appears that I have successfully set the expire time to 1200, because I no longer see unconsumed messages lingering in my queues, but it's not obvious how to verify. In the web interface, when I look at the queues, I see things like policy, state, features and consumers, but I don't see a timeout or expire value, nor do I find the number 1200 anywhere. Where should I be looking in the web interface to verify that I set the expire time correctly? Or do I need to use the CLI? >>> On Wednesday, January 5, 2022, 04:23:29 AM EST, Mark Goddard wrote: >>> >>> >>> On Tue, 4 Jan 2022 at 14:08, Albert Braden wrote: >>>> Now that the holidays are over I'm trying this one again. Can anyone help me figure out how to set "expires" and "message-ttl" ? >>> John Garbutt proposed a few patches for RabbitMQ in kolla, including >>> this: https://review.opendev.org/c/openstack/kolla-ansible/+/822191 >>> >>> https://review.opendev.org/q/hashtag:%2522rabbitmq%2522+(status:open+OR+status:merged)+project:openstack/kolla-ansible >>> >>> Note that they are currently untested. > >I've proposed one more as an alternative to reducing the number of queue >mirrors (disable all mirroring): > >https://review.opendev.org/c/openstack/kolla-ansible/+/824994 > >The reasoning behind it is in the commit message. It's partly justified >by the fact that we quite frequently have to 'reset' RabbitMQ with the >current transient mirrored configuration by removing all state anyway. > >>> >>> Mark >>> >>> >>>> On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden wrote: >>>> >>>> >>>> I tried these policies in ansible/roles/rabbitmq/templates/definitions.json.j2: >>>> >>>> "policies":[ >>>> {"vhost": "/", "name": "ha-all", "pattern": '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, >>>> {"vhost": "/", "name": "notifications-ttl", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"message-ttl":600}, "priority":0} >>>> {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"expire":3600}, "priority":0} >>>> {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} >>>> {% endif %} >>>> >>>> But I still see unconsumed messages lingering in notifications_extractor.info. From reading the docs I think this setting should cause messages to expire after 600 seconds, and unused queues to be deleted after 3600 seconds. What am I missing? >>>> On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden wrote: >>>> >>>> >>>> Following [1] I successfully set "amqp_durable_queues = True" and restricted HA to the appropriate queues, but I'm having trouble with some of the other settings such as "expires" and "message-ttl". Does anyone have an example of a working kolla config that includes these changes? >>>> >>>> [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit >>>> On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud wrote: >>>> >>>> >>>> So, your config snippet LGTM. >>>> >>>> Le ven. 10 d?c. 2021 ? 17:50, Albert Braden a ?crit : >>>> >>>> Sorry, that was a transcription error. I thought "True" and my fingers typed "False." The correct lines are: >>>> >>>> [oslo_messaging_rabbit] >>>> amqp_durable_queues = True >>>> >>>> On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud wrote: >>>> >>>> >>>> If you plan to let `amqp_durable_queues = False` (i.e if you plan to keep this config equal to false), then you don't need to add these config lines as this is already the default value [1]. >>>> >>>> [1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34 >>>> >>>> Le jeu. 9 d?c. 2021 ? 22:40, Albert Braden a ?crit : >>>> >>>> Replying from my home email because I've been asked to not email the list from my work email anymore, until I get permission from upper management. >>>> >>>> I'm not sure I follow. I was planning to add 2 lines to etc/kolla/config/global.conf: >>>> >>>> [oslo_messaging_rabbit] >>>> amqp_durable_queues = False >>>> >>>> Is that not sufficient? What is involved in configuring dedicated control exchanges for each service? What would that look like in the config? >>>> >>>> >>>> From: Herve Beraud >>>> Sent: Thursday, December 9, 2021 2:45 AM >>>> To: Bogdan Dobrelya >>>> Cc: openstack-discuss at lists.openstack.org >>>> Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability >>>> >>>> >>>> >>>> Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. >>>> >>>> >>>> >>>> >>>> >>>> Le mer. 8 d?c. 2021 ? 11:48, Bogdan Dobrelya a ?crit : >>>> >>>> Please see inline >>>> >>>>>> I read this with great interest because we are seeing this issue. Questions: >>>>>> >>>>>> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? >>>>>> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? >>>>>> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into? >>>>> Note that even having rabbit HA policies adjusted like that and its HA >>>>> replication factor [0] decreased (e.g. to a 2), there still might be >>>>> high churn caused by a large enough number of replicated durable RPC >>>>> topic queues. And that might cripple the cloud down with the incurred >>>>> I/O overhead because a durable queue requires all messages in it to be >>>>> persisted to a disk (for all the messaging cluster replicas) before they >>>>> are ack'ed by the broker. >>>>> >>>>> Given that said, Oslo messaging would likely require a more granular >>>>> control for topic exchanges and the durable queues flag - to tell it to >>>>> declare as durable only the most critical paths of a service. A single >>>>> config setting and a single control exchange per a service might be not >>>>> enough. >>>> Also note that therefore, amqp_durable_queue=True requires dedicated >>>> control exchanges configured for each service. Those that use >>>> 'openstack' as a default cannot turn the feature ON. Changing it to a >>>> service specific might also cause upgrade impact, as described in the >>>> topic [3]. >>>> >>>> >>>> >>>> The same is true for `amqp_auto_delete=True`. That requires dedicated control exchanges else it won't work if each service defines its own policy on a shared control exchange (e.g `openstack`) and if policies differ from each other. >>>> >>>> >>>> >>>> [3] https://review.opendev.org/q/topic:scope-config-opts >>>> >>>>> There are also race conditions with durable queues enabled, like [1]. A >>>>> solution could be where each service declare its own dedicated control >>>>> exchange with its own configuration. >>>>> >>>>> Finally, openstack components should add perhaps a *.next CI job to test >>>>> it with durable queues, like [2] >>>>> >>>>> [0] https://www.rabbitmq.com/ha.html#replication-factor >>>>> >>>>> [1] >>>>> https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt >>>>> >>>>> [2] https://review.opendev.org/c/openstack/nova/+/820523 >>>>> >>>>>> Does anyone have a sample set of RMQ config files that they can share? >>>>>> >>>>>> It looks like my Outlook has ruined the link; reposting: >>>>>> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit >>>>> >>>>> -- >>>>> Best regards, >>>>> Bogdan Dobrelya, >>>>> Irc #bogdando >>>> >>>> -- >>>> Best regards, >>>> Bogdan Dobrelya, >>>> Irc #bogdando >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Herv? Beraud >>>> >>>> Senior Software Engineer at Red Hat >>>> >>>> irc: hberaud >>>> >>>> https://github.com/4383/ >>>> >>>> https://twitter.com/4383hberaud >>>> >>>> >>>> >>>> -- >>>> Herv? Beraud >>>> Senior Software Engineer at Red Hat >>>> irc: hberaud >>>> https://github.com/4383/ >>>> https://twitter.com/4383hberaud >>>> >>>> >>>> >>>> -- >>>> Herv? Beraud >>>> Senior Software Engineer at Red Hat >>>> irc: hberaud >>>> https://github.com/4383/ >>>> https://twitter.com/4383hberaud >>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at ya.ru Wed Jan 19 19:33:01 2022 From: noonedeadpunk at ya.ru (Dmitriy Rabotyagov) Date: Wed, 19 Jan 2022 21:33:01 +0200 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: Message-ID: <257831642620613@mail.yandex.ru> An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Wed Jan 19 20:10:43 2022 From: satish.txt at gmail.com (Satish Patel) Date: Wed, 19 Jan 2022 15:10:43 -0500 Subject: Tesla V100 32G GPU with openstack In-Reply-To: References: <7CCE77EA-F7EA-45AF-85B5-3566D1DAB1CB@gmail.com> Message-ID: should i need to create a flavor to target both GPU. is it possible to have single flavor cover both GPU because end users don't understand which flavor to use. On Wed, Jan 19, 2022 at 1:54 AM Massimo Sgaravatto wrote: > > If I am not wrong those are 2 GPUs > > "tesla-v100:1" means 1 GPU > > So e.g. a flavor with "pci_passthrough:alias": "tesla-v100:2"} will be used to create an instance with 2 GPUs > > Cheers, Massimo > > On Tue, Jan 18, 2022 at 11:35 PM Satish Patel wrote: >> >> Thank you for the information. I have a quick question. >> >> [root at gpu01 ~]# lspci | grep -i nv >> 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe >> 32GB] (rev a1) >> d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe >> 32GB] (rev a1) >> >> In the above output showing two cards does that mean they are physical >> two or just BUS representation. >> >> Also i have the following entry in openstack flavor, does :1 means >> first GPU card? >> >> {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"} >> >> >> >> >> >> >> On Tue, Jan 18, 2022 at 5:55 AM Ant?nio Paulo wrote: >> > >> > Hey Satish, Gustavo, >> > >> > Just to clarify a bit on point 3, you will have to buy a vGPU license >> > per card and this gives you access to all the downloads you need through >> > NVIDIA's web dashboard -- both the host and guest drivers as well as the >> > license server setup files. >> > >> > Cheers, >> > Ant?nio >> > >> > On 18/01/22 02:46, Satish Patel wrote: >> > > Thank you so much! This is what I was looking for. It is very odd that >> > > we buy a pricey card but then we have to buy a license to make those >> > > features available. >> > > >> > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos >> > > wrote: >> > >> >> > >> Hello, Satish. >> > >> >> > >> I've been working with vGPU lately and I believe I can answer your >> > >> questions: >> > >> >> > >> 1. As you pointed out in question #2, the pci-passthrough will allocate >> > >> the entire physical GPU to one single guest VM, while vGPU allows you to >> > >> spawn from 1 to several VMs using the same physical GPU, depending on >> > >> the vGPU type you choose (check NVIDIA docs to see which vGPU types the >> > >> Tesla V100 supports and their properties); >> > >> 2. Correct; >> > >> 3. To use vGPU, you need vGPU drivers installed on the platform where >> > >> your deployment of OpenStack is running AND in the VMs, so there are two >> > >> drivers to be installed in order to use the feature. I believe both of >> > >> them have to be purchased from NVIDIA in order to be used, and you would >> > >> also have to deploy an NVIDIA licensing server in order to validate the >> > >> licenses of the drivers running in the VMs. >> > >> 4. You can see what the instructions are for each of these scenarios in >> > >> [1] and [2]. >> > >> >> > >> There is also extensive documentation on vGPU at NVIDIA's website [3]. >> > >> >> > >> [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html >> > >> [2] https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html >> > >> [3] https://docs.nvidia.com/grid/13.0/index.html >> > >> >> > >> Regards, >> > >> Gustavo. >> > >> >> > >> On 17/01/2022 14:41, Satish Patel wrote: >> > >>> [Please note: This e-mail is from an EXTERNAL e-mail address] >> > >>> >> > >>> Folk, >> > >>> >> > >>> We have Tesla V100 32G GPU and I?m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question. >> > >>> >> > >>> 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet. >> > >>> 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct? >> > >>> 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here? >> > >>> 3. What are the config difference between configure this card with passthrough vs vGPU? >> > >>> >> > >>> >> > >>> Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ? >> > >>> >> > >>> Sent from my iPhone >> > >>> >> > > >> From nate.johnston at redhat.com Wed Jan 19 20:58:11 2022 From: nate.johnston at redhat.com (Nate Johnston) Date: Wed, 19 Jan 2022 15:58:11 -0500 Subject: [Neutron][drivers] Proposing Oleg Bondarev (obondarev) to the Drivers team In-Reply-To: References: Message-ID: <20220119205811.iy3y6ooabyg2br4r@grind.home> Big +1 from me. Nate On Fri, Jan 14, 2022 at 04:43:55PM +0100, Lajos Katona wrote: > Hi Neutron Drivers, > > I would like to propose Oleg Bondarev to be a member of the Neutron Drivers > team. > He has long experience with Neutron, he has been always around to help with > advice and reviews, and enthusiastically participated in the Drivers > meeting (big +1 as it is on Friday 1400UTC, quite late afternoon in his > timezone :-)). > > Neutron drivers, please vote before the next Drivers meeting (next Friday, > 21. January). > > Best Regards > Lajos Katona (lajoskatona) From satish.txt at gmail.com Wed Jan 19 21:28:28 2022 From: satish.txt at gmail.com (Satish Patel) Date: Wed, 19 Jan 2022 16:28:28 -0500 Subject: Tesla V100 32G GPU with openstack In-Reply-To: References: <7CCE77EA-F7EA-45AF-85B5-3566D1DAB1CB@gmail.com> Message-ID: Hi Massimo, Ignore my last email, my requirement is to have a single VM with a single GPU ("tesla-v100:1") but I would like to create a second VM on the same compute node which uses the second GPU but I am getting the following error when I create a second VM and vm error out. looks like it's not allowing me to create a second vm and bind to a second GPU card. error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error: Hostdev already exists in the domain configuration On Wed, Jan 19, 2022 at 3:10 PM Satish Patel wrote: > > should i need to create a flavor to target both GPU. is it possible to > have single flavor cover both GPU because end users don't understand > which flavor to use. > > On Wed, Jan 19, 2022 at 1:54 AM Massimo Sgaravatto > wrote: > > > > If I am not wrong those are 2 GPUs > > > > "tesla-v100:1" means 1 GPU > > > > So e.g. a flavor with "pci_passthrough:alias": "tesla-v100:2"} will be used to create an instance with 2 GPUs > > > > Cheers, Massimo > > > > On Tue, Jan 18, 2022 at 11:35 PM Satish Patel wrote: > >> > >> Thank you for the information. I have a quick question. > >> > >> [root at gpu01 ~]# lspci | grep -i nv > >> 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe > >> 32GB] (rev a1) > >> d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe > >> 32GB] (rev a1) > >> > >> In the above output showing two cards does that mean they are physical > >> two or just BUS representation. > >> > >> Also i have the following entry in openstack flavor, does :1 means > >> first GPU card? > >> > >> {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"} > >> > >> > >> > >> > >> > >> > >> On Tue, Jan 18, 2022 at 5:55 AM Ant?nio Paulo wrote: > >> > > >> > Hey Satish, Gustavo, > >> > > >> > Just to clarify a bit on point 3, you will have to buy a vGPU license > >> > per card and this gives you access to all the downloads you need through > >> > NVIDIA's web dashboard -- both the host and guest drivers as well as the > >> > license server setup files. > >> > > >> > Cheers, > >> > Ant?nio > >> > > >> > On 18/01/22 02:46, Satish Patel wrote: > >> > > Thank you so much! This is what I was looking for. It is very odd that > >> > > we buy a pricey card but then we have to buy a license to make those > >> > > features available. > >> > > > >> > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos > >> > > wrote: > >> > >> > >> > >> Hello, Satish. > >> > >> > >> > >> I've been working with vGPU lately and I believe I can answer your > >> > >> questions: > >> > >> > >> > >> 1. As you pointed out in question #2, the pci-passthrough will allocate > >> > >> the entire physical GPU to one single guest VM, while vGPU allows you to > >> > >> spawn from 1 to several VMs using the same physical GPU, depending on > >> > >> the vGPU type you choose (check NVIDIA docs to see which vGPU types the > >> > >> Tesla V100 supports and their properties); > >> > >> 2. Correct; > >> > >> 3. To use vGPU, you need vGPU drivers installed on the platform where > >> > >> your deployment of OpenStack is running AND in the VMs, so there are two > >> > >> drivers to be installed in order to use the feature. I believe both of > >> > >> them have to be purchased from NVIDIA in order to be used, and you would > >> > >> also have to deploy an NVIDIA licensing server in order to validate the > >> > >> licenses of the drivers running in the VMs. > >> > >> 4. You can see what the instructions are for each of these scenarios in > >> > >> [1] and [2]. > >> > >> > >> > >> There is also extensive documentation on vGPU at NVIDIA's website [3]. > >> > >> > >> > >> [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html > >> > >> [2] https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html > >> > >> [3] https://docs.nvidia.com/grid/13.0/index.html > >> > >> > >> > >> Regards, > >> > >> Gustavo. > >> > >> > >> > >> On 17/01/2022 14:41, Satish Patel wrote: > >> > >>> [Please note: This e-mail is from an EXTERNAL e-mail address] > >> > >>> > >> > >>> Folk, > >> > >>> > >> > >>> We have Tesla V100 32G GPU and I?m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question. > >> > >>> > >> > >>> 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet. > >> > >>> 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct? > >> > >>> 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here? > >> > >>> 3. What are the config difference between configure this card with passthrough vs vGPU? > >> > >>> > >> > >>> > >> > >>> Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ? > >> > >>> > >> > >>> Sent from my iPhone > >> > >>> > >> > > > >> From cel975 at yahoo.com Wed Jan 19 22:29:35 2022 From: cel975 at yahoo.com (Celinio Fernandes) Date: Wed, 19 Jan 2022 22:29:35 +0000 (UTC) Subject: Cannot ssh/ping instance In-Reply-To: <1103131717.1158277.1642331300997@mail.yahoo.com> References: <869855278.629940.1641716238605.ref@mail.yahoo.com> <869855278.629940.1641716238605@mail.yahoo.com> <5690630.DvuYhMxLoT@p1> <2012398446.869565.1642202606158@mail.yahoo.com> <1103131717.1158277.1642331300997@mail.yahoo.com> Message-ID: <1237801494.398789.1642631375283@mail.yahoo.com> Hi,still trying to reach the external network from inside the VM.I have not set up any DNS server on any of the interfaces (shared and public).Do i need to add one ? On Sunday, January 16, 2022, 02:12:00 PM GMT+1, Celinio Fernandes wrote: Hi, I can ssh into the instance now but I noticed the VM does not have any external network access (internet). Before I dig any deeper into that problem, does anyone know what configuration i need to set up for that ? I already added 2 new security rules to make sure I can access HTTP and HTTPS ports (80 and 443), in vain : Ingress?? IPv4? TCP?? 80 (HTTP)?? 0.0.0.0/0 Ingress?? IPv4? TCP?? 443 (HTTPS)?? 0.0.0.0/0 Thanks. On Saturday, January 15, 2022, 12:29:40 AM GMT+1, Celinio Fernandes wrote: Thanks very much for your help. Before you replied, I tried what you wrote but on the wrong interfaces : enp0s3 and virbr0. I had no idea I needed to add the IP address from the public network's subnet on the br-ex interface. So to ping/ssh the floating IP this is what I did : ip link set dev br-ex up ip link set dev br-ex state up sudo ip addr add 172.24.4.254/24 dev br-ex And then I can finally ping the floating IP : ping 172.24.4.133 And I can also ssh into the VM : ssh cirros at 172.24.4.133 Thanks again :) On Sunday, January 9, 2022, 08:21:18 PM GMT+1, Slawek Kaplonski wrote: Hi, On niedziela, 9 stycznia 2022 09:17:18 CET Celinio Fernandes wrote: > Hi, > I am running Ubuntu Server 20.04 LTS on Virtualbox. I installed OpenStack > (Xena release) through Devstack. Here is the content of my > /opt/stack/devstack/local.conf file : > [[local|localrc]] > ADMIN_PASSWORD=secret > DATABASE_PASSWORD=$ADMIN_PASSWORD > RABBIT_PASSWORD=$ADMIN_PASSWORD > SERVICE_PASSWORD=$ADMIN_PASSWORD > HOST_IP=10.0.2.15 > > > I created an instance through Horizon. The security group contains the > 2 rules needed (one to be able to ping and one to be able to ssh the > instance). I also allocated and associated a floating IP address. And a ssh > key pair. > > Here is the configuration : > openstack server list > ---------------------------------+--------------------------+---------+ > > | ID? | Name | Status | Networks | Image? | Flavor? | > > ---------------------------------+--------------------------+---------+ > > | f5f0fdd5-298b-4fa3-9ee9-e6e4288f4327 | InstanceJanvier | ACTIVE | > | shared=172.24.4.133, 192.168.233.165 | cirros-0.5.2-x86_64-disk | m1.nano > | | > ------------------------------------------------------+ > > > openstack network list : > ------------------------------------------------------+ > > | ID? ? | Name? ? | Subnets? ? ? ? ? ? | > > ------------------------------------------------------+ > > | 96a04799-7fc7-4525-b05c-ad57261aed38 | public? | > | 07ce42db-6f3f-4135-ace7-2fc104ea62a0, 6dba13fc-b10c-48b1-b1b4-e1f1afe25b53 > | | c42638dc-fa56-4644-ad34-295fce4811d2 | shared? | > | a4e2d8cc-02b2-42e2-a525-e0eebbb08980? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > | | ffb8a527-266e-4e96-ad60-f7e9aba8f0c1 | private | > | 42e36677-cf3c-4df4-88a1-8cf79b9d6060, e507e6dd-132a-4249-96b1-83761562dd73 > | | > ------------------------------------------------------+ > > openstack router list : > +--------------------------------------+----------------+--------+------ > > | ID? ? | Name? | Status | State | Project? ? ? ? ? ? ? ? ? ? ? ? ? | > > +--------------------------------------+----------------+--------+------ > > | b9a15051-a532-4c93-95ad-53c057720c62 | Virtual_router | ACTIVE | UP? ? | > | 6556c02dd88f4c45b535c2dbb8ba1a04 | > +--------------------------------------+----------------+--------+------ > > > I cannot ping/ssh neither the fixed IP address or the floating IP address : > ping -c 3 172.24.4.133 > PING 172.24.4.133 (172.24.4.133) 56(84) bytes of data. > --- 172.24.4.133 ping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > ping -c 3 192.168.233.165 > PING 192.168.233.165 (192.168.233.165) 56(84) bytes of data. > --- 192.168.233.165 ping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > Maybe that has something to do with the network namespaces configuration on > Ubuntu. Does anyone know what could go wrong or what is missing ? > Thanks for helping. If You are trying to ping Floating IP directly from the host where devstack is installed (Virtualbox VM in Your case IIUC) then You should first have those floating IP addresses somehow reachable on the host, otherwise traffic is probably going through default gateway so is going outside the VM. If You are using ML2/OVN (default in Devstack) or ML2/OVS You probably have in the openvswitch bridge called br-ex which is used to send external network traffic from the OpenStack networks in Devstack. In such case You can e.g. add some IP address from the public network's subnet on the br-ex interface, like 192.168.233.254/24 - that will tell Your OS to reach that subnet through br- ex, so traffic will be able to go "into" the OVS managed by Neutron. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Jan 19 23:40:21 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 19 Jan 2022 17:40:21 -0600 Subject: [all][tc] Technical Committee next weekly meeting on Jan 20th at 1500 UTC In-Reply-To: <17e691e2e00.111c2bdad727170.2468892074465147644@ghanshyammann.com> References: <17e691e2e00.111c2bdad727170.2468892074465147644@ghanshyammann.com> Message-ID: <17e74b675b1.f81cfa74902732.6685352892817495136@ghanshyammann.com> Hello Everyone, Below is the agenda for Tomorrow's TC IRC meeting schedule at 1500 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting == Agenda for tomorrow's TC meeting == * Roll call * Follow up on past action items * Gate health check ** Fixing Zuul config error in OpenStack *** https://etherpad.opendev.org/p/zuul-config-error-openstack * Z Release Cycle Name ** It is needed for Release Management team to this week's task "Plan the next release cycle schedule" (elodilles) * Z cycle Technical Elections ** Need two volunteers to run the election. * Adjutant need PTLs and maintainers ** http://lists.openstack.org/pipermail/openstack-discuss/2021-October/025555.html * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open -gmann ---- On Mon, 17 Jan 2022 11:38:12 -0600 Ghanshyam Mann wrote ---- > Hello Everyone, > > Technical Committee's next weekly meeting is scheduled for Jan 20th at 1500 UTC. > > If you would like to add topics for discussion, please add them to the below wiki page by > Wednesday, Jan 19th, at 2100 UTC. > > https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting > > -gmann > > From tkajinam at redhat.com Thu Jan 20 00:05:11 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Thu, 20 Jan 2022 09:05:11 +0900 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: <17e73279ebc.da3cd092886796.8771224917629901204@ghanshyammann.com> References: <17e73279ebc.da3cd092886796.8771224917629901204@ghanshyammann.com> Message-ID: Thank you, Ghanshyam, for your inputs. These are helpful to understand the latest plan. So I think our question comes back to the original one. Currently keystone allows any of 1. system-service 2. domain-service 3. project-service 4. system-admin 5. system-member 6. system-reader to validate token but which one is the appropriate one to be used by authtoken middleware ? Considering the purpose of the service role, the service role is appropriate but it's not yet clear which scope should be used (as is pointed out by Mark from the beginning). AFAIK token is not a resource belonging to projects so system scope looks appropriate but what is the main intention is to allow project/domain scope ? By the way, in Puppet OpenStack, we have been using the service"s" project instead of the service project for some reason(which I'm not aware of). So it's helpful for us if we avoid implementing strict limitations to use the service project. On Thu, Jan 20, 2022 at 1:29 AM Ghanshyam Mann wrote: > ---- On Wed, 19 Jan 2022 08:01:00 -0600 Takashi Kajinami < > tkajinam at redhat.com> wrote ---- > > > > On Wed, Jan 19, 2022 at 9:22 PM Mark Goddard wrote: > > On Wed, 19 Jan 2022 at 11:15, Takashi Kajinami > wrote: > > > > > > Hi, > > > > > > > > > (The topic doesn't include puppet but ...) > > > I recently spent some time implementing initial support for SRBAC > > > in Puppet OpenStack. You can find details in the etherpad[1] I created > > > as my working note. It includes some items commonly required by all > toolings > > > in addition to ones specific to puppet. > > > [1] https://etherpad.opendev.org/p/puppet-secure-rbac > > > > Thanks for responding, Takashi - that's useful. > > > > > > > > I expect some of them (especially the configuration parameters) would > be used > > > by TripleO later. > > > > > > > Question: should the role be added with system scope or in the > > > > existing service project? The obvious main use for this is token > > > > validation, which seems to allow system or project scope. > > > > > > I'd add one more question which is; > > > Which roles should be assigned for the service users ? > > > > > > In the project which already implemented SRBAC, system-admin + > system-reader > > > allows any API calls and works like the previous project-admin. > > > > IIUC the direction of travel has changed, and now the intention is > > that system-admin won't have access to project-scoped APIs. > > Yes, as mark mentioned. And that is the key change from prevous direction. > We are isolating the system and project level APIs. system token will be > able > to perform only system level operation and not allowed to do project level > operation. For example: system user will not be allowed to create the > server > in nova. To have a quick view on those (we have not finished yet in nova), > you > can check how it will look like in the below series: > > - > https://review.opendev.org/q/topic:%22bp%252Fpolicy-defaults-refresh-2%22+(status:open%20OR%20status:merged) > > You can see the test cases for all four possible configuration combination > and what all > roles are allowed in which configuration (case 4th is end goal we want to > be for RBAC): > > 1. enforce_scope=False + legacy rule (current default policies) > 2. enforce_scope=False + No legacy rule (enable scope but remove old > policy default) > 3. enforce_scope=True + legacy rule (enable scope with old policy default) > 4. enforce_scope=True + no legacy rule (end goal of new RBAC) > > > > > > > > > For token validations system-reader(or service role) would be enough > but there are > > > some system-admin-only APIs (os-server-external-events API in nova > called by neutron, > > > Create allocation in placement called by nova or neutron) used for > communications > > > between services. > > > > The token validation API has the following default policy: > > > > identity:validate_token: (role:reader and system_scope:all) or > > rule:service_role or rule:token_subject > > > > So system-reader, system-admin or service (any scope) should work. The > > spec suggests that the service role is intended for use by service to > > service APIs, in this case the credentials provided in the > > keystone_authtoken config. I would guess that system scope makes most > > sense here with the service role, although the rule suggests it would > > work with project scope and the service role. > > > > I noticed I ignored implied roles... Thanks for clarifying that. > > I understand and I agree with this. Considering the intention of SRBAC > this would fixbetter with system-scoped, as you earlier mentioned but I'll > defer to the others. > > > Another thigns to note here is, in Yoga cycle we are doing only > system-admin. system-reader, > system-member will be done in phase3 which is for future releases (BB). > > > > If we agree system-admin + system-reader is the right set then I'll > update the default role > > > assignment accordingly. This is important for Puppet OpenStack > because there are implementations > > > in puppet (which is usually called as providers) to manage some > resources like Flavors, > > > and these rely on credentials of service users after trying to look > up user credentials. > > > > I think one of the outcomes of this work is that authentication will > > necessarily become a bit more fine-grained. It might not make sense to > > have the same role assignments for all users. To your example, I would > > say that registering flavors should be done by a different user with > > different permissions than a service user. In kolla-ansible we don't > > really register flavors other than for octavia - this is up to > > operators. > > My main concern was that some service users would require system-admin > butI should have read this part more carefully. > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#phase-2 > > So Assigning the service role (for the proper scope which is asked in > the original thread)is the right way to go. For the provider stuff I'll > look into any available option to replace usage of serviceuser credential > but that's specific to Puppet which we can ignore here in this discussion. > > right, once we have service role implemented then we will have clear way > on how services will be > communicating to other services APIs. > > -gmann > > > > > > > > > Takashi > > > > > > On Wed, Jan 19, 2022 at 7:40 PM Mark Goddard > wrote: > > >> > > >> Hi, > > >> > > >> If you haven't been paying close attention, it would be easy to miss > > >> some of the upcoming RBAC changes which will have an impact on > > >> deployment projects. I thought I'd start a thread so that we can > share > > >> how we are approaching this, get answers to open questions, and > > >> ideally all end up with a fairly consistent approach. > > >> > > >> The secure RBAC work has a long history, and continues to evolve. > > >> According to [1], we should start to see some fairly substantial > > >> changes over the next few releases. That spec is fairly long, but > > >> worth a read. > > >> > > >> In the yoga timeline [2], there is one change in particular that has > > >> an impact on deployment projects, "3. Keystone enforces scope by > > >> default". After this change, all of the deprecated policies that many > > >> still rely on in Keystone will be removed. > > >> > > >> In kolla-ansible, we have an etherpad [5] with some notes, questions > > >> and half-baked plans. We made some changes in Xena [3] to use system > > >> scope in some places when interacting with system APIs in Ansible > > >> tasks. > > >> > > >> The next change we have staged is to add the service role to all > > >> service users [4], in preparation for [2]. > > >> > > >> Question: should the role be added with system scope or in the > > >> existing service project? The obvious main use for this is token > > >> validation, which seems to allow system or project scope. > > >> > > >> We anticipate that some service users may still require some > > >> project-scoped roles, e.g. when creating resources for octavia. We'll > > >> deal with those on a case by case basis. > > >> > > >> In anticipation of keystone setting enforce_scope=True and removing > > >> old default policies (which I assume effectively removes > > >> enforce_new_defaults?), we will set this in kolla-ansible, and try to > > >> deal with any fallout. Hopefully the previous work will make this > > >> minimal. > > >> > > >> How does that line up with other projects' approaches? What have we > missed? > > >> > > >> Mark > > >> > > >> [1] > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > >> [2] > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > >> [3] > https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > >> [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > >> [5] > https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > >> > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jan 20 01:38:48 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 19 Jan 2022 19:38:48 -0600 Subject: [all] Nomination open for OpenStack "Z" Release Naming In-Reply-To: <17e49d142da.1232180f6369644.468276914290122896@ghanshyammann.com> References: <17e49d142da.1232180f6369644.468276914290122896@ghanshyammann.com> Message-ID: <17e7522e7a3.d804cf84903822.6365624460716990044@ghanshyammann.com> ---- On Tue, 11 Jan 2022 09:45:57 -0600 Ghanshyam Mann wrote ---- > Hello Everyone, > > We are now starting the process for the OpenStack 'Z' release name. We are a little late > to start it, sorry for that. I have proposed to close the nomination on 24th Jan[1]. I am hoping that is > enough time to collect the names, If not please reply to this thread or in gerrit review[1]. > > Once the governance patch is merged I will update the final dates of nomination close > and polls here, meanwhile in parallel please start proposing the name on the wiki page. Below is the schedule for the Z release process: Nomination close: 24th Jan 2022 Election start: 25th Jan 2022 Election end: 1st Feb 2022 -gmann > > Criteria: > ====== > - Refer to the below governance page for the naming criteria: > > https://governance.openstack.org/tc/reference/release-naming.html#release-name-criteria > > - Any community members can propose the name to the below wiki page: > > https://wiki.openstack.org/wiki/Release_Naming/Z_Proposals > > We encourage all community members to participate in this process. > > [1] https://review.opendev.org/c/openstack/governance/+/824201 > > -gmann > > > > From zhangbailin at inspur.com Thu Jan 20 02:07:54 2022 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Thu, 20 Jan 2022 02:07:54 +0000 Subject: =?utf-8?B?562U5aSNOiBbY3lib3JnXSBQcm9wb3NpbmcgY29yZSByZXZpZXdlcnM=?= In-Reply-To: References: Message-ID: <216f44447909460495f2f41ebacbe8a9@inspur.com> For the sake of due process, I have two suggestions : 1. Plz provide a link of Eric's full contribution from review.opendev.org. It is a custom to provide such reference. >> https://review.opendev.org/q/owner:eric_xiett%2540163.com+status:merged+project:openstack/cyborg >> https://review.opendev.org/c/openstack/openstack-helm/+/816786 2. For the removal of non active committers, we've adopt a policy of "voluntarily step down" in the past, for any changes please also submit a patch to create a document for a new governance folder for current core reviewers to have a formal vote. >> Where is the "voluntarily step down" policy? I don't seem to see this file, if so we can keep it the same, but what are the benefits of this? For the current Core member maintenance file, you can refer to https://docs.openstack.org/project-team-guide/ptl.html#core-member-maintenance On Mon, Jan 17, 2022, 9:40 AM Brin Zhang(???) > wrote: Hello all, Eric xie has been actively contributing to Cyborg in various areas, adding new features, improving quality, reviewing patches. Despite the relatively short time, he has been one of the most prolific contributors, and brings an enthusiastic and active mindset. I would like to thank and acknowledge him for his steady valuable contributions, and propose him as a core reviewer for Cyborg. Some of the currently listed core reviewers have not been participating for a lengthy period of time. It is proposed that those who have had no contributions for the past 18 months ? i.e. no participation in meetings, no code contributions, not participating in Cyborg open source activities and no reviews ? be removed from the list of core reviewers. -- The Cyborg team recognizes everyone's contributions, but we need to ensure the activity of the core-reviewer list. If you are interested in rejoining the cyborg team, feel free to ping us to restore the core reviewer for you. If no objections are made known by January 24, I will make the changes proposed above.. Regards, Brin Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Jan 20 07:04:45 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 20 Jan 2022 08:04:45 +0100 Subject: Cannot ssh/ping instance In-Reply-To: <1237801494.398789.1642631375283@mail.yahoo.com> References: <869855278.629940.1641716238605.ref@mail.yahoo.com> <1103131717.1158277.1642331300997@mail.yahoo.com> <1237801494.398789.1642631375283@mail.yahoo.com> Message-ID: <21337927.EfDdHjke4D@p1> Hi, In devstack, we are configureing MASQUARADE for the FLOATINT_RANGE to be able to go outside the devstack node. See https://github.com/openstack/devstack/ blob/24b65adc9cedff9c7a8ab412fb39613ef5d4a627/lib/neutron-legacy#L704 for the details. Maybe You need to configure something like that on Your setup? On ?roda, 19 stycznia 2022 23:29:35 CET Celinio Fernandes wrote: > Hi,still trying to reach the external network from inside the VM.I have not > set up any DNS server on any of the interfaces (shared and public).Do i need > to add one ? > > > > > On Sunday, January 16, 2022, 02:12:00 PM GMT+1, Celinio Fernandes > wrote: > > Hi, > I can ssh into the instance now but I noticed the VM does not have any > external network access (internet). Before I dig any deeper into that > problem, does anyone know what configuration i need to set up for that ? I > already added 2 new security rules to make sure I can access HTTP and HTTPS > ports (80 and 443), in vain : Ingress IPv4 TCP 80 (HTTP) 0.0.0.0/0 > Ingress IPv4 TCP 443 (HTTPS) 0.0.0.0/0 > > > Thanks. > > On Saturday, January 15, 2022, 12:29:40 AM GMT+1, Celinio Fernandes > wrote: > > Thanks very much for your help. > Before you replied, I tried what you wrote but on the wrong interfaces : > enp0s3 and virbr0. > I had no idea I needed to add the IP address from the public network's subnet > on the br-ex interface. So to ping/ssh the floating IP this is what I did : > ip link set dev br-ex up > ip link set dev br-ex state up > sudo ip addr add 172.24.4.254/24 dev br-ex > And then I can finally ping the floating IP : > ping 172.24.4.133 > And I can also ssh into the VM : > ssh cirros at 172.24.4.133 > > Thanks again :) > > > > On Sunday, January 9, 2022, 08:21:18 PM GMT+1, Slawek Kaplonski > wrote: > > Hi, > > On niedziela, 9 stycznia 2022 09:17:18 CET Celinio Fernandes wrote: > > Hi, > > I am running Ubuntu Server 20.04 LTS on Virtualbox. I installed OpenStack > > (Xena release) through Devstack. Here is the content of my > > /opt/stack/devstack/local.conf file : > > [[local|localrc]] > > ADMIN_PASSWORD=secret > > DATABASE_PASSWORD=$ADMIN_PASSWORD > > RABBIT_PASSWORD=$ADMIN_PASSWORD > > SERVICE_PASSWORD=$ADMIN_PASSWORD > > HOST_IP=10.0.2.15 > > > > > > I created an instance through Horizon. The security group contains the > > 2 rules needed (one to be able to ping and one to be able to ssh the > > instance). I also allocated and associated a floating IP address. And a ssh > > key pair. > > > > Here is the configuration : > > openstack server list > > ---------------------------------+--------------------------+---------+ > > > > | ID | Name | Status | Networks | Image | Flavor | > > > > ---------------------------------+--------------------------+---------+ > > > > | f5f0fdd5-298b-4fa3-9ee9-e6e4288f4327 | InstanceJanvier | ACTIVE | > > | shared=172.24.4.133, 192.168.233.165 | cirros-0.5.2-x86_64-disk | m1.nano > > > > ------------------------------------------------------+ > > > > > > openstack network list : > > ------------------------------------------------------+ > > > > | ID | Name | Subnets | > > > > ------------------------------------------------------+ > > > > | 96a04799-7fc7-4525-b05c-ad57261aed38 | public | > > | 07ce42db-6f3f-4135-ace7-2fc104ea62a0, > > | 6dba13fc-b10c-48b1-b1b4-e1f1afe25b53 > > | > > | | c42638dc-fa56-4644-ad34-295fce4811d2 | shared | > > | > > | a4e2d8cc-02b2-42e2-a525-e0eebbb08980 > > | > > | | ffb8a527-266e-4e96-ad60-f7e9aba8f0c1 | private | > > | > > | 42e36677-cf3c-4df4-88a1-8cf79b9d6060, > > | e507e6dd-132a-4249-96b1-83761562dd73 > > > > ------------------------------------------------------+ > > > > openstack router list : > > +--------------------------------------+----------------+--------+------ > > > > | ID | Name | Status | State | Project | > > > > +--------------------------------------+----------------+--------+------ > > > > | b9a15051-a532-4c93-95ad-53c057720c62 | Virtual_router | ACTIVE | UP | > > | 6556c02dd88f4c45b535c2dbb8ba1a04 | > > > > +--------------------------------------+----------------+--------+------ > > > > > > I cannot ping/ssh neither the fixed IP address or the floating IP address : > > ping -c 3 172.24.4.133 > > PING 172.24.4.133 (172.24.4.133) 56(84) bytes of data. > > --- 172.24.4.133 ping statistics --- > > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > > > ping -c 3 192.168.233.165 > > PING 192.168.233.165 (192.168.233.165) 56(84) bytes of data. > > --- 192.168.233.165 ping statistics --- > > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > > > Maybe that has something to do with the network namespaces configuration on > > Ubuntu. Does anyone know what could go wrong or what is missing ? > > Thanks for helping. > > If You are trying to ping Floating IP directly from the host where devstack > is installed (Virtualbox VM in Your case IIUC) then You should first have > those floating IP addresses somehow reachable on the host, otherwise traffic > is probably going through default gateway so is going outside the VM. > If You are using ML2/OVN (default in Devstack) or ML2/OVS You probably have > in the openvswitch bridge called br-ex which is used to send external > network traffic from the OpenStack networks in Devstack. In such case You > can e.g. add some IP address from the public network's subnet on the br-ex > interface, like 192.168.233.254/24 - that will tell Your OS to reach that > subnet through br- ex, so traffic will be able to go "into" the OVS managed > by Neutron. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From fpantano at redhat.com Thu Jan 20 07:42:45 2022 From: fpantano at redhat.com (Francesco Pantano) Date: Thu, 20 Jan 2022 08:42:45 +0100 Subject: [manila][cinder][glance][nova] Pop-up team for design and development of a Cephadm DevStack plugin In-Reply-To: References: Message-ID: Hi Victoria, thanks for starting this thread. On Wed, Jan 19, 2022 at 2:03 PM Sean Mooney wrote: > On Wed, 2022-01-19 at 12:04 +0100, Victoria Mart?nez de la Cruz wrote: > > Hi all, > > > > I'm reaching out to you to let you know that we will start the design and > > development of a Cephadm DevStack plugin. > > > > Some of the reasons on why we want to take this approach: > > > > - devstack-plugin-ceph worked for us for a lot of years, but the > > development of it relies on several hacks to adapt to the different Ceph > > versions we use and the different distros we support. This led to a > > monolithic script that sometimes is hard to debug and break our > development > > environments and our CI > > - cephadm is the deployment tool developed and maintained by the Ceph > > community, it allows their users to get specific Ceph versions very > easily > > and enforces good practices for Ceph clusters. From their docs, "Cephadm > > manages the full lifecycle of a Ceph cluster. It starts by bootstrapping > a > > tiny Ceph cluster on a single node (one monitor and one manager) and then > > uses the orchestration interface (?day 2? commands) to expand the cluster > > to include all hosts and to provision all Ceph daemons and services. [0]" > > - OpenStack deployment tools are starting to use cephadm as their way to > > deploy Ceph, so it would be nice to include cephadm in our development > > process to be closer with what is being done in the field > > > > I started the development of this in [1], but it might be better to > change > > devstack-plugin-ceph to do this instead of having a new plugin. This is > > something I would love to discuss in a first meeting. > i would advocate for pivoting devstack-plugin-ceph. > i dont think we have the capsity as a comunity to devleop, maintaine and > debug/support > 2 differnt ways of deploying ceph in our ci system in the long term. > > to me the way devstack-plugin-ceph install cpeh is jsut an implementaion > detail. > its contract is that it will install and configure ceph for use with > openstack. > if you make it use cephadm for that its just and internal detail that > should not > affect the consomes of the plugin provide you maintain the interface to > the devstack pluging > mostly the same. > Starting with pacific the deployment of Ceph is moved from ceph-ansible to cephadm: the implication of this change it's not just on the deployment side but this new component (which interacts with the ceph orchestrator module) is able to maintain the lifecycle of the deployed containers, so I'd say the new approach it's not just an implementation detail but also changes the way some components interact with Ceph. Manila using ganesha, for instance, it's the first component that should start using the orchestrator interface, so I guess it's worth aligning (and extending) the most popular dev installer to support the new way (like other projects already did). > > i would suggest addign a devstack macro initally to choose the backend but > then eventually > once the cephadm appoch is stable just swap the default. > +1 on choosing the backend and plan the switch when the cephadm approach is ready and works for all the openstack components > > > > > Having said this, I propose using the channel #openstack-cephadm in the > > OFTC network to talk about this and set up a first meeting with people > > interested in contributing to this effort. > ack im not sure i will get involed with this but the other option woudl be > to > just use #openstack-qa since that is the chanlle for devstack development. > Either #openstack-qa or a dedicated one works well , maybe #openstack-qa is useful to reach more people who can help / review the relevant changes .. wdyt > > > > > Thanks, > > > > Victoria > > > > [0] https://docs.ceph.com/en/pacific/cephadm/ > > [1] https://github.com/vkmc/devstack-plugin-cephadm > > > -- Francesco Pantano GPG KEY: F41BD75C -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.sgaravatto at gmail.com Thu Jan 20 07:51:54 2022 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Thu, 20 Jan 2022 08:51:54 +0100 Subject: Tesla V100 32G GPU with openstack In-Reply-To: References: <7CCE77EA-F7EA-45AF-85B5-3566D1DAB1CB@gmail.com> Message-ID: Hi Satish I am not able to understand what is wrong with your environment, but I can describe my setting. I have a compute node with 4 Tesla V100S. They have the same vendor-id (10de) and the same product id (13d6) [*] In nova.conf I defined this stuff in the [pci] section: [pci] passthrough_whitelist = {"vendor_id":"10de"} alias={"name":"V100","product_id":"1df6","vendor_id":"10de","device_type":"type-PCI"} I then created a flavor with this property: pci_passthrough:alias='V100:1' Using this flavor I can instantiate 4 VMs: each one can see a single V100 Hope this helps Cheers, Massimo [*] # lspci -nnk -d 10de: 60:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) Subsystem: NVIDIA Corporation Device [10de:13d6] Kernel driver in use: vfio-pci Kernel modules: nouveau 61:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) Subsystem: NVIDIA Corporation Device [10de:13d6] Kernel driver in use: vfio-pci Kernel modules: nouveau da:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) Subsystem: NVIDIA Corporation Device [10de:13d6] Kernel driver in use: vfio-pci Kernel modules: nouveau db:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) Subsystem: NVIDIA Corporation Device [10de:13d6] Kernel driver in use: vfio-pci Kernel modules: nouveau [root at cld-np-gpu-01 ~]# On Wed, Jan 19, 2022 at 10:28 PM Satish Patel wrote: > Hi Massimo, > > Ignore my last email, my requirement is to have a single VM with a > single GPU ("tesla-v100:1") but I would like to create a second VM on > the same compute node which uses the second GPU but I am getting the > following error when I create a second VM and vm error out. looks like > it's not allowing me to create a second vm and bind to a second GPU > card. > > error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error: > Hostdev already exists in the domain configuration > > On Wed, Jan 19, 2022 at 3:10 PM Satish Patel wrote: > > > > should i need to create a flavor to target both GPU. is it possible to > > have single flavor cover both GPU because end users don't understand > > which flavor to use. > > > > On Wed, Jan 19, 2022 at 1:54 AM Massimo Sgaravatto > > wrote: > > > > > > If I am not wrong those are 2 GPUs > > > > > > "tesla-v100:1" means 1 GPU > > > > > > So e.g. a flavor with "pci_passthrough:alias": "tesla-v100:2"} will be > used to create an instance with 2 GPUs > > > > > > Cheers, Massimo > > > > > > On Tue, Jan 18, 2022 at 11:35 PM Satish Patel > wrote: > > >> > > >> Thank you for the information. I have a quick question. > > >> > > >> [root at gpu01 ~]# lspci | grep -i nv > > >> 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe > > >> 32GB] (rev a1) > > >> d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe > > >> 32GB] (rev a1) > > >> > > >> In the above output showing two cards does that mean they are physical > > >> two or just BUS representation. > > >> > > >> Also i have the following entry in openstack flavor, does :1 means > > >> first GPU card? > > >> > > >> {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"} > > >> > > >> > > >> > > >> > > >> > > >> > > >> On Tue, Jan 18, 2022 at 5:55 AM Ant?nio Paulo > wrote: > > >> > > > >> > Hey Satish, Gustavo, > > >> > > > >> > Just to clarify a bit on point 3, you will have to buy a vGPU > license > > >> > per card and this gives you access to all the downloads you need > through > > >> > NVIDIA's web dashboard -- both the host and guest drivers as well > as the > > >> > license server setup files. > > >> > > > >> > Cheers, > > >> > Ant?nio > > >> > > > >> > On 18/01/22 02:46, Satish Patel wrote: > > >> > > Thank you so much! This is what I was looking for. It is very odd > that > > >> > > we buy a pricey card but then we have to buy a license to make > those > > >> > > features available. > > >> > > > > >> > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos > > >> > > wrote: > > >> > >> > > >> > >> Hello, Satish. > > >> > >> > > >> > >> I've been working with vGPU lately and I believe I can answer > your > > >> > >> questions: > > >> > >> > > >> > >> 1. As you pointed out in question #2, the pci-passthrough will > allocate > > >> > >> the entire physical GPU to one single guest VM, while vGPU > allows you to > > >> > >> spawn from 1 to several VMs using the same physical GPU, > depending on > > >> > >> the vGPU type you choose (check NVIDIA docs to see which vGPU > types the > > >> > >> Tesla V100 supports and their properties); > > >> > >> 2. Correct; > > >> > >> 3. To use vGPU, you need vGPU drivers installed on the platform > where > > >> > >> your deployment of OpenStack is running AND in the VMs, so there > are two > > >> > >> drivers to be installed in order to use the feature. I believe > both of > > >> > >> them have to be purchased from NVIDIA in order to be used, and > you would > > >> > >> also have to deploy an NVIDIA licensing server in order to > validate the > > >> > >> licenses of the drivers running in the VMs. > > >> > >> 4. You can see what the instructions are for each of these > scenarios in > > >> > >> [1] and [2]. > > >> > >> > > >> > >> There is also extensive documentation on vGPU at NVIDIA's > website [3]. > > >> > >> > > >> > >> [1] > https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html > > >> > >> [2] > https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html > > >> > >> [3] https://docs.nvidia.com/grid/13.0/index.html > > >> > >> > > >> > >> Regards, > > >> > >> Gustavo. > > >> > >> > > >> > >> On 17/01/2022 14:41, Satish Patel wrote: > > >> > >>> [Please note: This e-mail is from an EXTERNAL e-mail address] > > >> > >>> > > >> > >>> Folk, > > >> > >>> > > >> > >>> We have Tesla V100 32G GPU and I?m trying to configure with > openstack wallaby. This is first time dealing with GPU so I have couple of > question. > > >> > >>> > > >> > >>> 1. What is the difference between passthrough vs vGPU? I did > google but not very clear yet. > > >> > >>> 2. If I configure it passthrough then does it only work with > single VM ? ( I meant whole GPU will get allocate to single VM correct? > > >> > >>> 3. Also some document saying Tesla v100 support vGPU but some > folks saying you need license. I have no idea where to get that license. > What is the deal here? > > >> > >>> 3. What are the config difference between configure this card > with passthrough vs vGPU? > > >> > >>> > > >> > >>> > > >> > >>> Currently I configure it with passthrough based one one article > and I am able to spun up with and I can see nvidia card exposed to vm. (I > used iommu and vfio based driver) so if this card support vGPU then do I > need iommu and vfio or some other driver to make it virtualize ? > > >> > >>> > > >> > >>> Sent from my iPhone > > >> > >>> > > >> > > > > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Thu Jan 20 09:07:35 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 20 Jan 2022 10:07:35 +0100 Subject: Can neutron-fwaas project be revived? In-Reply-To: <17e72a52aee.f17e7143871608.3933408575637218060@ghanshyammann.com> References: <771f9e50a5f0498caecf3cb892902954@inspur.com> <17e6e183905.d179775f807310.479674362228455950@ghanshyammann.com> <20220118175104.a6ppj2kxpijeztz7@yuggoth.org> <17e72a52aee.f17e7143871608.3933408575637218060@ghanshyammann.com> Message-ID: Hi, Neutron team is open to include projects to the stadium group (that was the feeling during the meeting also when we discussed this topic) if there is a stable maintainer team behind the project. So as you mentioned it would be easier to avoid the back and forth movement of fwaas if possible. Lajos Ghanshyam Mann ezt ?rta (id?pont: 2022. jan. 19., Sze, 15:02): > ---- On Wed, 19 Jan 2022 02:23:39 -0600 Lajos Katona < > katonalala at gmail.com> wrote ---- > > Hi, > > Thanks for the advice. > > The intention from the Neutron team was to make it clear that the team > currently has no capacity to help the maintenance of neutron-fwaas, and > can't help to maintain it.If there's easier ways for volunteers to keep it > maintained other than forking it to x/ namespace that would be really > helpful. > > Thanks Lajos, > > Main point here is if it is maintained by current maintainer (inspur team > or other developers) whether neutron team will consider that > to be in added in neutron stadium? > > If yes, then it will be extra work to move to x/ namespace now and then > bring back to openstack/. > If no, then moving to x/ namespace is good option or if maintainer want to > be in openstack then we can discuss about > a separate new project (but that needs more discussion on host much cost > it adds). > > -gmann > > > Lajos Katona (lajoskatona) > > > > Jeremy Stanley ezt ?rta (id?pont: 2022. jan. 18., > K, 18:58): > > On 2022-01-18 10:49:48 -0600 (-0600), Ghanshyam Mann wrote: > > [...] > > > As discussed in project-config change[1], you or neutron folks can > > > propose the retirement now itself (considering there is no one to > > > maintain/release stable/victoria for new bug fixes) and TC will > > > merge it as per process. After that, creating it in x/ namespace > > > will be good to do. > > [...] > > > > Looking at this from a logistical perspective, it's a fair amount of > > churn in code hosting as well as unwelcoming to the new volunteers, > > compared to just leaving the repository where it is now and letting > > them contribute to it there. If the concern is that the Neutron team > > doesn't want to retain responsibility for it while they evaluate the > > conviction of the new maintainers for eventual re-inclusion, then > > the TC would be well within its rights to declare that the > > repository can remain in place while not having it be part of the > > Neutron team's responsibilities. > > > > There are a number of possible solutions, ranging from making a new > > category of provisional deliverable, to creating a lightweight > > project team under the DPL model, to declaring it a pop-up team with > > a TC-owned repository. There are repositories within the OpenStack > > namespace which are not an official part of the OpenStack > > coordinated release, after all. Solutions which don't involve having > > the new work take place somewhere separate, and the work involved in > > making that separate place, which will simply be closed down as > > transient cruft if everything goes as desired. > > -- > > Jeremy Stanley > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Thu Jan 20 09:35:33 2022 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 20 Jan 2022 09:35:33 +0000 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: <17e731cbfcc.e2463d54885770.5926410035050303741@ghanshyammann.com> References: <17e731cbfcc.e2463d54885770.5926410035050303741@ghanshyammann.com> Message-ID: On Wed, 19 Jan 2022 at 16:12, Ghanshyam Mann wrote: > > ---- On Wed, 19 Jan 2022 04:35:53 -0600 Mark Goddard wrote ---- > > Hi, > > > > If you haven't been paying close attention, it would be easy to miss > > some of the upcoming RBAC changes which will have an impact on > > deployment projects. I thought I'd start a thread so that we can share > > how we are approaching this, get answers to open questions, and > > ideally all end up with a fairly consistent approach. > > > > The secure RBAC work has a long history, and continues to evolve. > > According to [1], we should start to see some fairly substantial > > changes over the next few releases. That spec is fairly long, but > > worth a read. > > > > In the yoga timeline [2], there is one change in particular that has > > an impact on deployment projects, "3. Keystone enforces scope by > > default". After this change, all of the deprecated policies that many > > still rely on in Keystone will be removed. > > > > In kolla-ansible, we have an etherpad [5] with some notes, questions > > and half-baked plans. We made some changes in Xena [3] to use system > > scope in some places when interacting with system APIs in Ansible > > tasks. > > > > The next change we have staged is to add the service role to all > > service users [4], in preparation for [2]. > > > > Question: should the role be added with system scope or in the > > existing service project? The obvious main use for this is token > > validation, which seems to allow system or project scope. > > > > We anticipate that some service users may still require some > > project-scoped roles, e.g. when creating resources for octavia. We'll > > deal with those on a case by case basis. > > Service roles are planned for phase2 which is Z release[1]. The Idea here is > service to service communication will happen with 'service' role (which keystone > need to implement yet) and end users will keep using the what ever role > is default (or overridden in policy file) which can be project or system scoped > depends on the APIs. > > So at the end service-service APIs policy default will looks like > > '(role:admin and system:network and project_id:%(project_id)s) or (role:service and project_name:service)' > > Say nova will use that service role to communicate to cinder and cinder policy will pass > as service role is in OR in default policy. > > But let's see how they are going to be and if any challenges when we will implement > it in Z cycle. I'm not 100% on our reasoning for using the service role in yoga (I wasn't in the discussion when we made the switch, although John Garbutt was), although I can provide at least one reason. Currently, we have a bunch of service users doing things like keystone token validation using the admin role in the service project. If we enforce scopes & new defaults in keystone, this will no longer work, due to the default policy: identity:validate_token: (role:reader and system_scope:all) or rule:service_role or rule:token_subject Now we could go and assign system-reader to all these users, but if the end goal is to give them all the service role, and that allows token validation, then to me that seems like a better path. Currently, we're creating the service role during deploy & upgrade, then assigning it to users. Keystone is supposed to create the service role in yoga, so we can eventually drop that part. Does this seem reasonable? Is keystone still on track to create the service role in yoga? > > > > > In anticipation of keystone setting enforce_scope=True and removing > > old default policies (which I assume effectively removes > > enforce_new_defaults?), we will set this in kolla-ansible, and try to > > deal with any fallout. Hopefully the previous work will make this > > minimal. > > > > How does that line up with other projects' approaches? What have we missed? > > Yeah, we want users/deployment projects/horizon etc to use the new policy from > keystone as first and we will see feedback how they are (good, bad, really bad) from > usage perspective. Why we choose keystone is, because new policy are there since > many cycle and ready to use. Other projects needs to work their policy as per new > SRBAC design/direction (for example nova needs to modify their policy before we ask > users to use new policy and work is under progress[2]). > > I think trying in kolla will be good way to know if we can move to keystone's new policy > completely in yoga. We have a scope-enforcing preview patch [1], and it's passing our base set of tests. I have another that triggers all of the jobs. [1] https://review.opendev.org/c/openstack/kolla-ansible/+/825406 > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#z-release-timeline > [2] https://blueprints.launchpad.net/nova/+spec/policy-defaults-refresh-2 > > -gmann > > > > > Mark > > > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > > From mark at stackhpc.com Thu Jan 20 09:44:21 2022 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 20 Jan 2022 09:44:21 +0000 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: <257831642620613@mail.yandex.ru> References: <257831642620613@mail.yandex.ru> Message-ID: On Wed, 19 Jan 2022 at 19:38, Dmitriy Rabotyagov wrote: > > - ??? > > Hi! > > In OSA I've already started topic [1] quite a while ago, that adds service role in addition to admin one for the migration purposes. > It was intended, that admin can be revoked later on if needed. > > So I based on my understanding of the thread, that is exactly the plan for keystone as well. > > > [1] https://review.opendev.org/q/topic:%22osa%252Fservice_tokens%22+(status:open%20OR%20status:merged) Hi Dmitriy, Thanks for the update. I see that those patches are adding the service role. They're also configuring service tokens, which IIUC are unrelated, and allow for long-running operations to outlive the original user's token lifetime. Is it intentional? Mark > > 19.01.2022, 12:43, "Mark Goddard" : > > Hi, > > If you haven't been paying close attention, it would be easy to miss > some of the upcoming RBAC changes which will have an impact on > deployment projects. I thought I'd start a thread so that we can share > how we are approaching this, get answers to open questions, and > ideally all end up with a fairly consistent approach. > > The secure RBAC work has a long history, and continues to evolve. > According to [1], we should start to see some fairly substantial > changes over the next few releases. That spec is fairly long, but > worth a read. > > In the yoga timeline [2], there is one change in particular that has > an impact on deployment projects, "3. Keystone enforces scope by > default". After this change, all of the deprecated policies that many > still rely on in Keystone will be removed. > > In kolla-ansible, we have an etherpad [5] with some notes, questions > and half-baked plans. We made some changes in Xena [3] to use system > scope in some places when interacting with system APIs in Ansible > tasks. > > The next change we have staged is to add the service role to all > service users [4], in preparation for [2]. > > Question: should the role be added with system scope or in the > existing service project? The obvious main use for this is token > validation, which seems to allow system or project scope. > > We anticipate that some service users may still require some > project-scoped roles, e.g. when creating resources for octavia. We'll > deal with those on a case by case basis. > > In anticipation of keystone setting enforce_scope=True and removing > old default policies (which I assume effectively removes > enforce_new_defaults?), we will set this in kolla-ansible, and try to > deal with any fallout. Hopefully the previous work will make this > minimal. > > How does that line up with other projects' approaches? What have we missed? > > Mark > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > > > -- > Kind Regards, > Dmitriy Rabotyagov > From chkumar at redhat.com Thu Jan 20 09:49:45 2022 From: chkumar at redhat.com (Chandan Kumar) Date: Thu, 20 Jan 2022 15:19:45 +0530 Subject: [Infra]CentOS Stream 8 based jobs are hitting RETRY or RETRY_LIMIT Message-ID: Hello, Currently CentOS Stream 8 based jobs are hitting RETRY or RETRY_LIMIT. https://zuul.opendev.org/t/openstack/builds?result=RETRY&result=RETRY_LIMIT&skip=0 Based on the logs, all are hitting with : ``` rrors during downloading metadata for repository 'appstream': 2022-01-20 09:34:18.119377 | primary | - Status code: 404 for https://mirror-int.iad.rax.opendev.org/centos/8-stream/AppStream/x86_64/os/repodata/09255ba7c10e01afeb0d343667190f9c3e42d0a6099f887619abcb92ea0378db-filelists.xml.gz (IP: 10.208.224.52) 2022-01-20 09:34:18.119443 | primary | - Status code: 404 for https://mirror-int.iad.rax.opendev.org/centos/8-stream/AppStream/x86_64/os/repodata/678056e5b64153ca221196673208730234dd72f03397a3ab2d30fea01392bd87-primary.xml.gz (IP: 10.208.224.52) ``` While taking a look at https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/files/centos-mirror-update#L42 ``` 404 for http://mirror.dal10.us.leaseweb.net/centos/8-stream/AppStream/x86_64/os/re podata/09255ba7c10e01afeb0d343667190f9c3e42d0a6099f887619abcb92ea0378db-filelists.xml.gz (IP: 209.58.153.1) ``` It seems the centos mirror have issues. https://review.opendev.org/c/opendev/system-config/+/825446 make the switch to facebook mirror. It might fix the issue. Thanks Alfredo for debugging it. Thanks, Chandan Kumar From pierre at stackhpc.com Thu Jan 20 11:24:17 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 20 Jan 2022 12:24:17 +0100 Subject: [all][infra][kayobe][kolla] ping not permitted on latest centos-8-stream images In-Reply-To: <20220114155231.zkqlje3bsj7st6dv@yuggoth.org> References: <20220114145054.6bcnmu2335jepbvq@yuggoth.org> <192ecffc-4919-4576-805a-927f9fbd60f5@www.fastmail.com> <20220114155231.zkqlje3bsj7st6dv@yuggoth.org> Message-ID: On Fri, 14 Jan 2022 at 16:57, Jeremy Stanley wrote: > > On 2022-01-14 07:35:05 -0800 (-0800), Clark Boylan wrote: > [...] > > I don't think we should update DIB or our images to fix this. The > > distro is broken and our images accurately represent that state. > > If the software in CI fails as a result that is because our CI > > system is properly catching this problem. The software needs to > > work around this to ensure that it is deployable in the real world > > and not just on our systems. > > > > This approach of fixing it in the software itself appears to be > > the one TripleO took and is the correct approach. > > Thanks, in reflection I agree. It's good to keep reminding ourselves > that what we're testing is that the software works on the target > platform. Unfortunate and temporary as it may be, the current state > of CentOS Stream 8 is that you need root privileges in order to use > the ping utility. If we work around this in our testing, then users > who are trying to deploy that software onto the current state of > CentOS Stream 8 will not get the benefit of the workaround. > > It's good to be reminded that the goal is not to make tests pass no > matter the cost, it's to make sure the software will work for its > users. > -- > Jeremy Stanley We have applied the workaround in Kolla Ansible and backported it to stable branches. A fixed systemd package is hopefully coming to CentOS Stream 8 soon, as it was imported in Git yesterday: https://git.centos.org/rpms/systemd/c/3d3dc89fb25868e8038ecac8d5aef0603bdfaaa2?branch=c8s From alex.kavanagh at canonical.com Thu Jan 20 12:25:02 2022 From: alex.kavanagh at canonical.com (Alex Kavanagh) Date: Thu, 20 Jan 2022 12:25:02 +0000 Subject: [charms] Welcome Hemanth Nakkina to the charms core Message-ID: Hello Just a quick note to let you know that Hemanth has joined the core charms team, so a huge welcome to him! Thanks for joining the team. Hemanth has been a trusted and helpful reviewer and contributor to the charms, and received many +1s and comments on his proposal. All the best Alex. -- Alex Kavanagh - Software Engineer OpenStack Engineering - Data Centre Development - Canonical Ltd -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Thu Jan 20 12:40:56 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Thu, 20 Jan 2022 09:40:56 -0300 Subject: [manila][cinder][glance][nova] Pop-up team for design and development of a Cephadm DevStack plugin In-Reply-To: References: Message-ID: Sounds good. I wanna join . I haven't tried cephadm yet but It would help us to make ceph new features more transparent to Cinder in the future. Thanks On Thu, Jan 20, 2022 at 4:45 AM Francesco Pantano wrote: > Hi Victoria, > thanks for starting this thread. > > On Wed, Jan 19, 2022 at 2:03 PM Sean Mooney wrote: > >> On Wed, 2022-01-19 at 12:04 +0100, Victoria Mart?nez de la Cruz wrote: >> > Hi all, >> > >> > I'm reaching out to you to let you know that we will start the design >> and >> > development of a Cephadm DevStack plugin. >> > >> > Some of the reasons on why we want to take this approach: >> > >> > - devstack-plugin-ceph worked for us for a lot of years, but the >> > development of it relies on several hacks to adapt to the different Ceph >> > versions we use and the different distros we support. This led to a >> > monolithic script that sometimes is hard to debug and break our >> development >> > environments and our CI >> > - cephadm is the deployment tool developed and maintained by the Ceph >> > community, it allows their users to get specific Ceph versions very >> easily >> > and enforces good practices for Ceph clusters. From their docs, "Cephadm >> > manages the full lifecycle of a Ceph cluster. It starts by >> bootstrapping a >> > tiny Ceph cluster on a single node (one monitor and one manager) and >> then >> > uses the orchestration interface (?day 2? commands) to expand the >> cluster >> > to include all hosts and to provision all Ceph daemons and services. >> [0]" >> > - OpenStack deployment tools are starting to use cephadm as their way to >> > deploy Ceph, so it would be nice to include cephadm in our development >> > process to be closer with what is being done in the field >> > >> > I started the development of this in [1], but it might be better to >> change >> > devstack-plugin-ceph to do this instead of having a new plugin. This is >> > something I would love to discuss in a first meeting. >> i would advocate for pivoting devstack-plugin-ceph. >> i dont think we have the capsity as a comunity to devleop, maintaine and >> debug/support >> 2 differnt ways of deploying ceph in our ci system in the long term. >> >> to me the way devstack-plugin-ceph install cpeh is jsut an implementaion >> detail. >> its contract is that it will install and configure ceph for use with >> openstack. >> if you make it use cephadm for that its just and internal detail that >> should not >> affect the consomes of the plugin provide you maintain the interface to >> the devstack pluging >> mostly the same. >> > Starting with pacific the deployment of Ceph is moved from ceph-ansible to > cephadm: the implication of this change it's not just > on the deployment side but this new component (which interacts with the > ceph orchestrator module) is able to maintain the lifecycle > of the deployed containers, so I'd say the new approach it's not just an > implementation detail but also changes the way some components > interact with Ceph. > Manila using ganesha, for instance, it's the first component that should > start using the orchestrator interface, so I guess it's worth > aligning (and extending) the most popular dev installer to support the new > way (like other projects already did). > > > >> >> i would suggest addign a devstack macro initally to choose the backend >> but then eventually >> once the cephadm appoch is stable just swap the default. >> > +1 on choosing the backend and plan the switch when the cephadm approach > is ready and works for all the openstack components > >> >> > >> > Having said this, I propose using the channel #openstack-cephadm in the >> > OFTC network to talk about this and set up a first meeting with people >> > interested in contributing to this effort. >> ack im not sure i will get involed with this but the other option woudl >> be to >> just use #openstack-qa since that is the chanlle for devstack development. >> > > Either #openstack-qa or a dedicated one works well , maybe #openstack-qa > is useful to reach more people > who can help / review the relevant changes .. wdyt > >> >> > >> > Thanks, >> > >> > Victoria >> > >> > [0] https://docs.ceph.com/en/pacific/cephadm/ >> > [1] https://github.com/vkmc/devstack-plugin-cephadm >> >> >> > > -- > Francesco Pantano > GPG KEY: F41BD75C > -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Thu Jan 20 12:55:01 2022 From: satish.txt at gmail.com (Satish Patel) Date: Thu, 20 Jan 2022 07:55:01 -0500 Subject: Tesla V100 32G GPU with openstack In-Reply-To: References: Message-ID: Thank you! That is what I?m also trying to do to give each gpu card to each vm. I do have exact same setting in my nova.conf. What version of libvirt are you running? Did you install any special nvidia driver etc on your compute node for passthrough (I doubt because it straightforward). Do you have any NUMA setting in your flavor or compute? Sent from my iPhone > On Jan 20, 2022, at 2:52 AM, Massimo Sgaravatto wrote: > > ? > Hi Satish > > I am not able to understand what is wrong with your environment, but I can describe my setting. > > I have a compute node with 4 Tesla V100S. > They have the same vendor-id (10de) and the same product id (13d6) [*] > In nova.conf I defined this stuff in the [pci] section: > > [pci] > passthrough_whitelist = {"vendor_id":"10de"} > alias={"name":"V100","product_id":"1df6","vendor_id":"10de","device_type":"type-PCI"} > > > I then created a flavor with this property: > > pci_passthrough:alias='V100:1' > > Using this flavor I can instantiate 4 VMs: each one can see a single V100 > > Hope this helps > > Cheers, Massimo > > > [*] > # lspci -nnk -d 10de: > 60:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) > Subsystem: NVIDIA Corporation Device [10de:13d6] > Kernel driver in use: vfio-pci > Kernel modules: nouveau > 61:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) > Subsystem: NVIDIA Corporation Device [10de:13d6] > Kernel driver in use: vfio-pci > Kernel modules: nouveau > da:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) > Subsystem: NVIDIA Corporation Device [10de:13d6] > Kernel driver in use: vfio-pci > Kernel modules: nouveau > db:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) > Subsystem: NVIDIA Corporation Device [10de:13d6] > Kernel driver in use: vfio-pci > Kernel modules: nouveau > [root at cld-np-gpu-01 ~]# > > >> On Wed, Jan 19, 2022 at 10:28 PM Satish Patel wrote: >> Hi Massimo, >> >> Ignore my last email, my requirement is to have a single VM with a >> single GPU ("tesla-v100:1") but I would like to create a second VM on >> the same compute node which uses the second GPU but I am getting the >> following error when I create a second VM and vm error out. looks like >> it's not allowing me to create a second vm and bind to a second GPU >> card. >> >> error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error: >> Hostdev already exists in the domain configuration >> >> On Wed, Jan 19, 2022 at 3:10 PM Satish Patel wrote: >> > >> > should i need to create a flavor to target both GPU. is it possible to >> > have single flavor cover both GPU because end users don't understand >> > which flavor to use. >> > >> > On Wed, Jan 19, 2022 at 1:54 AM Massimo Sgaravatto >> > wrote: >> > > >> > > If I am not wrong those are 2 GPUs >> > > >> > > "tesla-v100:1" means 1 GPU >> > > >> > > So e.g. a flavor with "pci_passthrough:alias": "tesla-v100:2"} will be used to create an instance with 2 GPUs >> > > >> > > Cheers, Massimo >> > > >> > > On Tue, Jan 18, 2022 at 11:35 PM Satish Patel wrote: >> > >> >> > >> Thank you for the information. I have a quick question. >> > >> >> > >> [root at gpu01 ~]# lspci | grep -i nv >> > >> 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe >> > >> 32GB] (rev a1) >> > >> d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe >> > >> 32GB] (rev a1) >> > >> >> > >> In the above output showing two cards does that mean they are physical >> > >> two or just BUS representation. >> > >> >> > >> Also i have the following entry in openstack flavor, does :1 means >> > >> first GPU card? >> > >> >> > >> {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"} >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> On Tue, Jan 18, 2022 at 5:55 AM Ant?nio Paulo wrote: >> > >> > >> > >> > Hey Satish, Gustavo, >> > >> > >> > >> > Just to clarify a bit on point 3, you will have to buy a vGPU license >> > >> > per card and this gives you access to all the downloads you need through >> > >> > NVIDIA's web dashboard -- both the host and guest drivers as well as the >> > >> > license server setup files. >> > >> > >> > >> > Cheers, >> > >> > Ant?nio >> > >> > >> > >> > On 18/01/22 02:46, Satish Patel wrote: >> > >> > > Thank you so much! This is what I was looking for. It is very odd that >> > >> > > we buy a pricey card but then we have to buy a license to make those >> > >> > > features available. >> > >> > > >> > >> > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos >> > >> > > wrote: >> > >> > >> >> > >> > >> Hello, Satish. >> > >> > >> >> > >> > >> I've been working with vGPU lately and I believe I can answer your >> > >> > >> questions: >> > >> > >> >> > >> > >> 1. As you pointed out in question #2, the pci-passthrough will allocate >> > >> > >> the entire physical GPU to one single guest VM, while vGPU allows you to >> > >> > >> spawn from 1 to several VMs using the same physical GPU, depending on >> > >> > >> the vGPU type you choose (check NVIDIA docs to see which vGPU types the >> > >> > >> Tesla V100 supports and their properties); >> > >> > >> 2. Correct; >> > >> > >> 3. To use vGPU, you need vGPU drivers installed on the platform where >> > >> > >> your deployment of OpenStack is running AND in the VMs, so there are two >> > >> > >> drivers to be installed in order to use the feature. I believe both of >> > >> > >> them have to be purchased from NVIDIA in order to be used, and you would >> > >> > >> also have to deploy an NVIDIA licensing server in order to validate the >> > >> > >> licenses of the drivers running in the VMs. >> > >> > >> 4. You can see what the instructions are for each of these scenarios in >> > >> > >> [1] and [2]. >> > >> > >> >> > >> > >> There is also extensive documentation on vGPU at NVIDIA's website [3]. >> > >> > >> >> > >> > >> [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html >> > >> > >> [2] https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html >> > >> > >> [3] https://docs.nvidia.com/grid/13.0/index.html >> > >> > >> >> > >> > >> Regards, >> > >> > >> Gustavo. >> > >> > >> >> > >> > >> On 17/01/2022 14:41, Satish Patel wrote: >> > >> > >>> [Please note: This e-mail is from an EXTERNAL e-mail address] >> > >> > >>> >> > >> > >>> Folk, >> > >> > >>> >> > >> > >>> We have Tesla V100 32G GPU and I?m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question. >> > >> > >>> >> > >> > >>> 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet. >> > >> > >>> 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct? >> > >> > >>> 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here? >> > >> > >>> 3. What are the config difference between configure this card with passthrough vs vGPU? >> > >> > >>> >> > >> > >>> >> > >> > >>> Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ? >> > >> > >>> >> > >> > >>> Sent from my iPhone >> > >> > >>> >> > >> > > >> > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Thu Jan 20 13:04:26 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Thu, 20 Jan 2022 10:04:26 -0300 Subject: [outreachy] Stepping down as Outreachy co-organizer In-Reply-To: References: Message-ID: Thank you so much Samuel. You really helped me when I was both and intern and a mentor :P On Mon, Jan 10, 2022 at 5:40 PM Goutham Pacha Ravi wrote: > On Mon, Jan 3, 2022 at 9:58 AM Samuel de Medeiros Queiroz > wrote: > > > > Hi all, > > > > Outreachy is a wonderful program that promotes diversity in open source > communities by giving opportunities to people in underrepresented groups. > > > > This was a hard decision to make, but I have not been committing the > time this project deserves. > > For that reason, I would like to give visibility that I am stepping down > as an Outreachy organizer. > > > > It was a great honor to serve as co-organizer since late 2018, and we > had 19 internships since then. > > I also had the pleasure to serve twice (2016 and 2017) as a mentor. > > Wow - and these internships have had a tremendous impact over these > years! Thank you so much for your service, Samuel! > > > > > Mahati, it was a great pleasure co-organizing Outreachy in this > community with you. > > > > Thanks! > > Samuel Queiroz > > -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.rohmann at inovex.de Thu Jan 20 13:06:25 2022 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Thu, 20 Jan 2022 14:06:25 +0100 Subject: [ops][nova][victoria] Power State = Suspended? In-Reply-To: <9e5c73c2a2b9513c8c44c19f6d370411c016b169.camel@redhat.com> References: <0670B960225633449A24709C291A52525125F40D@COM01.performair.local> <9e5c73c2a2b9513c8c44c19f6d370411c016b169.camel@redhat.com> Message-ID: Hey there, On 04/08/2021 19:37, Sean Mooney wrote: >> I had something unusual happen this morning; one of my VMs was showing "Suspended" under the Power State in the Horizon dashboard. >> >> I've never seen that. What does it mean? >> >> Any search that I do points me to a bunch of resources for Status Suspended. > suspened is like hibernate in windows. in the libvirt driver we call libvirt managed_save api > this pauses the guests, snapshots the guest ram and saves it to disk then stops the instance. > so this frees the guest ram on the host and save it to a file so that we can recreate the vm and resume it > as if nothing happened. Sorry to hijack such an old thread. Looking into these features, I was just wondering if it was possible to: ? 1) Disable the support for pause / suspend altogether and not allow anyone to place instances in such states? ? 2) Change the storage location of the saved guest RAM to a shared storage to allow the instance to be migrated while being suspended/paused. As far as I can see currently this data is saved on the host disk. Regards Christian -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Jan 20 13:23:39 2022 From: smooney at redhat.com (Sean Mooney) Date: Thu, 20 Jan 2022 13:23:39 +0000 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: <17e73279ebc.da3cd092886796.8771224917629901204@ghanshyammann.com> Message-ID: On Thu, 2022-01-20 at 09:05 +0900, Takashi Kajinami wrote: > Thank you, Ghanshyam, for your inputs. > These are helpful to understand the latest plan. > > So I think our question comes back to the original one. > Currently keystone allows any of > 1. system-service > 2. domain-service > 3. project-service > 4. system-admin > 5. system-member > 6. system-reader > to validate token but which one is the appropriate one to be used by > authtoken middleware ? > > Considering the purpose of the service role, the service role is > appropriate but it's not yet > clear which scope should be used (as is pointed out by Mark from the > beginning). > > AFAIK token is not a resource belonging to projects so system scope looks > appropriate > but what is the main intention is to allow project/domain scope ? a token is really a resouce belownign to a user and that user can be a member of a project or doamin. do we need to enfoce any scope on this endpoint? the token that the midelware is vlaidating shoudl be suffenct to do the validation since that user shoudl be able to retirve the list of roles and project/domain membership its is part of so i guess im confusted why we woudl not just pass the token to be ckeck as the token the middelware uses and not enforece any scope or role reqiruement on the token validation endpoint perhaps im missunderstanding and the authtoken middleware is not the middleware that validate the toke is valid and populates the project_id and domain_id /roles in the oslo context object? > > By the way, in Puppet OpenStack, we have been using the service"s" project > instead of > the service project for some reason(which I'm not aware of). > So it's helpful for us if we avoid implementing strict limitations to use > the service project. > > > On Thu, Jan 20, 2022 at 1:29 AM Ghanshyam Mann > wrote: > > > ---- On Wed, 19 Jan 2022 08:01:00 -0600 Takashi Kajinami < > > tkajinam at redhat.com> wrote ---- > > > > > > On Wed, Jan 19, 2022 at 9:22 PM Mark Goddard wrote: > > > On Wed, 19 Jan 2022 at 11:15, Takashi Kajinami > > wrote: > > > > > > > > Hi, > > > > > > > > > > > > (The topic doesn't include puppet but ...) > > > > I recently spent some time implementing initial support for SRBAC > > > > in Puppet OpenStack. You can find details in the etherpad[1] I created > > > > as my working note. It includes some items commonly required by all > > toolings > > > > in addition to ones specific to puppet. > > > > [1] https://etherpad.opendev.org/p/puppet-secure-rbac > > > > > > Thanks for responding, Takashi - that's useful. > > > > > > > > > > > I expect some of them (especially the configuration parameters) would > > be used > > > > by TripleO later. > > > > > > > > > Question: should the role be added with system scope or in the > > > > > existing service project? The obvious main use for this is token > > > > > validation, which seems to allow system or project scope. > > > > > > > > I'd add one more question which is; > > > > Which roles should be assigned for the service users ? > > > > > > > > In the project which already implemented SRBAC, system-admin + > > system-reader > > > > allows any API calls and works like the previous project-admin. > > > > > > IIUC the direction of travel has changed, and now the intention is > > > that system-admin won't have access to project-scoped APIs. > > > > Yes, as mark mentioned. And that is the key change from prevous direction. > > We are isolating the system and project level APIs. system token will be > > able > > to perform only system level operation and not allowed to do project level > > operation. For example: system user will not be allowed to create the > > server > > in nova. To have a quick view on those (we have not finished yet in nova), > > you > > can check how it will look like in the below series: > > > > - > > https://review.opendev.org/q/topic:%22bp%252Fpolicy-defaults-refresh-2%22+(status:open%20OR%20status:merged) > > > > You can see the test cases for all four possible configuration combination > > and what all > > roles are allowed in which configuration (case 4th is end goal we want to > > be for RBAC): > > > > 1. enforce_scope=False + legacy rule (current default policies) > > 2. enforce_scope=False + No legacy rule (enable scope but remove old > > policy default) > > 3. enforce_scope=True + legacy rule (enable scope with old policy default) > > 4. enforce_scope=True + no legacy rule (end goal of new RBAC) > > > > > > > > > > > > > For token validations system-reader(or service role) would be enough > > but there are > > > > some system-admin-only APIs (os-server-external-events API in nova > > called by neutron, > > > > Create allocation in placement called by nova or neutron) used for > > communications > > > > between services. > > > > > > The token validation API has the following default policy: > > > > > > identity:validate_token: (role:reader and system_scope:all) or > > > rule:service_role or rule:token_subject > > > > > > So system-reader, system-admin or service (any scope) should work. The > > > spec suggests that the service role is intended for use by service to > > > service APIs, in this case the credentials provided in the > > > keystone_authtoken config. I would guess that system scope makes most > > > sense here with the service role, although the rule suggests it would > > > work with project scope and the service role. > > > > > > I noticed I ignored implied roles... Thanks for clarifying that. > > > I understand and I agree with this. Considering the intention of SRBAC > > this would fixbetter with system-scoped, as you earlier mentioned but I'll > > defer to the others. > > > > > Another thigns to note here is, in Yoga cycle we are doing only > > system-admin. system-reader, > > system-member will be done in phase3 which is for future releases (BB). > > > > > > If we agree system-admin + system-reader is the right set then I'll > > update the default role > > > > assignment accordingly. This is important for Puppet OpenStack > > because there are implementations > > > > in puppet (which is usually called as providers) to manage some > > resources like Flavors, > > > > and these rely on credentials of service users after trying to look > > up user credentials. > > > > > > I think one of the outcomes of this work is that authentication will > > > necessarily become a bit more fine-grained. It might not make sense to > > > have the same role assignments for all users. To your example, I would > > > say that registering flavors should be done by a different user with > > > different permissions than a service user. In kolla-ansible we don't > > > really register flavors other than for octavia - this is up to > > > operators. > > > My main concern was that some service users would require system-admin > > butI should have read this part more carefully. > > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#phase-2 > > > So Assigning the service role (for the proper scope which is asked in > > the original thread)is the right way to go. For the provider stuff I'll > > look into any available option to replace usage of serviceuser credential > > but that's specific to Puppet which we can ignore here in this discussion. > > > > right, once we have service role implemented then we will have clear way > > on how services will be > > communicating to other services APIs. > > > > -gmann > > > > > > > > > > > > > Takashi > > > > > > > > On Wed, Jan 19, 2022 at 7:40 PM Mark Goddard > > wrote: > > > >> > > > >> Hi, > > > >> > > > >> If you haven't been paying close attention, it would be easy to miss > > > >> some of the upcoming RBAC changes which will have an impact on > > > >> deployment projects. I thought I'd start a thread so that we can > > share > > > >> how we are approaching this, get answers to open questions, and > > > >> ideally all end up with a fairly consistent approach. > > > >> > > > >> The secure RBAC work has a long history, and continues to evolve. > > > >> According to [1], we should start to see some fairly substantial > > > >> changes over the next few releases. That spec is fairly long, but > > > >> worth a read. > > > >> > > > >> In the yoga timeline [2], there is one change in particular that has > > > >> an impact on deployment projects, "3. Keystone enforces scope by > > > >> default". After this change, all of the deprecated policies that many > > > >> still rely on in Keystone will be removed. > > > >> > > > >> In kolla-ansible, we have an etherpad [5] with some notes, questions > > > >> and half-baked plans. We made some changes in Xena [3] to use system > > > >> scope in some places when interacting with system APIs in Ansible > > > >> tasks. > > > >> > > > >> The next change we have staged is to add the service role to all > > > >> service users [4], in preparation for [2]. > > > >> > > > >> Question: should the role be added with system scope or in the > > > >> existing service project? The obvious main use for this is token > > > >> validation, which seems to allow system or project scope. > > > >> > > > >> We anticipate that some service users may still require some > > > >> project-scoped roles, e.g. when creating resources for octavia. We'll > > > >> deal with those on a case by case basis. > > > >> > > > >> In anticipation of keystone setting enforce_scope=True and removing > > > >> old default policies (which I assume effectively removes > > > >> enforce_new_defaults?), we will set this in kolla-ansible, and try to > > > >> deal with any fallout. Hopefully the previous work will make this > > > >> minimal. > > > >> > > > >> How does that line up with other projects' approaches? What have we > > missed? > > > >> > > > >> Mark > > > >> > > > >> [1] > > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > > >> [2] > > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > > >> [3] > > https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > > >> [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > > >> [5] > > https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > >> > > > > > > > > > > From ih at imranh.co.uk Thu Jan 20 13:25:32 2022 From: ih at imranh.co.uk (Imran Hussain) Date: Thu, 20 Jan 2022 13:25:32 +0000 Subject: [nova][libvirt] Secure boot with SMM support Message-ID: <557e84483c8c1fa64cbd5c2928862bd4@imranh.co.uk> Hi, I posted yesterday about my issue http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026796.html but having looked in to it I think it's a Nova problem and should be solved by Nova Compute. I've pushed up a quick patch I wrote https://review.opendev.org/c/openstack/nova/+/825496 that's running on my systems. Thoughts? Thanks, Imran From massimo.sgaravatto at gmail.com Thu Jan 20 13:28:34 2022 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Thu, 20 Jan 2022 14:28:34 +0100 Subject: Tesla V100 32G GPU with openstack In-Reply-To: References: Message-ID: I am using libvirt 7.0 on centos8 stream, Openstack Train nvidia drivers are installed only on the VMs (not on the compute node) I am not using any numa setting in the flavor But do you have the problem only when instantiating the second VM (while everything is ok with the first one using 1 GPU ) ? Cheers, Massimo PS: When I configured the GPUs on openstack using pci passthrough, I referred to these guides: https://docs.openstack.org/nova/pike/admin/pci-passthrough.html https://gist.github.com/claudiok/890ab6dfe76fa45b30081e58038a9215 On Thu, Jan 20, 2022 at 1:55 PM Satish Patel wrote: > Thank you! > > That is what I?m also trying to do to give each gpu card to each vm. I do > have exact same setting in my nova.conf. What version of libvirt are you > running? > > Did you install any special nvidia driver etc on your compute node for > passthrough (I doubt because it straightforward). > > Do you have any NUMA setting in your flavor or compute? > > Sent from my iPhone > > On Jan 20, 2022, at 2:52 AM, Massimo Sgaravatto < > massimo.sgaravatto at gmail.com> wrote: > > ? > Hi Satish > > I am not able to understand what is wrong with your environment, but I can > describe my setting. > > I have a compute node with 4 Tesla V100S. > They have the same vendor-id (10de) and the same product id (13d6) [*] > In nova.conf I defined this stuff in the [pci] section: > > [pci] > passthrough_whitelist = {"vendor_id":"10de"} > > alias={"name":"V100","product_id":"1df6","vendor_id":"10de","device_type":"type-PCI"} > > > I then created a flavor with this property: > > pci_passthrough:alias='V100:1' > > Using this flavor I can instantiate 4 VMs: each one can see a single V100 > > Hope this helps > > Cheers, Massimo > > > [*] > # lspci -nnk -d 10de: > 60:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe > 32GB] [10de:1df6] (rev a1) > Subsystem: NVIDIA Corporation Device [10de:13d6] > Kernel driver in use: vfio-pci > Kernel modules: nouveau > 61:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe > 32GB] [10de:1df6] (rev a1) > Subsystem: NVIDIA Corporation Device [10de:13d6] > Kernel driver in use: vfio-pci > Kernel modules: nouveau > da:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe > 32GB] [10de:1df6] (rev a1) > Subsystem: NVIDIA Corporation Device [10de:13d6] > Kernel driver in use: vfio-pci > Kernel modules: nouveau > db:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe > 32GB] [10de:1df6] (rev a1) > Subsystem: NVIDIA Corporation Device [10de:13d6] > Kernel driver in use: vfio-pci > Kernel modules: nouveau > [root at cld-np-gpu-01 ~]# > > > On Wed, Jan 19, 2022 at 10:28 PM Satish Patel > wrote: > >> Hi Massimo, >> >> Ignore my last email, my requirement is to have a single VM with a >> single GPU ("tesla-v100:1") but I would like to create a second VM on >> the same compute node which uses the second GPU but I am getting the >> following error when I create a second VM and vm error out. looks like >> it's not allowing me to create a second vm and bind to a second GPU >> card. >> >> error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error: >> Hostdev already exists in the domain configuration >> >> On Wed, Jan 19, 2022 at 3:10 PM Satish Patel >> wrote: >> > >> > should i need to create a flavor to target both GPU. is it possible to >> > have single flavor cover both GPU because end users don't understand >> > which flavor to use. >> > >> > On Wed, Jan 19, 2022 at 1:54 AM Massimo Sgaravatto >> > wrote: >> > > >> > > If I am not wrong those are 2 GPUs >> > > >> > > "tesla-v100:1" means 1 GPU >> > > >> > > So e.g. a flavor with "pci_passthrough:alias": "tesla-v100:2"} will >> be used to create an instance with 2 GPUs >> > > >> > > Cheers, Massimo >> > > >> > > On Tue, Jan 18, 2022 at 11:35 PM Satish Patel >> wrote: >> > >> >> > >> Thank you for the information. I have a quick question. >> > >> >> > >> [root at gpu01 ~]# lspci | grep -i nv >> > >> 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe >> > >> 32GB] (rev a1) >> > >> d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe >> > >> 32GB] (rev a1) >> > >> >> > >> In the above output showing two cards does that mean they are >> physical >> > >> two or just BUS representation. >> > >> >> > >> Also i have the following entry in openstack flavor, does :1 means >> > >> first GPU card? >> > >> >> > >> {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"} >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> On Tue, Jan 18, 2022 at 5:55 AM Ant?nio Paulo >> wrote: >> > >> > >> > >> > Hey Satish, Gustavo, >> > >> > >> > >> > Just to clarify a bit on point 3, you will have to buy a vGPU >> license >> > >> > per card and this gives you access to all the downloads you need >> through >> > >> > NVIDIA's web dashboard -- both the host and guest drivers as well >> as the >> > >> > license server setup files. >> > >> > >> > >> > Cheers, >> > >> > Ant?nio >> > >> > >> > >> > On 18/01/22 02:46, Satish Patel wrote: >> > >> > > Thank you so much! This is what I was looking for. It is very >> odd that >> > >> > > we buy a pricey card but then we have to buy a license to make >> those >> > >> > > features available. >> > >> > > >> > >> > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos >> > >> > > wrote: >> > >> > >> >> > >> > >> Hello, Satish. >> > >> > >> >> > >> > >> I've been working with vGPU lately and I believe I can answer >> your >> > >> > >> questions: >> > >> > >> >> > >> > >> 1. As you pointed out in question #2, the pci-passthrough will >> allocate >> > >> > >> the entire physical GPU to one single guest VM, while vGPU >> allows you to >> > >> > >> spawn from 1 to several VMs using the same physical GPU, >> depending on >> > >> > >> the vGPU type you choose (check NVIDIA docs to see which vGPU >> types the >> > >> > >> Tesla V100 supports and their properties); >> > >> > >> 2. Correct; >> > >> > >> 3. To use vGPU, you need vGPU drivers installed on the platform >> where >> > >> > >> your deployment of OpenStack is running AND in the VMs, so >> there are two >> > >> > >> drivers to be installed in order to use the feature. I believe >> both of >> > >> > >> them have to be purchased from NVIDIA in order to be used, and >> you would >> > >> > >> also have to deploy an NVIDIA licensing server in order to >> validate the >> > >> > >> licenses of the drivers running in the VMs. >> > >> > >> 4. You can see what the instructions are for each of these >> scenarios in >> > >> > >> [1] and [2]. >> > >> > >> >> > >> > >> There is also extensive documentation on vGPU at NVIDIA's >> website [3]. >> > >> > >> >> > >> > >> [1] >> https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html >> > >> > >> [2] >> https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html >> > >> > >> [3] https://docs.nvidia.com/grid/13.0/index.html >> > >> > >> >> > >> > >> Regards, >> > >> > >> Gustavo. >> > >> > >> >> > >> > >> On 17/01/2022 14:41, Satish Patel wrote: >> > >> > >>> [Please note: This e-mail is from an EXTERNAL e-mail address] >> > >> > >>> >> > >> > >>> Folk, >> > >> > >>> >> > >> > >>> We have Tesla V100 32G GPU and I?m trying to configure with >> openstack wallaby. This is first time dealing with GPU so I have couple of >> question. >> > >> > >>> >> > >> > >>> 1. What is the difference between passthrough vs vGPU? I did >> google but not very clear yet. >> > >> > >>> 2. If I configure it passthrough then does it only work with >> single VM ? ( I meant whole GPU will get allocate to single VM correct? >> > >> > >>> 3. Also some document saying Tesla v100 support vGPU but some >> folks saying you need license. I have no idea where to get that license. >> What is the deal here? >> > >> > >>> 3. What are the config difference between configure this card >> with passthrough vs vGPU? >> > >> > >>> >> > >> > >>> >> > >> > >>> Currently I configure it with passthrough based one one >> article and I am able to spun up with and I can see nvidia card exposed to >> vm. (I used iommu and vfio based driver) so if this card support vGPU then >> do I need iommu and vfio or some other driver to make it virtualize ? >> > >> > >>> >> > >> > >>> Sent from my iPhone >> > >> > >>> >> > >> > > >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Jan 20 13:37:51 2022 From: smooney at redhat.com (Sean Mooney) Date: Thu, 20 Jan 2022 13:37:51 +0000 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: <17e73279ebc.da3cd092886796.8771224917629901204@ghanshyammann.com> Message-ID: <9dc233e4d22afff6c6890acdd6bba1850c8f28f0.camel@redhat.com> On Thu, 2022-01-20 at 13:23 +0000, Sean Mooney wrote: > On Thu, 2022-01-20 at 09:05 +0900, Takashi Kajinami wrote: > > Thank you, Ghanshyam, for your inputs. > > These are helpful to understand the latest plan. > > > > So I think our question comes back to the original one. > > Currently keystone allows any of > > 1. system-service > > 2. domain-service > > 3. project-service > > 4. system-admin > > 5. system-member > > 6. system-reader > > to validate token but which one is the appropriate one to be used by > > authtoken middleware ? > > > > Considering the purpose of the service role, the service role is > > appropriate but it's not yet > > clear which scope should be used (as is pointed out by Mark from the > > beginning). > > > > AFAIK token is not a resource belonging to projects so system scope looks > > appropriate > > but what is the main intention is to allow project/domain scope ? > a token is really a resouce belownign to a user and that user can be a member of a project or doamin. > do we need to enfoce any scope on this endpoint? > > the token that the midelware is vlaidating shoudl be suffenct to do the validation since that user > shoudl be able to retirve the list of roles and project/domain membership its is part of so i guess > im confusted why we woudl not just pass the token to be ckeck as the token the middelware uses and not enforece > any scope or role reqiruement on the token validation endpoint > > perhaps im missunderstanding and the authtoken middleware is not the middleware that validate the toke is valid and populates > the project_id and domain_id /roles in the oslo context object? by the way im asserting that a GET or HEAD query to /v3/auth/tokens should not require any scope or role to complete https://docs.openstack.org/api-ref/identity/v3/?expanded=check-token-detail,validate-and-show-information-for-token-detail#check-token if the token that is being vlaidated is valid then the request can use the permission on that token to authrise the return of the info if the token is not valid hten it woudl return a 403 openstack uses bearer tokens so the fact that you posess it entirles you to use teh permission allowed by that token. even a *reader token with no other roles shoudl be able to use the current token to validate itself. so really i dont think this api shoudl require a second token to vouch for it. i can see some pushing back saying this would weaken the current security model which is valid in which case i woudl proably go with allowing a system-reader token with service roles or something similar. since tokens are nto really proejct or domaing owned but user owned. > > > > By the way, in Puppet OpenStack, we have been using the service"s" project > > instead of > > the service project for some reason(which I'm not aware of). > > So it's helpful for us if we avoid implementing strict limitations to use > > the service project. > > > > > > On Thu, Jan 20, 2022 at 1:29 AM Ghanshyam Mann > > wrote: > > > > > ---- On Wed, 19 Jan 2022 08:01:00 -0600 Takashi Kajinami < > > > tkajinam at redhat.com> wrote ---- > > > > > > > > On Wed, Jan 19, 2022 at 9:22 PM Mark Goddard wrote: > > > > On Wed, 19 Jan 2022 at 11:15, Takashi Kajinami > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > (The topic doesn't include puppet but ...) > > > > > I recently spent some time implementing initial support for SRBAC > > > > > in Puppet OpenStack. You can find details in the etherpad[1] I created > > > > > as my working note. It includes some items commonly required by all > > > toolings > > > > > in addition to ones specific to puppet. > > > > > [1] https://etherpad.opendev.org/p/puppet-secure-rbac > > > > > > > > Thanks for responding, Takashi - that's useful. > > > > > > > > > > > > > > I expect some of them (especially the configuration parameters) would > > > be used > > > > > by TripleO later. > > > > > > > > > > > Question: should the role be added with system scope or in the > > > > > > existing service project? The obvious main use for this is token > > > > > > validation, which seems to allow system or project scope. > > > > > > > > > > I'd add one more question which is; > > > > > Which roles should be assigned for the service users ? > > > > > > > > > > In the project which already implemented SRBAC, system-admin + > > > system-reader > > > > > allows any API calls and works like the previous project-admin. > > > > > > > > IIUC the direction of travel has changed, and now the intention is > > > > that system-admin won't have access to project-scoped APIs. > > > > > > Yes, as mark mentioned. And that is the key change from prevous direction. > > > We are isolating the system and project level APIs. system token will be > > > able > > > to perform only system level operation and not allowed to do project level > > > operation. For example: system user will not be allowed to create the > > > server > > > in nova. To have a quick view on those (we have not finished yet in nova), > > > you > > > can check how it will look like in the below series: > > > > > > - > > > https://review.opendev.org/q/topic:%22bp%252Fpolicy-defaults-refresh-2%22+(status:open%20OR%20status:merged) > > > > > > You can see the test cases for all four possible configuration combination > > > and what all > > > roles are allowed in which configuration (case 4th is end goal we want to > > > be for RBAC): > > > > > > 1. enforce_scope=False + legacy rule (current default policies) > > > 2. enforce_scope=False + No legacy rule (enable scope but remove old > > > policy default) > > > 3. enforce_scope=True + legacy rule (enable scope with old policy default) > > > 4. enforce_scope=True + no legacy rule (end goal of new RBAC) > > > > > > > > > > > > > > > > > For token validations system-reader(or service role) would be enough > > > but there are > > > > > some system-admin-only APIs (os-server-external-events API in nova > > > called by neutron, > > > > > Create allocation in placement called by nova or neutron) used for > > > communications > > > > > between services. > > > > > > > > The token validation API has the following default policy: > > > > > > > > identity:validate_token: (role:reader and system_scope:all) or > > > > rule:service_role or rule:token_subject > > > > > > > > So system-reader, system-admin or service (any scope) should work. The > > > > spec suggests that the service role is intended for use by service to > > > > service APIs, in this case the credentials provided in the > > > > keystone_authtoken config. I would guess that system scope makes most > > > > sense here with the service role, although the rule suggests it would > > > > work with project scope and the service role. > > > > > > > > I noticed I ignored implied roles... Thanks for clarifying that. > > > > I understand and I agree with this. Considering the intention of SRBAC > > > this would fixbetter with system-scoped, as you earlier mentioned but I'll > > > defer to the others. > > > > > > > Another thigns to note here is, in Yoga cycle we are doing only > > > system-admin. system-reader, > > > system-member will be done in phase3 which is for future releases (BB). > > > > > > > > If we agree system-admin + system-reader is the right set then I'll > > > update the default role > > > > > assignment accordingly. This is important for Puppet OpenStack > > > because there are implementations > > > > > in puppet (which is usually called as providers) to manage some > > > resources like Flavors, > > > > > and these rely on credentials of service users after trying to look > > > up user credentials. > > > > > > > > I think one of the outcomes of this work is that authentication will > > > > necessarily become a bit more fine-grained. It might not make sense to > > > > have the same role assignments for all users. To your example, I would > > > > say that registering flavors should be done by a different user with > > > > different permissions than a service user. In kolla-ansible we don't > > > > really register flavors other than for octavia - this is up to > > > > operators. > > > > My main concern was that some service users would require system-admin > > > butI should have read this part more carefully. > > > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#phase-2 > > > > So Assigning the service role (for the proper scope which is asked in > > > the original thread)is the right way to go. For the provider stuff I'll > > > look into any available option to replace usage of serviceuser credential > > > but that's specific to Puppet which we can ignore here in this discussion. > > > > > > right, once we have service role implemented then we will have clear way > > > on how services will be > > > communicating to other services APIs. > > > > > > -gmann > > > > > > > > > > > > > > > > > Takashi > > > > > > > > > > On Wed, Jan 19, 2022 at 7:40 PM Mark Goddard > > > wrote: > > > > >> > > > > >> Hi, > > > > >> > > > > >> If you haven't been paying close attention, it would be easy to miss > > > > >> some of the upcoming RBAC changes which will have an impact on > > > > >> deployment projects. I thought I'd start a thread so that we can > > > share > > > > >> how we are approaching this, get answers to open questions, and > > > > >> ideally all end up with a fairly consistent approach. > > > > >> > > > > >> The secure RBAC work has a long history, and continues to evolve. > > > > >> According to [1], we should start to see some fairly substantial > > > > >> changes over the next few releases. That spec is fairly long, but > > > > >> worth a read. > > > > >> > > > > >> In the yoga timeline [2], there is one change in particular that has > > > > >> an impact on deployment projects, "3. Keystone enforces scope by > > > > >> default". After this change, all of the deprecated policies that many > > > > >> still rely on in Keystone will be removed. > > > > >> > > > > >> In kolla-ansible, we have an etherpad [5] with some notes, questions > > > > >> and half-baked plans. We made some changes in Xena [3] to use system > > > > >> scope in some places when interacting with system APIs in Ansible > > > > >> tasks. > > > > >> > > > > >> The next change we have staged is to add the service role to all > > > > >> service users [4], in preparation for [2]. > > > > >> > > > > >> Question: should the role be added with system scope or in the > > > > >> existing service project? The obvious main use for this is token > > > > >> validation, which seems to allow system or project scope. > > > > >> > > > > >> We anticipate that some service users may still require some > > > > >> project-scoped roles, e.g. when creating resources for octavia. We'll > > > > >> deal with those on a case by case basis. > > > > >> > > > > >> In anticipation of keystone setting enforce_scope=True and removing > > > > >> old default policies (which I assume effectively removes > > > > >> enforce_new_defaults?), we will set this in kolla-ansible, and try to > > > > >> deal with any fallout. Hopefully the previous work will make this > > > > >> minimal. > > > > >> > > > > >> How does that line up with other projects' approaches? What have we > > > missed? > > > > >> > > > > >> Mark > > > > >> > > > > >> [1] > > > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > > > >> [2] > > > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > > > >> [3] > > > https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > > > >> [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > > > >> [5] > > > https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > > >> > > > > > > > > > > > > > > > From rosmaita.fossdev at gmail.com Thu Jan 20 14:03:22 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 20 Jan 2022 09:03:22 -0500 Subject: [cinder] review priorities for the next 2 days Message-ID: <311021fc-d11e-f715-d4bb-d18e857494c1@gmail.com> Not surprisingly, the review priority for the next 2 days is to review new driver patches. The merge deadline is tomorrow (21 January) at 20:00 UTC. To be more specific: We have 2 new drivers that depend on the following patches that generalize the hitachi driver: - https://review.opendev.org/c/openstack/cinder/+/786873 - https://review.opendev.org/c/openstack/cinder/+/815461 Then there are the 2 drivers that depend on the above; they are small patches and each of their third-party CIs is running: - https://review.opendev.org/c/openstack/cinder/+/815582 - https://review.opendev.org/c/openstack/cinder/+/815614 So those should be fairly quick reviews. The teams have been quick to respond to comments. There is also the Lightbits driver; it has both cinder and os-brick patches: - https://review.opendev.org/c/openstack/cinder/+/821602 - https://review.opendev.org/c/openstack/os-brick/+/821603 Their third-party CI is running and they have been very responsive to reviews. So really, only 3 new drivers to review! Other drivers: - TOYOU NetStor - merged! \o/ - Pure NVMe-RoCE - vendor is now aiming to merge in Z - Dell EMC PowerStore NFS - vendor is now aiming to merge in Z - YADRO Tatlin - initial commit posted today; has not yet passed Zuul. (Look at it if you have spare time, but please prioritize the patches listed above.) From fungi at yuggoth.org Thu Jan 20 14:09:09 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 20 Jan 2022 14:09:09 +0000 Subject: [Infra]CentOS Stream 8 based jobs are hitting RETRY or RETRY_LIMIT In-Reply-To: References: Message-ID: <20220120140908.ry5tqr7txwrrctcu@yuggoth.org> On 2022-01-20 15:19:45 +0530 (+0530), Chandan Kumar wrote: > Currently CentOS Stream 8 based jobs are hitting RETRY or RETRY_LIMIT. > > https://zuul.opendev.org/t/openstack/builds?result=RETRY&result=RETRY_LIMIT&skip=0 > Based on the logs, all are hitting with : > ``` > rrors during downloading metadata for repository 'appstream': > 2022-01-20 09:34:18.119377 | primary | - Status code: 404 for > https://mirror-int.iad.rax.opendev.org/centos/8-stream/AppStream/x86_64/os/repodata/09255ba7c10e01afeb0d343667190f9c3e42d0a6099f887619abcb92ea0378db-filelists.xml.gz > (IP: 10.208.224.52) > 2022-01-20 09:34:18.119443 | primary | - Status code: 404 for > https://mirror-int.iad.rax.opendev.org/centos/8-stream/AppStream/x86_64/os/repodata/678056e5b64153ca221196673208730234dd72f03397a3ab2d30fea01392bd87-primary.xml.gz > (IP: 10.208.224.52) > > ``` > While taking a look at > https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/files/centos-mirror-update#L42 > > ``` > 404 for http://mirror.dal10.us.leaseweb.net/centos/8-stream/AppStream/x86_64/os/re > podata/09255ba7c10e01afeb0d343667190f9c3e42d0a6099f887619abcb92ea0378db-filelists.xml.gz > (IP: 209.58.153.1) > ``` > It seems the centos mirror have issues. > https://review.opendev.org/c/opendev/system-config/+/825446 make the > switch to facebook mirror. > > It might fix the issue. [...] The content of /centos/8-stream/AppStream/x86_64/os/repodata/ on mirror.facebook.net is identical to what we're serving already. I checked some other mirrors, e.g. linuxsoft.cern.ch, and see the same. The repomd.xml indices on them all match too. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Thu Jan 20 14:48:28 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 20 Jan 2022 14:48:28 +0000 Subject: [Infra]CentOS Stream 8 based jobs are hitting RETRY or RETRY_LIMIT In-Reply-To: <20220120140908.ry5tqr7txwrrctcu@yuggoth.org> References: <20220120140908.ry5tqr7txwrrctcu@yuggoth.org> Message-ID: <20220120144828.iwjpa4vjevg7e7pa@yuggoth.org> On 2022-01-20 14:09:09 +0000 (+0000), Jeremy Stanley wrote: [...] > The content of /centos/8-stream/AppStream/x86_64/os/repodata/ on > mirror.facebook.net is identical to what we're serving already. I > checked some other mirrors, e.g. linuxsoft.cern.ch, and see the > same. The repomd.xml indices on them all match too. Further investigation of our mirror update logs indicates there was some (likely global) upheaval for CentOS Stream 8 package indices, which we then mirrored on what was probably a several hour delay as we're multiple mirror "hops" from their primary. The mirror at LeaseWeb, which we pull from, had an index update around 06:00 UTC which seems to roughly coincide with when the problems began, and then we saw those indices switch back around 12:00 UTC to what they had been previously. The timeframe where the suspected problem indices were being served from our mirrors was approximately 06:55-12:57 UTC. We also saw a failure to upload updated centos-8-stream images to a significant proportion of our providers shortly prior to this, so out of an abundance of caution I've issued a delete for that image (falling back to the one built yesterday), and our builders are presently refreshing it from what is hopefully now a sane mirror of the packages and indices. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From cel975 at yahoo.com Thu Jan 20 14:58:07 2022 From: cel975 at yahoo.com (Celinio Fernandes) Date: Thu, 20 Jan 2022 14:58:07 +0000 (UTC) Subject: Cannot ssh/ping instance In-Reply-To: <21337927.EfDdHjke4D@p1> References: <869855278.629940.1641716238605.ref@mail.yahoo.com> <1103131717.1158277.1642331300997@mail.yahoo.com> <1237801494.398789.1642631375283@mail.yahoo.com> <21337927.EfDdHjke4D@p1> Message-ID: <564023378.615168.1642690687367@mail.yahoo.com> Thanks. I tried this on the host : sudo ifconfig br-ex 172.24.4.1? netmask 255.255.255.0 up sudo iptables -t nat -A POSTROUTING -s 172.24.4.254/24? -o wlo1 -j MASQUERADE I then connect to the VM through ssh and still no internet: sudo apt-get update returns : W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/focal/InRelease? Temporary failure resolving 'archive.ubuntu.com' Any other suggestion please ? On Thursday, January 20, 2022, 11:46:12 AM GMT+1, Slawek Kaplonski wrote: Hi, In devstack, we are configureing MASQUARADE for the FLOATINT_RANGE to be able to go outside the devstack node. See https://github.com/openstack/devstack/ blob/24b65adc9cedff9c7a8ab412fb39613ef5d4a627/lib/neutron-legacy#L704 for the details. Maybe You need to configure something like that on Your setup? On ?roda, 19 stycznia 2022 23:29:35 CET Celinio Fernandes wrote: >? Hi,still trying to reach the external network from inside the VM.I have not > set up any DNS server on any of the interfaces (shared and public).Do i need > to add one ? > > > > >? ? On Sunday, January 16, 2022, 02:12:00 PM GMT+1, Celinio Fernandes > wrote: > >? Hi, > I can ssh into the instance now but I noticed the VM does not have any > external network access (internet). Before I dig any deeper into that > problem, does anyone know what configuration i need to set up for that ? I > already added 2 new security rules to make sure I can access HTTP and HTTPS > ports (80 and 443), in vain : Ingress? IPv4? TCP? 80 (HTTP)? 0.0.0.0/0 > Ingress? IPv4? TCP? 443 (HTTPS)? 0.0.0.0/0 > > > Thanks. > >? ? On Saturday, January 15, 2022, 12:29:40 AM GMT+1, Celinio Fernandes > wrote: > >? Thanks very much for your help. > Before you replied, I tried what you wrote but on the wrong interfaces : > enp0s3 and virbr0. > I had no idea I needed to add the IP address from the public network's subnet > on the br-ex interface. So to ping/ssh the floating IP this is what I did : > ip link set dev br-ex up > ip link set dev br-ex state up > sudo ip addr add 172.24.4.254/24 dev br-ex > And then I can finally ping the floating IP : > ping 172.24.4.133 > And I can also ssh into the VM : > ssh cirros at 172.24.4.133 > > Thanks again :) > > > >? ? On Sunday, January 9, 2022, 08:21:18 PM GMT+1, Slawek Kaplonski > wrote: > >? Hi, > > On niedziela, 9 stycznia 2022 09:17:18 CET Celinio Fernandes wrote: > > Hi, > > I am running Ubuntu Server 20.04 LTS on Virtualbox. I installed OpenStack > > (Xena release) through Devstack. Here is the content of my > > /opt/stack/devstack/local.conf file : > > [[local|localrc]] > > ADMIN_PASSWORD=secret > > DATABASE_PASSWORD=$ADMIN_PASSWORD > > RABBIT_PASSWORD=$ADMIN_PASSWORD > > SERVICE_PASSWORD=$ADMIN_PASSWORD > > HOST_IP=10.0.2.15 > > > > > > I created an instance through Horizon. The security group contains the > > 2 rules needed (one to be able to ping and one to be able to ssh the > > instance). I also allocated and associated a floating IP address. And a ssh > > key pair. > > > > Here is the configuration : > > openstack server list > > ---------------------------------+--------------------------+---------+ > > > > | ID? | Name | Status | Networks | Image? | Flavor? | > > > > ---------------------------------+--------------------------+---------+ > > > > | f5f0fdd5-298b-4fa3-9ee9-e6e4288f4327 | InstanceJanvier | ACTIVE | > > | shared=172.24.4.133, 192.168.233.165 | cirros-0.5.2-x86_64-disk | m1.nano > > > > ------------------------------------------------------+ > > > > > > openstack network list : > > ------------------------------------------------------+ > > > > | ID? ? | Name? ? | Subnets? ? ? ? ? ? | > > > > ------------------------------------------------------+ > > > > | 96a04799-7fc7-4525-b05c-ad57261aed38 | public? | > > | 07ce42db-6f3f-4135-ace7-2fc104ea62a0, > > | 6dba13fc-b10c-48b1-b1b4-e1f1afe25b53 > > | > > | | c42638dc-fa56-4644-ad34-295fce4811d2 | shared? | > > | > > | a4e2d8cc-02b2-42e2-a525-e0eebbb08980? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > > | > > | | ffb8a527-266e-4e96-ad60-f7e9aba8f0c1 | private | > > | > > | 42e36677-cf3c-4df4-88a1-8cf79b9d6060, > > | e507e6dd-132a-4249-96b1-83761562dd73 > > > > ------------------------------------------------------+ > > > > openstack router list : > > +--------------------------------------+----------------+--------+------ > > > > | ID? ? | Name? | Status | State | Project? ? ? ? ? ? ? ? ? ? ? ? ? | > > > > +--------------------------------------+----------------+--------+------ > > > > | b9a15051-a532-4c93-95ad-53c057720c62 | Virtual_router | ACTIVE | UP? ? | > > | 6556c02dd88f4c45b535c2dbb8ba1a04 | > > > > +--------------------------------------+----------------+--------+------ > > > > > > I cannot ping/ssh neither the fixed IP address or the floating IP address : > > ping -c 3 172.24.4.133 > > PING 172.24.4.133 (172.24.4.133) 56(84) bytes of data. > > --- 172.24.4.133 ping statistics --- > > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > > > ping -c 3 192.168.233.165 > > PING 192.168.233.165 (192.168.233.165) 56(84) bytes of data. > > --- 192.168.233.165 ping statistics --- > > 3 packets transmitted, 0 received, 100% packet loss, time 2035ms > > > > Maybe that has something to do with the network namespaces configuration on > > Ubuntu. Does anyone know what could go wrong or what is missing ? > > Thanks for helping. > > If You are trying to ping Floating IP directly from the host where devstack > is installed (Virtualbox VM in Your case IIUC) then You should first have > those floating IP addresses somehow reachable on the host, otherwise traffic > is probably going through default gateway so is going outside the VM. > If You are using ML2/OVN (default in Devstack) or ML2/OVS You probably have > in the openvswitch bridge called br-ex which is used to send external > network traffic from the OpenStack networks in Devstack. In such case You > can e.g. add some IP address from the public network's subnet on the br-ex > interface, like 192.168.233.254/24 - that will tell Your OS to reach that > subnet through br- ex, so traffic will be able to go "into" the OVS managed > by Neutron. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Thu Jan 20 15:17:25 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Thu, 20 Jan 2022 16:17:25 +0100 Subject: [nova] [placement] Proposing Sean Mooney as nova-core In-Reply-To: References: Message-ID: Le mer. 12 janv. 2022 ? 17:19, Sylvain Bauza a ?crit : > Hi all, > I would like to propose Sean as an addition to the nova-core team (which > includes placement merge rights as nova-core is implicitly a subgroup). > > As we know, he's around for a long time, is already a nova-specs-core and > has proven solid experience in reviews. > > Cores, please vote (-1, 0, +1) before next Wednesday Jan 19th 1600UTC. > > FWIW, saw no negative votes, so added Sean to nova-core group. Welcome Sean on board and thanks for helping the community ! -S > Cheers, > -Sylvain > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Thu Jan 20 15:32:03 2022 From: satish.txt at gmail.com (Satish Patel) Date: Thu, 20 Jan 2022 10:32:03 -0500 Subject: Tesla V100 32G GPU with openstack In-Reply-To: References: Message-ID: Hi Massimo, My problem got resolved :( it was very stupid problem. I have glusterfs mounted on /var/lib/nova and somehow after reboot node that mount point disappears and /var/lib/nova endup on a local disk which has only a 50G partition. My flavor has disk 40G so the first vm always works but second vm i get a strange error :( after fixing my mount point everything works :) Mostly when you don't have enough space i should get an error like No Valid Host but i was getting a very different error so that mislead me :( Thank you for your help though. now time to play with InfiniBand configuration (are you guys using InfiniBand in your cloud?) On Thu, Jan 20, 2022 at 8:28 AM Massimo Sgaravatto wrote: > > I am using libvirt 7.0 on centos8 stream, Openstack Train > > nvidia drivers are installed only on the VMs (not on the compute node) > I am not using any numa setting in the flavor > > But do you have the problem only when instantiating the second VM (while everything is ok with the first one using 1 GPU ) ? > > > Cheers, Massimo > > PS: When I configured the GPUs on openstack using pci passthrough, I referred to these guides: > > https://docs.openstack.org/nova/pike/admin/pci-passthrough.html > https://gist.github.com/claudiok/890ab6dfe76fa45b30081e58038a9215 > > > On Thu, Jan 20, 2022 at 1:55 PM Satish Patel wrote: >> >> Thank you! >> >> That is what I?m also trying to do to give each gpu card to each vm. I do have exact same setting in my nova.conf. What version of libvirt are you running? >> >> Did you install any special nvidia driver etc on your compute node for passthrough (I doubt because it straightforward). >> >> Do you have any NUMA setting in your flavor or compute? >> >> Sent from my iPhone >> >> On Jan 20, 2022, at 2:52 AM, Massimo Sgaravatto wrote: >> >> ? >> Hi Satish >> >> I am not able to understand what is wrong with your environment, but I can describe my setting. >> >> I have a compute node with 4 Tesla V100S. >> They have the same vendor-id (10de) and the same product id (13d6) [*] >> In nova.conf I defined this stuff in the [pci] section: >> >> [pci] >> passthrough_whitelist = {"vendor_id":"10de"} >> alias={"name":"V100","product_id":"1df6","vendor_id":"10de","device_type":"type-PCI"} >> >> >> I then created a flavor with this property: >> >> pci_passthrough:alias='V100:1' >> >> Using this flavor I can instantiate 4 VMs: each one can see a single V100 >> >> Hope this helps >> >> Cheers, Massimo >> >> >> [*] >> # lspci -nnk -d 10de: >> 60:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) >> Subsystem: NVIDIA Corporation Device [10de:13d6] >> Kernel driver in use: vfio-pci >> Kernel modules: nouveau >> 61:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) >> Subsystem: NVIDIA Corporation Device [10de:13d6] >> Kernel driver in use: vfio-pci >> Kernel modules: nouveau >> da:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) >> Subsystem: NVIDIA Corporation Device [10de:13d6] >> Kernel driver in use: vfio-pci >> Kernel modules: nouveau >> db:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1) >> Subsystem: NVIDIA Corporation Device [10de:13d6] >> Kernel driver in use: vfio-pci >> Kernel modules: nouveau >> [root at cld-np-gpu-01 ~]# >> >> >> On Wed, Jan 19, 2022 at 10:28 PM Satish Patel wrote: >>> >>> Hi Massimo, >>> >>> Ignore my last email, my requirement is to have a single VM with a >>> single GPU ("tesla-v100:1") but I would like to create a second VM on >>> the same compute node which uses the second GPU but I am getting the >>> following error when I create a second VM and vm error out. looks like >>> it's not allowing me to create a second vm and bind to a second GPU >>> card. >>> >>> error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error: >>> Hostdev already exists in the domain configuration >>> >>> On Wed, Jan 19, 2022 at 3:10 PM Satish Patel wrote: >>> > >>> > should i need to create a flavor to target both GPU. is it possible to >>> > have single flavor cover both GPU because end users don't understand >>> > which flavor to use. >>> > >>> > On Wed, Jan 19, 2022 at 1:54 AM Massimo Sgaravatto >>> > wrote: >>> > > >>> > > If I am not wrong those are 2 GPUs >>> > > >>> > > "tesla-v100:1" means 1 GPU >>> > > >>> > > So e.g. a flavor with "pci_passthrough:alias": "tesla-v100:2"} will be used to create an instance with 2 GPUs >>> > > >>> > > Cheers, Massimo >>> > > >>> > > On Tue, Jan 18, 2022 at 11:35 PM Satish Patel wrote: >>> > >> >>> > >> Thank you for the information. I have a quick question. >>> > >> >>> > >> [root at gpu01 ~]# lspci | grep -i nv >>> > >> 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe >>> > >> 32GB] (rev a1) >>> > >> d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe >>> > >> 32GB] (rev a1) >>> > >> >>> > >> In the above output showing two cards does that mean they are physical >>> > >> two or just BUS representation. >>> > >> >>> > >> Also i have the following entry in openstack flavor, does :1 means >>> > >> first GPU card? >>> > >> >>> > >> {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"} >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> On Tue, Jan 18, 2022 at 5:55 AM Ant?nio Paulo wrote: >>> > >> > >>> > >> > Hey Satish, Gustavo, >>> > >> > >>> > >> > Just to clarify a bit on point 3, you will have to buy a vGPU license >>> > >> > per card and this gives you access to all the downloads you need through >>> > >> > NVIDIA's web dashboard -- both the host and guest drivers as well as the >>> > >> > license server setup files. >>> > >> > >>> > >> > Cheers, >>> > >> > Ant?nio >>> > >> > >>> > >> > On 18/01/22 02:46, Satish Patel wrote: >>> > >> > > Thank you so much! This is what I was looking for. It is very odd that >>> > >> > > we buy a pricey card but then we have to buy a license to make those >>> > >> > > features available. >>> > >> > > >>> > >> > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos >>> > >> > > wrote: >>> > >> > >> >>> > >> > >> Hello, Satish. >>> > >> > >> >>> > >> > >> I've been working with vGPU lately and I believe I can answer your >>> > >> > >> questions: >>> > >> > >> >>> > >> > >> 1. As you pointed out in question #2, the pci-passthrough will allocate >>> > >> > >> the entire physical GPU to one single guest VM, while vGPU allows you to >>> > >> > >> spawn from 1 to several VMs using the same physical GPU, depending on >>> > >> > >> the vGPU type you choose (check NVIDIA docs to see which vGPU types the >>> > >> > >> Tesla V100 supports and their properties); >>> > >> > >> 2. Correct; >>> > >> > >> 3. To use vGPU, you need vGPU drivers installed on the platform where >>> > >> > >> your deployment of OpenStack is running AND in the VMs, so there are two >>> > >> > >> drivers to be installed in order to use the feature. I believe both of >>> > >> > >> them have to be purchased from NVIDIA in order to be used, and you would >>> > >> > >> also have to deploy an NVIDIA licensing server in order to validate the >>> > >> > >> licenses of the drivers running in the VMs. >>> > >> > >> 4. You can see what the instructions are for each of these scenarios in >>> > >> > >> [1] and [2]. >>> > >> > >> >>> > >> > >> There is also extensive documentation on vGPU at NVIDIA's website [3]. >>> > >> > >> >>> > >> > >> [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html >>> > >> > >> [2] https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html >>> > >> > >> [3] https://docs.nvidia.com/grid/13.0/index.html >>> > >> > >> >>> > >> > >> Regards, >>> > >> > >> Gustavo. >>> > >> > >> >>> > >> > >> On 17/01/2022 14:41, Satish Patel wrote: >>> > >> > >>> [Please note: This e-mail is from an EXTERNAL e-mail address] >>> > >> > >>> >>> > >> > >>> Folk, >>> > >> > >>> >>> > >> > >>> We have Tesla V100 32G GPU and I?m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question. >>> > >> > >>> >>> > >> > >>> 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet. >>> > >> > >>> 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct? >>> > >> > >>> 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here? >>> > >> > >>> 3. What are the config difference between configure this card with passthrough vs vGPU? >>> > >> > >>> >>> > >> > >>> >>> > >> > >>> Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ? >>> > >> > >>> >>> > >> > >>> Sent from my iPhone >>> > >> > >>> >>> > >> > > >>> > >> From strigazi at gmail.com Thu Jan 20 15:56:10 2022 From: strigazi at gmail.com (Spyros Trigazis) Date: Thu, 20 Jan 2022 16:56:10 +0100 Subject: [magnum][tc] Proposing Jake Yip for core-reviewer Message-ID: Dear all, I would like to nominate Jake Yip for core reviewer in the magnum project. Jake has been doing code reviews for the past years in magnum at a steady rate and is running magnum as well in their organization. With very good knowledge of the magnum codebase, I am confident that Jake will help increase the pace of development in the project. Cheers, Spyros -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jan 20 16:14:45 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 20 Jan 2022 10:14:45 -0600 Subject: [magnum][tc] Proposing Jake Yip for core-reviewer In-Reply-To: References: Message-ID: <17e7844dbe1.c0976f59964466.8920758580000981793@ghanshyammann.com> ---- On Thu, 20 Jan 2022 09:56:10 -0600 Spyros Trigazis wrote ---- > Dear all, > I would like to nominate Jake Yip for core reviewer in the magnum project. > Jake has been doing code reviews for the past years in magnum at a steady rate > and is running magnum as well in their organization. With very good knowledgeof the magnum codebase, I am confident that Jake will help increase the pace > of development in the project. Thanks a lot. Surly, it will help. Really appreciate your action on this. -gmann > Cheers,Spyros > > From gmann at ghanshyammann.com Thu Jan 20 18:40:31 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 20 Jan 2022 12:40:31 -0600 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: <17e731cbfcc.e2463d54885770.5926410035050303741@ghanshyammann.com> Message-ID: <17e78ca4f43.11df902c0973397.2476553495581610085@ghanshyammann.com> ---- On Thu, 20 Jan 2022 03:35:33 -0600 Mark Goddard wrote ---- > On Wed, 19 Jan 2022 at 16:12, Ghanshyam Mann wrote: > > > > ---- On Wed, 19 Jan 2022 04:35:53 -0600 Mark Goddard wrote ---- > > > Hi, > > > > > > If you haven't been paying close attention, it would be easy to miss > > > some of the upcoming RBAC changes which will have an impact on > > > deployment projects. I thought I'd start a thread so that we can share > > > how we are approaching this, get answers to open questions, and > > > ideally all end up with a fairly consistent approach. > > > > > > The secure RBAC work has a long history, and continues to evolve. > > > According to [1], we should start to see some fairly substantial > > > changes over the next few releases. That spec is fairly long, but > > > worth a read. > > > > > > In the yoga timeline [2], there is one change in particular that has > > > an impact on deployment projects, "3. Keystone enforces scope by > > > default". After this change, all of the deprecated policies that many > > > still rely on in Keystone will be removed. > > > > > > In kolla-ansible, we have an etherpad [5] with some notes, questions > > > and half-baked plans. We made some changes in Xena [3] to use system > > > scope in some places when interacting with system APIs in Ansible > > > tasks. > > > > > > The next change we have staged is to add the service role to all > > > service users [4], in preparation for [2]. > > > > > > Question: should the role be added with system scope or in the > > > existing service project? The obvious main use for this is token > > > validation, which seems to allow system or project scope. > > > > > > We anticipate that some service users may still require some > > > project-scoped roles, e.g. when creating resources for octavia. We'll > > > deal with those on a case by case basis. > > > > Service roles are planned for phase2 which is Z release[1]. The Idea here is > > service to service communication will happen with 'service' role (which keystone > > need to implement yet) and end users will keep using the what ever role > > is default (or overridden in policy file) which can be project or system scoped > > depends on the APIs. > > > > So at the end service-service APIs policy default will looks like > > > > '(role:admin and system:network and project_id:%(project_id)s) or (role:service and project_name:service)' > > > > Say nova will use that service role to communicate to cinder and cinder policy will pass > > as service role is in OR in default policy. > > > > But let's see how they are going to be and if any challenges when we will implement > > it in Z cycle. > > I'm not 100% on our reasoning for using the service role in yoga (I > wasn't in the discussion when we made the switch, although John > Garbutt was), although I can provide at least one reason. > > Currently, we have a bunch of service users doing things like keystone > token validation using the admin role in the service project. If we > enforce scopes & new defaults in keystone, this will no longer work, > due to the default policy: > > identity:validate_token: (role:reader and system_scope:all) or > rule:service_role or rule:token_subject > > Now we could go and assign system-reader to all these users, but if > the end goal is to give them all the service role, and that allows > token validation, then to me that seems like a better path. > > Currently, we're creating the service role during deploy & upgrade, > then assigning it to users. Keystone is supposed to create the service > role in yoga, so we can eventually drop that part. > > Does this seem reasonable? Is keystone still on track to create the > service role in yoga? I think this is a reasonable plan and once we have service roles implemented in keystone as well as in all the services to request other service APIs then deployment project (Kolla here) can update them from system_reader to actual service role. And yes that can be done for token validation as well as the service-to-service API calls for example nova to cinder or neutron to nova APIs call. I do not think we can migrate everything (service tokens) together for all the services in deployment projects until all these services are ready with the 'service' role implementation (implementation means changing their default roles to add 'service' role for service-to-service APIs). Regarding the keystone track on service role work in Yoga or not, I do not have clear answer may be Lance or keystone team can answer it. But Lance has spec up[1] but not yet merged. [1] https://review.opendev.org/c/openstack/keystone-specs/+/818616 -gmann > > > > > > > > > In anticipation of keystone setting enforce_scope=True and removing > > > old default policies (which I assume effectively removes > > > enforce_new_defaults?), we will set this in kolla-ansible, and try to > > > deal with any fallout. Hopefully the previous work will make this > > > minimal. > > > > > > How does that line up with other projects' approaches? What have we missed? > > > > Yeah, we want users/deployment projects/horizon etc to use the new policy from > > keystone as first and we will see feedback how they are (good, bad, really bad) from > > usage perspective. Why we choose keystone is, because new policy are there since > > many cycle and ready to use. Other projects needs to work their policy as per new > > SRBAC design/direction (for example nova needs to modify their policy before we ask > > users to use new policy and work is under progress[2]). > > > > I think trying in kolla will be good way to know if we can move to keystone's new policy > > completely in yoga. > > We have a scope-enforcing preview patch [1], and it's passing our base > set of tests. I have another that triggers all of the jobs. > > [1] https://review.opendev.org/c/openstack/kolla-ansible/+/825406 > > > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#z-release-timeline > > [2] https://blueprints.launchpad.net/nova/+spec/policy-defaults-refresh-2 > > > > -gmann > > > > > > > > Mark > > > > > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > > [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > > [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > > [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > > [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > > > > > > From gmann at ghanshyammann.com Thu Jan 20 18:53:26 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 20 Jan 2022 12:53:26 -0600 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: <17e73279ebc.da3cd092886796.8771224917629901204@ghanshyammann.com> Message-ID: <17e78d6230b.da0e3f8e973917.8408987284990732554@ghanshyammann.com> ---- On Wed, 19 Jan 2022 18:05:11 -0600 Takashi Kajinami wrote ---- > Thank you, Ghanshyam, for your inputs.These are helpful to understand the latest plan. > So I think our question comes back to the original one.Currently keystone allows any of 1. system-service 2. domain-service > 3. project-service 4. system-admin 5. system-member 6. system-readerto validate token but which one is the appropriate one to be used by authtoken middleware ? > Considering the purpose of the service role, the service role is appropriate but it's not yetclear which scope should be used (as is pointed out by Mark from the beginning). > AFAIK token is not a resource belonging to projects so system scope looks appropriatebut what is the main intention is to allow project/domain scope ? IMO, general service role enforcement will look like: - They will enforce the same scope as APIs. For example, neutrons call nova APIs X (server external event APIs). Nova APIs default policy will add service role like below: policy.DocumentedRuleDefault( name='os_compute_api:os-server-external-events:create', check_str='role:admin or role:service', scope_types=['project'] ) and neutron will call the service role token with project scoped. Same applies to token validation APIs also, as you know, it is allowed to any scope (system, domain, project) so allowed service role in any scope can be used. Answer to your question on which one is appropriate is that you can use any of mentioned one as they are allowed (that is how users will be accessing it). I hope it answer your query but again service roles are not implemented yet so policy default may change especially from project side policy, hoping keystone policy are all good and will not change but let's wait until this spec - https://review.opendev.org/c/openstack/keystone-specs/+/818616 -gmann > > By the way, in Puppet OpenStack, we have been using the service"s" project instead ofthe service project for some reason(which I'm not aware of).So it's helpful for us if we avoid implementing strict limitations to use the service project. > > > On Thu, Jan 20, 2022 at 1:29 AM Ghanshyam Mann wrote: > ---- On Wed, 19 Jan 2022 08:01:00 -0600 Takashi Kajinami wrote ---- > > > > On Wed, Jan 19, 2022 at 9:22 PM Mark Goddard wrote: > > On Wed, 19 Jan 2022 at 11:15, Takashi Kajinami wrote: > > > > > > Hi, > > > > > > > > > (The topic doesn't include puppet but ...) > > > I recently spent some time implementing initial support for SRBAC > > > in Puppet OpenStack. You can find details in the etherpad[1] I created > > > as my working note. It includes some items commonly required by all toolings > > > in addition to ones specific to puppet. > > > [1] https://etherpad.opendev.org/p/puppet-secure-rbac > > > > Thanks for responding, Takashi - that's useful. > > > > > > > > I expect some of them (especially the configuration parameters) would be used > > > by TripleO later. > > > > > > > Question: should the role be added with system scope or in the > > > > existing service project? The obvious main use for this is token > > > > validation, which seems to allow system or project scope. > > > > > > I'd add one more question which is; > > > Which roles should be assigned for the service users ? > > > > > > In the project which already implemented SRBAC, system-admin + system-reader > > > allows any API calls and works like the previous project-admin. > > > > IIUC the direction of travel has changed, and now the intention is > > that system-admin won't have access to project-scoped APIs. > > Yes, as mark mentioned. And that is the key change from prevous direction. > We are isolating the system and project level APIs. system token will be able > to perform only system level operation and not allowed to do project level > operation. For example: system user will not be allowed to create the server > in nova. To have a quick view on those (we have not finished yet in nova), you > can check how it will look like in the below series: > > - https://review.opendev.org/q/topic:%22bp%252Fpolicy-defaults-refresh-2%22+(status:open%20OR%20status:merged) > > You can see the test cases for all four possible configuration combination and what all > roles are allowed in which configuration (case 4th is end goal we want to be for RBAC): > > 1. enforce_scope=False + legacy rule (current default policies) > 2. enforce_scope=False + No legacy rule (enable scope but remove old policy default) > 3. enforce_scope=True + legacy rule (enable scope with old policy default) > 4. enforce_scope=True + no legacy rule (end goal of new RBAC) > > > > > > > > > For token validations system-reader(or service role) would be enough but there are > > > some system-admin-only APIs (os-server-external-events API in nova called by neutron, > > > Create allocation in placement called by nova or neutron) used for communications > > > between services. > > > > The token validation API has the following default policy: > > > > identity:validate_token: (role:reader and system_scope:all) or > > rule:service_role or rule:token_subject > > > > So system-reader, system-admin or service (any scope) should work. The > > spec suggests that the service role is intended for use by service to > > service APIs, in this case the credentials provided in the > > keystone_authtoken config. I would guess that system scope makes most > > sense here with the service role, although the rule suggests it would > > work with project scope and the service role. > > > > I noticed I ignored implied roles... Thanks for clarifying that. > > I understand and I agree with this. Considering the intention of SRBAC this would fixbetter with system-scoped, as you earlier mentioned but I'll defer to the others. > > > Another thigns to note here is, in Yoga cycle we are doing only system-admin. system-reader, > system-member will be done in phase3 which is for future releases (BB). > > > > If we agree system-admin + system-reader is the right set then I'll update the default role > > > assignment accordingly. This is important for Puppet OpenStack because there are implementations > > > in puppet (which is usually called as providers) to manage some resources like Flavors, > > > and these rely on credentials of service users after trying to look up user credentials. > > > > I think one of the outcomes of this work is that authentication will > > necessarily become a bit more fine-grained. It might not make sense to > > have the same role assignments for all users. To your example, I would > > say that registering flavors should be done by a different user with > > different permissions than a service user. In kolla-ansible we don't > > really register flavors other than for octavia - this is up to > > operators. > > My main concern was that some service users would require system-admin butI should have read this part more carefully. https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#phase-2 > > So Assigning the service role (for the proper scope which is asked in the original thread)is the right way to go. For the provider stuff I'll look into any available option to replace usage of serviceuser credential but that's specific to Puppet which we can ignore here in this discussion. > > right, once we have service role implemented then we will have clear way on how services will be > communicating to other services APIs. > > -gmann > > > > > > > > > Takashi > > > > > > On Wed, Jan 19, 2022 at 7:40 PM Mark Goddard wrote: > > >> > > >> Hi, > > >> > > >> If you haven't been paying close attention, it would be easy to miss > > >> some of the upcoming RBAC changes which will have an impact on > > >> deployment projects. I thought I'd start a thread so that we can share > > >> how we are approaching this, get answers to open questions, and > > >> ideally all end up with a fairly consistent approach. > > >> > > >> The secure RBAC work has a long history, and continues to evolve. > > >> According to [1], we should start to see some fairly substantial > > >> changes over the next few releases. That spec is fairly long, but > > >> worth a read. > > >> > > >> In the yoga timeline [2], there is one change in particular that has > > >> an impact on deployment projects, "3. Keystone enforces scope by > > >> default". After this change, all of the deprecated policies that many > > >> still rely on in Keystone will be removed. > > >> > > >> In kolla-ansible, we have an etherpad [5] with some notes, questions > > >> and half-baked plans. We made some changes in Xena [3] to use system > > >> scope in some places when interacting with system APIs in Ansible > > >> tasks. > > >> > > >> The next change we have staged is to add the service role to all > > >> service users [4], in preparation for [2]. > > >> > > >> Question: should the role be added with system scope or in the > > >> existing service project? The obvious main use for this is token > > >> validation, which seems to allow system or project scope. > > >> > > >> We anticipate that some service users may still require some > > >> project-scoped roles, e.g. when creating resources for octavia. We'll > > >> deal with those on a case by case basis. > > >> > > >> In anticipation of keystone setting enforce_scope=True and removing > > >> old default policies (which I assume effectively removes > > >> enforce_new_defaults?), we will set this in kolla-ansible, and try to > > >> deal with any fallout. Hopefully the previous work will make this > > >> minimal. > > >> > > >> How does that line up with other projects' approaches? What have we missed? > > >> > > >> Mark > > >> > > >> [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > >> [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > >> [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > >> [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > >> [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > >> > > > > > > From gmann at ghanshyammann.com Thu Jan 20 18:55:31 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 20 Jan 2022 12:55:31 -0600 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: <9dc233e4d22afff6c6890acdd6bba1850c8f28f0.camel@redhat.com> References: <17e73279ebc.da3cd092886796.8771224917629901204@ghanshyammann.com> <9dc233e4d22afff6c6890acdd6bba1850c8f28f0.camel@redhat.com> Message-ID: <17e78d808cd.10adc3b9e974017.1955512596653528133@ghanshyammann.com> ---- On Thu, 20 Jan 2022 07:37:51 -0600 Sean Mooney wrote ---- > On Thu, 2022-01-20 at 13:23 +0000, Sean Mooney wrote: > > On Thu, 2022-01-20 at 09:05 +0900, Takashi Kajinami wrote: > > > Thank you, Ghanshyam, for your inputs. > > > These are helpful to understand the latest plan. > > > > > > So I think our question comes back to the original one. > > > Currently keystone allows any of > > > 1. system-service > > > 2. domain-service > > > 3. project-service > > > 4. system-admin > > > 5. system-member > > > 6. system-reader > > > to validate token but which one is the appropriate one to be used by > > > authtoken middleware ? > > > > > > Considering the purpose of the service role, the service role is > > > appropriate but it's not yet > > > clear which scope should be used (as is pointed out by Mark from the > > > beginning). > > > > > > AFAIK token is not a resource belonging to projects so system scope looks > > > appropriate > > > but what is the main intention is to allow project/domain scope ? > > a token is really a resouce belownign to a user and that user can be a member of a project or doamin. > > do we need to enfoce any scope on this endpoint? > > > > the token that the midelware is vlaidating shoudl be suffenct to do the validation since that user > > shoudl be able to retirve the list of roles and project/domain membership its is part of so i guess > > im confusted why we woudl not just pass the token to be ckeck as the token the middelware uses and not enforece > > any scope or role reqiruement on the token validation endpoint > > > > perhaps im missunderstanding and the authtoken middleware is not the middleware that validate the toke is valid and populates > > the project_id and domain_id /roles in the oslo context object? > > by the way im asserting that a GET or HEAD query to > > /v3/auth/tokens should not require any scope or role to complete > > https://docs.openstack.org/api-ref/identity/v3/?expanded=check-token-detail,validate-and-show-information-for-token-detail#check-token > > if the token that is being vlaidated is valid then the request can use the permission on that token to authrise the return of the info > if the token is not valid hten it woudl return a 403 > > openstack uses bearer tokens so the fact that you posess it entirles you to use teh permission allowed by that token. > > even a *reader token with no other roles shoudl be able to use the current token to validate itself. > so really i dont think this api shoudl require a second token to vouch for it. > > i can see some pushing back saying this would weaken the current security model which is valid > in which case i woudl proably go with allowing a system-reader token with service roles or something similar. > since tokens are nto really proejct or domaing owned but user owned. As you mentioned, token valdiation is allowed to system, domain, and project scope token so I think main idea here to have and check scope is to block the unscoped token which is good. IMO, Not forcing the scope seems more risky. -gmann > > > > > > > By the way, in Puppet OpenStack, we have been using the service"s" project > > > instead of > > > the service project for some reason(which I'm not aware of). > > > So it's helpful for us if we avoid implementing strict limitations to use > > > the service project. > > > > > > > > > On Thu, Jan 20, 2022 at 1:29 AM Ghanshyam Mann > > > wrote: > > > > > > > ---- On Wed, 19 Jan 2022 08:01:00 -0600 Takashi Kajinami < > > > > tkajinam at redhat.com> wrote ---- > > > > > > > > > > On Wed, Jan 19, 2022 at 9:22 PM Mark Goddard wrote: > > > > > On Wed, 19 Jan 2022 at 11:15, Takashi Kajinami > > > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > (The topic doesn't include puppet but ...) > > > > > > I recently spent some time implementing initial support for SRBAC > > > > > > in Puppet OpenStack. You can find details in the etherpad[1] I created > > > > > > as my working note. It includes some items commonly required by all > > > > toolings > > > > > > in addition to ones specific to puppet. > > > > > > [1] https://etherpad.opendev.org/p/puppet-secure-rbac > > > > > > > > > > Thanks for responding, Takashi - that's useful. > > > > > > > > > > > > > > > > > I expect some of them (especially the configuration parameters) would > > > > be used > > > > > > by TripleO later. > > > > > > > > > > > > > Question: should the role be added with system scope or in the > > > > > > > existing service project? The obvious main use for this is token > > > > > > > validation, which seems to allow system or project scope. > > > > > > > > > > > > I'd add one more question which is; > > > > > > Which roles should be assigned for the service users ? > > > > > > > > > > > > In the project which already implemented SRBAC, system-admin + > > > > system-reader > > > > > > allows any API calls and works like the previous project-admin. > > > > > > > > > > IIUC the direction of travel has changed, and now the intention is > > > > > that system-admin won't have access to project-scoped APIs. > > > > > > > > Yes, as mark mentioned. And that is the key change from prevous direction. > > > > We are isolating the system and project level APIs. system token will be > > > > able > > > > to perform only system level operation and not allowed to do project level > > > > operation. For example: system user will not be allowed to create the > > > > server > > > > in nova. To have a quick view on those (we have not finished yet in nova), > > > > you > > > > can check how it will look like in the below series: > > > > > > > > - > > > > https://review.opendev.org/q/topic:%22bp%252Fpolicy-defaults-refresh-2%22+(status:open%20OR%20status:merged) > > > > > > > > You can see the test cases for all four possible configuration combination > > > > and what all > > > > roles are allowed in which configuration (case 4th is end goal we want to > > > > be for RBAC): > > > > > > > > 1. enforce_scope=False + legacy rule (current default policies) > > > > 2. enforce_scope=False + No legacy rule (enable scope but remove old > > > > policy default) > > > > 3. enforce_scope=True + legacy rule (enable scope with old policy default) > > > > 4. enforce_scope=True + no legacy rule (end goal of new RBAC) > > > > > > > > > > > > > > > > > > > > > For token validations system-reader(or service role) would be enough > > > > but there are > > > > > > some system-admin-only APIs (os-server-external-events API in nova > > > > called by neutron, > > > > > > Create allocation in placement called by nova or neutron) used for > > > > communications > > > > > > between services. > > > > > > > > > > The token validation API has the following default policy: > > > > > > > > > > identity:validate_token: (role:reader and system_scope:all) or > > > > > rule:service_role or rule:token_subject > > > > > > > > > > So system-reader, system-admin or service (any scope) should work. The > > > > > spec suggests that the service role is intended for use by service to > > > > > service APIs, in this case the credentials provided in the > > > > > keystone_authtoken config. I would guess that system scope makes most > > > > > sense here with the service role, although the rule suggests it would > > > > > work with project scope and the service role. > > > > > > > > > > I noticed I ignored implied roles... Thanks for clarifying that. > > > > > I understand and I agree with this. Considering the intention of SRBAC > > > > this would fixbetter with system-scoped, as you earlier mentioned but I'll > > > > defer to the others. > > > > > > > > > Another thigns to note here is, in Yoga cycle we are doing only > > > > system-admin. system-reader, > > > > system-member will be done in phase3 which is for future releases (BB). > > > > > > > > > > If we agree system-admin + system-reader is the right set then I'll > > > > update the default role > > > > > > assignment accordingly. This is important for Puppet OpenStack > > > > because there are implementations > > > > > > in puppet (which is usually called as providers) to manage some > > > > resources like Flavors, > > > > > > and these rely on credentials of service users after trying to look > > > > up user credentials. > > > > > > > > > > I think one of the outcomes of this work is that authentication will > > > > > necessarily become a bit more fine-grained. It might not make sense to > > > > > have the same role assignments for all users. To your example, I would > > > > > say that registering flavors should be done by a different user with > > > > > different permissions than a service user. In kolla-ansible we don't > > > > > really register flavors other than for octavia - this is up to > > > > > operators. > > > > > My main concern was that some service users would require system-admin > > > > butI should have read this part more carefully. > > > > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#phase-2 > > > > > So Assigning the service role (for the proper scope which is asked in > > > > the original thread)is the right way to go. For the provider stuff I'll > > > > look into any available option to replace usage of serviceuser credential > > > > but that's specific to Puppet which we can ignore here in this discussion. > > > > > > > > right, once we have service role implemented then we will have clear way > > > > on how services will be > > > > communicating to other services APIs. > > > > > > > > -gmann > > > > > > > > > > > > > > > > > > > > > Takashi > > > > > > > > > > > > On Wed, Jan 19, 2022 at 7:40 PM Mark Goddard > > > > wrote: > > > > > >> > > > > > >> Hi, > > > > > >> > > > > > >> If you haven't been paying close attention, it would be easy to miss > > > > > >> some of the upcoming RBAC changes which will have an impact on > > > > > >> deployment projects. I thought I'd start a thread so that we can > > > > share > > > > > >> how we are approaching this, get answers to open questions, and > > > > > >> ideally all end up with a fairly consistent approach. > > > > > >> > > > > > >> The secure RBAC work has a long history, and continues to evolve. > > > > > >> According to [1], we should start to see some fairly substantial > > > > > >> changes over the next few releases. That spec is fairly long, but > > > > > >> worth a read. > > > > > >> > > > > > >> In the yoga timeline [2], there is one change in particular that has > > > > > >> an impact on deployment projects, "3. Keystone enforces scope by > > > > > >> default". After this change, all of the deprecated policies that many > > > > > >> still rely on in Keystone will be removed. > > > > > >> > > > > > >> In kolla-ansible, we have an etherpad [5] with some notes, questions > > > > > >> and half-baked plans. We made some changes in Xena [3] to use system > > > > > >> scope in some places when interacting with system APIs in Ansible > > > > > >> tasks. > > > > > >> > > > > > >> The next change we have staged is to add the service role to all > > > > > >> service users [4], in preparation for [2]. > > > > > >> > > > > > >> Question: should the role be added with system scope or in the > > > > > >> existing service project? The obvious main use for this is token > > > > > >> validation, which seems to allow system or project scope. > > > > > >> > > > > > >> We anticipate that some service users may still require some > > > > > >> project-scoped roles, e.g. when creating resources for octavia. We'll > > > > > >> deal with those on a case by case basis. > > > > > >> > > > > > >> In anticipation of keystone setting enforce_scope=True and removing > > > > > >> old default policies (which I assume effectively removes > > > > > >> enforce_new_defaults?), we will set this in kolla-ansible, and try to > > > > > >> deal with any fallout. Hopefully the previous work will make this > > > > > >> minimal. > > > > > >> > > > > > >> How does that line up with other projects' approaches? What have we > > > > missed? > > > > > >> > > > > > >> Mark > > > > > >> > > > > > >> [1] > > > > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > > > > >> [2] > > > > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > > > > >> [3] > > > > https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > > > > >> [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > > > > >> [5] > > > > https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > From mark at stackhpc.com Thu Jan 20 19:36:53 2022 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 20 Jan 2022 19:36:53 +0000 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: <17e78ca4f43.11df902c0973397.2476553495581610085@ghanshyammann.com> References: <17e731cbfcc.e2463d54885770.5926410035050303741@ghanshyammann.com> <17e78ca4f43.11df902c0973397.2476553495581610085@ghanshyammann.com> Message-ID: On Thu, 20 Jan 2022 at 18:40, Ghanshyam Mann wrote: > > > ---- On Thu, 20 Jan 2022 03:35:33 -0600 Mark Goddard wrote ---- > > On Wed, 19 Jan 2022 at 16:12, Ghanshyam Mann wrote: > > > > > > ---- On Wed, 19 Jan 2022 04:35:53 -0600 Mark Goddard wrote ---- > > > > Hi, > > > > > > > > If you haven't been paying close attention, it would be easy to miss > > > > some of the upcoming RBAC changes which will have an impact on > > > > deployment projects. I thought I'd start a thread so that we can share > > > > how we are approaching this, get answers to open questions, and > > > > ideally all end up with a fairly consistent approach. > > > > > > > > The secure RBAC work has a long history, and continues to evolve. > > > > According to [1], we should start to see some fairly substantial > > > > changes over the next few releases. That spec is fairly long, but > > > > worth a read. > > > > > > > > In the yoga timeline [2], there is one change in particular that has > > > > an impact on deployment projects, "3. Keystone enforces scope by > > > > default". After this change, all of the deprecated policies that many > > > > still rely on in Keystone will be removed. > > > > > > > > In kolla-ansible, we have an etherpad [5] with some notes, questions > > > > and half-baked plans. We made some changes in Xena [3] to use system > > > > scope in some places when interacting with system APIs in Ansible > > > > tasks. > > > > > > > > The next change we have staged is to add the service role to all > > > > service users [4], in preparation for [2]. > > > > > > > > Question: should the role be added with system scope or in the > > > > existing service project? The obvious main use for this is token > > > > validation, which seems to allow system or project scope. > > > > > > > > We anticipate that some service users may still require some > > > > project-scoped roles, e.g. when creating resources for octavia. We'll > > > > deal with those on a case by case basis. > > > > > > Service roles are planned for phase2 which is Z release[1]. The Idea here is > > > service to service communication will happen with 'service' role (which keystone > > > need to implement yet) and end users will keep using the what ever role > > > is default (or overridden in policy file) which can be project or system scoped > > > depends on the APIs. > > > > > > So at the end service-service APIs policy default will looks like > > > > > > '(role:admin and system:network and project_id:%(project_id)s) or (role:service and project_name:service)' > > > > > > Say nova will use that service role to communicate to cinder and cinder policy will pass > > > as service role is in OR in default policy. > > > > > > But let's see how they are going to be and if any challenges when we will implement > > > it in Z cycle. > > > > I'm not 100% on our reasoning for using the service role in yoga (I > > wasn't in the discussion when we made the switch, although John > > Garbutt was), although I can provide at least one reason. > > > > Currently, we have a bunch of service users doing things like keystone > > token validation using the admin role in the service project. If we > > enforce scopes & new defaults in keystone, this will no longer work, > > due to the default policy: > > > > identity:validate_token: (role:reader and system_scope:all) or > > rule:service_role or rule:token_subject > > > > Now we could go and assign system-reader to all these users, but if > > the end goal is to give them all the service role, and that allows > > token validation, then to me that seems like a better path. > > > > Currently, we're creating the service role during deploy & upgrade, > > then assigning it to users. Keystone is supposed to create the service > > role in yoga, so we can eventually drop that part. > > > > Does this seem reasonable? Is keystone still on track to create the > > service role in yoga? > > I think this is a reasonable plan and once we have service roles implemented > in keystone as well as in all the services to request other service APIs then > deployment project (Kolla here) can update them from system_reader to > actual service role. To be clear, I am proposing to skip system-reader, and go straight to the service role in yoga. > > And yes that can be done for token validation as well as > the service-to-service API calls for example nova to cinder or neutron to nova > APIs call. I do not think we can migrate everything (service tokens) together for all > the services in deployment projects until all these services are ready with the 'service' > role implementation (implementation means changing their default roles > to add 'service' role for service-to-service APIs). > > Regarding the keystone track on service role work in Yoga or not, I do not > have clear answer may be Lance or keystone team can answer it. But Lance > has spec up[1] but not yet merged. > > [1] https://review.opendev.org/c/openstack/keystone-specs/+/818616 > > -gmann > > > > > > > > > > > > > > In anticipation of keystone setting enforce_scope=True and removing > > > > old default policies (which I assume effectively removes > > > > enforce_new_defaults?), we will set this in kolla-ansible, and try to > > > > deal with any fallout. Hopefully the previous work will make this > > > > minimal. > > > > > > > > How does that line up with other projects' approaches? What have we missed? > > > > > > Yeah, we want users/deployment projects/horizon etc to use the new policy from > > > keystone as first and we will see feedback how they are (good, bad, really bad) from > > > usage perspective. Why we choose keystone is, because new policy are there since > > > many cycle and ready to use. Other projects needs to work their policy as per new > > > SRBAC design/direction (for example nova needs to modify their policy before we ask > > > users to use new policy and work is under progress[2]). > > > > > > I think trying in kolla will be good way to know if we can move to keystone's new policy > > > completely in yoga. > > > > We have a scope-enforcing preview patch [1], and it's passing our base > > set of tests. I have another that triggers all of the jobs. > > > > [1] https://review.opendev.org/c/openstack/kolla-ansible/+/825406 > > > > > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#z-release-timeline > > > [2] https://blueprints.launchpad.net/nova/+spec/policy-defaults-refresh-2 > > > > > > -gmann > > > > > > > > > > > Mark > > > > > > > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > > > [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > > > [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > > > [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > > > [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > > > > > > > > > > From katonalala at gmail.com Thu Jan 20 19:41:24 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 20 Jan 2022 20:41:24 +0100 Subject: [neutron] Drivers meeting - Friday 21.01.2022 - cancelled Message-ID: Hi Neutron Drivers! Due to the lack of agenda, let's cancel tomorrow's drivers meeting. See You on the meeting next week. [0]: https://meetings.opendev.org/meetings/neutron_drivers/2021/neutron_drivers.2021-12-03-14.01.log.html#l-76 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Thu Jan 20 19:43:03 2022 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 20 Jan 2022 19:43:03 +0000 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: <17e78d6230b.da0e3f8e973917.8408987284990732554@ghanshyammann.com> References: <17e73279ebc.da3cd092886796.8771224917629901204@ghanshyammann.com> <17e78d6230b.da0e3f8e973917.8408987284990732554@ghanshyammann.com> Message-ID: On Thu, 20 Jan 2022 at 18:53, Ghanshyam Mann wrote: > > ---- On Wed, 19 Jan 2022 18:05:11 -0600 Takashi Kajinami wrote ---- > > Thank you, Ghanshyam, for your inputs.These are helpful to understand the latest plan. > > So I think our question comes back to the original one.Currently keystone allows any of 1. system-service 2. domain-service > > 3. project-service 4. system-admin 5. system-member 6. system-readerto validate token but which one is the appropriate one to be used by authtoken middleware ? > > Considering the purpose of the service role, the service role is appropriate but it's not yetclear which scope should be used (as is pointed out by Mark from the beginning). > > AFAIK token is not a resource belonging to projects so system scope looks appropriatebut what is the main intention is to allow project/domain scope ? > > IMO, general service role enforcement will look like: > > - They will enforce the same scope as APIs. For example, neutrons call nova APIs X (server external event APIs). Nova > APIs default policy will add service role like below: > > policy.DocumentedRuleDefault( > name='os_compute_api:os-server-external-events:create', > check_str='role:admin or role:service', > scope_types=['project'] > ) > > and neutron will call the service role token with project scoped. Which project would be in scope here? I don't think it makes sense to use the project of the resource that the event is for, since these calls are generally asynchronous, so we won't have the user's context. Currently for these service API calls AFAIK we're using a service user (e.g. nova) which has the admin role in the service project. > > Same applies to token validation APIs also, as you know, it is allowed to > any scope (system, domain, project) so allowed service role in any scope > can be used. Answer to your question on which one is appropriate is that > you can use any of mentioned one as they are allowed (that is how users > will be accessing it). > > I hope it answer your query but again service roles are not implemented yet > so policy default may change especially from project side policy, hoping keystone > policy are all good and will not change but let's wait until this spec > - https://review.opendev.org/c/openstack/keystone-specs/+/818616 > > -gmann > > > > > By the way, in Puppet OpenStack, we have been using the service"s" project instead ofthe service project for some reason(which I'm not aware of).So it's helpful for us if we avoid implementing strict limitations to use the service project. > > > > > > On Thu, Jan 20, 2022 at 1:29 AM Ghanshyam Mann wrote: > > ---- On Wed, 19 Jan 2022 08:01:00 -0600 Takashi Kajinami wrote ---- > > > > > > On Wed, Jan 19, 2022 at 9:22 PM Mark Goddard wrote: > > > On Wed, 19 Jan 2022 at 11:15, Takashi Kajinami wrote: > > > > > > > > Hi, > > > > > > > > > > > > (The topic doesn't include puppet but ...) > > > > I recently spent some time implementing initial support for SRBAC > > > > in Puppet OpenStack. You can find details in the etherpad[1] I created > > > > as my working note. It includes some items commonly required by all toolings > > > > in addition to ones specific to puppet. > > > > [1] https://etherpad.opendev.org/p/puppet-secure-rbac > > > > > > Thanks for responding, Takashi - that's useful. > > > > > > > > > > > I expect some of them (especially the configuration parameters) would be used > > > > by TripleO later. > > > > > > > > > Question: should the role be added with system scope or in the > > > > > existing service project? The obvious main use for this is token > > > > > validation, which seems to allow system or project scope. > > > > > > > > I'd add one more question which is; > > > > Which roles should be assigned for the service users ? > > > > > > > > In the project which already implemented SRBAC, system-admin + system-reader > > > > allows any API calls and works like the previous project-admin. > > > > > > IIUC the direction of travel has changed, and now the intention is > > > that system-admin won't have access to project-scoped APIs. > > > > Yes, as mark mentioned. And that is the key change from prevous direction. > > We are isolating the system and project level APIs. system token will be able > > to perform only system level operation and not allowed to do project level > > operation. For example: system user will not be allowed to create the server > > in nova. To have a quick view on those (we have not finished yet in nova), you > > can check how it will look like in the below series: > > > > - https://review.opendev.org/q/topic:%22bp%252Fpolicy-defaults-refresh-2%22+(status:open%20OR%20status:merged) > > > > You can see the test cases for all four possible configuration combination and what all > > roles are allowed in which configuration (case 4th is end goal we want to be for RBAC): > > > > 1. enforce_scope=False + legacy rule (current default policies) > > 2. enforce_scope=False + No legacy rule (enable scope but remove old policy default) > > 3. enforce_scope=True + legacy rule (enable scope with old policy default) > > 4. enforce_scope=True + no legacy rule (end goal of new RBAC) > > > > > > > > > > > > > For token validations system-reader(or service role) would be enough but there are > > > > some system-admin-only APIs (os-server-external-events API in nova called by neutron, > > > > Create allocation in placement called by nova or neutron) used for communications > > > > between services. > > > > > > The token validation API has the following default policy: > > > > > > identity:validate_token: (role:reader and system_scope:all) or > > > rule:service_role or rule:token_subject > > > > > > So system-reader, system-admin or service (any scope) should work. The > > > spec suggests that the service role is intended for use by service to > > > service APIs, in this case the credentials provided in the > > > keystone_authtoken config. I would guess that system scope makes most > > > sense here with the service role, although the rule suggests it would > > > work with project scope and the service role. > > > > > > I noticed I ignored implied roles... Thanks for clarifying that. > > > I understand and I agree with this. Considering the intention of SRBAC this would fixbetter with system-scoped, as you earlier mentioned but I'll defer to the others. > > > > > Another thigns to note here is, in Yoga cycle we are doing only system-admin. system-reader, > > system-member will be done in phase3 which is for future releases (BB). > > > > > > If we agree system-admin + system-reader is the right set then I'll update the default role > > > > assignment accordingly. This is important for Puppet OpenStack because there are implementations > > > > in puppet (which is usually called as providers) to manage some resources like Flavors, > > > > and these rely on credentials of service users after trying to look up user credentials. > > > > > > I think one of the outcomes of this work is that authentication will > > > necessarily become a bit more fine-grained. It might not make sense to > > > have the same role assignments for all users. To your example, I would > > > say that registering flavors should be done by a different user with > > > different permissions than a service user. In kolla-ansible we don't > > > really register flavors other than for octavia - this is up to > > > operators. > > > My main concern was that some service users would require system-admin butI should have read this part more carefully. https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#phase-2 > > > So Assigning the service role (for the proper scope which is asked in the original thread)is the right way to go. For the provider stuff I'll look into any available option to replace usage of serviceuser credential but that's specific to Puppet which we can ignore here in this discussion. > > > > right, once we have service role implemented then we will have clear way on how services will be > > communicating to other services APIs. > > > > -gmann > > > > > > > > > > > > > Takashi > > > > > > > > On Wed, Jan 19, 2022 at 7:40 PM Mark Goddard wrote: > > > >> > > > >> Hi, > > > >> > > > >> If you haven't been paying close attention, it would be easy to miss > > > >> some of the upcoming RBAC changes which will have an impact on > > > >> deployment projects. I thought I'd start a thread so that we can share > > > >> how we are approaching this, get answers to open questions, and > > > >> ideally all end up with a fairly consistent approach. > > > >> > > > >> The secure RBAC work has a long history, and continues to evolve. > > > >> According to [1], we should start to see some fairly substantial > > > >> changes over the next few releases. That spec is fairly long, but > > > >> worth a read. > > > >> > > > >> In the yoga timeline [2], there is one change in particular that has > > > >> an impact on deployment projects, "3. Keystone enforces scope by > > > >> default". After this change, all of the deprecated policies that many > > > >> still rely on in Keystone will be removed. > > > >> > > > >> In kolla-ansible, we have an etherpad [5] with some notes, questions > > > >> and half-baked plans. We made some changes in Xena [3] to use system > > > >> scope in some places when interacting with system APIs in Ansible > > > >> tasks. > > > >> > > > >> The next change we have staged is to add the service role to all > > > >> service users [4], in preparation for [2]. > > > >> > > > >> Question: should the role be added with system scope or in the > > > >> existing service project? The obvious main use for this is token > > > >> validation, which seems to allow system or project scope. > > > >> > > > >> We anticipate that some service users may still require some > > > >> project-scoped roles, e.g. when creating resources for octavia. We'll > > > >> deal with those on a case by case basis. > > > >> > > > >> In anticipation of keystone setting enforce_scope=True and removing > > > >> old default policies (which I assume effectively removes > > > >> enforce_new_defaults?), we will set this in kolla-ansible, and try to > > > >> deal with any fallout. Hopefully the previous work will make this > > > >> minimal. > > > >> > > > >> How does that line up with other projects' approaches? What have we missed? > > > >> > > > >> Mark > > > >> > > > >> [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > > >> [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > > >> [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > > >> [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > > >> [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > >> > > > > > > > > > > From gmann at ghanshyammann.com Thu Jan 20 19:55:35 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 20 Jan 2022 13:55:35 -0600 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: <17e731cbfcc.e2463d54885770.5926410035050303741@ghanshyammann.com> <17e78ca4f43.11df902c0973397.2476553495581610085@ghanshyammann.com> Message-ID: <17e790f0a61.fa031a2a976064.4934623486848755469@ghanshyammann.com> ---- On Thu, 20 Jan 2022 13:36:53 -0600 Mark Goddard wrote ---- > On Thu, 20 Jan 2022 at 18:40, Ghanshyam Mann wrote: > > > > > > ---- On Thu, 20 Jan 2022 03:35:33 -0600 Mark Goddard wrote ---- > > > On Wed, 19 Jan 2022 at 16:12, Ghanshyam Mann wrote: > > > > > > > > ---- On Wed, 19 Jan 2022 04:35:53 -0600 Mark Goddard wrote ---- > > > > > Hi, > > > > > > > > > > If you haven't been paying close attention, it would be easy to miss > > > > > some of the upcoming RBAC changes which will have an impact on > > > > > deployment projects. I thought I'd start a thread so that we can share > > > > > how we are approaching this, get answers to open questions, and > > > > > ideally all end up with a fairly consistent approach. > > > > > > > > > > The secure RBAC work has a long history, and continues to evolve. > > > > > According to [1], we should start to see some fairly substantial > > > > > changes over the next few releases. That spec is fairly long, but > > > > > worth a read. > > > > > > > > > > In the yoga timeline [2], there is one change in particular that has > > > > > an impact on deployment projects, "3. Keystone enforces scope by > > > > > default". After this change, all of the deprecated policies that many > > > > > still rely on in Keystone will be removed. > > > > > > > > > > In kolla-ansible, we have an etherpad [5] with some notes, questions > > > > > and half-baked plans. We made some changes in Xena [3] to use system > > > > > scope in some places when interacting with system APIs in Ansible > > > > > tasks. > > > > > > > > > > The next change we have staged is to add the service role to all > > > > > service users [4], in preparation for [2]. > > > > > > > > > > Question: should the role be added with system scope or in the > > > > > existing service project? The obvious main use for this is token > > > > > validation, which seems to allow system or project scope. > > > > > > > > > > We anticipate that some service users may still require some > > > > > project-scoped roles, e.g. when creating resources for octavia. We'll > > > > > deal with those on a case by case basis. > > > > > > > > Service roles are planned for phase2 which is Z release[1]. The Idea here is > > > > service to service communication will happen with 'service' role (which keystone > > > > need to implement yet) and end users will keep using the what ever role > > > > is default (or overridden in policy file) which can be project or system scoped > > > > depends on the APIs. > > > > > > > > So at the end service-service APIs policy default will looks like > > > > > > > > '(role:admin and system:network and project_id:%(project_id)s) or (role:service and project_name:service)' > > > > > > > > Say nova will use that service role to communicate to cinder and cinder policy will pass > > > > as service role is in OR in default policy. > > > > > > > > But let's see how they are going to be and if any challenges when we will implement > > > > it in Z cycle. > > > > > > I'm not 100% on our reasoning for using the service role in yoga (I > > > wasn't in the discussion when we made the switch, although John > > > Garbutt was), although I can provide at least one reason. > > > > > > Currently, we have a bunch of service users doing things like keystone > > > token validation using the admin role in the service project. If we > > > enforce scopes & new defaults in keystone, this will no longer work, > > > due to the default policy: > > > > > > identity:validate_token: (role:reader and system_scope:all) or > > > rule:service_role or rule:token_subject > > > > > > Now we could go and assign system-reader to all these users, but if > > > the end goal is to give them all the service role, and that allows > > > token validation, then to me that seems like a better path. > > > > > > Currently, we're creating the service role during deploy & upgrade, > > > then assigning it to users. Keystone is supposed to create the service > > > role in yoga, so we can eventually drop that part. > > > > > > Does this seem reasonable? Is keystone still on track to create the > > > service role in yoga? > > > > I think this is a reasonable plan and once we have service roles implemented > > in keystone as well as in all the services to request other service APIs then > > deployment project (Kolla here) can update them from system_reader to > > actual service role. > > To be clear, I am proposing to skip system-reader, and go straight to > the service role in yoga. But that would not be doable until services implement service roles which is Yoga cycle target for keystone and Z cyle target for other projects. Or you mean to re-consider to target the service role for all projects also in Yoga so that deployment projects can go with service role directly? -gmann > > > > > And yes that can be done for token validation as well as > > the service-to-service API calls for example nova to cinder or neutron to nova > > APIs call. I do not think we can migrate everything (service tokens) together for all > > the services in deployment projects until all these services are ready with the 'service' > > role implementation (implementation means changing their default roles > > to add 'service' role for service-to-service APIs). > > > > Regarding the keystone track on service role work in Yoga or not, I do not > > have clear answer may be Lance or keystone team can answer it. But Lance > > has spec up[1] but not yet merged. > > > > [1] https://review.opendev.org/c/openstack/keystone-specs/+/818616 > > > > -gmann > > > > > > > > > > > > > > > > > > > In anticipation of keystone setting enforce_scope=True and removing > > > > > old default policies (which I assume effectively removes > > > > > enforce_new_defaults?), we will set this in kolla-ansible, and try to > > > > > deal with any fallout. Hopefully the previous work will make this > > > > > minimal. > > > > > > > > > > How does that line up with other projects' approaches? What have we missed? > > > > > > > > Yeah, we want users/deployment projects/horizon etc to use the new policy from > > > > keystone as first and we will see feedback how they are (good, bad, really bad) from > > > > usage perspective. Why we choose keystone is, because new policy are there since > > > > many cycle and ready to use. Other projects needs to work their policy as per new > > > > SRBAC design/direction (for example nova needs to modify their policy before we ask > > > > users to use new policy and work is under progress[2]). > > > > > > > > I think trying in kolla will be good way to know if we can move to keystone's new policy > > > > completely in yoga. > > > > > > We have a scope-enforcing preview patch [1], and it's passing our base > > > set of tests. I have another that triggers all of the jobs. > > > > > > [1] https://review.opendev.org/c/openstack/kolla-ansible/+/825406 > > > > > > > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#z-release-timeline > > > > [2] https://blueprints.launchpad.net/nova/+spec/policy-defaults-refresh-2 > > > > > > > > -gmann > > > > > > > > > > > > > > Mark > > > > > > > > > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > > > > [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > > > > [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > > > > [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > > > > [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > > > > > > > > > > > > > > > > From amy at demarco.com Thu Jan 20 20:38:04 2022 From: amy at demarco.com (Amy Marrich) Date: Thu, 20 Jan 2022 14:38:04 -0600 Subject: [Diversity] Diversity and Inclusion WG new meeting time Message-ID: Thank you to those who responded to the Doodle for our meeting time. The day and time selected was Friday at 15:00 UTC so we will move to that in February with our first meeting being on 2/4. As usual, should we need to cancel the first Friday of the month we have the backup date of the third week. Should we end up having to do that too frequently ,as Fridays are popular for holidays and vacations, we'll readdress the meeting time again. Also chosen was to switch to a video platform and we will work on choosing one before the meeting reminder goes out Thanks, Amy (spotz) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Thu Jan 20 20:41:00 2022 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 20 Jan 2022 20:41:00 +0000 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: <17e790f0a61.fa031a2a976064.4934623486848755469@ghanshyammann.com> References: <17e731cbfcc.e2463d54885770.5926410035050303741@ghanshyammann.com> <17e78ca4f43.11df902c0973397.2476553495581610085@ghanshyammann.com> <17e790f0a61.fa031a2a976064.4934623486848755469@ghanshyammann.com> Message-ID: On Thu, 20 Jan 2022 at 19:55, Ghanshyam Mann wrote: > > ---- On Thu, 20 Jan 2022 13:36:53 -0600 Mark Goddard wrote ---- > > On Thu, 20 Jan 2022 at 18:40, Ghanshyam Mann wrote: > > > > > > > > > ---- On Thu, 20 Jan 2022 03:35:33 -0600 Mark Goddard wrote ---- > > > > On Wed, 19 Jan 2022 at 16:12, Ghanshyam Mann wrote: > > > > > > > > > > ---- On Wed, 19 Jan 2022 04:35:53 -0600 Mark Goddard wrote ---- > > > > > > Hi, > > > > > > > > > > > > If you haven't been paying close attention, it would be easy to miss > > > > > > some of the upcoming RBAC changes which will have an impact on > > > > > > deployment projects. I thought I'd start a thread so that we can share > > > > > > how we are approaching this, get answers to open questions, and > > > > > > ideally all end up with a fairly consistent approach. > > > > > > > > > > > > The secure RBAC work has a long history, and continues to evolve. > > > > > > According to [1], we should start to see some fairly substantial > > > > > > changes over the next few releases. That spec is fairly long, but > > > > > > worth a read. > > > > > > > > > > > > In the yoga timeline [2], there is one change in particular that has > > > > > > an impact on deployment projects, "3. Keystone enforces scope by > > > > > > default". After this change, all of the deprecated policies that many > > > > > > still rely on in Keystone will be removed. > > > > > > > > > > > > In kolla-ansible, we have an etherpad [5] with some notes, questions > > > > > > and half-baked plans. We made some changes in Xena [3] to use system > > > > > > scope in some places when interacting with system APIs in Ansible > > > > > > tasks. > > > > > > > > > > > > The next change we have staged is to add the service role to all > > > > > > service users [4], in preparation for [2]. > > > > > > > > > > > > Question: should the role be added with system scope or in the > > > > > > existing service project? The obvious main use for this is token > > > > > > validation, which seems to allow system or project scope. > > > > > > > > > > > > We anticipate that some service users may still require some > > > > > > project-scoped roles, e.g. when creating resources for octavia. We'll > > > > > > deal with those on a case by case basis. > > > > > > > > > > Service roles are planned for phase2 which is Z release[1]. The Idea here is > > > > > service to service communication will happen with 'service' role (which keystone > > > > > need to implement yet) and end users will keep using the what ever role > > > > > is default (or overridden in policy file) which can be project or system scoped > > > > > depends on the APIs. > > > > > > > > > > So at the end service-service APIs policy default will looks like > > > > > > > > > > '(role:admin and system:network and project_id:%(project_id)s) or (role:service and project_name:service)' > > > > > > > > > > Say nova will use that service role to communicate to cinder and cinder policy will pass > > > > > as service role is in OR in default policy. > > > > > > > > > > But let's see how they are going to be and if any challenges when we will implement > > > > > it in Z cycle. > > > > > > > > I'm not 100% on our reasoning for using the service role in yoga (I > > > > wasn't in the discussion when we made the switch, although John > > > > Garbutt was), although I can provide at least one reason. > > > > > > > > Currently, we have a bunch of service users doing things like keystone > > > > token validation using the admin role in the service project. If we > > > > enforce scopes & new defaults in keystone, this will no longer work, > > > > due to the default policy: > > > > > > > > identity:validate_token: (role:reader and system_scope:all) or > > > > rule:service_role or rule:token_subject > > > > > > > > Now we could go and assign system-reader to all these users, but if > > > > the end goal is to give them all the service role, and that allows > > > > token validation, then to me that seems like a better path. > > > > > > > > Currently, we're creating the service role during deploy & upgrade, > > > > then assigning it to users. Keystone is supposed to create the service > > > > role in yoga, so we can eventually drop that part. > > > > > > > > Does this seem reasonable? Is keystone still on track to create the > > > > service role in yoga? > > > > > > I think this is a reasonable plan and once we have service roles implemented > > > in keystone as well as in all the services to request other service APIs then > > > deployment project (Kolla here) can update them from system_reader to > > > actual service role. > > > > To be clear, I am proposing to skip system-reader, and go straight to > > the service role in yoga. > > But that would not be doable until services implement service roles which is > Yoga cycle target for keystone and Z cyle target for other projects. Or you mean > to re-consider to target the service role for all projects also in Yoga so that > deployment projects can go with service role directly? Our current plan is to add the service role to all service users in yoga. This will allow keystone token validation to work when keystone drops the deprecated policies. We will not remove the admin role from service users in the service project during yoga. This will allow projects other than keystone to continue to work as before. At some later point, we will remove the admin role from service users in the service project, hopefully relying on the service role for most service-service communication. There may be other roles we need to assign in order to drop admin, but we'll assess that as we go. Hopefully that's a bit more of a clear picture, and it seems sensible? > > -gmann > > > > > > > > > And yes that can be done for token validation as well as > > > the service-to-service API calls for example nova to cinder or neutron to nova > > > APIs call. I do not think we can migrate everything (service tokens) together for all > > > the services in deployment projects until all these services are ready with the 'service' > > > role implementation (implementation means changing their default roles > > > to add 'service' role for service-to-service APIs). > > > > > > Regarding the keystone track on service role work in Yoga or not, I do not > > > have clear answer may be Lance or keystone team can answer it. But Lance > > > has spec up[1] but not yet merged. > > > > > > [1] https://review.opendev.org/c/openstack/keystone-specs/+/818616 > > > > > > -gmann > > > > > > > > > > > > > > > > > > > > > > > > In anticipation of keystone setting enforce_scope=True and removing > > > > > > old default policies (which I assume effectively removes > > > > > > enforce_new_defaults?), we will set this in kolla-ansible, and try to > > > > > > deal with any fallout. Hopefully the previous work will make this > > > > > > minimal. > > > > > > > > > > > > How does that line up with other projects' approaches? What have we missed? > > > > > > > > > > Yeah, we want users/deployment projects/horizon etc to use the new policy from > > > > > keystone as first and we will see feedback how they are (good, bad, really bad) from > > > > > usage perspective. Why we choose keystone is, because new policy are there since > > > > > many cycle and ready to use. Other projects needs to work their policy as per new > > > > > SRBAC design/direction (for example nova needs to modify their policy before we ask > > > > > users to use new policy and work is under progress[2]). > > > > > > > > > > I think trying in kolla will be good way to know if we can move to keystone's new policy > > > > > completely in yoga. > > > > > > > > We have a scope-enforcing preview patch [1], and it's passing our base > > > > set of tests. I have another that triggers all of the jobs. > > > > > > > > [1] https://review.opendev.org/c/openstack/kolla-ansible/+/825406 > > > > > > > > > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#z-release-timeline > > > > > [2] https://blueprints.launchpad.net/nova/+spec/policy-defaults-refresh-2 > > > > > > > > > > -gmann > > > > > > > > > > > > > > > > > Mark > > > > > > > > > > > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > > > > > [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > > > > > [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > > > > > [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > > > > > [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > > > > > > > > > > > > > > > > > > > > > > From gmann at ghanshyammann.com Thu Jan 20 20:42:18 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 20 Jan 2022 14:42:18 -0600 Subject: [gate][nova][qa] tempest-*-centos-8-stream job failing consistently since yesterday In-Reply-To: <17e59990993.b47647d1605836.2392080917091176150@ghanshyammann.com> References: <17e59990993.b47647d1605836.2392080917091176150@ghanshyammann.com> Message-ID: <17e7939cfb1.b336e5ff977318.6284505705360244259@ghanshyammann.com> ---- On Fri, 14 Jan 2022 11:18:28 -0600 Ghanshyam Mann wrote ---- > Hello Everyone, > > You might have noticed that 'tempest-integrated-compute-centos-8-stream' > and 'tempest tempest-full-py3-centos-8-stream' job started failing the below two > tests consistently since yesterday (~7 PM CST) > > - https://169dddc67bd535a0361f-0632fd6194b48b475d9eb0d8f7720c6c.ssl.cf2.rackcdn.com/824478/5/check/tempest-integrated-compute-centos-8-stream/e0db6a9/testr_results.html > > I have filed the bug and to unblock the gate (nova & tempest), I have pushed patch > to make these job non voting until bug is fixed. > > - https://review.opendev.org/c/openstack/tempest/+/824740 > > Please hold the recheck on nova or tempest (or any other effected project). With the devstack workaround (https://review.opendev.org/c/openstack/devstack/+/824862), jobs are made voting again and working fine https://review.opendev.org/c/openstack/tempest/+/824962 Thanks Yatin. -gmann > > ralonsoh mentioned that this is not the same issue which is raised in > - http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026682.html > > or may be triggered due to same root cause? > > > -gmann > > From cboylan at sapwetik.org Fri Jan 21 00:17:50 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 20 Jan 2022 16:17:50 -0800 Subject: [gate][nova][qa] tempest-*-centos-8-stream job failing consistently since yesterday In-Reply-To: <17e7939cfb1.b336e5ff977318.6284505705360244259@ghanshyammann.com> References: <17e59990993.b47647d1605836.2392080917091176150@ghanshyammann.com> <17e7939cfb1.b336e5ff977318.6284505705360244259@ghanshyammann.com> Message-ID: <369db78f-eec2-4767-a64d-189465a57cc1@www.fastmail.com> On Thu, Jan 20, 2022, at 12:42 PM, Ghanshyam Mann wrote: > ---- On Fri, 14 Jan 2022 11:18:28 -0600 Ghanshyam Mann > wrote ---- > > Hello Everyone, > > > > You might have noticed that > 'tempest-integrated-compute-centos-8-stream' > > and 'tempest tempest-full-py3-centos-8-stream' job started failing > the below two > > tests consistently since yesterday (~7 PM CST) > > > > - > https://169dddc67bd535a0361f-0632fd6194b48b475d9eb0d8f7720c6c.ssl.cf2.rackcdn.com/824478/5/check/tempest-integrated-compute-centos-8-stream/e0db6a9/testr_results.html > > > > I have filed the bug and to unblock the gate (nova & tempest), I > have pushed patch > > to make these job non voting until bug is fixed. > > > > - https://review.opendev.org/c/openstack/tempest/+/824740 > > > > Please hold the recheck on nova or tempest (or any other effected > project). > > With the devstack workaround > (https://review.opendev.org/c/openstack/devstack/+/824862), > jobs are made voting again and working fine > https://review.opendev.org/c/openstack/tempest/+/824962 Looks like http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/systemd-239-55.el8.x86_64.rpm exists upstream of us now. Our mirrors haven't updated to pull that in yet but should soon. Then we will also need new centos 8 stream images built as systemd is including in them and I'm not sure that systemd will get updated later. Once that happens you should be able to revert the various workarounds that have been made. > > Thanks Yatin. > > -gmann > > > > > ralonsoh mentioned that this is not the same issue which is raised in > > - > http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026682.html > > > > or may be triggered due to same root cause? > > > > > > -gmann > > > > From cboylan at sapwetik.org Fri Jan 21 00:18:40 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 20 Jan 2022 16:18:40 -0800 Subject: [all][infra][kayobe][kolla] ping not permitted on latest centos-8-stream images In-Reply-To: References: <20220114145054.6bcnmu2335jepbvq@yuggoth.org> <192ecffc-4919-4576-805a-927f9fbd60f5@www.fastmail.com> <20220114155231.zkqlje3bsj7st6dv@yuggoth.org> Message-ID: On Thu, Jan 20, 2022, at 3:24 AM, Pierre Riteau wrote: > On Fri, 14 Jan 2022 at 16:57, Jeremy Stanley wrote: >> >> On 2022-01-14 07:35:05 -0800 (-0800), Clark Boylan wrote: >> [...] >> > I don't think we should update DIB or our images to fix this. The >> > distro is broken and our images accurately represent that state. >> > If the software in CI fails as a result that is because our CI >> > system is properly catching this problem. The software needs to >> > work around this to ensure that it is deployable in the real world >> > and not just on our systems. >> > >> > This approach of fixing it in the software itself appears to be >> > the one TripleO took and is the correct approach. >> >> Thanks, in reflection I agree. It's good to keep reminding ourselves >> that what we're testing is that the software works on the target >> platform. Unfortunate and temporary as it may be, the current state >> of CentOS Stream 8 is that you need root privileges in order to use >> the ping utility. If we work around this in our testing, then users >> who are trying to deploy that software onto the current state of >> CentOS Stream 8 will not get the benefit of the workaround. >> >> It's good to be reminded that the goal is not to make tests pass no >> matter the cost, it's to make sure the software will work for its >> users. >> -- >> Jeremy Stanley > > We have applied the workaround in Kolla Ansible and backported it to > stable branches. > > A fixed systemd package is hopefully coming to CentOS Stream 8 soon, > as it was imported in Git yesterday: > https://git.centos.org/rpms/systemd/c/3d3dc89fb25868e8038ecac8d5aef0603bdfaaa2?branch=c8s Looks like http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/systemd-239-55.el8.x86_64.rpm exists upstream of us now. Our mirrors haven't updated to pull that in yet but should soon. Then we will also need new centos 8 stream images built as systemd is included in them, and I'm not sure that systemd will get updated later. Once that happens you should be able to revert the various workarounds that have been made. From aschultz at redhat.com Fri Jan 21 03:20:15 2022 From: aschultz at redhat.com (Alex Schultz) Date: Thu, 20 Jan 2022 20:20:15 -0700 Subject: [gate][nova][qa] tempest-*-centos-8-stream job failing consistently since yesterday In-Reply-To: <369db78f-eec2-4767-a64d-189465a57cc1@www.fastmail.com> References: <17e59990993.b47647d1605836.2392080917091176150@ghanshyammann.com> <17e7939cfb1.b336e5ff977318.6284505705360244259@ghanshyammann.com> <369db78f-eec2-4767-a64d-189465a57cc1@www.fastmail.com> Message-ID: On Thu, Jan 20, 2022 at 5:20 PM Clark Boylan wrote: > > On Thu, Jan 20, 2022, at 12:42 PM, Ghanshyam Mann wrote: > > ---- On Fri, 14 Jan 2022 11:18:28 -0600 Ghanshyam Mann > > wrote ---- > > > Hello Everyone, > > > > > > You might have noticed that > > 'tempest-integrated-compute-centos-8-stream' > > > and 'tempest tempest-full-py3-centos-8-stream' job started failing > > the below two > > > tests consistently since yesterday (~7 PM CST) > > > > > > - > > https://169dddc67bd535a0361f-0632fd6194b48b475d9eb0d8f7720c6c.ssl.cf2.rackcdn.com/824478/5/check/tempest-integrated-compute-centos-8-stream/e0db6a9/testr_results.html > > > > > > I have filed the bug and to unblock the gate (nova & tempest), I > > have pushed patch > > > to make these job non voting until bug is fixed. > > > > > > - https://review.opendev.org/c/openstack/tempest/+/824740 > > > > > > Please hold the recheck on nova or tempest (or any other effected > > project). > > > > With the devstack workaround > > (https://review.opendev.org/c/openstack/devstack/+/824862), > > jobs are made voting again and working fine > > https://review.opendev.org/c/openstack/tempest/+/824962 > > Looks like http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/systemd-239-55.el8.x86_64.rpm exists upstream of us now. Our mirrors haven't updated to pull that in yet but should soon. Then we will also need new centos 8 stream images built as systemd is including in them and I'm not sure that systemd will get updated later. > > Once that happens you should be able to revert the various workarounds that have been made. > Unfortunately per the bz (https://bugzilla.redhat.com/show_bug.cgi?id=2037807) that package might not fix it. The fix they applied incorrectly included a -. Looks like we need to wait for a different version. Thanks, -Alex > > > > Thanks Yatin. > > > > -gmann > > > > > > > > ralonsoh mentioned that this is not the same issue which is raised in > > > - > > http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026682.html > > > > > > or may be triggered due to same root cause? > > > > > > > > > -gmann > > > > > > > From chkumar at redhat.com Fri Jan 21 04:01:20 2022 From: chkumar at redhat.com (Chandan Kumar) Date: Fri, 21 Jan 2022 09:31:20 +0530 Subject: [Infra]CentOS Stream 8 based jobs are hitting RETRY or RETRY_LIMIT In-Reply-To: <20220120144828.iwjpa4vjevg7e7pa@yuggoth.org> References: <20220120140908.ry5tqr7txwrrctcu@yuggoth.org> <20220120144828.iwjpa4vjevg7e7pa@yuggoth.org> Message-ID: On Thu, Jan 20, 2022 at 8:25 PM Jeremy Stanley wrote: > > On 2022-01-20 14:09:09 +0000 (+0000), Jeremy Stanley wrote: > [...] > > The content of /centos/8-stream/AppStream/x86_64/os/repodata/ on > > mirror.facebook.net is identical to what we're serving already. I > > checked some other mirrors, e.g. linuxsoft.cern.ch, and see the > > same. The repomd.xml indices on them all match too. > > Further investigation of our mirror update logs indicates there was > some (likely global) upheaval for CentOS Stream 8 package indices, > which we then mirrored on what was probably a several hour delay as > we're multiple mirror "hops" from their primary. The mirror at > LeaseWeb, which we pull from, had an index update around 06:00 UTC > which seems to roughly coincide with when the problems began, and > then we saw those indices switch back around 12:00 UTC to what they > had been previously. > > The timeframe where the suspected problem indices were being served > from our mirrors was approximately 06:55-12:57 UTC. We also saw a > failure to upload updated centos-8-stream images to a significant > proportion of our providers shortly prior to this, so out of an > abundance of caution I've issued a delete for that image (falling > back to the one built yesterday), and our builders are presently > refreshing it from what is hopefully now a sane mirror of the > packages and indices. Thank you Jeremy for looking into it. It seems the issue was from the CentOS side itself. Maybe the RDO team can help here to avoid these kinds of issues in future. Thanks, Chandan Kumar From victoria at vmartinezdelacruz.com Fri Jan 21 10:27:22 2022 From: victoria at vmartinezdelacruz.com (=?UTF-8?Q?Victoria_Mart=C3=ADnez_de_la_Cruz?=) Date: Fri, 21 Jan 2022 11:27:22 +0100 Subject: [outreachy] Stepping down as Outreachy co-organizer In-Reply-To: References: Message-ID: Joining a bit late to this convo, but I didn't want to miss the opportunity to say thanks Samuel Thanks to contributors like you, this internship program can continue to be alive in OpenStack, having both a great impact to us and to the newcomers starting their careers in open source. All the best, Victoria On Thu, Jan 20, 2022 at 2:24 PM Sofia Enriquez wrote: > Thank you so much Samuel. You really helped me when I was both and intern > and a mentor :P > > > > > On Mon, Jan 10, 2022 at 5:40 PM Goutham Pacha Ravi > wrote: > >> On Mon, Jan 3, 2022 at 9:58 AM Samuel de Medeiros Queiroz >> wrote: >> > >> > Hi all, >> > >> > Outreachy is a wonderful program that promotes diversity in open source >> communities by giving opportunities to people in underrepresented groups. >> > >> > This was a hard decision to make, but I have not been committing the >> time this project deserves. >> > For that reason, I would like to give visibility that I am stepping >> down as an Outreachy organizer. >> > >> > It was a great honor to serve as co-organizer since late 2018, and we >> had 19 internships since then. >> > I also had the pleasure to serve twice (2016 and 2017) as a mentor. >> >> Wow - and these internships have had a tremendous impact over these >> years! Thank you so much for your service, Samuel! >> >> > >> > Mahati, it was a great pleasure co-organizing Outreachy in this >> community with you. >> > >> > Thanks! >> > Samuel Queiroz >> >> > > -- > > Sof?a Enriquez > > she/her > > Software Engineer > > Red Hat PnT > > IRC: @enriquetaso > @RedHat Red Hat > Red Hat > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victoria at vmartinezdelacruz.com Fri Jan 21 10:58:13 2022 From: victoria at vmartinezdelacruz.com (=?UTF-8?Q?Victoria_Mart=C3=ADnez_de_la_Cruz?=) Date: Fri, 21 Jan 2022 11:58:13 +0100 Subject: [manila][cinder][glance][nova] Pop-up team for design and development of a Cephadm DevStack plugin In-Reply-To: References: Message-ID: Thanks everybody for your responses! I'll send an initial path with a macro to pick between implementations and we can go from there. Let's have a meeting next Friday at 3pm UTC in #openstack-qa at OFTC. Would that work for you? Thanks, Victoria On Thu, Jan 20, 2022 at 2:04 PM Sofia Enriquez wrote: > Sounds good. I wanna join . > I haven't tried cephadm yet but It would help us to make ceph new features > more transparent to Cinder in the future. > Thanks > ++ > > > On Thu, Jan 20, 2022 at 4:45 AM Francesco Pantano > wrote: > >> Hi Victoria, >> thanks for starting this thread. >> >> On Wed, Jan 19, 2022 at 2:03 PM Sean Mooney wrote: >> >>> On Wed, 2022-01-19 at 12:04 +0100, Victoria Mart?nez de la Cruz wrote: >>> > Hi all, >>> > >>> > I'm reaching out to you to let you know that we will start the design >>> and >>> > development of a Cephadm DevStack plugin. >>> > >>> > Some of the reasons on why we want to take this approach: >>> > >>> > - devstack-plugin-ceph worked for us for a lot of years, but the >>> > development of it relies on several hacks to adapt to the different >>> Ceph >>> > versions we use and the different distros we support. This led to a >>> > monolithic script that sometimes is hard to debug and break our >>> development >>> > environments and our CI >>> > - cephadm is the deployment tool developed and maintained by the Ceph >>> > community, it allows their users to get specific Ceph versions very >>> easily >>> > and enforces good practices for Ceph clusters. From their docs, >>> "Cephadm >>> > manages the full lifecycle of a Ceph cluster. It starts by >>> bootstrapping a >>> > tiny Ceph cluster on a single node (one monitor and one manager) and >>> then >>> > uses the orchestration interface (?day 2? commands) to expand the >>> cluster >>> > to include all hosts and to provision all Ceph daemons and services. >>> [0]" >>> > - OpenStack deployment tools are starting to use cephadm as their way >>> to >>> > deploy Ceph, so it would be nice to include cephadm in our development >>> > process to be closer with what is being done in the field >>> > >>> > I started the development of this in [1], but it might be better to >>> change >>> > devstack-plugin-ceph to do this instead of having a new plugin. This is >>> > something I would love to discuss in a first meeting. >>> i would advocate for pivoting devstack-plugin-ceph. >>> i dont think we have the capsity as a comunity to devleop, maintaine and >>> debug/support >>> 2 differnt ways of deploying ceph in our ci system in the long term. >>> >>> ++ let's pivot devstack-plugin-ceph to me the way devstack-plugin-ceph install cpeh is jsut an implementaion >>> detail. >>> its contract is that it will install and configure ceph for use with >>> openstack. >>> if you make it use cephadm for that its just and internal detail that >>> should not >>> affect the consomes of the plugin provide you maintain the interface to >>> the devstack pluging >>> mostly the same. >>> >> Starting with pacific the deployment of Ceph is moved from ceph-ansible >> to cephadm: the implication of this change it's not just >> on the deployment side but this new component (which interacts with the >> ceph orchestrator module) is able to maintain the lifecycle >> of the deployed containers, so I'd say the new approach it's not just an >> implementation detail but also changes the way some components >> interact with Ceph. >> Manila using ganesha, for instance, it's the first component that should >> start using the orchestrator interface, so I guess it's worth >> aligning (and extending) the most popular dev installer to support the >> new way (like other projects already did). >> >> >> >>> >>> i would suggest addign a devstack macro initally to choose the backend >>> but then eventually >>> once the cephadm appoch is stable just swap the default. >>> >> +1 on choosing the backend and plan the switch when the cephadm approach >> is ready and works for all the openstack components >> >>> >>> > >>> > Having said this, I propose using the channel #openstack-cephadm in the >>> > OFTC network to talk about this and set up a first meeting with people >>> > interested in contributing to this effort. >>> ack im not sure i will get involed with this but the other option woudl >>> be to >>> just use #openstack-qa since that is the chanlle for devstack >>> development. >>> >> >> Either #openstack-qa or a dedicated one works well , maybe #openstack-qa >> is useful to reach more people >> who can help / review the relevant changes .. wdyt >> > ++ let's meet in #openstack-qa > >>> > >>> > Thanks, >>> > >>> > Victoria >>> > >>> > [0] https://docs.ceph.com/en/pacific/cephadm/ >>> > [1] https://github.com/vkmc/devstack-plugin-cephadm >>> >>> >>> >> >> -- >> Francesco Pantano >> GPG KEY: F41BD75C >> > > > -- > > Sof?a Enriquez > > she/her > > Software Engineer > > Red Hat PnT > > IRC: @enriquetaso > @RedHat Red Hat > Red Hat > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From syedammad83 at gmail.com Fri Jan 21 11:02:05 2022 From: syedammad83 at gmail.com (Ammad Syed) Date: Fri, 21 Jan 2022 16:02:05 +0500 Subject: [xena][nova] Libvirt Live Migration Lock Message-ID: Hi, I have a weird error while performing live migration of trove instances. The instance has a root disk residing on a local compute node and a second a second disk is attached from shared storage via cinder. There are two networks attached with the instance, one is vlan backed and other one is geneve. When I try to perform its live migration, it fails. I see below errors in libvirt logs and the same in nova-compute logs. Jan 19 15:56:54 host03.cloud.com libvirtd[731030]: Cannot start job (query, none, none) for domain instance-00000819; current job is (none, none, migration in) owned by (0 , 0 , 0 remoteDispatchDomainMigratePrepare3Params (flags=0x9b)) for (0s, 0s, 61s) Jan 19 15:56:54 host03.cloud.com libvirtd[731030]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePrepare3Params) Jan 19 15:36:30 host03.cloud.com libvirtd[731030]: migration successfully aborted Any advise how to fix it ? I am using below libvirt version. libvirt 6.0 qemu-kvm 4.2 kernel 5.4 nova 24.0 Ammad -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Fri Jan 21 11:46:04 2022 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 21 Jan 2022 06:46:04 -0500 Subject: [magnum][tc] Proposing Jake Yip for core-reviewer In-Reply-To: References: Message-ID: non core +1 ? On Thu, Jan 20, 2022 at 11:09 AM Spyros Trigazis wrote: > Dear all, > > I would like to nominate Jake Yip for core reviewer in the magnum project. > > Jake has been doing code reviews for the past years in magnum at a steady > rate > and is running magnum as well in their organization. With very good > knowledge > of the magnum codebase, I am confident that Jake will help increase the > pace > of development in the project. > > Cheers, > Spyros > > > -- Mohammed Naser VEXXHOST, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Fri Jan 21 11:48:21 2022 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 21 Jan 2022 06:48:21 -0500 Subject: [ops][nova][victoria] Power State = Suspended? In-Reply-To: References: <0670B960225633449A24709C291A52525125F40D@COM01.performair.local> <9e5c73c2a2b9513c8c44c19f6d370411c016b169.camel@redhat.com> Message-ID: On Thu, Jan 20, 2022 at 8:20 AM Christian Rohmann < christian.rohmann at inovex.de> wrote: > Hey there, > > On 04/08/2021 19:37, Sean Mooney wrote: > > I had something unusual happen this morning; one of my VMs was showing "Suspended" under the Power State in the Horizon dashboard. > > I've never seen that. What does it mean? > > Any search that I do points me to a bunch of resources for Status Suspended. > > suspened is like hibernate in windows. in the libvirt driver we call libvirt managed_save api > this pauses the guests, snapshots the guest ram and saves it to disk then stops the instance. > so this frees the guest ram on the host and save it to a file so that we can recreate the vm and resume it > as if nothing happened. > > Sorry to hijack such an old thread. Looking into these features, I was > just wondering if it was possible to: > > 1) Disable the support for pause / suspend altogether and not allow > anyone to place instances in such states? > you can use policy to disable suspending vms via the api > 2) Change the storage location of the saved guest RAM to a shared > storage to allow the instance to be migrated while being suspended/paused. > As far as I can see currently this data is saved on the host disk. > you can mount the path where things get saved at where ever you want (I think it?s somewhere inside /var/lib/nova/instances) > > Regards > > > Christian > > > -- Mohammed Naser VEXXHOST, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsteinmuller at vexxhost.com Fri Jan 21 12:13:08 2022 From: gsteinmuller at vexxhost.com (=?UTF-8?Q?Guilherme_Steinm=C3=BCller?=) Date: Fri, 21 Jan 2022 09:13:08 -0300 Subject: [magnum][tc] Proposing Jake Yip for core-reviewer In-Reply-To: References: Message-ID: +1 ! On Fri, 21 Jan 2022 at 08:50 Mohammed Naser wrote: > non core +1 ? > > On Thu, Jan 20, 2022 at 11:09 AM Spyros Trigazis > wrote: > >> Dear all, >> >> I would like to nominate Jake Yip for core reviewer in the magnum project. >> >> Jake has been doing code reviews for the past years in magnum at a steady >> rate >> and is running magnum as well in their organization. With very good >> knowledge >> of the magnum codebase, I am confident that Jake will help increase the >> pace >> of development in the project. >> >> Cheers, >> Spyros >> >> >> -- > Mohammed Naser > VEXXHOST, Inc. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Fri Jan 21 14:43:21 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Fri, 21 Jan 2022 23:43:21 +0900 Subject: [cinder] Bug deputy report for week of 01-19-2022 In-Reply-To: References: Message-ID: There is one more bug I've reported under cinderlib. https://bugs.launchpad.net/cinderlib/+bug/1958159 "stable/ussuri: cinderlib-lvm-functional job is failing" . Unassigned AS reported, currently there is one CI job broken. The issue is currently observed in stable/ussuri only, and is a potential blocker for CI migration from CentOS8 to CentOS8 Stream[1]. [1] https://review.opendev.org/q/topic:%2522remove-centos-8%2522+project:openstack/cinderlib The error looks strange and unfortunately I've not yet got time to identify the cause. I would appreciate it if anyone can take a look. On Wed, Jan 19, 2022 at 9:59 PM Sofia Enriquez wrote: > This is a bug report from 01-12-2022 to 01-19-2022. > Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting > > ----------------------------------------------------------------------------------------- > > Medium > > - https://bugs.launchpad.net/cinder/+bug/1957804 "RBD deferred > deletion causes undeletable RBD snapshots." In Progress. Assigned to Eric. > - https://bugs.launchpad.net/cinder/+bug/1958122 "HPE 3PAR: In multi > host env, multi-detach works partially if volume is attached to instances > from separate hosts." In Progress. Assigned to Raghavendra Tilay. > > Low > > - https://bugs.launchpad.net/cinder/+bug/1958023 "Solidfire: there are > some references to the removed parameters left." In Progress. Assigned to > kajinamit. > - https://bugs.launchpad.net/cinder/+bug/1958245 "NetApp ONTAP driver > shows type error exception when replicating FlexGroups." Unassigned. > > Cheers, > > -- > > Sof?a Enriquez > > she/her > > Software Engineer > > Red Hat PnT > > IRC: @enriquetaso > @RedHat Red Hat > Red Hat > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Fri Jan 21 15:26:35 2022 From: katonalala at gmail.com (Lajos Katona) Date: Fri, 21 Jan 2022 16:26:35 +0100 Subject: [neutron] Welcome Oleg Bondarev in the drivers team Message-ID: Hi, Oleg is one of the most active and experienced developers around Neutron, and long served to be a member of the Drivers team (see: [0]). We all agreed that Oleg deserves to be part of the team and he will be a great member (see [1]), so I just added him to the neutron-drivers group. Welcome to the team Oleg! [0]: https://review.opendev.org/admin/groups/5b063c96511f090638652067cf0939da1cb6efa7,members [1]: http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026690.html Lajos Katona (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From anyrude10 at gmail.com Fri Jan 21 09:34:44 2022 From: anyrude10 at gmail.com (Anirudh Gupta) Date: Fri, 21 Jan 2022 15:04:44 +0530 Subject: [TripleO] Horizon login failed with Something went wrong error in IPv6 Message-ID: Hi Team, We are trying to deploy the Tripleo Train with IPv6. All the overcloud control plane networks - internal, management etc are also on the IPv6 subnet. Upon successful completion of overcloud, when I am trying to open the page, it does open. But when I enter the correct login credentials, it says something went wrong. [image: image.png] Upon looking into error logs, I found 2022-01-21 12:33:34.825 324 WARNING keystone.server.flask.application [req-0660e62c-dff7-4609-89fe-225e177a84f8 f908417368a24cc685818bb5fc54fe12 - - default -] *Authorization failed. The request you have made requires authentication. from fd00:fd00:fd00:2000::359: keystone.exception.Unauthorized: The request you have made requires authentication.* where *fd00:fd00:fd00:2000::359 is my internal IP address which is reachable.* Regards Anirudh Gupta -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 108018 bytes Desc: not available URL: From miguel at mlavalle.com Fri Jan 21 16:00:41 2022 From: miguel at mlavalle.com (Miguel Lavalle) Date: Fri, 21 Jan 2022 10:00:41 -0600 Subject: [neutron] Welcome Oleg Bondarev in the drivers team In-Reply-To: References: Message-ID: Welcome! On Fri, Jan 21, 2022 at 9:39 AM Lajos Katona wrote: > Hi, > > Oleg is one of the most active and experienced developers around Neutron, > and long served to be a member of the Drivers team (see: [0]). > We all agreed that Oleg deserves to be part of the team and he will be a > great member (see [1]), > so I just added him to the neutron-drivers group. > Welcome to the team Oleg! > > [0]: > https://review.opendev.org/admin/groups/5b063c96511f090638652067cf0939da1cb6efa7,members > [1]: > http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026690.html > > Lajos Katona (lajoskatona) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Jan 21 16:50:33 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 21 Jan 2022 10:50:33 -0600 Subject: [gate][nova][qa] tempest-*-centos-8-stream job failing consistently since yesterday In-Reply-To: References: <17e59990993.b47647d1605836.2392080917091176150@ghanshyammann.com> <17e7939cfb1.b336e5ff977318.6284505705360244259@ghanshyammann.com> <369db78f-eec2-4767-a64d-189465a57cc1@www.fastmail.com> Message-ID: <17e7d8bfdc3.e44280f842736.893825997748646676@ghanshyammann.com> ---- On Thu, 20 Jan 2022 21:20:15 -0600 Alex Schultz wrote ---- > On Thu, Jan 20, 2022 at 5:20 PM Clark Boylan wrote: > > > > On Thu, Jan 20, 2022, at 12:42 PM, Ghanshyam Mann wrote: > > > ---- On Fri, 14 Jan 2022 11:18:28 -0600 Ghanshyam Mann > > > wrote ---- > > > > Hello Everyone, > > > > > > > > You might have noticed that > > > 'tempest-integrated-compute-centos-8-stream' > > > > and 'tempest tempest-full-py3-centos-8-stream' job started failing > > > the below two > > > > tests consistently since yesterday (~7 PM CST) > > > > > > > > - > > > https://169dddc67bd535a0361f-0632fd6194b48b475d9eb0d8f7720c6c.ssl.cf2.rackcdn.com/824478/5/check/tempest-integrated-compute-centos-8-stream/e0db6a9/testr_results.html > > > > > > > > I have filed the bug and to unblock the gate (nova & tempest), I > > > have pushed patch > > > > to make these job non voting until bug is fixed. > > > > > > > > - https://review.opendev.org/c/openstack/tempest/+/824740 > > > > > > > > Please hold the recheck on nova or tempest (or any other effected > > > project). > > > > > > With the devstack workaround > > > (https://review.opendev.org/c/openstack/devstack/+/824862), > > > jobs are made voting again and working fine > > > https://review.opendev.org/c/openstack/tempest/+/824962 > > > > Looks like http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/systemd-239-55.el8.x86_64.rpm exists upstream of us now. Our mirrors haven't updated to pull that in yet but should soon. Then we will also need new centos 8 stream images built as systemd is including in them and I'm not sure that systemd will get updated later. > > > > Once that happens you should be able to revert the various workarounds that have been made. > > > > Unfortunately per the bz > (https://bugzilla.redhat.com/show_bug.cgi?id=2037807) that package > might not fix it. The fix they applied incorrectly included a -. Looks > like we need to wait for a different version. As jobs are again failing on RETRY_LIMIT (404 from CentOS-Stream - AppStream), I made them n-v again until they are stable - https://review.opendev.org/c/openstack/tempest/+/825730 -gmann > > Thanks, > -Alex > > > > > > > Thanks Yatin. > > > > > > -gmann > > > > > > > > > > > ralonsoh mentioned that this is not the same issue which is raised in > > > > - > > > http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026682.html > > > > > > > > or may be triggered due to same root cause? > > > > > > > > > > > > -gmann > > > > > > > > > > > > > From hjensas at redhat.com Fri Jan 21 17:05:31 2022 From: hjensas at redhat.com (Harald Jensas) Date: Fri, 21 Jan 2022 18:05:31 +0100 Subject: [TripleO] Horizon login failed with Something went wrong error in IPv6 In-Reply-To: References: Message-ID: On 1/21/22 10:34, Anirudh Gupta wrote: > Hi Team, > > We are trying to deploy the Tripleo Train with IPv6. > All the overcloud control plane networks - internal, management etc are > also on the IPv6 subnet. > > Upon successful completion of overcloud, when I am trying to open the > page, it does open. > But when I enter the correct login credentials, it says something went > wrong. > > image.png > > > Upon looking into error logs, I found > 2022-01-21 12:33:34.825 324 WARNING keystone.server.flask.application > [req-0660e62c-dff7-4609-89fe-225e177a84f8 > f908417368a24cc685818bb5fc54fe12 - - default -] *Authorization failed. > The request you have made requires authentication. from > fd00:fd00:fd00:2000::359: keystone.exception.Unauthorized: The request > you have made requires authentication.*** > * > * > where *fd00:fd00:fd00:2000::359 is my internal IP address which?is > reachable.* > > Regards > Anirudh Gupta Can you share more details from the deployment? Maby open a bug in Launcpad against TripleO and attach logs, templates used for deployment, and config files for Horizon? Did you set the parameter MemcachedIPv6 to true in your environment files? Does the CLI work? Regards, Harald From fungi at yuggoth.org Fri Jan 21 18:13:54 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 21 Jan 2022 18:13:54 +0000 Subject: [gate][nova][qa] tempest-*-centos-8-stream job failing consistently since yesterday In-Reply-To: <17e7d8bfdc3.e44280f842736.893825997748646676@ghanshyammann.com> References: <17e59990993.b47647d1605836.2392080917091176150@ghanshyammann.com> <17e7939cfb1.b336e5ff977318.6284505705360244259@ghanshyammann.com> <369db78f-eec2-4767-a64d-189465a57cc1@www.fastmail.com> <17e7d8bfdc3.e44280f842736.893825997748646676@ghanshyammann.com> Message-ID: <20220121181353.pcqmeqsgv3peiwma@yuggoth.org> On 2022-01-21 10:50:33 -0600 (-0600), Ghanshyam Mann wrote: [...] > jobs are again failing on RETRY_LIMIT (404 from CentOS-Stream - > AppStream) [...] Yes, reviewing the mirror-update log[*], it appears we copied a problem state from an official mirror again, which we were publishing from 00:45:01 to 06:57:37 UTC today, so similar to yesterday's event. There's a proposal[**] to switch to a different mirror, asserting that it's somehow got additional verification measures to make it immune to these sorts of inconsistencies, so I suppose it's worth trying. [*] https://static.opendev.org/mirror/logs/rsync-mirrors/centos.log [**] https://review.opendev.org/825446 -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From abdullahzamanbabar2019 at gmail.com Fri Jan 21 18:19:38 2022 From: abdullahzamanbabar2019 at gmail.com (Abdullah Zaman Babar) Date: Fri, 21 Jan 2022 23:19:38 +0500 Subject: Bug Fix - openstack-archive/syntribos Message-ID: Dear Sir/Madam, I hope this email finds you well. Recently I was working with syntribos and I found a bug in the code which I wanted to report but the repository is archived and I'm unable to generate a pull request. Could you please guide me to contribute to the repository? It will be my first ever contribution to open-source. Regards Abdullah Zaman Babar -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Jan 21 18:51:53 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 21 Jan 2022 18:51:53 +0000 Subject: [security-sig] Bug Fix - openstack-archive/syntribos In-Reply-To: References: Message-ID: <20220121185153.wqlteoxywsrochh3@yuggoth.org> [I'm keeping you in Cc because it looked like you're not subscribed to this mailing list, but please reply to it rather than to me directly. Thanks!] On 2022-01-21 23:19:38 +0500 (+0500), Abdullah Zaman Babar wrote: > I hope this email finds you well. Recently I was working with > syntribos and I found a bug in the code which I wanted to report > but the repository is archived and I'm unable to generate a pull > request. Could you please guide me to contribute to the > repository? It will be my first ever contribution to open-source. See https://opendev.org/openstack/syntribos for details, but development on Syntribos ceased in 2018 and it is no longer maintained. If you would like to start a new project based on Syntribos, you're welcome to create a fork and build on it to apply your improvements in compliance with its copyright license (Apache License, Version 2.0 as found in its LICENSE file). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gagehugo at gmail.com Fri Jan 21 18:52:44 2022 From: gagehugo at gmail.com (Gage Hugo) Date: Fri, 21 Jan 2022 12:52:44 -0600 Subject: Bug Fix - openstack-archive/syntribos In-Reply-To: References: Message-ID: Hey Abdullah, Syntribos has been retired as an openstack project so there's no official contributions going to it at this time. Also as far as I know the repos in github for openstack projects are a read-only mirror, the contributions get submitted via gerrit. If you want to make a change to syntribos however, feel free to fork the repo and make the change there. On Fri, Jan 21, 2022 at 12:44 PM Abdullah Zaman Babar < abdullahzamanbabar2019 at gmail.com> wrote: > Dear Sir/Madam, > I hope this email finds you well. Recently I was working with syntribos > and I found a bug in the code which I wanted to report but the repository > is archived and I'm unable to generate a pull request. Could you please > guide me to contribute to the repository? > It will be my first ever contribution to open-source. > > Regards > Abdullah Zaman Babar > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Fri Jan 21 22:25:26 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Fri, 21 Jan 2022 17:25:26 -0500 Subject: [cinder] new driver freeze & exceptions Message-ID: <72ed9f96-7f1e-0d0c-63fd-aa3602d703b6@gmail.com> Hello Argonauts, The new driver merge deadline passed at 20:00 UTC today. I'm extending the new driver merge deadline to Friday 28 January at 20:00 UTC for two drivers: 1. Lightbits: It has both a cinder and os-brick patch, and I want the team to have more time to look at the os-brick patch. I think the driver patch is close to ready; the developers have been quick to respond to comments and make revisions. Also, the third-party CI is functioning and responding on patches. cinder: https://review.opendev.org/c/openstack/cinder/+/821602 os-brick: https://review.opendev.org/c/openstack/os-brick/+/821603 2. NEC Storage V Series: the driver patch has one +2 and the third party CI is functioning and responding on patches; I don't see any reason to make this one wait for the Z release. https://review.opendev.org/c/openstack/cinder/+/815614 Cinder core reviewers: please continue to review the above patches. With respect to the other proposed drivers: - Pure Storage NVMe-RoCE: vendor decided to hold it for Z - Dell EMC PowerStore NFS driver: vendor decided to hold it for Z - HPE XP Storage FC & iSCSI: the code is very straightforward, but the CI is not ready yet, so this will have to wait for Z - YADRO Tatlin.UNIFIED driver: initial patch arrived yesterday (and has a -1 from Zuul); the CI appears to be running, but the cinder team needs to move on to reviewing os-brick and Yoga feature patches, so this will have to wait for Z also cheers, brian From gmann at ghanshyammann.com Fri Jan 21 23:03:26 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 21 Jan 2022 17:03:26 -0600 Subject: [all] Nomination open for OpenStack "Z" Release Naming In-Reply-To: <17e7522e7a3.d804cf84903822.6365624460716990044@ghanshyammann.com> References: <17e49d142da.1232180f6369644.468276914290122896@ghanshyammann.com> <17e7522e7a3.d804cf84903822.6365624460716990044@ghanshyammann.com> Message-ID: <17e7ee15ef8.d9eaa8fe52465.365410538673369733@ghanshyammann.com> ---- On Wed, 19 Jan 2022 19:38:48 -0600 Ghanshyam Mann wrote ---- > ---- On Tue, 11 Jan 2022 09:45:57 -0600 Ghanshyam Mann wrote ---- > > Hello Everyone, > > > > We are now starting the process for the OpenStack 'Z' release name. We are a little late > > to start it, sorry for that. I have proposed to close the nomination on 24th Jan[1]. I am hoping that is > > enough time to collect the names, If not please reply to this thread or in gerrit review[1]. > > > > Once the governance patch is merged I will update the final dates of nomination close > > and polls here, meanwhile in parallel please start proposing the name on the wiki page. > > Below is the schedule for the Z release process: > > Nomination close: 24th Jan 2022 > Election start: 25th Jan 2022 > Election end: 1st Feb 2022 Before election start on 25th, I would like to request community members to filter out the names which has any objection from cultural, historical point of view. Please NOTE: legal checks on names will be performed by the foundation which will be done after the election. The idea here is to filter out any cultural, historical objective names in advance to avoid trademark check cost. And along with TC (who will check name criteria also), any Community members can move such names to 'Proposed Names that do not meet the criteria' section in below wiki with reasoning or bring it in this ML thread. - https://wiki.openstack.org/wiki/Release_Naming/Z_Proposals -gmann > > -gmann > > > > > Criteria: > > ====== > > - Refer to the below governance page for the naming criteria: > > > > https://governance.openstack.org/tc/reference/release-naming.html#release-name-criteria > > > > - Any community members can propose the name to the below wiki page: > > > > https://wiki.openstack.org/wiki/Release_Naming/Z_Proposals > > > > We encourage all community members to participate in this process. > > > > [1] https://review.opendev.org/c/openstack/governance/+/824201 > > > > -gmann > > > > > > > > > > From gmann at ghanshyammann.com Fri Jan 21 23:28:10 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 21 Jan 2022 17:28:10 -0600 Subject: [all][tc] What's happening in Technical Committee: summary 21st Jan, 21: Reading: 5 min Message-ID: <17e7ef8037a.ce52445052734.8300213236832369504@ghanshyammann.com> Hello Everyone, Here is this week's summary of the Technical Committee activities. 1. TC Meetings: ============ * We had this week meeting yesterday. Most of the meeting discussions are summarized below (Completed or in-progress activities section). Meeting full logs are available @https://meetings.opendev.org/meetings/tc/2022/tc.2022-01-20-15.00.log.html * Next week's meeting is on 27th Jan Thursday 15:00 UTC, feel free add the topic on the agenda[1] by 26th Jan. 2. What we completed this week: ========================= * [masakari] Transfer PTL role to suzhengwei[2] 3. Activities In progress: ================== TC Tracker for Yoga cycle ------------------------------ * This etherpad includes the Yoga cycle targets/working items for TC[3]. 2 out of 9 items are completed. Open Reviews ----------------- * 4 open reviews for ongoing activities[4]. Z release cycle name ------------------------- Z release cycle naming process schedule is merged[5]. Follow the ML thread for updates[6]. Z release cycle Technical Election --------------------------------------- Its time for Z release Technical Election, I have proposed the election dates (combined PTL + TC) [7], please review and add your feedback. Remove the tags framework --------------------------------- yoctozepto has proposed the WIP patch to remove the tag framework, feel free to review and provide early feedback[8]. OpenStack Pain points discussion ---------------------------------------- No updates in this week. Stay tuned on the ML thread[9] and Rico will inform you about the next meeting details. TC position on release cadence ------------------------------------- No updates on this. Discussion is in-progress in ML thread[10]. No other updates from TC on this and we will set up a separate call to continue the discussion. Fixing Zuul config error ---------------------------- Requesting projects with zuul config error to look into those and fix them which should not take much time[11]. Adjutant need maintainers and PTLs ------------------------------------------- We are still waiting to hear from Braden, Albert on permission to work on this project[12], we discussed it in this week TC meeting and there is blocker for Yoga release. We will leave this item from monitoring now and check the situation at the time of Z release election. I hope till that time, we will have more maintainer/leaders to lead this project. Project updates ------------------- * Retire js-openstack-lib (waiting on Adjutant to have new PTL/maintainer) [13] 4. How to contact the TC: ==================== If you would like to discuss or give feedback to TC, you can reach out to us in multiple ways: 1. Email: you can send the email with tag [tc] on openstack-discuss ML[14]. 2. Weekly meeting: The Technical Committee conduct a weekly meeting every Thursday 15 UTC [15] 3. Ping us using 'tc-members' nickname on #openstack-tc IRC channel. Welcome back from the holidays and stay safe! [1] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting [2] https://review.opendev.org/c/openstack/governance/+/824509 [3] https://etherpad.opendev.org/p/tc-yoga-tracker [4] https://review.opendev.org/q/projects:openstack/governance+status:open [5] https://review.opendev.org/c/openstack/governance/+/824201 [6] http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026810.html [7] https://review.opendev.org/c/openstack/election/+/825017 [8] https://review.opendev.org/c/openstack/governance/+/822900 [9] http://lists.openstack.org/pipermail/openstack-discuss/2021-December/026245.html [10] http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025684.html [11] https://etherpad.opendev.org/p/zuul-config-error-openstack [12] http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025786.html [13] https://review.opendev.org/c/openstack/governance/+/798540 [14] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss [15] http://eavesdrop.openstack.org/#Technical_Committee_Meeting -gmann From gmann at ghanshyammann.com Fri Jan 21 23:53:21 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 21 Jan 2022 17:53:21 -0600 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: <17e731cbfcc.e2463d54885770.5926410035050303741@ghanshyammann.com> <17e78ca4f43.11df902c0973397.2476553495581610085@ghanshyammann.com> <17e790f0a61.fa031a2a976064.4934623486848755469@ghanshyammann.com> Message-ID: <17e7f0f14fd.114ff118752948.3643237598290287986@ghanshyammann.com> ---- On Thu, 20 Jan 2022 14:41:00 -0600 Mark Goddard wrote ---- > On Thu, 20 Jan 2022 at 19:55, Ghanshyam Mann wrote: > > > > ---- On Thu, 20 Jan 2022 13:36:53 -0600 Mark Goddard wrote ---- > > > On Thu, 20 Jan 2022 at 18:40, Ghanshyam Mann wrote: > > > > > > > > > > > > ---- On Thu, 20 Jan 2022 03:35:33 -0600 Mark Goddard wrote ---- > > > > > On Wed, 19 Jan 2022 at 16:12, Ghanshyam Mann wrote: > > > > > > > > > > > > ---- On Wed, 19 Jan 2022 04:35:53 -0600 Mark Goddard wrote ---- > > > > > > > Hi, > > > > > > > > > > > > > > If you haven't been paying close attention, it would be easy to miss > > > > > > > some of the upcoming RBAC changes which will have an impact on > > > > > > > deployment projects. I thought I'd start a thread so that we can share > > > > > > > how we are approaching this, get answers to open questions, and > > > > > > > ideally all end up with a fairly consistent approach. > > > > > > > > > > > > > > The secure RBAC work has a long history, and continues to evolve. > > > > > > > According to [1], we should start to see some fairly substantial > > > > > > > changes over the next few releases. That spec is fairly long, but > > > > > > > worth a read. > > > > > > > > > > > > > > In the yoga timeline [2], there is one change in particular that has > > > > > > > an impact on deployment projects, "3. Keystone enforces scope by > > > > > > > default". After this change, all of the deprecated policies that many > > > > > > > still rely on in Keystone will be removed. > > > > > > > > > > > > > > In kolla-ansible, we have an etherpad [5] with some notes, questions > > > > > > > and half-baked plans. We made some changes in Xena [3] to use system > > > > > > > scope in some places when interacting with system APIs in Ansible > > > > > > > tasks. > > > > > > > > > > > > > > The next change we have staged is to add the service role to all > > > > > > > service users [4], in preparation for [2]. > > > > > > > > > > > > > > Question: should the role be added with system scope or in the > > > > > > > existing service project? The obvious main use for this is token > > > > > > > validation, which seems to allow system or project scope. > > > > > > > > > > > > > > We anticipate that some service users may still require some > > > > > > > project-scoped roles, e.g. when creating resources for octavia. We'll > > > > > > > deal with those on a case by case basis. > > > > > > > > > > > > Service roles are planned for phase2 which is Z release[1]. The Idea here is > > > > > > service to service communication will happen with 'service' role (which keystone > > > > > > need to implement yet) and end users will keep using the what ever role > > > > > > is default (or overridden in policy file) which can be project or system scoped > > > > > > depends on the APIs. > > > > > > > > > > > > So at the end service-service APIs policy default will looks like > > > > > > > > > > > > '(role:admin and system:network and project_id:%(project_id)s) or (role:service and project_name:service)' > > > > > > > > > > > > Say nova will use that service role to communicate to cinder and cinder policy will pass > > > > > > as service role is in OR in default policy. > > > > > > > > > > > > But let's see how they are going to be and if any challenges when we will implement > > > > > > it in Z cycle. > > > > > > > > > > I'm not 100% on our reasoning for using the service role in yoga (I > > > > > wasn't in the discussion when we made the switch, although John > > > > > Garbutt was), although I can provide at least one reason. > > > > > > > > > > Currently, we have a bunch of service users doing things like keystone > > > > > token validation using the admin role in the service project. If we > > > > > enforce scopes & new defaults in keystone, this will no longer work, > > > > > due to the default policy: > > > > > > > > > > identity:validate_token: (role:reader and system_scope:all) or > > > > > rule:service_role or rule:token_subject > > > > > > > > > > Now we could go and assign system-reader to all these users, but if > > > > > the end goal is to give them all the service role, and that allows > > > > > token validation, then to me that seems like a better path. > > > > > > > > > > Currently, we're creating the service role during deploy & upgrade, > > > > > then assigning it to users. Keystone is supposed to create the service > > > > > role in yoga, so we can eventually drop that part. > > > > > > > > > > Does this seem reasonable? Is keystone still on track to create the > > > > > service role in yoga? > > > > > > > > I think this is a reasonable plan and once we have service roles implemented > > > > in keystone as well as in all the services to request other service APIs then > > > > deployment project (Kolla here) can update them from system_reader to > > > > actual service role. > > > > > > To be clear, I am proposing to skip system-reader, and go straight to > > > the service role in yoga. > > > > But that would not be doable until services implement service roles which is > > Yoga cycle target for keystone and Z cyle target for other projects. Or you mean > > to re-consider to target the service role for all projects also in Yoga so that > > deployment projects can go with service role directly? > > Our current plan is to add the service role to all service users in > yoga. This will allow keystone token validation to work when keystone > drops the deprecated policies. > > We will not remove the admin role from service users in the service > project during yoga. This will allow projects other than keystone to > continue to work as before. > > At some later point, we will remove the admin role from service users > in the service project, hopefully relying on the service role for most > service-service communication. There may be other roles we need to > assign in order to drop admin, but we'll assess that as we go. > > Hopefully that's a bit more of a clear picture, and it seems sensible? +1, sounds good to me. Hopefully we will get in better shape by Z release when all (or maximum) services will be migrated to new RBAC. Till than your plan sounds reasonable. -gmann > > > > > -gmann > > > > > > > > > > > > > And yes that can be done for token validation as well as > > > > the service-to-service API calls for example nova to cinder or neutron to nova > > > > APIs call. I do not think we can migrate everything (service tokens) together for all > > > > the services in deployment projects until all these services are ready with the 'service' > > > > role implementation (implementation means changing their default roles > > > > to add 'service' role for service-to-service APIs). > > > > > > > > Regarding the keystone track on service role work in Yoga or not, I do not > > > > have clear answer may be Lance or keystone team can answer it. But Lance > > > > has spec up[1] but not yet merged. > > > > > > > > [1] https://review.opendev.org/c/openstack/keystone-specs/+/818616 > > > > > > > > -gmann > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In anticipation of keystone setting enforce_scope=True and removing > > > > > > > old default policies (which I assume effectively removes > > > > > > > enforce_new_defaults?), we will set this in kolla-ansible, and try to > > > > > > > deal with any fallout. Hopefully the previous work will make this > > > > > > > minimal. > > > > > > > > > > > > > > How does that line up with other projects' approaches? What have we missed? > > > > > > > > > > > > Yeah, we want users/deployment projects/horizon etc to use the new policy from > > > > > > keystone as first and we will see feedback how they are (good, bad, really bad) from > > > > > > usage perspective. Why we choose keystone is, because new policy are there since > > > > > > many cycle and ready to use. Other projects needs to work their policy as per new > > > > > > SRBAC design/direction (for example nova needs to modify their policy before we ask > > > > > > users to use new policy and work is under progress[2]). > > > > > > > > > > > > I think trying in kolla will be good way to know if we can move to keystone's new policy > > > > > > completely in yoga. > > > > > > > > > > We have a scope-enforcing preview patch [1], and it's passing our base > > > > > set of tests. I have another that triggers all of the jobs. > > > > > > > > > > [1] https://review.opendev.org/c/openstack/kolla-ansible/+/825406 > > > > > > > > > > > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#z-release-timeline > > > > > > [2] https://blueprints.launchpad.net/nova/+spec/policy-defaults-refresh-2 > > > > > > > > > > > > -gmann > > > > > > > > > > > > > > > > > > > > Mark > > > > > > > > > > > > > > [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > > > > > > [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > > > > > > [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > > > > > > [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > > > > > > [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From gmann at ghanshyammann.com Fri Jan 21 23:55:22 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 21 Jan 2022 17:55:22 -0600 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: <17e73279ebc.da3cd092886796.8771224917629901204@ghanshyammann.com> <17e78d6230b.da0e3f8e973917.8408987284990732554@ghanshyammann.com> Message-ID: <17e7f10eb8b.ca47b84f52959.5755187650170046642@ghanshyammann.com> ---- On Thu, 20 Jan 2022 13:43:03 -0600 Mark Goddard wrote ---- > On Thu, 20 Jan 2022 at 18:53, Ghanshyam Mann wrote: > > > > ---- On Wed, 19 Jan 2022 18:05:11 -0600 Takashi Kajinami wrote ---- > > > Thank you, Ghanshyam, for your inputs.These are helpful to understand the latest plan. > > > So I think our question comes back to the original one.Currently keystone allows any of 1. system-service 2. domain-service > > > 3. project-service 4. system-admin 5. system-member 6. system-readerto validate token but which one is the appropriate one to be used by authtoken middleware ? > > > Considering the purpose of the service role, the service role is appropriate but it's not yetclear which scope should be used (as is pointed out by Mark from the beginning). > > > AFAIK token is not a resource belonging to projects so system scope looks appropriatebut what is the main intention is to allow project/domain scope ? > > > > IMO, general service role enforcement will look like: > > > > - They will enforce the same scope as APIs. For example, neutrons call nova APIs X (server external event APIs). Nova > > APIs default policy will add service role like below: > > > > policy.DocumentedRuleDefault( > > name='os_compute_api:os-server-external-events:create', > > check_str='role:admin or role:service', > > scope_types=['project'] > > ) > > > > and neutron will call the service role token with project scoped. > > Which project would be in scope here? I don't think it makes sense to > use the project of the resource that the event is for, since these > calls are generally asynchronous, so we won't have the user's context. > Currently for these service API calls AFAIK we're using a service user > (e.g. nova) which has the admin role in the service project. Those are good points and honestly saying I do not have answer to those yet. We will see while implementing those service role and per APIs/Service case by case. -gmann > > > > > Same applies to token validation APIs also, as you know, it is allowed to > > any scope (system, domain, project) so allowed service role in any scope > > can be used. Answer to your question on which one is appropriate is that > > you can use any of mentioned one as they are allowed (that is how users > > will be accessing it). > > > > I hope it answer your query but again service roles are not implemented yet > > so policy default may change especially from project side policy, hoping keystone > > policy are all good and will not change but let's wait until this spec > > - https://review.opendev.org/c/openstack/keystone-specs/+/818616 > > > > -gmann > > > > > > > > By the way, in Puppet OpenStack, we have been using the service"s" project instead ofthe service project for some reason(which I'm not aware of).So it's helpful for us if we avoid implementing strict limitations to use the service project. > > > > > > > > > On Thu, Jan 20, 2022 at 1:29 AM Ghanshyam Mann wrote: > > > ---- On Wed, 19 Jan 2022 08:01:00 -0600 Takashi Kajinami wrote ---- > > > > > > > > On Wed, Jan 19, 2022 at 9:22 PM Mark Goddard wrote: > > > > On Wed, 19 Jan 2022 at 11:15, Takashi Kajinami wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > (The topic doesn't include puppet but ...) > > > > > I recently spent some time implementing initial support for SRBAC > > > > > in Puppet OpenStack. You can find details in the etherpad[1] I created > > > > > as my working note. It includes some items commonly required by all toolings > > > > > in addition to ones specific to puppet. > > > > > [1] https://etherpad.opendev.org/p/puppet-secure-rbac > > > > > > > > Thanks for responding, Takashi - that's useful. > > > > > > > > > > > > > > I expect some of them (especially the configuration parameters) would be used > > > > > by TripleO later. > > > > > > > > > > > Question: should the role be added with system scope or in the > > > > > > existing service project? The obvious main use for this is token > > > > > > validation, which seems to allow system or project scope. > > > > > > > > > > I'd add one more question which is; > > > > > Which roles should be assigned for the service users ? > > > > > > > > > > In the project which already implemented SRBAC, system-admin + system-reader > > > > > allows any API calls and works like the previous project-admin. > > > > > > > > IIUC the direction of travel has changed, and now the intention is > > > > that system-admin won't have access to project-scoped APIs. > > > > > > Yes, as mark mentioned. And that is the key change from prevous direction. > > > We are isolating the system and project level APIs. system token will be able > > > to perform only system level operation and not allowed to do project level > > > operation. For example: system user will not be allowed to create the server > > > in nova. To have a quick view on those (we have not finished yet in nova), you > > > can check how it will look like in the below series: > > > > > > - https://review.opendev.org/q/topic:%22bp%252Fpolicy-defaults-refresh-2%22+(status:open%20OR%20status:merged) > > > > > > You can see the test cases for all four possible configuration combination and what all > > > roles are allowed in which configuration (case 4th is end goal we want to be for RBAC): > > > > > > 1. enforce_scope=False + legacy rule (current default policies) > > > 2. enforce_scope=False + No legacy rule (enable scope but remove old policy default) > > > 3. enforce_scope=True + legacy rule (enable scope with old policy default) > > > 4. enforce_scope=True + no legacy rule (end goal of new RBAC) > > > > > > > > > > > > > > > > > For token validations system-reader(or service role) would be enough but there are > > > > > some system-admin-only APIs (os-server-external-events API in nova called by neutron, > > > > > Create allocation in placement called by nova or neutron) used for communications > > > > > between services. > > > > > > > > The token validation API has the following default policy: > > > > > > > > identity:validate_token: (role:reader and system_scope:all) or > > > > rule:service_role or rule:token_subject > > > > > > > > So system-reader, system-admin or service (any scope) should work. The > > > > spec suggests that the service role is intended for use by service to > > > > service APIs, in this case the credentials provided in the > > > > keystone_authtoken config. I would guess that system scope makes most > > > > sense here with the service role, although the rule suggests it would > > > > work with project scope and the service role. > > > > > > > > I noticed I ignored implied roles... Thanks for clarifying that. > > > > I understand and I agree with this. Considering the intention of SRBAC this would fixbetter with system-scoped, as you earlier mentioned but I'll defer to the others. > > > > > > > Another thigns to note here is, in Yoga cycle we are doing only system-admin. system-reader, > > > system-member will be done in phase3 which is for future releases (BB). > > > > > > > > If we agree system-admin + system-reader is the right set then I'll update the default role > > > > > assignment accordingly. This is important for Puppet OpenStack because there are implementations > > > > > in puppet (which is usually called as providers) to manage some resources like Flavors, > > > > > and these rely on credentials of service users after trying to look up user credentials. > > > > > > > > I think one of the outcomes of this work is that authentication will > > > > necessarily become a bit more fine-grained. It might not make sense to > > > > have the same role assignments for all users. To your example, I would > > > > say that registering flavors should be done by a different user with > > > > different permissions than a service user. In kolla-ansible we don't > > > > really register flavors other than for octavia - this is up to > > > > operators. > > > > My main concern was that some service users would require system-admin butI should have read this part more carefully. https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#phase-2 > > > > So Assigning the service role (for the proper scope which is asked in the original thread)is the right way to go. For the provider stuff I'll look into any available option to replace usage of serviceuser credential but that's specific to Puppet which we can ignore here in this discussion. > > > > > > right, once we have service role implemented then we will have clear way on how services will be > > > communicating to other services APIs. > > > > > > -gmann > > > > > > > > > > > > > > > > > Takashi > > > > > > > > > > On Wed, Jan 19, 2022 at 7:40 PM Mark Goddard wrote: > > > > >> > > > > >> Hi, > > > > >> > > > > >> If you haven't been paying close attention, it would be easy to miss > > > > >> some of the upcoming RBAC changes which will have an impact on > > > > >> deployment projects. I thought I'd start a thread so that we can share > > > > >> how we are approaching this, get answers to open questions, and > > > > >> ideally all end up with a fairly consistent approach. > > > > >> > > > > >> The secure RBAC work has a long history, and continues to evolve. > > > > >> According to [1], we should start to see some fairly substantial > > > > >> changes over the next few releases. That spec is fairly long, but > > > > >> worth a read. > > > > >> > > > > >> In the yoga timeline [2], there is one change in particular that has > > > > >> an impact on deployment projects, "3. Keystone enforces scope by > > > > >> default". After this change, all of the deprecated policies that many > > > > >> still rely on in Keystone will be removed. > > > > >> > > > > >> In kolla-ansible, we have an etherpad [5] with some notes, questions > > > > >> and half-baked plans. We made some changes in Xena [3] to use system > > > > >> scope in some places when interacting with system APIs in Ansible > > > > >> tasks. > > > > >> > > > > >> The next change we have staged is to add the service role to all > > > > >> service users [4], in preparation for [2]. > > > > >> > > > > >> Question: should the role be added with system scope or in the > > > > >> existing service project? The obvious main use for this is token > > > > >> validation, which seems to allow system or project scope. > > > > >> > > > > >> We anticipate that some service users may still require some > > > > >> project-scoped roles, e.g. when creating resources for octavia. We'll > > > > >> deal with those on a case by case basis. > > > > >> > > > > >> In anticipation of keystone setting enforce_scope=True and removing > > > > >> old default policies (which I assume effectively removes > > > > >> enforce_new_defaults?), we will set this in kolla-ansible, and try to > > > > >> deal with any fallout. Hopefully the previous work will make this > > > > >> minimal. > > > > >> > > > > >> How does that line up with other projects' approaches? What have we missed? > > > > >> > > > > >> Mark > > > > >> > > > > >> [1] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > > > >> [2] https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > > > >> [3] https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > > > >> [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > > > >> [5] https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > > >> > > > > > > > > > > > > > > > > From tkajinam at redhat.com Mon Jan 24 01:55:20 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Mon, 24 Jan 2022 10:55:20 +0900 Subject: [puppet][tc] Propose changing the release model to cycle-with-rc Message-ID: Hello, I already discussed this with a few cores last year but I'm raising this topic in ML to make an official decision. Currently Puppet OpenStack is following cycle-with-intermediary and creates a release every milestone. However our code is tightly related to the actual service implementations and having only puppet releases is not very useful. Considering the above point, and effort required to cut off releases per milestone, I'll propose changing our release model to cycle-with-rc , and creating a single release. Because we already created milestone releases for Yoga, I'm thinking of applying the change from next cycle(Z). Please let me know if you have any concerns. Thank you, Takashi -------------- next part -------------- An HTML attachment was scrubbed... URL: From oleg.bondarev at huawei.com Mon Jan 24 07:10:01 2022 From: oleg.bondarev at huawei.com (Oleg Bondarev) Date: Mon, 24 Jan 2022 07:10:01 +0000 Subject: [neutron] Welcome Oleg Bondarev in the drivers team In-Reply-To: References: Message-ID: <885c09e96ddd4a6ea0d308c32af9aa64@huawei.com> Thanks Team! For proposal, voting and warm welcome!? Oleg From: Miguel Lavalle [mailto:miguel at mlavalle.com] Sent: Friday, January 21, 2022 7:01 PM To: Lajos Katona Cc: openstack-discuss Subject: Re: [neutron] Welcome Oleg Bondarev in the drivers team Welcome! On Fri, Jan 21, 2022 at 9:39 AM Lajos Katona > wrote: Hi, Oleg is one of the most active and experienced developers around Neutron, and long served to be a member of the Drivers team (see: [0]). We all agreed that Oleg deserves to be part of the team and he will be a great member (see [1]), so I just added him to the neutron-drivers group. Welcome to the team Oleg! [0]: https://review.opendev.org/admin/groups/5b063c96511f090638652067cf0939da1cb6efa7,members [1]: http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026690.html Lajos Katona (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From syedammad83 at gmail.com Mon Jan 24 07:21:15 2022 From: syedammad83 at gmail.com (Ammad Syed) Date: Mon, 24 Jan 2022 12:21:15 +0500 Subject: [xena][nova] Libvirt Live Migration Lock In-Reply-To: References: Message-ID: Hi, I have tried to troubleshoot this issue further. - created a vm from image on local compute storage (local disk not shared between compute nodes). - Tried to live migrate that vm, but it failed with the same error. I have two environments, one is on wallaby and other is on xena. The above case works fine in wallaby but not working in xena. Is there something related to live block migration changed in xena that is causing trouble ? Ammad On Fri, Jan 21, 2022 at 4:02 PM Ammad Syed wrote: > Hi, > > I have a weird error while performing live migration of trove instances. > The instance has a root disk residing on a local compute node and a second > a second disk is attached from shared storage via cinder. There are two > networks attached with the instance, one is vlan backed and other one is > geneve. > > When I try to perform its live migration, it fails. I see below errors in > libvirt logs and the same in nova-compute logs. > > Jan 19 15:56:54 host03.cloud.com libvirtd[731030]: Cannot start job > (query, none, none) for domain instance-00000819; current job is (none, > none, migration in) owned by (0 , 0 , 0 > remoteDispatchDomainMigratePrepare3Params (flags=0x9b)) for (0s, 0s, 61s) > Jan 19 15:56:54 host03.cloud.com libvirtd[731030]: Timed out during > operation: cannot acquire state change lock (held by > monitor=remoteDispatchDomainMigratePrepare3Params) > Jan 19 15:36:30 host03.cloud.com libvirtd[731030]: migration successfully > aborted > > Any advise how to fix it ? I am using below libvirt version. > > libvirt 6.0 > qemu-kvm 4.2 > kernel 5.4 > nova 24.0 > > Ammad > -- Regards, Syed Ammad Ali -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Mon Jan 24 07:50:42 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Mon, 24 Jan 2022 16:50:42 +0900 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: <17e7f0f14fd.114ff118752948.3643237598290287986@ghanshyammann.com> References: <17e731cbfcc.e2463d54885770.5926410035050303741@ghanshyammann.com> <17e78ca4f43.11df902c0973397.2476553495581610085@ghanshyammann.com> <17e790f0a61.fa031a2a976064.4934623486848755469@ghanshyammann.com> <17e7f0f14fd.114ff118752948.3643237598290287986@ghanshyammann.com> Message-ID: On Sat, Jan 22, 2022 at 8:57 AM Ghanshyam Mann wrote: > ---- On Thu, 20 Jan 2022 14:41:00 -0600 Mark Goddard > wrote ---- > > On Thu, 20 Jan 2022 at 19:55, Ghanshyam Mann > wrote: > > > > > > ---- On Thu, 20 Jan 2022 13:36:53 -0600 Mark Goddard < > mark at stackhpc.com> wrote ---- > > > > On Thu, 20 Jan 2022 at 18:40, Ghanshyam Mann < > gmann at ghanshyammann.com> wrote: > > > > > > > > > > > > > > > ---- On Thu, 20 Jan 2022 03:35:33 -0600 Mark Goddard < > mark at stackhpc.com> wrote ---- > > > > > > On Wed, 19 Jan 2022 at 16:12, Ghanshyam Mann < > gmann at ghanshyammann.com> wrote: > > > > > > > > > > > > > > ---- On Wed, 19 Jan 2022 04:35:53 -0600 Mark Goddard < > mark at stackhpc.com> wrote ---- > > > > > > > > Hi, > > > > > > > > > > > > > > > > If you haven't been paying close attention, it would be > easy to miss > > > > > > > > some of the upcoming RBAC changes which will have an > impact on > > > > > > > > deployment projects. I thought I'd start a thread so > that we can share > > > > > > > > how we are approaching this, get answers to open > questions, and > > > > > > > > ideally all end up with a fairly consistent approach. > > > > > > > > > > > > > > > > The secure RBAC work has a long history, and continues > to evolve. > > > > > > > > According to [1], we should start to see some fairly > substantial > > > > > > > > changes over the next few releases. That spec is fairly > long, but > > > > > > > > worth a read. > > > > > > > > > > > > > > > > In the yoga timeline [2], there is one change in > particular that has > > > > > > > > an impact on deployment projects, "3. Keystone enforces > scope by > > > > > > > > default". After this change, all of the deprecated > policies that many > > > > > > > > still rely on in Keystone will be removed. > > > > > > > > > > > > > > > > In kolla-ansible, we have an etherpad [5] with some > notes, questions > > > > > > > > and half-baked plans. We made some changes in Xena [3] > to use system > > > > > > > > scope in some places when interacting with system APIs > in Ansible > > > > > > > > tasks. > > > > > > > > > > > > > > > > The next change we have staged is to add the service > role to all > > > > > > > > service users [4], in preparation for [2]. > > > > > > > > > > > > > > > > Question: should the role be added with system scope or > in the > > > > > > > > existing service project? The obvious main use for this > is token > > > > > > > > validation, which seems to allow system or project scope. > > > > > > > > > > > > > > > > We anticipate that some service users may still require > some > > > > > > > > project-scoped roles, e.g. when creating resources for > octavia. We'll > > > > > > > > deal with those on a case by case basis. > > > > > > > > > > > > > > Service roles are planned for phase2 which is Z release[1]. > The Idea here is > > > > > > > service to service communication will happen with 'service' > role (which keystone > > > > > > > need to implement yet) and end users will keep using the > what ever role > > > > > > > is default (or overridden in policy file) which can be > project or system scoped > > > > > > > depends on the APIs. > > > > > > > > > > > > > > So at the end service-service APIs policy default will > looks like > > > > > > > > > > > > > > '(role:admin and system:network and > project_id:%(project_id)s) or (role:service and project_name:service)' > > > > > > > > > > > > > > Say nova will use that service role to communicate to > cinder and cinder policy will pass > > > > > > > as service role is in OR in default policy. > > > > > > > > > > > > > > But let's see how they are going to be and if any > challenges when we will implement > > > > > > > it in Z cycle. > > > > > > > > > > > > I'm not 100% on our reasoning for using the service role in > yoga (I > > > > > > wasn't in the discussion when we made the switch, although > John > > > > > > Garbutt was), although I can provide at least one reason. > > > > > > > > > > > > Currently, we have a bunch of service users doing things like > keystone > > > > > > token validation using the admin role in the service project. > If we > > > > > > enforce scopes & new defaults in keystone, this will no > longer work, > > > > > > due to the default policy: > > > > > > > > > > > > identity:validate_token: (role:reader and system_scope:all) or > > > > > > rule:service_role or rule:token_subject > > > > > > > > > > > > Now we could go and assign system-reader to all these users, > but if > > > > > > the end goal is to give them all the service role, and that > allows > > > > > > token validation, then to me that seems like a better path. > > > > > > > > > > > > Currently, we're creating the service role during deploy & > upgrade, > > > > > > then assigning it to users. Keystone is supposed to create > the service > > > > > > role in yoga, so we can eventually drop that part. > > > > > > > > > > > > Does this seem reasonable? Is keystone still on track to > create the > > > > > > service role in yoga? > > > > > > > > > > I think this is a reasonable plan and once we have service roles > implemented > > > > > in keystone as well as in all the services to request other > service APIs then > > > > > deployment project (Kolla here) can update them from > system_reader to > > > > > actual service role. > > > > > > > > To be clear, I am proposing to skip system-reader, and go straight > to > > > > the service role in yoga. > > > > > > But that would not be doable until services implement service roles > which is > > > Yoga cycle target for keystone and Z cyle target for other projects. > Or you mean > > > to re-consider to target the service role for all projects also in > Yoga so that > > > deployment projects can go with service role directly? > > > > Our current plan is to add the service role to all service users in > > yoga. This will allow keystone token validation to work when keystone > > drops the deprecated policies. > > > > We will not remove the admin role from service users in the service > > project during yoga. This will allow projects other than keystone to > > continue to work as before. > > > > At some later point, we will remove the admin role from service users > > in the service project, hopefully relying on the service role for most > > service-service communication. There may be other roles we need to > > assign in order to drop admin, but we'll assess that as we go. > > > > Hopefully that's a bit more of a clear picture, and it seems sensible? > > +1, sounds good to me. Hopefully we will get in better shape by Z release > when all (or maximum) services will be migrated to new RBAC. Till than > your plan sounds reasonable. > > -gmann > I'll follow the same approach in Puppet OpenStack and will add the project-scoped 'service' role to each service user by default. IIUC This is consistent with the current devstack which assigns the project-scoped service role to each service user, so I expect this approach will be tested in dsvm jobs [1]. [1] https://github.com/openstack/devstack/blob/d5d0bed479497560489983ae1fc80444b44fe029/lib/keystone#L421 The same was already implemented in TripleO by [2] [2] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/819250 > > > > > > > > > -gmann > > > > > > > > > > > > > > > > > And yes that can be done for token validation as well as > > > > > the service-to-service API calls for example nova to cinder or > neutron to nova > > > > > APIs call. I do not think we can migrate everything (service > tokens) together for all > > > > > the services in deployment projects until all these services are > ready with the 'service' > > > > > role implementation (implementation means changing their default > roles > > > > > to add 'service' role for service-to-service APIs). > > > > > > > > > > Regarding the keystone track on service role work in Yoga or > not, I do not > > > > > have clear answer may be Lance or keystone team can answer it. > But Lance > > > > > has spec up[1] but not yet merged. > > > > > > > > > > [1] > https://review.opendev.org/c/openstack/keystone-specs/+/818616 > > > > > > > > > > -gmann > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In anticipation of keystone setting enforce_scope=True > and removing > > > > > > > > old default policies (which I assume effectively removes > > > > > > > > enforce_new_defaults?), we will set this in > kolla-ansible, and try to > > > > > > > > deal with any fallout. Hopefully the previous work will > make this > > > > > > > > minimal. > > > > > > > > > > > > > > > > How does that line up with other projects' approaches? > What have we missed? > > > > > > > > > > > > > > Yeah, we want users/deployment projects/horizon etc to use > the new policy from > > > > > > > keystone as first and we will see feedback how they are > (good, bad, really bad) from > > > > > > > usage perspective. Why we choose keystone is, because new > policy are there since > > > > > > > many cycle and ready to use. Other projects needs to work > their policy as per new > > > > > > > SRBAC design/direction (for example nova needs to modify > their policy before we ask > > > > > > > users to use new policy and work is under progress[2]). > > > > > > > > > > > > > > I think trying in kolla will be good way to know if we can > move to keystone's new policy > > > > > > > completely in yoga. > > > > > > > > > > > > We have a scope-enforcing preview patch [1], and it's passing > our base > > > > > > set of tests. I have another that triggers all of the jobs. > > > > > > > > > > > > [1] > https://review.opendev.org/c/openstack/kolla-ansible/+/825406 > > > > > > > > > > > > > > [1] > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#z-release-timeline > > > > > > > [2] > https://blueprints.launchpad.net/nova/+spec/policy-defaults-refresh-2 > > > > > > > > > > > > > > -gmann > > > > > > > > > > > > > > > > > > > > > > > Mark > > > > > > > > > > > > > > > > [1] > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > > > > > > > > [2] > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > > > > > > > > [3] > https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > > > > > > > > [4] > https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > > > > > > > > [5] > https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From munnaeebd at gmail.com Mon Jan 24 08:09:10 2022 From: munnaeebd at gmail.com (Md. Hejbul Tawhid MUNNA) Date: Mon, 24 Jan 2022 14:09:10 +0600 Subject: Neutron issue || Remote error: TimeoutError QueuePool limit of size 5 overflow 50 reached Message-ID: Hi, Suddenly we have observed few VM down . then we have found some agent are getting down (XXX) , agents are getting UP and down randomly. Please check the attachment. //////////////////////////////////////////////////////////////////////////////////////////////////// /sqlalchemy/pool.py", line 788, in _checkout\n fairy = _ConnectionRecord.checkout(pool)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 532, in checkout\n rec = pool._do_get()\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1186, in _do_get\n (self.size(), self.overflow(), self._timeout), code="3o7r")\n', u'TimeoutError: QueuePool limit of size 5 overflow 50 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/3o7r)\n']. 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 837, in _report_state 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent True) 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 97, in report_state 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent return method(context, 'report_state', **kwargs) 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 179, in call 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent retry=self.retry) 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, in _send 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent retry=retry) 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 645, in send 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent call_monitor_timeout, retry=retry) 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 636, in _send 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent raise result 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent RemoteError: Remote error: TimeoutError QueuePool limit of size 5 overflow 50 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/3o7r) 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming\n res = self.dispatcher.dispatch(message)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch\n ///////////////////////////////////////////////////////////////////////////////// Is there anything related with the following default configuration. /etc/neutron/neutron.conf #max_pool_size = 5 #max_overflow = 50 regards, Munna -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jan 24 08:43:46 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 24 Jan 2022 09:43:46 +0100 Subject: Neutron issue || Remote error: TimeoutError QueuePool limit of size 5 overflow 50 reached In-Reply-To: References: Message-ID: <4248952.ejJDZkT8p0@p1> Hi, On poniedzia?ek, 24 stycznia 2022 09:09:10 CET Md. Hejbul Tawhid MUNNA wrote: > Hi, > > Suddenly we have observed few VM down . then we have found some agent are > getting down (XXX) , agents are getting UP and down randomly. Please check > the attachment. > > ///////////////////////////////////////////////////////////////////////////// > /////////////////////// /sqlalchemy/pool.py", line 788, in _checkout\n > fairy = > _ConnectionRecord.checkout(pool)\n', u' File > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 532, in > checkout\n rec = pool._do_get()\n', u' File > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1186, in > _do_get\n (self.size(), self.overflow(), self._timeout), > code="3o7r")\n', u'TimeoutError: QueuePool limit of size 5 overflow 50 > reached, connection timed out, timeout 30 (Background on this error at: > http://sqlalche.me/e/3o7r)\n']. > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent Traceback (most > recent call last): > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 837, in > _report_state > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent True) > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 97, in > report_state > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent return > method(context, 'report_state', **kwargs) > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 179, > in call > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent > retry=self.retry) > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, > in _send > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent retry=retry) > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", > line 645, in send > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent > call_monitor_timeout, retry=retry) > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", > line 636, in _send > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent raise result > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent RemoteError: > Remote error: TimeoutError QueuePool limit of size 5 overflow 50 reached, > connection timed out, timeout 30 (Background on this error at: > http://sqlalche.me/e/3o7r) > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent [u'Traceback > (most recent call last):\n', u' File > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, > in _process_incoming\n res = self.dispatcher.dispatch(message)\n', u' > File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 265, in dispatch\n return self._do_dispatch(endpoint, method, ctxt, > args)\n', u' File > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line > 194, in _do_dispatch\n > > ///////////////////////////////////////////////////////////////////////////// > //// > > Is there anything related with the following default configuration. > > /etc/neutron/neutron.conf > #max_pool_size = 5 > #max_overflow = 50 Yes. You probably have busy environment and You need to increase those values to have more connections from the neutron server to the database. > > regards, > Munna -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From munnaeebd at gmail.com Mon Jan 24 09:42:00 2022 From: munnaeebd at gmail.com (Md. Hejbul Tawhid MUNNA) Date: Mon, 24 Jan 2022 15:42:00 +0600 Subject: Neutron issue || Remote error: TimeoutError QueuePool limit of size 5 overflow 50 reached In-Reply-To: <4248952.ejJDZkT8p0@p1> References: <4248952.ejJDZkT8p0@p1> Message-ID: Hi, Currently we have running 500+VM and total network is 383 including HA-network. Can you advice the appropriate value and is there any chance of service impact? Should we change the configuration in the neutron.conf on controller node? Regards, Munna On Mon, Jan 24, 2022 at 2:47 PM Slawek Kaplonski wrote: > Hi, > > On poniedzia?ek, 24 stycznia 2022 09:09:10 CET Md. Hejbul Tawhid MUNNA > wrote: > > Hi, > > > > Suddenly we have observed few VM down . then we have found some agent are > > getting down (XXX) , agents are getting UP and down randomly. Please > check > > the attachment. > > > > > > ///////////////////////////////////////////////////////////////////////////// > > /////////////////////// /sqlalchemy/pool.py", line 788, in _checkout\n > > fairy = > > _ConnectionRecord.checkout(pool)\n', u' File > > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 532, in > > checkout\n rec = pool._do_get()\n', u' File > > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1186, in > > _do_get\n (self.size(), self.overflow(), self._timeout), > > code="3o7r")\n', u'TimeoutError: QueuePool limit of size 5 overflow 50 > > reached, connection timed out, timeout 30 (Background on this error at: > > http://sqlalche.me/e/3o7r)\n']. > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent Traceback > (most > > recent call last): > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > > "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 837, > in > > _report_state > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent True) > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > > "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 97, in > > report_state > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent return > > method(context, 'report_state', **kwargs) > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line > 179, > > in call > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent > > retry=self.retry) > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > > "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, > > in _send > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent > retry=retry) > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > > "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", > > line 645, in send > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent > > call_monitor_timeout, retry=retry) > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File > > "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", > > line 636, in _send > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent raise > result > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent RemoteError: > > Remote error: TimeoutError QueuePool limit of size 5 overflow 50 reached, > > connection timed out, timeout 30 (Background on this error at: > > http://sqlalche.me/e/3o7r) > > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent [u'Traceback > > (most recent call last):\n', u' File > > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line > 163, > > in _process_incoming\n res = self.dispatcher.dispatch(message)\n', u' > > File > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > > line 265, in dispatch\n return self._do_dispatch(endpoint, method, > ctxt, > > args)\n', u' File > > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line > > 194, in _do_dispatch\n > > > > > > ///////////////////////////////////////////////////////////////////////////// > > //// > > > > Is there anything related with the following default configuration. > > > > /etc/neutron/neutron.conf > > #max_pool_size = 5 > > #max_overflow = 50 > > Yes. You probably have busy environment and You need to increase those > values > to have more connections from the neutron server to the database. > > > > > regards, > > Munna > > > > -- > Slawek Kaplonski > Principal Software Engineer > Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.rohmann at inovex.de Mon Jan 24 09:59:06 2022 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Mon, 24 Jan 2022 10:59:06 +0100 Subject: [ops][nova][victoria] Power State = Suspended? In-Reply-To: References: <0670B960225633449A24709C291A52525125F40D@COM01.performair.local> <9e5c73c2a2b9513c8c44c19f6d370411c016b169.camel@redhat.com> Message-ID: Hey Mohammed, thanks for the input! On 21/01/2022 12:48, Mohammed Naser wrote: > Sorry to hijack such an old thread. Looking into these features, I was > just wondering if it was possible to: > > > ? 1) Disable the support for pause / suspend altogether and not > allow anyone to place instances in such states? > > you can use policy to disable suspending vms via the api Good point, thanks. > ? 2) Change the storage location of the saved guest RAM to a > shared storage to allow the instance to be migrated while being > suspended/paused. As far as I can see currently this data is saved > on the host disk. > > you can mount the path where things get saved at where ever you want > (I think it?s somewhere inside /var/lib/nova/instances) That's true, but this would require some multi-mountable shared storage like CephFS or some NFS to remove the dependency from a single node. It's not like Nova would store this data as e.g. a RBD image in Ceph via some config option, right? Regards Christian -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnaud.morin at gmail.com Mon Jan 24 12:29:47 2022 From: arnaud.morin at gmail.com (Arnaud) Date: Mon, 24 Jan 2022 13:29:47 +0100 Subject: =?US-ASCII?Q?Re=3A_Neutron_issue_=7C=7C_Remote_error=3A_TimeoutErro?= =?US-ASCII?Q?r_QueuePool_limit_of_size_5_overflow_50_reached?= In-Reply-To: References: <4248952.ejJDZkT8p0@p1> Message-ID: Hi, I would also consider checking the number of RPC workers you have in neutron.conf, this is maybe a better option to increase this before the comnection pool params. Also, check your database, is it under load? Updating agent state should not be long. Cheers, Arnaud Le 24 janvier 2022 10:42:00 GMT+01:00, "Md. Hejbul Tawhid MUNNA" a ?crit?: >Hi, > >Currently we have running 500+VM and total network is 383 including >HA-network. > >Can you advice the appropriate value and is there any chance of service >impact? > >Should we change the configuration in the neutron.conf on controller node? > >Regards, >Munna > > > >On Mon, Jan 24, 2022 at 2:47 PM Slawek Kaplonski >wrote: > >> Hi, >> >> On poniedzia?ek, 24 stycznia 2022 09:09:10 CET Md. Hejbul Tawhid MUNNA >> wrote: >> > Hi, >> > >> > Suddenly we have observed few VM down . then we have found some agent are >> > getting down (XXX) , agents are getting UP and down randomly. Please >> check >> > the attachment. >> > >> > >> >> ///////////////////////////////////////////////////////////////////////////// >> > /////////////////////// /sqlalchemy/pool.py", line 788, in _checkout\n >> > fairy = >> > _ConnectionRecord.checkout(pool)\n', u' File >> > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 532, in >> > checkout\n rec = pool._do_get()\n', u' File >> > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1186, in >> > _do_get\n (self.size(), self.overflow(), self._timeout), >> > code="3o7r")\n', u'TimeoutError: QueuePool limit of size 5 overflow 50 >> > reached, connection timed out, timeout 30 (Background on this error at: >> > http://sqlalche.me/e/3o7r)\n']. >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent Traceback >> (most >> > recent call last): >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >> > "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 837, >> in >> > _report_state >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent True) >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >> > "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 97, in >> > report_state >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent return >> > method(context, 'report_state', **kwargs) >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line >> 179, >> > in call >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >> > retry=self.retry) >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >> > "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, >> > in _send >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >> retry=retry) >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >> > "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", >> > line 645, in send >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >> > call_monitor_timeout, retry=retry) >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >> > "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", >> > line 636, in _send >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent raise >> result >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent RemoteError: >> > Remote error: TimeoutError QueuePool limit of size 5 overflow 50 reached, >> > connection timed out, timeout 30 (Background on this error at: >> > http://sqlalche.me/e/3o7r) >> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent [u'Traceback >> > (most recent call last):\n', u' File >> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line >> 163, >> > in _process_incoming\n res = self.dispatcher.dispatch(message)\n', u' >> > File >> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", >> > line 265, in dispatch\n return self._do_dispatch(endpoint, method, >> ctxt, >> > args)\n', u' File >> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line >> > 194, in _do_dispatch\n >> > >> > >> >> ///////////////////////////////////////////////////////////////////////////// >> > //// >> > >> > Is there anything related with the following default configuration. >> > >> > /etc/neutron/neutron.conf >> > #max_pool_size = 5 >> > #max_overflow = 50 >> >> Yes. You probably have busy environment and You need to increase those >> values >> to have more connections from the neutron server to the database. >> >> > >> > regards, >> > Munna >> >> >> >> -- >> Slawek Kaplonski >> Principal Software Engineer >> Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Mon Jan 24 13:44:22 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 24 Jan 2022 13:44:22 +0000 Subject: [ops][nova][victoria] Power State = Suspended? In-Reply-To: References: <0670B960225633449A24709C291A52525125F40D@COM01.performair.local> <9e5c73c2a2b9513c8c44c19f6d370411c016b169.camel@redhat.com> Message-ID: <08c41f9d73d137d2473831ac9a297ff248591a3f.camel@redhat.com> On Mon, 2022-01-24 at 10:59 +0100, Christian Rohmann wrote: > Hey Mohammed, > > > thanks for the input! > > On 21/01/2022 12:48, Mohammed Naser wrote: > > Sorry to hijack such an old thread. Looking into these features, I was > > just wondering if it was possible to: > > > > > > ? 1) Disable the support for pause / suspend altogether and not > > allow anyone to place instances in such states? > > > > you can use policy to disable suspending vms via the api > > Good point, thanks. > > > > ? 2) Change the storage location of the saved guest RAM to a > > shared storage to allow the instance to be migrated while being > > suspended/paused. As far as I can see currently this data is saved > > on the host disk. > > > > you can mount the path where things get saved at where ever you want > > (I think it?s somewhere inside /var/lib/nova/instances) > > > That's true, but this would require some multi-mountable shared storage > like CephFS or some NFS to remove the dependency from a single node. > It's not like Nova would store this data as e.g. a RBD image in Ceph via > some config option, right? the instance state dir contence no. so wehn you use the rbd images backend and you suspend the ram is saved to disk on the local system i could see adding a feature to nova to poteally upload that as an addtional rbd image to ceph. or maybe store it itn swift or something like that but nova cannot do it today. you could certenly put the instance state dir on a cephfs share or nfs we dont realy like dealing with shared storage for the state dir but the generic code we have for nfs should work with cephfs. just be aware de dont currently test that but i dont see why we could not extned the ceph job to mount /var/lib/nova on cephfs to get coverage in the ci. in the short term i agree that using a cephfs mount is likely the best way to avoid the guest ram from beign stored on the comptue node disk. in terms fo disabling suspend the simplelst way to do that is to alter the policy.json and make that either admin only or preferably require a new role then just dont give that role to your tenants. > > > > Regards > > > Christian > > From smooney at redhat.com Mon Jan 24 13:48:53 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 24 Jan 2022 13:48:53 +0000 Subject: [ops][nova][victoria] Power State = Suspended? In-Reply-To: <08c41f9d73d137d2473831ac9a297ff248591a3f.camel@redhat.com> References: <0670B960225633449A24709C291A52525125F40D@COM01.performair.local> <9e5c73c2a2b9513c8c44c19f6d370411c016b169.camel@redhat.com> <08c41f9d73d137d2473831ac9a297ff248591a3f.camel@redhat.com> Message-ID: <77bb0088fd1afdb46ca086e0cd2e6de6433a7201.camel@redhat.com> On Mon, 2022-01-24 at 13:44 +0000, Sean Mooney wrote: > On Mon, 2022-01-24 at 10:59 +0100, Christian Rohmann wrote: > > Hey Mohammed, > > > > > > thanks for the input! > > > > On 21/01/2022 12:48, Mohammed Naser wrote: > > > Sorry to hijack such an old thread. Looking into these features, I was > > > just wondering if it was possible to: > > > > > > > > > ? 1) Disable the support for pause / suspend altogether and not > > > allow anyone to place instances in such states? > > > > > > you can use policy to disable suspending vms via the api > > > > Good point, thanks. > > > > > > > ? 2) Change the storage location of the saved guest RAM to a > > > shared storage to allow the instance to be migrated while being > > > suspended/paused. As far as I can see currently this data is saved > > > on the host disk. > > > > > > you can mount the path where things get saved at where ever you want > > > (I think it?s somewhere inside /var/lib/nova/instances) > > > > > > That's true, but this would require some multi-mountable shared storage > > like CephFS or some NFS to remove the dependency from a single node. > > It's not like Nova would store this data as e.g. a RBD image in Ceph via > > some config option, right? > > the instance state dir contence no. > so wehn you use the rbd images backend and you suspend the ram is saved to disk on the local system > i could see adding a feature to nova to poteally upload that as an addtional rbd image to ceph. > or maybe store it itn swift or something like that but nova cannot do it today. > > you could certenly put the instance state dir on a cephfs share or nfs > we dont realy like dealing with shared storage for the state dir but the generic code we have for nfs > should work with cephfs. just be aware de dont currently test that but i dont see why we could not > extned the ceph job to mount /var/lib/nova on cephfs to get coverage in the ci. > > > in the short term i agree that using a cephfs mount is likely the best way to avoid the guest ram from beign stored > on the comptue node disk. in terms fo disabling suspend the simplelst way to do that is to alter the policy.json > and make that either admin only or preferably require a new role then just dont give that role to your tenants. the polices in question are defined here https://github.com/openstack/nova/blob/master/nova/policies/suspend_server.py so you woudl override os_compute_api:os-suspend-server:suspend with check_str='is_admin:True' or check_str='project_id:%(project_id)s and role:suspend' it really depends on if you want to still allow suspend for some user or just admins if you want to block even admins then set check_str='!' that will alwasys reject the request. > > > > > > > > > Regards > > > > > > Christian > > > > > From artem.goncharov at gmail.com Mon Jan 24 14:31:14 2022 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Mon, 24 Jan 2022 15:31:14 +0100 Subject: [sdk] R1 branch merged into master Message-ID: <3B224E71-A9CE-4208-9AAB-A874FAE3BE70@gmail.com> Hi everybody, Just wanted to give a hint that feature/r1 branch of the OpenStackSDK is now merged into the master. While most of the challenging things are now completed the work is not finished yet, but will soon be. Most important: most of the work is backward compatible, but few things were not. I took care on keeping things working and testing, but there might be side effects that I have not expected. Please let me know if jobs in your projects start failing. Regards, Artem From aschultz at redhat.com Mon Jan 24 15:14:54 2022 From: aschultz at redhat.com (Alex Schultz) Date: Mon, 24 Jan 2022 08:14:54 -0700 Subject: [puppet][tc] Propose changing the release model to cycle-with-rc In-Reply-To: References: Message-ID: On Sun, Jan 23, 2022 at 6:58 PM Takashi Kajinami wrote: > > Hello, > > > I already discussed this with a few cores last year but I'm raising this topic > in ML to make an official decision. > > Currently Puppet OpenStack is following cycle-with-intermediary and > creates a release every milestone. However our code is tightly related > to the actual service implementations and having only puppet releases > is not very useful. > > Considering the above point, and effort required to cut off releases per milestone, > I'll propose changing our release model to cycle-with-rc , and creating a single release. > > Because we already created milestone releases for Yoga, I'm thinking of applying > the change from next cycle(Z). > Works for me. > Please let me know if you have any concerns. > > Thank you, > Takashi From lokendrarathour at gmail.com Mon Jan 24 07:31:42 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Mon, 24 Jan 2022 13:01:42 +0530 Subject: [TripleO] Horizon login failed with Something went wrong error in IPv6 In-Reply-To: References: Message-ID: Hi Harald, Thanks for the response. we have CLI working perfectly. For Query: "Did you set the parameter MemcachedIPv6 to true in your environment files?" we have passed this in our environment defaults, so yes this is getting passed. About the Deployment: 3 Node Controller + 1 Compute for the Triple of Stein. it is working perfectly in the ipv4 segment but when we change the Networking to ipv6 we start seeing this error as reported. Please find the attached : 1. network_data.yaml 2. roles_data.yaml using these YAML files we are rendering the remaining config files and are using bond-with-vlan configs. Also for your reference please find the keystone end point details: (overcloud) [stack at undercloud ~]$ openstack endpoint list | grep "keystone" | 5ab65d3322654c3eaa9a868c547eec1e | regionOne | keystone | identity | True | public | http://[fd00:fd00:fd00:9900::351]:5000 | | d31bd34d0ac942a59eb086c4a5d3079f | regionOne | keystone | identity | True | internal | http://[fd00:fd00:fd00:2000::7d]:5000 | | e27f7195a48045d0ab14ceb065c67512 | regionOne | keystone | identity | True | admin | http://10.10.0.213:35357 # sudo pcs status : [heat-admin at overcloud-controller-2 ~]$ sudo pcs status Cluster name: tripleo_cluster Cluster Summary: * Stack: corosync * Current DC: overcloud-controller-1 (version 2.1.2-1.el8-ada5c3b36e2) - partition with quorum * Last updated: Mon Jan 24 12:39:04 2022 * Last change: Mon Jan 24 11:20:45 2022 by root via cibadmin on overcloud-controller-0 * 15 nodes configured * 46 resource instances configured Node List: * Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] * GuestOnline: [ galera-bundle-0 at overcloud-controller-2 galera-bundle-1 at overcloud-controller-0 galera-bundle-2 at overcloud-controller-1 ovn-dbs-bundle-0 at overcloud-controller-2 ovn-dbs-bundle-1 at overcloud-controller-0 ovn-dbs-bundle-2 at overcloud-controller-1 rabbitmq-bundle-0 at overcloud-controller-2 rabbitmq-bundle-1 at overcloud-controller-0 rabbitmq-bundle-2 at overcloud-controller-1 redis-bundle-0 at overcloud-controller-2 redis-bundle-1 at overcloud-controller-0 redis-bundle-2 at overcloud-controller-1 ] Full List of Resources: * ip-10.10.30.213 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 * ip-fd00.fd00.fd00.9900..351 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 * ip-fd00.fd00.fd00.2000..3a0 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 * ip-fd00.fd00.fd00.2000..7d (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 * ip-fd80.fd00.fd00.2000..6d (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 * Container bundle set: haproxy-bundle [cluster.common.tag/centos-binary-haproxy:pcmklatest]: * haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started overcloud-controller-2 * haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started overcloud-controller-0 * haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started overcloud-controller-1 * Container bundle set: galera-bundle [cluster.common.tag/centos-binary-mariadb:pcmklatest]: * galera-bundle-0 (ocf::heartbeat:galera): Master overcloud-controller-2 * galera-bundle-1 (ocf::heartbeat:galera): Master overcloud-controller-0 * galera-bundle-2 (ocf::heartbeat:galera): Master overcloud-controller-1 * Container bundle set: rabbitmq-bundle [cluster.common.tag/centos-binary-rabbitmq:pcmklatest]: * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-2 * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-0 * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-1 * Container bundle set: redis-bundle [cluster.common.tag/centos-binary-redis:pcmklatest]: * redis-bundle-0 (ocf::heartbeat:redis): Master overcloud-controller-2 * redis-bundle-1 (ocf::heartbeat:redis): Slave overcloud-controller-0 * redis-bundle-2 (ocf::heartbeat:redis): Slave overcloud-controller-1 * Container bundle set: ovn-dbs-bundle [cluster.common.tag/centos-binary-ovn-northd:pcmklatest]: * ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master overcloud-controller-2 * ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave overcloud-controller-0 * ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave overcloud-controller-1 * ip-fd00.fd00.fd00.2000..ea (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 * Container bundle: openstack-cinder-volume [cluster.common.tag/centos-binary-cinder-volume:pcmklatest]: * openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Started overcloud-controller-0 Failed Resource Actions: * ovndb_servers_monitor_30000 on ovn-dbs-bundle-2 'not running' (7): call=23, status='complete', last-rc-change='Mon Jan 24 11:17:07 2022', queued=0ms, exec=0ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled please check and let me know in case it can help us somehow. On Fri, Jan 21, 2022 at 10:44 PM Harald Jensas wrote: > On 1/21/22 10:34, Anirudh Gupta wrote: > > Hi Team, > > > > We are trying to deploy the Tripleo Train with IPv6. > > All the overcloud control plane networks - internal, management etc are > > also on the IPv6 subnet. > > > > Upon successful completion of overcloud, when I am trying to open the > > page, it does open. > > But when I enter the correct login credentials, it says something went > > wrong. > > > > image.png > > > > > > Upon looking into error logs, I found > > 2022-01-21 12:33:34.825 324 WARNING keystone.server.flask.application > > [req-0660e62c-dff7-4609-89fe-225e177a84f8 > > f908417368a24cc685818bb5fc54fe12 - - default -] *Authorization failed. > > The request you have made requires authentication. from > > fd00:fd00:fd00:2000::359: keystone.exception.Unauthorized: The request > > you have made requires authentication.*** > > * > > * > > where *fd00:fd00:fd00:2000::359 is my internal IP address which is > > reachable.* > > > > Regards > > Anirudh Gupta > > Can you share more details from the deployment? > Maby open a bug in Launcpad against TripleO and attach logs, templates > used for deployment, and config files for Horizon? > > Did you set the parameter MemcachedIPv6 to true in your environment files? > > Does the CLI work? > > > Regards, > Harald > > > -- ~ Lokendra www.inertiaspeaks.com www.inertiagroups.com skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: network_data.yaml Type: application/octet-stream Size: 7208 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: roles_data.yaml Type: application/octet-stream Size: 11626 bytes Desc: not available URL: From gouthampravi at gmail.com Mon Jan 24 17:10:26 2022 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Mon, 24 Jan 2022 09:10:26 -0800 Subject: [sdk] R1 branch merged into master In-Reply-To: <3B224E71-A9CE-4208-9AAB-A874FAE3BE70@gmail.com> References: <3B224E71-A9CE-4208-9AAB-A874FAE3BE70@gmail.com> Message-ID: On Mon, Jan 24, 2022 at 6:38 AM Artem Goncharov wrote: > > Hi everybody, > > Just wanted to give a hint that feature/r1 branch of the OpenStackSDK is now merged into the master. While most of the challenging things are now completed the work is not finished yet, but will soon be. > Thanks for the heads up Artem. So will this remaining work continue to happen on the feature/r1 branch? I've a bunch of featureful/open reviews on feature/r1 - do they still need to merge into feature/r1 or would you prefer them proposed against the main branch? > Most important: most of the work is backward compatible, but few things were not. I took care on keeping things working and testing, but there might be side effects that I have not expected. Please let me know if jobs in your projects start failing. > > Regards, > Artem From artem.goncharov at gmail.com Mon Jan 24 17:13:27 2022 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Mon, 24 Jan 2022 18:13:27 +0100 Subject: [sdk] R1 branch merged into master In-Reply-To: References: <3B224E71-A9CE-4208-9AAB-A874FAE3BE70@gmail.com> Message-ID: <93A46FF9-D438-4240-9B60-C5B744631450@gmail.com> > On 24. Jan 2022, at 18:10, Goutham Pacha Ravi wrote: > > On Mon, Jan 24, 2022 at 6:38 AM Artem Goncharov > wrote: >> >> Hi everybody, >> >> Just wanted to give a hint that feature/r1 branch of the OpenStackSDK is now merged into the master. While most of the challenging things are now completed the work is not finished yet, but will soon be. >> > > Thanks for the heads up Artem. So will this remaining work continue to > happen on the feature/r1 branch? I've a bunch of featureful/open > reviews on feature/r1 - do they still need to merge into feature/r1 or > would you prefer them proposed against the main branch? > That is exactly what I meant - we have some leftovers in r1 and we need to work on finishing that. Everything new should be started on master and open changes of r1 need to be finalised. Then we do another round of sync r1 => master. > >> Most important: most of the work is backward compatible, but few things were not. I took care on keeping things working and testing, but there might be side effects that I have not expected. Please let me know if jobs in your projects start failing. >> >> Regards, >> Artem From kennelson11 at gmail.com Mon Jan 24 19:10:33 2022 From: kennelson11 at gmail.com (Kendall Nelson) Date: Mon, 24 Jan 2022 11:10:33 -0800 Subject: [PTG] April 2022 Virtual PTG Dates & Registration Message-ID: Hello Everyone! We're happy to announce the next virtual PTG[1] will take place April 4-8, 2022! Registration is now open[2]. The virtual PTG is free to attend, but make sure to register so you receive important communications like schedules, passwords, and other relevant updates. Next week, keep an eye out for info regarding team sign-ups. Can't wait to see you all there! -Kendall (diablo_rojo) [1] https://www.openstack.org/ptg/ [2] https://openinfra-ptg.eventbrite.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From felipe.reyes at canonical.com Mon Jan 24 20:30:14 2022 From: felipe.reyes at canonical.com (Felipe Reyes) Date: Mon, 24 Jan 2022 17:30:14 -0300 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: Message-ID: Hi Mark and all, On Wed, 2022-01-19 at 10:35 +0000, Mark Goddard wrote: > Hi, > > If you haven't been paying close attention, it would be easy to miss > some of the upcoming RBAC changes which will have an impact on > deployment projects. I thought I'd start a thread so that we can share > how we are approaching this, get answers to open questions, and > ideally all end up with a fairly consistent approach. Thanks for highlighting this. In the Charms we are evaluating what needs to be changed. On a first pass these are the changes needed that were identified, it's an etherpad that's still evolving though :-) https://etherpad.opendev.org/p/charms-secure-rbac > > The secure RBAC work has a long history, and continues to evolve. > According to [1], we should start to see some fairly substantial > changes over the next few releases. That spec is fairly long, but > worth a read. > > In the yoga timeline [2], there is one change in particular that has > an impact on deployment projects, "3. Keystone enforces scope by > default". After this change, all of the deprecated policies that many > still rely on in Keystone will be removed. > > In kolla-ansible, we have an etherpad [5] with some notes, questions > and half-baked plans. We made some changes in Xena [3] to use system > scope in some places when interacting with system APIs in Ansible > tasks. > > The next change we have staged is to add the service role to all > service users [4], in preparation for [2]. > > Question: should the role be added with system scope or in the > existing service project? The obvious main use for this is token > validation, which seems to allow system or project scope. > > We anticipate that some service users may still require some > project-scoped roles, e.g. when creating resources for octavia. We'll > deal with those on a case by case basis. > > In anticipation of keystone setting enforce_scope=True and removing > old default policies (which I assume effectively removes > enforce_new_defaults?), we will set this in kolla-ansible, and try to > deal with any fallout. Hopefully the previous work will make this > minimal. > > How does that line up with other projects' approaches? What have we > missed? > > Mark > > [1] > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst > [2] > https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 > [3] > https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 > [4] https://review.opendev.org/c/openstack/kolla-ansible/+/815577 > [5] > https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible > -- Felipe Reyes Software Engineer @ Canonical # Email: felipe.reyes at canonical.com (GPG:0x9B1FFF39) # Launchpad: ~freyes | IRC: freyes From gmann at ghanshyammann.com Mon Jan 24 21:58:54 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 24 Jan 2022 15:58:54 -0600 Subject: [all][tc] Technical Committee next weekly meeting on Jan 27th at 1500 UTC Message-ID: <17e8e195dd4.1181e0a8f53874.3883263648236970911@ghanshyammann.com> Hello Everyone, Technical Committee's next weekly meeting is scheduled for Jan 27th at 1500 UTC. If you would like to add topics for discussion, please add them to the below wiki page by Wednesday, Jan 26th, at 2100 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting -gmann From zhangbailin at inspur.com Tue Jan 25 01:21:06 2022 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Tue, 25 Jan 2022 01:21:06 +0000 Subject: =?gb2312?B?tPC4tDogW2N5Ym9yZ10gUHJvcG9zaW5nIGNvcmUgcmV2aWV3ZXJz?= Message-ID: > ??????: Brin Zhang(??????) > ????????: 2022??1??17?? 9:13 > ??????: 'openstack-discuss at lists.openstack.org' > ????: 'xin-ran.wang at intel.com' ; Alex Song (??????) ; 'huangzhipeng at huawei.com' ; 'liliueecg at gmail.com' ; 'shogo.saito.ac at hco.ntt.co.jp' ; 'sundar.nadathur at intel.com' ; 'yumeng_bao at yahoo.com' ; 'chen.ke14 at zte.com.cn' ; '419546439 at qq.com' <419546439 at qq.com>; 'shaohe.feng at intel.com' ; 'wangzhengh at chinatelecom.cn' ; 'zhuli2317 at gmail.com' >????: [cyborg] Proposing core reviewers > Hello all, > Eric xie has been actively contributing to Cyborg in various areas, adding new features, improving quality, reviewing patches. Despite the relatively short time, he has been one of the most prolific contributors, > and brings an enthusiastic and active mindset. I would like to thank and acknowledge him for his steady valuable contributions, and propose him as a core reviewer for Cyborg. fwiw, saw no negative votes, so added Eric to cyborg-core group. Welcome Eric on board and thanks for helping the community ! > Some of the currently listed core reviewers have not been participating for a lengthy period of time. It is proposed that those who have had no contributions for the past 18 months > ?C i.e. no participation in meetings, no code contributions, not participating in Cyborg open source activities and no reviews ?C be removed from the list of core reviewers. > -- The Cyborg team recognizes everyone's contributions, but we need to ensure the activity of the core-reviewer list. If you are interested in rejoining the cyborg team, feel free to ping us to restore the core reviewer for you. > If no objections are made known by January 24, I will make the changes proposed above.. > Regards, > Brin Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: From gagehugo at gmail.com Tue Jan 25 02:20:44 2022 From: gagehugo at gmail.com (Gage Hugo) Date: Mon, 24 Jan 2022 20:20:44 -0600 Subject: [openstack-helm] No meeting tomorrow Message-ID: Hey team, Since there's nothing on the agenda for tomorrow's meeting, it has been cancelled. We will meet again next week. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ccamacho at redhat.com Tue Jan 25 06:42:54 2022 From: ccamacho at redhat.com (Carlos Camacho Gonzalez) Date: Tue, 25 Jan 2022 07:42:54 +0100 Subject: [TripleO] Proposing Juan Badia Payno for TripleO core reviewer. In-Reply-To: References: Message-ID: Thanks Juan for your effort! On Fri, Jan 14, 2022 at 4:11 PM Alex Schultz wrote: > +1 > > On Wed, Jan 12, 2022 at 6:16 AM Carlos Camacho Gonzalez > wrote: > > > > Hi everyone! > > > > I'd like to propose Juan Badia Paino [1][2][3] as a core reviewer on the > TripleO repositories that are or might be related to the backup and restore > efforts (openstack/tripleo-ci, openstack/tripleo-ansible, > openstack/python-tripleoclient, openstack/tripleo-quickstart-extras, > openstack/tripleo-quickstart). > > > > Juan has been around since 2016 making useful contributions and code > reviews to the community and I believe adding him to our core reviewer > group will help us improve the review and coding speed for the backup and > restore codebase. > > > > As usual, consider this email as an initial +1 from my side, I will keep > an eye on this thread for a week, and based on your feedback and if there > are no objections I will add him as a core reviewer in two weeks. > > > > [1]: https://review.opendev.org/q/owner:jbadiapa%2540redhat.com > > [2]: > https://www.stackalytics.io/?project_type=all&metric=commits&user_id=jbadiapa&release=all > > [3]: > https://www.stackalytics.io/report/contribution?module=tripleo-group&project_type=openstack&days=180 > > > > Cheers, > > Carlos. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vikarnatathe at gmail.com Tue Jan 25 07:37:16 2022 From: vikarnatathe at gmail.com (Vikarna Tathe) Date: Tue, 25 Jan 2022 13:07:16 +0530 Subject: Openstack+Magnum for edge Message-ID: Hello All, Is there a solution available for OpenStack controllers at a central location and computes at the edge (far from central)? If not what's the alternative? Any help would be appreciated. Thanks, Vikarna -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbengt at redhat.com Tue Jan 25 08:41:26 2022 From: dbengt at redhat.com (Daniel Mats Niklas Bengtsson) Date: Tue, 25 Jan 2022 09:41:26 +0100 Subject: [Oslo] First IRC meeting of the year. Message-ID: Hi there, Yesterday we had the first oslo IRC meeting. You can find the log[1] on the meeting. Sorry I forget the ping and I confused the hour, but I will do better job next time. [1] https://meetings.opendev.org/meetings/oslo/2022/ From munnaeebd at gmail.com Tue Jan 25 08:52:26 2022 From: munnaeebd at gmail.com (Md. Hejbul Tawhid MUNNA) Date: Tue, 25 Jan 2022 14:52:26 +0600 Subject: Neutron issue || Remote error: TimeoutError QueuePool limit of size 5 overflow 50 reached In-Reply-To: References: <4248952.ejJDZkT8p0@p1> Message-ID: Hello Arnaud, Thank you for your valuable reply. we did not modify default config of RPC worker . /etc/neutron/neutron.conf # Number of separate API worker processes for service. If not specified, the # default is equal to the number of CPUs available for best performance. # (integer value) #api_workers = # Number of RPC worker processes for service. (integer value) #rpc_workers = 1 # Number of RPC worker processes dedicated to state reports queue. (integer # value) #rpc_state_report_workers = 1 how to check load on database. RAM/CPU/Disk-IO utilization is low on the database server. Please guide us further Regards, Munna On Mon, Jan 24, 2022 at 6:29 PM Arnaud wrote: > Hi, > > I would also consider checking the number of RPC workers you have in > neutron.conf, this is maybe a better option to increase this before the > comnection pool params. > > Also, check your database, is it under load? > Updating agent state should not be long. > > Cheers, > Arnaud > > > > Le 24 janvier 2022 10:42:00 GMT+01:00, "Md. Hejbul Tawhid MUNNA" < > munnaeebd at gmail.com> a ?crit : >> >> Hi, >> >> Currently we have running 500+VM and total network is 383 including >> HA-network. >> >> Can you advice the appropriate value and is there any chance of service >> impact? >> >> Should we change the configuration in the neutron.conf on controller node? >> >> Regards, >> Munna >> >> >> >> On Mon, Jan 24, 2022 at 2:47 PM Slawek Kaplonski >> wrote: >> >>> Hi, >>> >>> On poniedzia?ek, 24 stycznia 2022 09:09:10 CET Md. Hejbul Tawhid MUNNA >>> wrote: >>> > Hi, >>> > >>> > Suddenly we have observed few VM down . then we have found some agent >>> are >>> > getting down (XXX) , agents are getting UP and down randomly. Please >>> check >>> > the attachment. >>> > >>> > >>> >>> ///////////////////////////////////////////////////////////////////////////// >>> > /////////////////////// /sqlalchemy/pool.py", line 788, in >>> _checkout\n >>> > fairy = >>> > _ConnectionRecord.checkout(pool)\n', u' File >>> > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 532, in >>> > checkout\n rec = pool._do_get()\n', u' File >>> > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1186, in >>> > _do_get\n (self.size(), self.overflow(), self._timeout), >>> > code="3o7r")\n', u'TimeoutError: QueuePool limit of size 5 overflow 50 >>> > reached, connection timed out, timeout 30 (Background on this error at: >>> > http://sqlalche.me/e/3o7r)\n']. >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent Traceback >>> (most >>> > recent call last): >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>> > "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line >>> 837, in >>> > _report_state >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent True) >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>> > "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 97, in >>> > report_state >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent return >>> > method(context, 'report_state', **kwargs) >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line >>> 179, >>> > in call >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>> > retry=self.retry) >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>> > "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line >>> 133, >>> > in _send >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>> retry=retry) >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>> > >>> "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>> > line 645, in send >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>> > call_monitor_timeout, retry=retry) >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>> > >>> "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>> > line 636, in _send >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent raise >>> result >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>> RemoteError: >>> > Remote error: TimeoutError QueuePool limit of size 5 overflow 50 >>> reached, >>> > connection timed out, timeout 30 (Background on this error at: >>> > http://sqlalche.me/e/3o7r) >>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>> [u'Traceback >>> > (most recent call last):\n', u' File >>> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line >>> 163, >>> > in _process_incoming\n res = self.dispatcher.dispatch(message)\n', >>> u' >>> > File >>> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", >>> > line 265, in dispatch\n return self._do_dispatch(endpoint, method, >>> ctxt, >>> > args)\n', u' File >>> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", >>> line >>> > 194, in _do_dispatch\n >>> > >>> > >>> >>> ///////////////////////////////////////////////////////////////////////////// >>> > //// >>> > >>> > Is there anything related with the following default configuration. >>> > >>> > /etc/neutron/neutron.conf >>> > #max_pool_size = 5 >>> > #max_overflow = 50 >>> >>> Yes. You probably have busy environment and You need to increase those >>> values >>> to have more connections from the neutron server to the database. >>> >>> > >>> > regards, >>> > Munna >>> >>> >>> >>> -- >>> Slawek Kaplonski >>> Principal Software Engineer >>> Red Hat >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From oleg.bondarev at huawei.com Tue Jan 25 10:33:03 2022 From: oleg.bondarev at huawei.com (Oleg Bondarev) Date: Tue, 25 Jan 2022 10:33:03 +0000 Subject: [neutron] Bug Deputy Report Jan 17 - 23 Message-ID: Hi everyone, Please find Bug Deputy report for the week Jan 17 - 23rd below. 1 Critical bug (with a related logs-improving patch from Slawek) looks for volunteers. A couple of Medium bugs: an l3-bgp bug to be verified by bgp team - 1958627, and an ipv6+ovs firewall bug - 1958643. Also several OVN bug needs some triage from OVN folks + assignees. Details: Critical -------- - https://bugs.launchpad.net/neutron/+bug/1958363 - Notifications to nova disabled causing tests failures - Confirmed - Related patch: https://review.opendev.org/c/openstack/neutron/+/825513 - Unassigned! Medium ---------- - https://bugs.launchpad.net/neutron/+bug/1958627 - Incomplete ARP entries on L3 gw namespace - Opinion - Unassigned! - https://bugs.launchpad.net/neutron/+bug/1958149 - ha backup router ipv6 accept_ra broken - In progress: https://review.opendev.org/c/openstack/neutron/+/824947 - Assigned to Maximilian Stinsky - https://bugs.launchpad.net/neutron/+bug/1958364 - [ovn]Set NB/SB connection inactivity_probe does not work for cluster - In progress: https://review.opendev.org/c/openstack/neutron/+/825269 - Assigned to ZhouHeng - https://bugs.launchpad.net/neutron/+bug/1958225 - [ovn]chassis available zone changed, not reschedule router gateway_chassis - In progress: https://review.opendev.org/c/openstack/neutron/+/825073 - Assigned to ZhouHeng - https://bugs.launchpad.net/neutron/+bug/1958353 - [ovn]Gateway port is down after gateway chassis changed - In progress: https://review.opendev.org/c/openstack/neutron/+/825041 - Assigned to ZhouHeng - https://bugs.launchpad.net/neutron/+bug/1958501 - [ovn]Refusing to bind port to dead agent - In progress: https://review.opendev.org/c/openstack/neutron/+/825428 - Assigned to ZhouHeng - https://bugs.launchpad.net/neutron/+bug/1958643 - Unicast RA messages for a VM are filtered out by ovs rules - Confirmed - Unassigned! Low ----- - https://bugs.launchpad.net/neutron/+bug/1958513 - [test]The unit test file ovn_l3.test_plugin cannot be run alone - In progress: https://review.opendev.org/c/openstack/neutron/+/825478 - Assigned to ZhouHeng Undecided ------------- - https://bugs.launchpad.net/neutron/+bug/1958355 - Undecided - Unassigned! - https://bugs.launchpad.net/neutron/+bug/1958561 - [ovn-octavia-provider] Listner/Pool Stuck in PENDING_CREATE Using Fully Populated LB API - Undecided - Unassigned! - https://bugs.launchpad.net/neutron/+bug/1958593 - [ovn]neutron-server allow more efficient reconnections when connecting to clustered OVSDB servers - Undecided - Unassigned! Invalid -------- - https://bugs.launchpad.net/neutron/+bug/1958128 - Neutron l3 agent keeps restarting (Ubuntu) Expired --------- - https://bugs.launchpad.net/neutron/+bug/1939680 - Neutron - set "dns_domain" for existing network is passively failing when Designate DNS extension is not enabled for Neutron - Undecided - Unassigned Thanks, Oleg --- Advanced Software Technology Lab Huawei -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Tue Jan 25 10:51:20 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Tue, 25 Jan 2022 11:51:20 +0100 Subject: [release][infra] SSH host key issue - RedFlag stop approving Message-ID: <405abfa7-373c-fd4f-5b87-b57f77d64648@est.tech> Hi, tag-release jobs started to fail today and what i see is mostly that ECDSA host key for IP address X not in list of known hosts [1][2][3] Please don't approve release patches until we haven't figure out what is the problem. [1] https://zuul.opendev.org/t/openstack/build/a490703c83514eb7ba9d0ff6307f7371/log/job-output.txt#641-647 [2] https://zuul.opendev.org/t/openstack/build/00d3f765ea6847a3bf7ea53294cf2a88/log/job-output.txt#979-980 [3] https://zuul.opendev.org/t/openstack/build/d8aaeff52dac442e89fd3e253596e0c2/log/job-output.txt#980 Thanks, El?d irc:elodilles @ #openstack-release -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnaud.morin at gmail.com Tue Jan 25 12:26:42 2022 From: arnaud.morin at gmail.com (Arnaud) Date: Tue, 25 Jan 2022 13:26:42 +0100 Subject: =?US-ASCII?Q?Re=3A_Neutron_issue_=7C=7C_Remote_error=3A_TimeoutErro?= =?US-ASCII?Q?r_QueuePool_limit_of_size_5_overflow_50_reached?= In-Reply-To: References: <4248952.ejJDZkT8p0@p1> Message-ID: Hi, I am not an db expert, but openstack tends to open a LOT of connection to the db. So one of the thing to monitor is the number of connection you have/allow on the db side. Also, raising the number of RPC (and report states) workers will solve your issue. The good number is not easy to calculate, and depends on each deployment. A good approach is to the try/improve loop. Cheers, Arnaud. Le 25 janvier 2022 09:52:26 GMT+01:00, "Md. Hejbul Tawhid MUNNA" a ?crit?: >Hello Arnaud, > >Thank you for your valuable reply. > >we did not modify default config of RPC worker . > >/etc/neutron/neutron.conf > ># Number of separate API worker processes for service. If not specified, the ># default is equal to the number of CPUs available for best performance. ># (integer value) >#api_workers = > ># Number of RPC worker processes for service. (integer value) >#rpc_workers = 1 > ># Number of RPC worker processes dedicated to state reports queue. (integer ># value) >#rpc_state_report_workers = 1 > >how to check load on database. RAM/CPU/Disk-IO utilization is low on the >database server. > >Please guide us further > >Regards, >Munna > >On Mon, Jan 24, 2022 at 6:29 PM Arnaud wrote: > >> Hi, >> >> I would also consider checking the number of RPC workers you have in >> neutron.conf, this is maybe a better option to increase this before the >> comnection pool params. >> >> Also, check your database, is it under load? >> Updating agent state should not be long. >> >> Cheers, >> Arnaud >> >> >> >> Le 24 janvier 2022 10:42:00 GMT+01:00, "Md. Hejbul Tawhid MUNNA" < >> munnaeebd at gmail.com> a ?crit : >>> >>> Hi, >>> >>> Currently we have running 500+VM and total network is 383 including >>> HA-network. >>> >>> Can you advice the appropriate value and is there any chance of service >>> impact? >>> >>> Should we change the configuration in the neutron.conf on controller node? >>> >>> Regards, >>> Munna >>> >>> >>> >>> On Mon, Jan 24, 2022 at 2:47 PM Slawek Kaplonski >>> wrote: >>> >>>> Hi, >>>> >>>> On poniedzia?ek, 24 stycznia 2022 09:09:10 CET Md. Hejbul Tawhid MUNNA >>>> wrote: >>>> > Hi, >>>> > >>>> > Suddenly we have observed few VM down . then we have found some agent >>>> are >>>> > getting down (XXX) , agents are getting UP and down randomly. Please >>>> check >>>> > the attachment. >>>> > >>>> > >>>> >>>> ///////////////////////////////////////////////////////////////////////////// >>>> > /////////////////////// /sqlalchemy/pool.py", line 788, in >>>> _checkout\n >>>> > fairy = >>>> > _ConnectionRecord.checkout(pool)\n', u' File >>>> > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 532, in >>>> > checkout\n rec = pool._do_get()\n', u' File >>>> > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1186, in >>>> > _do_get\n (self.size(), self.overflow(), self._timeout), >>>> > code="3o7r")\n', u'TimeoutError: QueuePool limit of size 5 overflow 50 >>>> > reached, connection timed out, timeout 30 (Background on this error at: >>>> > http://sqlalche.me/e/3o7r)\n']. >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent Traceback >>>> (most >>>> > recent call last): >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>> > "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line >>>> 837, in >>>> > _report_state >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent True) >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>> > "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 97, in >>>> > report_state >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent return >>>> > method(context, 'report_state', **kwargs) >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line >>>> 179, >>>> > in call >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>> > retry=self.retry) >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>> > "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line >>>> 133, >>>> > in _send >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>> retry=retry) >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>> > >>>> "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>>> > line 645, in send >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>> > call_monitor_timeout, retry=retry) >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>> > >>>> "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>>> > line 636, in _send >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent raise >>>> result >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>> RemoteError: >>>> > Remote error: TimeoutError QueuePool limit of size 5 overflow 50 >>>> reached, >>>> > connection timed out, timeout 30 (Background on this error at: >>>> > http://sqlalche.me/e/3o7r) >>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>> [u'Traceback >>>> > (most recent call last):\n', u' File >>>> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line >>>> 163, >>>> > in _process_incoming\n res = self.dispatcher.dispatch(message)\n', >>>> u' >>>> > File >>>> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", >>>> > line 265, in dispatch\n return self._do_dispatch(endpoint, method, >>>> ctxt, >>>> > args)\n', u' File >>>> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", >>>> line >>>> > 194, in _do_dispatch\n >>>> > >>>> > >>>> >>>> ///////////////////////////////////////////////////////////////////////////// >>>> > //// >>>> > >>>> > Is there anything related with the following default configuration. >>>> > >>>> > /etc/neutron/neutron.conf >>>> > #max_pool_size = 5 >>>> > #max_overflow = 50 >>>> >>>> Yes. You probably have busy environment and You need to increase those >>>> values >>>> to have more connections from the neutron server to the database. >>>> >>>> > >>>> > regards, >>>> > Munna >>>> >>>> >>>> >>>> -- >>>> Slawek Kaplonski >>>> Principal Software Engineer >>>> Red Hat >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Jan 25 15:40:21 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 25 Jan 2022 15:40:21 +0000 Subject: [release][infra] Tag permission regression (was: SSH host key issue - RedFlag stop approving) In-Reply-To: <405abfa7-373c-fd4f-5b87-b57f77d64648@est.tech> References: <405abfa7-373c-fd4f-5b87-b57f77d64648@est.tech> Message-ID: <20220125153822.yexqplti3kqy7t4t@yuggoth.org> On 2022-01-25 11:51:20 +0100 (+0100), El?d Ill?s wrote: > tag-release jobs started to fail today and what i see is mostly that ECDSA > host key for IP address X not in list of known hosts [1][2][3] [...] Just a quick update, that message about an unrecognized host key was a red herring. The actual error appears after that, indicating the signed tag push was rejected for insufficient permissions. We suspect this may be a regression in Gerrit 3.4 (to which we upgraded in yesterday's maintenance) and are testing a couple of potential workarounds while preparing to raise it with the Gerrit developers. We'll keep everyone posted. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ozzzo at yahoo.com Tue Jan 25 16:57:18 2022 From: ozzzo at yahoo.com (Albert Braden) Date: Tue, 25 Jan 2022 16:57:18 +0000 (UTC) Subject: [ops] [kolla] RabbitMQ High Availability In-Reply-To: References: <5dd6d28f-9955-7ca5-0ab8-0c5ce11ceb7e@redhat.com> <14084e87df22458caa7668eea8b496b6@verisign.com> <1147779219.1274196.1639086048233@mail.yahoo.com> <986294621.1553814.1639155002132@mail.yahoo.com> <169252651.2819859.1639516226823@mail.yahoo.com> <1335760337.3548170.1639680236968@mail.yahoo.com> <33441648.1434581.1641304881681@mail.yahoo.com> <385929635.1929303.1642011977053@mail.yahoo.com> <326590098.315301.1642089266574@mail.yahoo.com> <2058295726.372026.1642097389964@mail.yahoo.com> <2077696744.396726.1642099801140@mail.yahoo.com> Message-ID: <1261227838.2328665.1643129838762@mail.yahoo.com> That fixed the problem, thank you! I was able to successfully set TTL and expiration. I added 2 comments to [1] and my co-worker updated [2] with the correct ms values. [1] https://review.opendev.org/c/openstack/kolla-ansible/+/822191 [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit On Monday, January 17, 2022, 04:49:18 AM EST, Mark Goddard wrote: Drop the double quotes around On Thu, 13 Jan 2022 at 18:55, Albert Braden wrote: > > After reading more I realize that "expires" is also set in ms. So it looks like the correct settings are: > > message-ttl: 60000 > expires: 120000 > > This would expire messages in 10 minutes and queues in 20 minutes. > > The only remaining question is, how can I specify these in a variable without generating the "not a valid message TTL" error? > On Thursday, January 13, 2022, 01:22:33 PM EST, Albert Braden wrote: > > > Update: I googled around and found this: https://tickets.puppetlabs.com/browse/MODULES-2986 > > Apparently the " | int " isn't working. I tried '60000' and "60000" but that didn't make a difference. In desperation I tried this: > > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":60000,"expires":1200}, "priority":0}, > > That works, but I'd prefer to use a variable. Has anyone done this successfully? Also, am I understanding correctly that "message-ttl" is set in milliseconds and "expires" is set in seconds? Or do I need to use ms for "expires" too? > On Thursday, January 13, 2022, 11:03:11 AM EST, Albert Braden wrote: > > > After digging further I realized that I'm not setting TTL; only queue expiration. Here's what I see in the GUI when I look at affected queues: > > Policy notifications-expire > Effective policy definition expires: 1200 > > This is what I have in definitions.json.j2: > > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"expires":1200}, "priority":0}, > > I tried this to set both: > > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications).*", "apply-to": "queues", "definition": {"message-ttl":"{{ rabbitmq_message_ttl | int }}","expires":1200}, "priority":0}, Drop the double quotes around the jinja expression. It's not YAML, so you don't need them. Please update the upstream patches with any fixes. > > But the RMQ containers restart every 60 seconds and puke this into the log: > > [error] <0.322.0> CRASH REPORT Process <0.322.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"600\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 > > After reading the doc on TTL: https://www.rabbitmq.com/ttl.html I realized that the TTL is set in ms, so I tried "rabbitmq_message_ttl: 60000" > > but that only changes the number in the error: > > [error] <0.318.0> CRASH REPORT Process <0.318.0> with 0 neighbours exited with reason: {error,<<"<<\"Validation failed\\n\\n<<\\\"60000\\\">> is not a valid message TTL\\n (//notifications-expire)\">>">>} in application_master:init/4 line 138 > > What am I missing? > > > On Wednesday, January 12, 2022, 05:11:41 PM EST, Dale Smith wrote: > > > In the web interface(RabbitMQ 3.8.23, not using Kolla), when looking at the queue you will see the "Policy" listed by name, and "Effective policy definition". > > You can either view the policy definition, and the arguments for the definitions applied, or "effective policy definition" should show you the list. > > > It may be relevant to note: "Each exchange or queue will have at most one policy matching" - https://www.rabbitmq.com/parameters.html#how-policies-work > > I've added a similar comment to the linked patchset. > > > On 13/01/22 7:26 am, Albert Braden wrote: > > This is very helpful. Thank you! It appears that I have successfully set the expire time to 1200, because I no longer see unconsumed messages lingering in my queues, but it's not obvious how to verify. In the web interface, when I look at the queues, I see things like policy, state, features and consumers, but I don't see a timeout or expire value, nor do I find the number 1200 anywhere. Where should I be looking in the web interface to verify that I set the expire time correctly? Or do I need to use the CLI? > On Wednesday, January 5, 2022, 04:23:29 AM EST, Mark Goddard wrote: > > > On Tue, 4 Jan 2022 at 14:08, Albert Braden wrote: > > > > Now that the holidays are over I'm trying this one again. Can anyone help me figure out how to set "expires" and "message-ttl" ? > > John Garbutt proposed a few patches for RabbitMQ in kolla, including > this: https://review.opendev.org/c/openstack/kolla-ansible/+/822191 > > https://review.opendev.org/q/hashtag:%2522rabbitmq%2522+(status:open+OR+status:merged)+project:openstack/kolla-ansible > > Note that they are currently untested. > > Mark > > > > On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden wrote: > > > > > > I tried these policies in ansible/roles/rabbitmq/templates/definitions.json.j2: > > > > "policies":[ > > {"vhost": "/", "name": "ha-all", "pattern": '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, > > {"vhost": "/", "name": "notifications-ttl", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"message-ttl":600}, "priority":0} > > {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"expire":3600}, "priority":0} > > {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} > > {% endif %} > > > > But I still see unconsumed messages lingering in notifications_extractor.info. From reading the docs I think this setting should cause messages to expire after 600 seconds, and unused queues to be deleted after 3600 seconds. What am I missing? > > On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden wrote: > > > > > > Following [1] I successfully set "amqp_durable_queues = True" and restricted HA to the appropriate queues, but I'm having trouble with some of the other settings such as "expires" and "message-ttl". Does anyone have an example of a working kolla config that includes these changes? > > > > [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud wrote: > > > > > > So, your config snippet LGTM. > > > > Le ven. 10 d?c. 2021 ? 17:50, Albert Braden a ?crit : > > > > Sorry, that was a transcription error. I thought "True" and my fingers typed "False." The correct lines are: > > > > [oslo_messaging_rabbit] > > amqp_durable_queues = True > > > > On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud wrote: > > > > > > If you plan to let `amqp_durable_queues = False` (i.e if you plan to keep this config equal to false), then you don't need to add these config lines as this is already the default value [1]. > > > > [1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34 > > > > Le jeu. 9 d?c. 2021 ? 22:40, Albert Braden a ?crit : > > > > Replying from my home email because I've been asked to not email the list from my work email anymore, until I get permission from upper management. > > > > I'm not sure I follow. I was planning to add 2 lines to etc/kolla/config/global.conf: > > > > [oslo_messaging_rabbit] > > amqp_durable_queues = False > > > > Is that not sufficient? What is involved in configuring dedicated control exchanges for each service? What would that look like in the config? > > > > > > From: Herve Beraud > > Sent: Thursday, December 9, 2021 2:45 AM > > To: Bogdan Dobrelya > > Cc: openstack-discuss at lists.openstack.org > > Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability > > > > > > > > Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > > > > > > > Le mer. 8 d?c. 2021 ? 11:48, Bogdan Dobrelya a ?crit : > > > > Please see inline > > > > >> I read this with great interest because we are seeing this issue. Questions: > > >> > > >> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? > > >> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? > > >> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into? > > > > > > Note that even having rabbit HA policies adjusted like that and its HA > > > replication factor [0] decreased (e.g. to a 2), there still might be > > > high churn caused by a large enough number of replicated durable RPC > > > topic queues. And that might cripple the cloud down with the incurred > > > I/O overhead because a durable queue requires all messages in it to be > > > persisted to a disk (for all the messaging cluster replicas) before they > > > are ack'ed by the broker. > > > > > > Given that said, Oslo messaging would likely require a more granular > > > control for topic exchanges and the durable queues flag - to tell it to > > > declare as durable only the most critical paths of a service. A single > > > config setting and a single control exchange per a service might be not > > > enough. > > > > Also note that therefore, amqp_durable_queue=True requires dedicated > > control exchanges configured for each service. Those that use > > 'openstack' as a default cannot turn the feature ON. Changing it to a > > service specific might also cause upgrade impact, as described in the > > topic [3]. > > > > > > > > The same is true for `amqp_auto_delete=True`. That requires dedicated control exchanges else it won't work if each service defines its own policy on a shared control exchange (e.g `openstack`) and if policies differ from each other. > > > > > > > > [3] https://review.opendev.org/q/topic:scope-config-opts > > > > > > > > There are also race conditions with durable queues enabled, like [1]. A > > > solution could be where each service declare its own dedicated control > > > exchange with its own configuration. > > > > > > Finally, openstack components should add perhaps a *.next CI job to test > > > it with durable queues, like [2] > > > > > > [0] https://www.rabbitmq.com/ha.html#replication-factor > > > > > > [1] > > > https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt > > > > > > [2] https://review.opendev.org/c/openstack/nova/+/820523 > > > > > >> > > >> Does anyone have a sample set of RMQ config files that they can share? > > >> > > >> It looks like my Outlook has ruined the link; reposting: > > >> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit > > > > > > > > > -- > > > Best regards, > > > Bogdan Dobrelya, > > > Irc #bogdando > > > > > > -- > > Best regards, > > Bogdan Dobrelya, > > Irc #bogdando > > > > > > > > > > -- > > > > Herv? Beraud > > > > Senior Software Engineer at Red Hat > > > > irc: hberaud > > > > https://github.com/4383/ > > > > https://twitter.com/4383hberaud > > > > > > > > -- > > Herv? Beraud > > Senior Software Engineer at Red Hat > > irc: hberaud > > https://github.com/4383/ > > https://twitter.com/4383hberaud > > > > > > > > -- > > Herv? Beraud > > Senior Software Engineer at Red Hat > > irc: hberaud > > https://github.com/4383/ > > https://twitter.com/4383hberaud > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Jan 25 21:02:20 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 25 Jan 2022 21:02:20 +0000 Subject: [release][infra] Tag permission regression (was: SSH host key issue - RedFlag stop approving) In-Reply-To: <20220125153822.yexqplti3kqy7t4t@yuggoth.org> References: <405abfa7-373c-fd4f-5b87-b57f77d64648@est.tech> <20220125153822.yexqplti3kqy7t4t@yuggoth.org> Message-ID: <20220125210219.2aazqxoojzgskyw3@yuggoth.org> On 2022-01-25 15:40:21 +0000 (+0000), Jeremy Stanley wrote: [...] > The actual error appears after that, indicating the signed tag > push was rejected for insufficient permissions. We suspect this > may be a regression in Gerrit 3.4 (to which we upgraded in > yesterday's maintenance) and are testing a couple of potential > workarounds while preparing to raise it with the Gerrit > developers. > > We'll keep everyone posted. We've merged a workaround so people won't be blocked: http://lists.opendev.org/pipermail/service-announce/2022-January/000031.html It's okay to resume approving automated release requests. If anyone is pushing tags directly to Gerrit for some reason, please be very careful to make sure they're signed until we can get the situation properly solved. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From rosmaita.fossdev at gmail.com Wed Jan 26 02:45:08 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 25 Jan 2022 21:45:08 -0500 Subject: [sdk][devstack] openstacksdk-functional-devstack job failing on stable branches Message-ID: <72c3aff1-1c18-24fc-9869-3e992cc518e0@gmail.com> openstacksdk-functional-devstack began failing in the stable branches (xena through ussuri) shortly after 2022-01-24 11:53:49 with the error openstack.exceptions.ResourceNotFound: ResourceNotFound: 404: Client Error for url: https://{ip_address}:9696/v2.0/qos/policies/69e472c0-37d2-46e0-9b9e-727c962c5511/minimum_packet_rate_rules, The resource could not be found. for these four tests: openstack.tests.functional.network.v2.test_qos_minimum_packet_rate_rule.TestQoSMinimumPacketRateRule.test_find openstack.tests.functional.network.v2.test_qos_minimum_packet_rate_rule.TestQoSMinimumPacketRateRule.test_get openstack.tests.functional.network.v2.test_qos_minimum_packet_rate_rule.TestQoSMinimumPacketRateRule.test_list openstack.tests.functional.network.v2.test_qos_minimum_packet_rate_rule.TestQoSMinimumPacketRateRule.test_update job build history: https://zuul.opendev.org/t/openstack/builds?job_name=openstacksdk-functional-devstack&branch=stable%2Fxena&branch=stable%2Fwallaby&branch=stable%2Fvictoria&branch=stable%2Fussuri It's passing in master and stable/{train,stein,rocky}. Anyone know what's up? From munnaeebd at gmail.com Wed Jan 26 05:49:10 2022 From: munnaeebd at gmail.com (Md. Hejbul Tawhid MUNNA) Date: Wed, 26 Jan 2022 11:49:10 +0600 Subject: Neutron issue || Remote error: TimeoutError QueuePool limit of size 5 overflow 50 reached In-Reply-To: References: <4248952.ejJDZkT8p0@p1> Message-ID: Hello Arnaud, Thank you so much for your valuable feedback. As per our neutron.conf file, #rpc_workers = 1 and #rpc_state_report_workers = 1 , is it default value neutron taken? What is the procedure to increase the value? remove # and increase the worker from 1 to 2 and restart neutron-server service? Regards, Munna On Tue, Jan 25, 2022 at 6:26 PM Arnaud wrote: > Hi, > > I am not an db expert, but openstack tends to open a LOT of connection to > the db. So one of the thing to monitor is the number of connection you > have/allow on the db side. > > Also, raising the number of RPC (and report states) workers will solve > your issue. > The good number is not easy to calculate, and depends on each deployment. > > A good approach is to the try/improve loop. > > Cheers, > Arnaud. > > > Le 25 janvier 2022 09:52:26 GMT+01:00, "Md. Hejbul Tawhid MUNNA" < > munnaeebd at gmail.com> a ?crit : >> >> Hello Arnaud, >> >> Thank you for your valuable reply. >> >> we did not modify default config of RPC worker . >> >> /etc/neutron/neutron.conf >> >> # Number of separate API worker processes for service. If not specified, >> the >> # default is equal to the number of CPUs available for best performance. >> # (integer value) >> #api_workers = >> >> # Number of RPC worker processes for service. (integer value) >> #rpc_workers = 1 >> >> # Number of RPC worker processes dedicated to state reports queue. >> (integer >> # value) >> #rpc_state_report_workers = 1 >> >> how to check load on database. RAM/CPU/Disk-IO utilization is low on the >> database server. >> >> Please guide us further >> >> Regards, >> Munna >> >> On Mon, Jan 24, 2022 at 6:29 PM Arnaud wrote: >> >>> Hi, >>> >>> I would also consider checking the number of RPC workers you have in >>> neutron.conf, this is maybe a better option to increase this before the >>> comnection pool params. >>> >>> Also, check your database, is it under load? >>> Updating agent state should not be long. >>> >>> Cheers, >>> Arnaud >>> >>> >>> >>> Le 24 janvier 2022 10:42:00 GMT+01:00, "Md. Hejbul Tawhid MUNNA" < >>> munnaeebd at gmail.com> a ?crit : >>>> >>>> Hi, >>>> >>>> Currently we have running 500+VM and total network is 383 including >>>> HA-network. >>>> >>>> Can you advice the appropriate value and is there any chance of service >>>> impact? >>>> >>>> Should we change the configuration in the neutron.conf on controller >>>> node? >>>> >>>> Regards, >>>> Munna >>>> >>>> >>>> >>>> On Mon, Jan 24, 2022 at 2:47 PM Slawek Kaplonski >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> On poniedzia?ek, 24 stycznia 2022 09:09:10 CET Md. Hejbul Tawhid MUNNA >>>>> wrote: >>>>> > Hi, >>>>> > >>>>> > Suddenly we have observed few VM down . then we have found some >>>>> agent are >>>>> > getting down (XXX) , agents are getting UP and down randomly. Please >>>>> check >>>>> > the attachment. >>>>> > >>>>> > >>>>> >>>>> ///////////////////////////////////////////////////////////////////////////// >>>>> > /////////////////////// /sqlalchemy/pool.py", line 788, in >>>>> _checkout\n >>>>> > fairy = >>>>> > _ConnectionRecord.checkout(pool)\n', u' File >>>>> > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 532, in >>>>> > checkout\n rec = pool._do_get()\n', u' File >>>>> > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1186, >>>>> in >>>>> > _do_get\n (self.size(), self.overflow(), self._timeout), >>>>> > code="3o7r")\n', u'TimeoutError: QueuePool limit of size 5 overflow >>>>> 50 >>>>> > reached, connection timed out, timeout 30 (Background on this error >>>>> at: >>>>> > http://sqlalche.me/e/3o7r)\n']. >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>> Traceback (most >>>>> > recent call last): >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>> > "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line >>>>> 837, in >>>>> > _report_state >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent True) >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>> > "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 97, in >>>>> > report_state >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>> return >>>>> > method(context, 'report_state', **kwargs) >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", >>>>> line 179, >>>>> > in call >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>> > retry=self.retry) >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>> > "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line >>>>> 133, >>>>> > in _send >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>> retry=retry) >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>> > >>>>> "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>>>> > line 645, in send >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>> > call_monitor_timeout, retry=retry) >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>> > >>>>> "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>>>> > line 636, in _send >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>> raise result >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>> RemoteError: >>>>> > Remote error: TimeoutError QueuePool limit of size 5 overflow 50 >>>>> reached, >>>>> > connection timed out, timeout 30 (Background on this error at: >>>>> > http://sqlalche.me/e/3o7r) >>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>> [u'Traceback >>>>> > (most recent call last):\n', u' File >>>>> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", >>>>> line 163, >>>>> > in _process_incoming\n res = >>>>> self.dispatcher.dispatch(message)\n', u' >>>>> > File >>>>> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", >>>>> > line 265, in dispatch\n return self._do_dispatch(endpoint, >>>>> method, ctxt, >>>>> > args)\n', u' File >>>>> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", >>>>> line >>>>> > 194, in _do_dispatch\n >>>>> > >>>>> > >>>>> >>>>> ///////////////////////////////////////////////////////////////////////////// >>>>> > //// >>>>> > >>>>> > Is there anything related with the following default configuration. >>>>> > >>>>> > /etc/neutron/neutron.conf >>>>> > #max_pool_size = 5 >>>>> > #max_overflow = 50 >>>>> >>>>> Yes. You probably have busy environment and You need to increase those >>>>> values >>>>> to have more connections from the neutron server to the database. >>>>> >>>>> > >>>>> > regards, >>>>> > Munna >>>>> >>>>> >>>>> >>>>> -- >>>>> Slawek Kaplonski >>>>> Principal Software Engineer >>>>> Red Hat >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Wed Jan 26 08:00:05 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Wed, 26 Jan 2022 09:00:05 +0100 Subject: [sdk][devstack] openstacksdk-functional-devstack job failing on stable branches In-Reply-To: <72c3aff1-1c18-24fc-9869-3e992cc518e0@gmail.com> References: <72c3aff1-1c18-24fc-9869-3e992cc518e0@gmail.com> Message-ID: <5787685.lOV4Wx5bFT@p1> Hi, On ?roda, 26 stycznia 2022 03:45:08 CET Brian Rosmaita wrote: > openstacksdk-functional-devstack began failing in the stable branches > (xena through ussuri) shortly after 2022-01-24 11:53:49 with the error > > openstack.exceptions.ResourceNotFound: ResourceNotFound: 404: Client > Error for url: > https://{ip_address}:9696/v2.0/qos/policies/ 69e472c0-37d2-46e0-9b9e-727c962c5 > 511/minimum_packet_rate_rules, The resource could not be found. > > for these four tests: > > openstack.tests.functional.network.v2.test_qos_minimum_packet_rate_rule.TestQ > oSMinimumPacketRateRule.test_find > openstack.tests.functional.network.v2.test_qos_minimum_packet_rate_rule.Test > QoSMinimumPacketRateRule.test_get > openstack.tests.functional.network.v2.test_qos_minimum_packet_rate_rule.Test > QoSMinimumPacketRateRule.test_list > openstack.tests.functional.network.v2.test_qos_minimum_packet_rate_rule.Test > QoSMinimumPacketRateRule.test_update > > job build history: > > https://zuul.opendev.org/t/openstack/builds?job_name=openstacksdk-functional-> devstack&branch=stable%2Fxena&branch=stable%2Fwallaby&branch=stable%2Fvictori > a&branch=stable%2Fussuri > > It's passing in master and stable/{train,stein,rocky}. Anyone know > what's up? Those tests are available only in master branch: [1] and they not exists in e.g. stable/xena [2]. I'm not sure how those jobs are defined exactly but my guess is that on those stable branches where it fails it runs openstacksdk from master branch. So either it should be changed and proper stable branch of the openstacksdk should be used there or we should skip those tests when API extension in neuron is not present. [1] https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/ tests/functional/network/v2/test_qos_minimum_packet_rate_rule.py [2] https://opendev.org/openstack/openstacksdk/src/branch/stable/xena/ openstack/tests/functional/network/v2/test_qos_minimum_packet_rate_rule.py -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From mbultel at redhat.com Wed Jan 26 08:41:53 2022 From: mbultel at redhat.com (Mathieu Bultel) Date: Wed, 26 Jan 2022 09:41:53 +0100 Subject: [TripleO] Proposing Juan Badia Payno for TripleO core reviewer. In-Reply-To: References: Message-ID: +1 On Tue, Jan 25, 2022 at 7:51 AM Carlos Camacho Gonzalez wrote: > Thanks Juan for your effort! > > On Fri, Jan 14, 2022 at 4:11 PM Alex Schultz wrote: > >> +1 >> >> On Wed, Jan 12, 2022 at 6:16 AM Carlos Camacho Gonzalez >> wrote: >> > >> > Hi everyone! >> > >> > I'd like to propose Juan Badia Paino [1][2][3] as a core reviewer on >> the TripleO repositories that are or might be related to the backup and >> restore efforts (openstack/tripleo-ci, openstack/tripleo-ansible, >> openstack/python-tripleoclient, openstack/tripleo-quickstart-extras, >> openstack/tripleo-quickstart). >> > >> > Juan has been around since 2016 making useful contributions and code >> reviews to the community and I believe adding him to our core reviewer >> group will help us improve the review and coding speed for the backup and >> restore codebase. >> > >> > As usual, consider this email as an initial +1 from my side, I will >> keep an eye on this thread for a week, and based on your feedback and if >> there are no objections I will add him as a core reviewer in two weeks. >> > >> > [1]: https://review.opendev.org/q/owner:jbadiapa%2540redhat.com >> > [2]: >> https://www.stackalytics.io/?project_type=all&metric=commits&user_id=jbadiapa&release=all >> > [3]: >> https://www.stackalytics.io/report/contribution?module=tripleo-group&project_type=openstack&days=180 >> > >> > Cheers, >> > Carlos. >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbadiapa at redhat.com Wed Jan 26 08:46:55 2022 From: jbadiapa at redhat.com (Juan Badia Payno) Date: Wed, 26 Jan 2022 09:46:55 +0100 Subject: [TripleO] Proposing Juan Badia Payno for TripleO core reviewer. In-Reply-To: References: Message-ID: Thank you all !!!! :) On Wed, Jan 26, 2022 at 9:42 AM Mathieu Bultel wrote: > +1 > > On Tue, Jan 25, 2022 at 7:51 AM Carlos Camacho Gonzalez < > ccamacho at redhat.com> wrote: > >> Thanks Juan for your effort! >> >> On Fri, Jan 14, 2022 at 4:11 PM Alex Schultz wrote: >> >>> +1 >>> >>> On Wed, Jan 12, 2022 at 6:16 AM Carlos Camacho Gonzalez >>> wrote: >>> > >>> > Hi everyone! >>> > >>> > I'd like to propose Juan Badia Paino [1][2][3] as a core reviewer on >>> the TripleO repositories that are or might be related to the backup and >>> restore efforts (openstack/tripleo-ci, openstack/tripleo-ansible, >>> openstack/python-tripleoclient, openstack/tripleo-quickstart-extras, >>> openstack/tripleo-quickstart). >>> > >>> > Juan has been around since 2016 making useful contributions and code >>> reviews to the community and I believe adding him to our core reviewer >>> group will help us improve the review and coding speed for the backup and >>> restore codebase. >>> > >>> > As usual, consider this email as an initial +1 from my side, I will >>> keep an eye on this thread for a week, and based on your feedback and if >>> there are no objections I will add him as a core reviewer in two weeks. >>> > >>> > [1]: https://review.opendev.org/q/owner:jbadiapa%2540redhat.com >>> > [2]: >>> https://www.stackalytics.io/?project_type=all&metric=commits&user_id=jbadiapa&release=all >>> > [3]: >>> https://www.stackalytics.io/report/contribution?module=tripleo-group&project_type=openstack&days=180 >>> > >>> > Cheers, >>> > Carlos. >>> >>> >>> -- Juan Badia Payno Senior Software Engineer Red Hat EMEA ENG Openstack Infrastructure -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkopec at redhat.com Wed Jan 26 10:10:20 2022 From: mkopec at redhat.com (Martin Kopec) Date: Wed, 26 Jan 2022 11:10:20 +0100 Subject: [heat][interop] StackBuildErrorException on a master job In-Reply-To: References: Message-ID: In the neutron logs we can see: Jan 25 23:57:53 devstack neutron-server[155870]: ERROR neutron.plugins.ml2.managers [req-6189506a-aad6-400b-bb80-e358d9695ddf req-75afd428-da64-4562-9401-135906d2d0f8 service neutron] Port db268d9e-2a24-45bd-8bfc-5b06f5597927 does not have an IP address assigned and there are no driver with 'connectivity' = 'l2'. The port cannot be bound. Jan 25 23:57:53 devstack neutron-server[155870]: ERROR neutron.plugins.ml2.managers [req-6189506a-aad6-400b-bb80-e358d9695ddf req-75afd428-da64-4562-9401-135906d2d0f8 service neutron] Failed to bind port db268d9e-2a24-45bd-8bfc-5b06f5597927 on host devstack for vnic_type normal using segments [{'id': '84199152-f782-4b64-9e6a-fc0dce2ec606', 'network_type': 'geneve', 'physical_network': None, 'segmentation_id': 3273, 'network_id': 'e5238eb0-54bd-40a6-bfa2-575464ec42fd'}] We continue investigating possible root causes. On Mon, 10 Jan 2022 at 09:19, Martin Kopec wrote: > Hi there, > > our interop master job (jobs for older OpenStack releases are fine) is > constantly failing on 2 heat tests: > > heat_tempest_plugin.tests.functional.test_software_config.ParallelDeploymentsTest.test_deployments_metadata > heat_tempest_plugin.tests.functional.test_software_config.ParallelDeploymentsTest.test_deployments_timeout_failed > > Full log at: > https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c51/819918/1/check/refstack-client-devstack-master/c51816e/job-output.txt > > Patch where you can see all the jobs - master one failing and the rest passing: > https://review.opendev.org/c/openinfra/refstack-client/+/819918 > > We can't figure out what's going on there. Maybe additional configuration > of the deployment is required? New requirements? > Any ideas what the job is missing? > > Thank you, > -- > Martin Kopec > Senior Software Quality Engineer > Red Hat EMEA > > > > -- Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Wed Jan 26 11:49:11 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Wed, 26 Jan 2022 12:49:11 +0100 Subject: [release][infra] Tag permission regression - GreenFlag - approving can be continued In-Reply-To: <20220125210219.2aazqxoojzgskyw3@yuggoth.org> References: <405abfa7-373c-fd4f-5b87-b57f77d64648@est.tech> <20220125153822.yexqplti3kqy7t4t@yuggoth.org> <20220125210219.2aazqxoojzgskyw3@yuggoth.org> Message-ID: <0f47ae0f-bb6e-5c4d-202f-2753b0c8dc6e@est.tech> Thanks for the workaround! The first tag-release jobs started to pass, so it works. The Red Flag for approving patches is now officially removed :) The release management team started to approve release patches. Thanks again, El?d irc:elodilles @ #openstack-release On 2022. 01. 25. 22:02, Jeremy Stanley wrote: > On 2022-01-25 15:40:21 +0000 (+0000), Jeremy Stanley wrote: > [...] >> The actual error appears after that, indicating the signed tag >> push was rejected for insufficient permissions. We suspect this >> may be a regression in Gerrit 3.4 (to which we upgraded in >> yesterday's maintenance) and are testing a couple of potential >> workarounds while preparing to raise it with the Gerrit >> developers. >> >> We'll keep everyone posted. > We've merged a workaround so people won't be blocked: > > http://lists.opendev.org/pipermail/service-announce/2022-January/000031.html > > It's okay to resume approving automated release requests. If anyone > is pushing tags directly to Gerrit for some reason, please be very > careful to make sure they're signed until we can get the situation > properly solved. From manchandavishal143 at gmail.com Wed Jan 26 13:07:35 2022 From: manchandavishal143 at gmail.com (vishal manchanda) Date: Wed, 26 Jan 2022 18:37:35 +0530 Subject: [horizon] Cancelling Today's weekly meeting Message-ID: Hello Team, Since there are no agenda items [1] to discuss for today's horizon weekly meeting. Also, Today is a holiday for me. So let's cancel today's weekly meeting. The next weekly meeting will be on 2nd February. Thanks & Regards, Vishal Manchanda [1] https://etherpad.opendev.org/p/horizon-release-priorities -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed Jan 26 13:26:55 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 26 Jan 2022 13:26:55 +0000 Subject: [sdk][devstack] openstacksdk-functional-devstack job failing on stable branches In-Reply-To: <5787685.lOV4Wx5bFT@p1> References: <72c3aff1-1c18-24fc-9869-3e992cc518e0@gmail.com> <5787685.lOV4Wx5bFT@p1> Message-ID: <20220126132655.tm633dhzwdguum27@yuggoth.org> On 2022-01-26 09:00:05 +0100 (+0100), Slawek Kaplonski wrote: [...] > Those tests are available only in master branch: [1] and they not > exists in e.g. stable/xena [2]. > > I'm not sure how those jobs are defined exactly but my guess is > that on those stable branches where it fails it runs openstacksdk > from master branch. So either it should be changed and proper > stable branch of the openstacksdk should be used there or we > should skip those tests when API extension in neuron is not > present. [...] I find it intriguing that the SDK has stable branches at all. Isn't the goal that the latest version of the SDK be able to interface with multiple versions of OpenStack services, new and old? If we don't test that the current SDK code works with Neutron from Xena, then that opens it up to growing serious problems for users. Having the SDK or tests work out what features are expected to work sounds like the only sensible solution. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From smooney at redhat.com Wed Jan 26 13:35:55 2022 From: smooney at redhat.com (Sean Mooney) Date: Wed, 26 Jan 2022 13:35:55 +0000 Subject: [sdk][devstack] openstacksdk-functional-devstack job failing on stable branches In-Reply-To: <20220126132655.tm633dhzwdguum27@yuggoth.org> References: <72c3aff1-1c18-24fc-9869-3e992cc518e0@gmail.com> <5787685.lOV4Wx5bFT@p1> <20220126132655.tm633dhzwdguum27@yuggoth.org> Message-ID: <36f75a014497335c2ccb3c31c376a341625fa62b.camel@redhat.com> On Wed, 2022-01-26 at 13:26 +0000, Jeremy Stanley wrote: > On 2022-01-26 09:00:05 +0100 (+0100), Slawek Kaplonski wrote: > [...] > > Those tests are available only in master branch: [1] and they not > > exists in e.g. stable/xena [2]. > > > > I'm not sure how those jobs are defined exactly but my guess is > > that on those stable branches where it fails it runs openstacksdk > > from master branch. So either it should be changed and proper > > stable branch of the openstacksdk should be used there or we > > should skip those tests when API extension in neuron is not > > present. > [...] > > I find it intriguing that the SDK has stable branches at all. Isn't > the goal that the latest version of the SDK be able to interface > with multiple versions of OpenStack services, new and old? If we > don't test that the current SDK code works with Neutron from Xena, > then that opens it up to growing serious problems for users. > ya that seam odd to me too. i would expect sdk to be release independent and work simialr to tempest where master sdk should be useable with all stable branches. > Having the SDK or tests work out what features are expected to work > sounds like the only sensible solution. From tobias.urdin at binero.com Wed Jan 26 13:51:33 2022 From: tobias.urdin at binero.com (Tobias Urdin) Date: Wed, 26 Jan 2022 13:51:33 +0000 Subject: [puppet][tc] Propose changing the release model to cycle-with-rc In-Reply-To: References: Message-ID: <5680715D-5BB2-42D4-80F9-1BC5E2715384@binero.com> +1, let?s get this changed for Z cycle. > On 24 Jan 2022, at 02:55, Takashi Kajinami wrote: > > Hello, > > > I already discussed this with a few cores last year but I'm raising this topic > in ML to make an official decision. > > Currently Puppet OpenStack is following cycle-with-intermediary and > creates a release every milestone. However our code is tightly related > to the actual service implementations and having only puppet releases > is not very useful. > > Considering the above point, and effort required to cut off releases per milestone, > I'll propose changing our release model to cycle-with-rc , and creating a single release. > > Because we already created milestone releases for Yoga, I'm thinking of applying > the change from next cycle(Z). > > Please let me know if you have any concerns. > > Thank you, > Takashi From zigo at debian.org Wed Jan 26 14:01:34 2022 From: zigo at debian.org (Thomas Goirand) Date: Wed, 26 Jan 2022 15:01:34 +0100 Subject: [puppet][tc] Propose changing the release model to cycle-with-rc In-Reply-To: References: Message-ID: <714118a4-fb77-017b-ec93-575ba541f15a@debian.org> On 1/24/22 02:55, Takashi Kajinami wrote: > Hello, > > > I already discussed this with a few cores last year but I'm raising this > topic > in ML to make an official decision. > > Currently Puppet OpenStack is following cycle-with-intermediary and > creates a release every milestone. However our code is tightly related > to the actual service implementations and having only puppet releases > is not very useful. > > Considering the above point, and effort required to cut off releases per > milestone, > I'll propose changing our release model to cycle-with-rc , and creating > a single release. > > Because we already created milestone releases for Yoga, I'm thinking of > applying > the change from next cycle(Z). > > Please let me know if you have any concerns. > > Thank you, > Takashi +1 From artem.goncharov at gmail.com Wed Jan 26 14:23:58 2022 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Wed, 26 Jan 2022 15:23:58 +0100 Subject: [sdk][devstack] openstacksdk-functional-devstack job failing on stable branches In-Reply-To: <36f75a014497335c2ccb3c31c376a341625fa62b.camel@redhat.com> References: <72c3aff1-1c18-24fc-9869-3e992cc518e0@gmail.com> <5787685.lOV4Wx5bFT@p1> <20220126132655.tm633dhzwdguum27@yuggoth.org> <36f75a014497335c2ccb3c31c376a341625fa62b.camel@redhat.com> Message-ID: <4D3A881A-4EA8-457A-ACFD-48E76BED0919@gmail.com> Hi > On 26. Jan 2022, at 14:35, Sean Mooney wrote: > > On Wed, 2022-01-26 at 13:26 +0000, Jeremy Stanley wrote: >> On 2022-01-26 09:00:05 +0100 (+0100), Slawek Kaplonski wrote: >> [...] >>> Those tests are available only in master branch: [1] and they not >>> exists in e.g. stable/xena [2]. >>> >>> I'm not sure how those jobs are defined exactly but my guess is >>> that on those stable branches where it fails it runs openstacksdk >>> from master branch. So either it should be changed and proper >>> stable branch of the openstacksdk should be used there or we >>> should skip those tests when API extension in neuron is not >>> present. >> [...] >> >> I find it intriguing that the SDK has stable branches at all. Isn't >> the goal that the latest version of the SDK be able to interface >> with multiple versions of OpenStack services, new and old? If we >> don't test that the current SDK code works with Neutron from Xena, >> then that opens it up to growing serious problems for users. >> > ya that seam odd to me too. > i would expect sdk to be release independent and work simialr to tempest > where master sdk should be useable with all stable branches. >> Having the SDK or tests work out what features are expected to work >> sounds like the only sensible solution. Well, actually it should be working. Cause of absence of real tests this way I can not guarantee it will work in every case, but we do our best to keep it this way (and most of the tests are checking whether feature is available or not). Honestly I have no clue why it evolved this way and therefore can?t really comment on that. Maybe (just maybe) there was something from the Debian (or other distro) packaging pov (stable, not-stable, etc) that somehow leaded to this setup. Otherwise there is absolutely no possibility to limit which version of sdk need to go into which older distro. And since i.e. ansible and openstackclient depend on the sdk things are getting even more interesting. Sdk in this sense is comparable with keystoneauth lib (and actually depend on it). So once we say there are no stable branches of sdk anymore we open even worse can of worms. Since we are currently anyway in a phase of big rework in sdk I would say addressing tests that do not work on older branches can be done as well. Artem -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Wed Jan 26 14:33:23 2022 From: jungleboyj at gmail.com (Jay Bryant) Date: Wed, 26 Jan 2022 08:33:23 -0600 Subject: Z Release Naming ... Message-ID: <935942c0-28dc-22f5-ca88-f2c38592b4b2@gmail.com> All, As in the past, I would like the community's input on the 'Z' release name [1] .? I have narrowed the list down to my top 4 choices.? Please take a few minutes to vote.? The poll closes around 14:30 UTC on 1/28/22. Thanks for your input! Jay (jungleboyj) [1] https://twitter.com/jungleboyj/status/1486344910693441541 From elod.illes at est.tech Wed Jan 26 14:40:03 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Wed, 26 Jan 2022 15:40:03 +0100 Subject: [monasca] [Release-job-failures] Release of openstack/monasca-agent for ref refs/tags/4.0.1 failed In-Reply-To: References: Message-ID: <3a245f33-2cdf-5076-3a38-0ce3390ed5e5@est.tech> Hi Monasca team, The forwarded mail contains a release job failure, specifically the publish-monasca-agent-docker-images job failed [1] with a missing package during docker image build. Can you please have a look at the issue? [1] https://zuul.opendev.org/t/openstack/build/2ebcbfec634c4c068e9f7625473ea726 Thanks in advance, El?d irc:elodilles @ #openstack-release -------- Forwarded Message -------- Subject: [Release-job-failures] Release of openstack/monasca-agent for ref refs/tags/4.0.1 failed Date: Wed, 26 Jan 2022 11:57:09 +0000 From: zuul at openstack.org Reply-To: openstack-discuss at lists.openstack.org To: release-job-failures at lists.openstack.org Build failed. - openstack-upload-github-mirror https://zuul.opendev.org/t/openstack/build/bba91d0025a2496087a29803451a314f : SUCCESS in 45s - release-openstack-python https://zuul.opendev.org/t/openstack/build/a749eb25de1d426e845740cae1876afe : SUCCESS in 3m 50s - announce-release https://zuul.opendev.org/t/openstack/build/551dd5bd118347908b7113a5e2fa5c5e : SUCCESS in 5m 37s - propose-update-constraints https://zuul.opendev.org/t/openstack/build/e95a5bee9817491d84c89f89758872d6 : SUCCESS in 4m 18s - publish-monasca-agent-docker-images https://zuul.opendev.org/t/openstack/build/2ebcbfec634c4c068e9f7625473ea726 : POST_FAILURE in 7m 08s _______________________________________________ Release-job-failures mailing list Release-job-failures at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/release-job-failures -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed Jan 26 16:27:05 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 26 Jan 2022 16:27:05 +0000 Subject: [monasca] [Release-job-failures] Release of openstack/monasca-agent for ref refs/tags/4.0.1 failed In-Reply-To: <3a245f33-2cdf-5076-3a38-0ce3390ed5e5@est.tech> References: <3a245f33-2cdf-5076-3a38-0ce3390ed5e5@est.tech> Message-ID: <20220126162705.2e7hb72ivjbzo5dw@yuggoth.org> On 2022-01-26 15:40:03 +0100 (+0100), El?d Ill?s wrote: > The forwarded mail contains a release job failure, specifically the > publish-monasca-agent-docker-images job failed [1] with a missing package > during docker image build. Can you please have a look at the issue? > > [1] https://zuul.opendev.org/t/openstack/build/2ebcbfec634c4c068e9f7625473ea726 [...] The error seems to stem from the attempt to install py2-psutil from Alpine 3.15 finding no suitable package. This is done here in the image build: https://opendev.org/openstack/monasca-agent/src/commit/8c8a85a1d89dc430f831d31a36d2e282d1c2a387/docker/collector/Dockerfile#L32 If Python 2.7 libraries are actually required for that image, an older version of Alpine may be needed since there seems to be work in progress to drop support for EOL Python versions: https://gitlab.alpinelinux.org/alpine/aports/-/issues/12740 -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Wed Jan 26 17:18:34 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 26 Jan 2022 17:18:34 +0000 Subject: [release][infra] Tag permission regression - GreenFlag - approving can be continued In-Reply-To: <0f47ae0f-bb6e-5c4d-202f-2753b0c8dc6e@est.tech> References: <405abfa7-373c-fd4f-5b87-b57f77d64648@est.tech> <20220125153822.yexqplti3kqy7t4t@yuggoth.org> <20220125210219.2aazqxoojzgskyw3@yuggoth.org> <0f47ae0f-bb6e-5c4d-202f-2753b0c8dc6e@est.tech> Message-ID: <20220126171833.iq4julazg6kua4d7@yuggoth.org> On 2022-01-26 12:49:11 +0100 (+0100), El?d Ill?s wrote: > Thanks for the workaround! The first tag-release jobs started to > pass, so it works. > > The Red Flag for approving patches is now officially removed :) > The release management team started to approve release patches. [...] The three failed releases from yesterday have been rerun now and seem to have worked as well. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ralonsoh at redhat.com Wed Jan 26 17:54:49 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Wed, 26 Jan 2022 18:54:49 +0100 Subject: [heat][interop] StackBuildErrorException on a master job In-Reply-To: References: Message-ID: Hi Martin: I've opened https://bugs.launchpad.net/neutron/+bug/1959125 explaining the problem. I've pushed the corresponding patches for Neutron and neutron-lib. Thank you for the information provided today via IRC. Regards. On Wed, Jan 26, 2022 at 11:17 AM Martin Kopec wrote: > In the neutron logs we can see: > > Jan 25 23:57:53 devstack neutron-server[155870]: ERROR > neutron.plugins.ml2.managers [req-6189506a-aad6-400b-bb80-e358d9695ddf > req-75afd428-da64-4562-9401-135906d2d0f8 service neutron] Port > db268d9e-2a24-45bd-8bfc-5b06f5597927 does not have an IP address assigned > and there are no driver with 'connectivity' = 'l2'. The port cannot be > bound. > Jan 25 23:57:53 devstack neutron-server[155870]: ERROR > neutron.plugins.ml2.managers [req-6189506a-aad6-400b-bb80-e358d9695ddf > req-75afd428-da64-4562-9401-135906d2d0f8 service neutron] Failed to bind > port db268d9e-2a24-45bd-8bfc-5b06f5597927 on host devstack for vnic_type > normal using segments [{'id': '84199152-f782-4b64-9e6a-fc0dce2ec606', > 'network_type': 'geneve', 'physical_network': None, 'segmentation_id': > 3273, 'network_id': 'e5238eb0-54bd-40a6-bfa2-575464ec42fd'}] > > We continue investigating possible root causes. > > On Mon, 10 Jan 2022 at 09:19, Martin Kopec wrote: > >> Hi there, >> >> our interop master job (jobs for older OpenStack releases are fine) is >> constantly failing on 2 heat tests: >> >> heat_tempest_plugin.tests.functional.test_software_config.ParallelDeploymentsTest.test_deployments_metadata >> heat_tempest_plugin.tests.functional.test_software_config.ParallelDeploymentsTest.test_deployments_timeout_failed >> >> Full log at: >> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c51/819918/1/check/refstack-client-devstack-master/c51816e/job-output.txt >> >> Patch where you can see all the jobs - master one failing and the rest passing: >> https://review.opendev.org/c/openinfra/refstack-client/+/819918 >> >> We can't figure out what's going on there. Maybe additional configuration >> of the deployment is required? New requirements? >> Any ideas what the job is missing? >> >> Thank you, >> -- >> Martin Kopec >> Senior Software Quality Engineer >> Red Hat EMEA >> >> >> >> > > -- > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pradeepantil at gmail.com Tue Jan 25 12:15:32 2022 From: pradeepantil at gmail.com (Pradeep Antil) Date: Tue, 25 Jan 2022 17:45:32 +0530 Subject: Compute Node Not Getting PXE boot During Scale-In/Out Activity Message-ID: Hello Folks, Hope you are doing well ! I have a compute node (SRIOV) in my RedHat Openstack 13 Lab. I have a task to remove the sriov compute and add it back as a dpdk compute node. After removing this node as scale-in activity via overcloud stack when I tried to add it again, introspection for this node is failing. After looking into the logs in undercloud , it appears like it is not getting the ip address from pxe. # cat /var/log/messages|grep -i "0c:42:a1:45:03:10" --color [image: image.png] [root at director ~]# tcpdump -n -i br-ctlplane -e | grep -i "0c:42:a1:45:03:10" --color tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-ctlplane, link-type EN10MB (Ethernet), capture size 262144 bytes 23:54:08.653057 0c:42:a1:45:03:10 > Broadcast, ethertype IPv4 (0x0800), length 436: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0c:42:a1:45:03:10, length 394 23:54:12.669877 0c:42:a1:45:03:10 > Broadcast, ethertype IPv4 (0x0800), length 436: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0c:42:a1:45:03:10, length 394 23:54:20.744605 0c:42:a1:45:03:10 > Broadcast, ethertype IPv4 (0x0800), length 436: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0c:42:a1:45:03:10, length 394 23:54:36.839226 0c:42:a1:45:03:10 > Broadcast, ethertype IPv4 (0x0800), length 436: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0c:42:a1:45:03:10, length 394 23:56:48.967623 0c:42:a1:45:03:10 > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 90: :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 23:56:49.044752 0c:42:a1:45:03:10 > Broadcast, ethertype IPv4 (0x0800), length 342: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0c:42:a1:45:03:10, length 300 23:56:49.269645 0c:42:a1:45:03:10 > 33:33:ff:45:03:10, ethertype IPv6 (0x86dd), length 86: :: > ff02::1:ff45:310: ICMP6, neighbor solicitation, who has fe80::e42:a1ff:fe45:310, length 32 23:56:49.781577 0c:42:a1:45:03:10 > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 90: :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 23:56:50.271663 0c:42:a1:45:03:10 > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 90: fe80::e42:a1ff:fe45:310 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 23:56:50.271922 0c:42:a1:45:03:10 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 70: fe80::e42:a1ff:fe45:310 > ff02::2: ICMP6, router solicitation, length 16 23:56:50.368576 0c:42:a1:45:03:10 > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 90: fe80::e42:a1ff:fe45:310 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 23:56:54.274049 0c:42:a1:45:03:10 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 70: fe80::e42:a1ff:fe45:310 > ff02::2: ICMP6, router solicitation, length 16 23:56:54.675024 0c:42:a1:45:03:10 > Broadcast, ethertype IPv4 (0x0800), length 342: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0c:42:a1:45:03:10, length 300 23:56:58.282332 0c:42:a1:45:03:10 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 70: fe80::e42:a1ff:fe45:310 > ff02::2: ICMP6, router solicitation, length 16 23:57:07.638252 0c:42:a1:45:03:10 > Broadcast, ethertype IPv4 (0x0800), length 342: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0c:42:a1:45:03:10, length 300 23:57:10.053205 0c:42:a1:45:03:10 > Broadcast, ethertype IPv4 (0x0800), length 342: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0c:42:a1:45:03:10, length 300 23:57:18.720085 0c:42:a1:45:03:10 > Broadcast, ethertype IPv4 (0x0800), length 342: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0c:42:a1:45:03:10, length 300 23:57:32.817306 0c:42:a1:45:03:10 > Broadcast, ethertype IPv4 (0x0800), length 342: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0c:42:a1:45:03:10, length 300 Physical connectivity of this node is intact. Any idea and suggestions ? how this issue can be fixed. -- Best Regards Pradeep Kumar -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 147668 bytes Desc: not available URL: From kennelson11 at gmail.com Wed Jan 26 18:47:14 2022 From: kennelson11 at gmail.com (Kendall Nelson) Date: Wed, 26 Jan 2022 10:47:14 -0800 Subject: 2021 OpenInfra Foundation Annual Report Now Available Message-ID: Hello Everyone! The 2021 OpenInfra Annual Report is live! It includes sections on all OpenInfra projects, Foundation updates, open infrastructure growth, SIG and WG updates, and more. You can find the 2021 OpenInfra Annual Report here: https://openinfra.dev/annual-report/2021 Thanks! Kendall (diablo_rojo) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkopec at redhat.com Wed Jan 26 19:31:45 2022 From: mkopec at redhat.com (Martin Kopec) Date: Wed, 26 Jan 2022 20:31:45 +0100 Subject: [heat][interop] StackBuildErrorException on a master job In-Reply-To: References: Message-ID: Thank you Rodolfo for following up and fixing the problem, it's very much appreciated. On Wed, 26 Jan 2022 at 18:55, Rodolfo Alonso Hernandez wrote: > Hi Martin: > > I've opened https://bugs.launchpad.net/neutron/+bug/1959125 explaining > the problem. I've pushed the corresponding patches for Neutron and > neutron-lib. > > Thank you for the information provided today via IRC. > > Regards. > > > On Wed, Jan 26, 2022 at 11:17 AM Martin Kopec wrote: > >> In the neutron logs we can see: >> >> Jan 25 23:57:53 devstack neutron-server[155870]: ERROR >> neutron.plugins.ml2.managers [req-6189506a-aad6-400b-bb80-e358d9695ddf >> req-75afd428-da64-4562-9401-135906d2d0f8 service neutron] Port >> db268d9e-2a24-45bd-8bfc-5b06f5597927 does not have an IP address assigned >> and there are no driver with 'connectivity' = 'l2'. The port cannot be >> bound. >> Jan 25 23:57:53 devstack neutron-server[155870]: ERROR >> neutron.plugins.ml2.managers [req-6189506a-aad6-400b-bb80-e358d9695ddf >> req-75afd428-da64-4562-9401-135906d2d0f8 service neutron] Failed to bind >> port db268d9e-2a24-45bd-8bfc-5b06f5597927 on host devstack for vnic_type >> normal using segments [{'id': '84199152-f782-4b64-9e6a-fc0dce2ec606', >> 'network_type': 'geneve', 'physical_network': None, 'segmentation_id': >> 3273, 'network_id': 'e5238eb0-54bd-40a6-bfa2-575464ec42fd'}] >> >> We continue investigating possible root causes. >> >> On Mon, 10 Jan 2022 at 09:19, Martin Kopec wrote: >> >>> Hi there, >>> >>> our interop master job (jobs for older OpenStack releases are fine) is >>> constantly failing on 2 heat tests: >>> >>> heat_tempest_plugin.tests.functional.test_software_config.ParallelDeploymentsTest.test_deployments_metadata >>> heat_tempest_plugin.tests.functional.test_software_config.ParallelDeploymentsTest.test_deployments_timeout_failed >>> >>> Full log at: >>> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c51/819918/1/check/refstack-client-devstack-master/c51816e/job-output.txt >>> >>> Patch where you can see all the jobs - master one failing and the rest passing: >>> https://review.opendev.org/c/openinfra/refstack-client/+/819918 >>> >>> We can't figure out what's going on there. Maybe additional >>> configuration of the deployment is required? New requirements? >>> Any ideas what the job is missing? >>> >>> Thank you, >>> -- >>> Martin Kopec >>> Senior Software Quality Engineer >>> Red Hat EMEA >>> >>> >>> >>> >> >> -- >> Martin >> > -- Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Wed Jan 26 20:41:56 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 26 Jan 2022 17:41:56 -0300 Subject: [cinder] Bug deputy report for week of 01-19-2022 In-Reply-To: References: Message-ID: Thanks Takashi for pointing that out. I missed this message. I'm going PTO the next two weeks and I'm not sure if there will be a bug meeting. I see that the CI problem in stable/ussuri persists. I'll try to take a look but if anyone can take a look at it is welcome too. Maybe you can point this bug out at the cinder meeting next week. Thanks in advance Sofia On Fri, Jan 21, 2022 at 11:43 AM Takashi Kajinami wrote: > There is one more bug I've reported under cinderlib. > https://bugs.launchpad.net/cinderlib/+bug/1958159 "stable/ussuri: > cinderlib-lvm-functional job is failing" . Unassigned > > AS reported, currently there is one CI job broken. The issue is currently > observed in stable/ussuri only, and > is a potential blocker for CI migration from CentOS8 to CentOS8 Stream[1]. > [1] > https://review.opendev.org/q/topic:%2522remove-centos-8%2522+project:openstack/cinderlib > > The error looks strange and unfortunately I've not yet got time to > identify the cause. > I would appreciate it if anyone can take a look. > > On Wed, Jan 19, 2022 at 9:59 PM Sofia Enriquez > wrote: > >> This is a bug report from 01-12-2022 to 01-19-2022. >> Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting >> >> ----------------------------------------------------------------------------------------- >> >> Medium >> >> - https://bugs.launchpad.net/cinder/+bug/1957804 "RBD deferred >> deletion causes undeletable RBD snapshots." In Progress. Assigned to Eric. >> - https://bugs.launchpad.net/cinder/+bug/1958122 "HPE 3PAR: In multi >> host env, multi-detach works partially if volume is attached to instances >> from separate hosts." In Progress. Assigned to Raghavendra Tilay. >> >> Low >> >> - https://bugs.launchpad.net/cinder/+bug/1958023 "Solidfire: there >> are some references to the removed parameters left." In Progress. Assigned >> to kajinamit. >> - https://bugs.launchpad.net/cinder/+bug/1958245 "NetApp ONTAP driver >> shows type error exception when replicating FlexGroups." Unassigned. >> >> Cheers, >> >> -- >> >> Sof?a Enriquez >> >> she/her >> >> Software Engineer >> >> Red Hat PnT >> >> IRC: @enriquetaso >> @RedHat Red Hat >> Red Hat >> >> >> >> -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From tburke at nvidia.com Wed Jan 26 21:14:16 2022 From: tburke at nvidia.com (Timothy Burke) Date: Wed, 26 Jan 2022 21:14:16 +0000 Subject: [swift] Dropping support for py27 Message-ID: The Swift team plans for the coming Yoga release to be the final Swift release to support Python 2.7. This is long overdue; it is now two years since the official Python 2.7 end-of-life. We were relatively late in adding Python 3 support, however, and wanted to ensure a long-enough period to root out Python 3 bugs and transition existing Python 2 clusters. We've used that time effectively; now it's time to move forward. We'll do what we can to ensure subsequent releases are *not* considered for installation on unsupported versions of Python. Notably, we'll ensure wheels are no longer flagged as universal, and `python_requires` will be set. If you have a continued interest in running latest Swift on Python 2.7, please get in touch with us, either here on the mailing list or in #openstack-swift on OFTC. Tim Burke Swift PTL -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Wed Jan 26 21:16:50 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 26 Jan 2022 18:16:50 -0300 Subject: [cinder] Bug deputy report for week of 01-26-2022 Message-ID: This is a bug report from 01-19-2022 to 01-26-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- The cinder bug meeting will return on Feb 16th. If you have an urgent bug, please bring it to the main cinder meeting! Medium - https://bugs.launchpad.net/cinderlib/+bug/1958159 "stable/ussuri: cinderlib-lvm-functional job is failing." Fix proposed to stable/ussuri and master. Assigned to Takashi Kajinami. - https://bugs.launchpad.net/cinder/+bug/1958632 "SVC storwize regression after fixing 1913363." Unassigned. Incompleted - https://bugs.launchpad.net/cinder/+bug/1958845 "Rbd: possibility of data loss during revert-to-snap." Fix proposed to master. Cheers -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From james at openstack.org Wed Jan 26 21:44:22 2022 From: james at openstack.org (James Cole) Date: Wed, 26 Jan 2022 15:44:22 -0600 Subject: [Skyline] Project mascot to review Message-ID: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> Greetings everyone, I?m James, a designer with the OpenInfra Foundation. We have a been working on a new project mascot for Skyline and wanted to share a couple of options with you. The reference we were provided alluded to a nine-color-deer, so we created a deer illustration in the style of the other project mascots. The two options are essentially the same, but one is white with spots and the other is yellow with spots. Let me know if you like one or the other?or if we?re totally off on the theme?and we will add the text to the logo and finalize the assets. There is a PDF attached to this message or you can view the document by visiting this link . Thank you! -James -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 20220120 Mascots Project.pdf Type: application/pdf Size: 83953 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Thu Jan 27 00:03:38 2022 From: jungleboyj at gmail.com (Jay Bryant) Date: Wed, 26 Jan 2022 18:03:38 -0600 Subject: [Skyline] Project mascot to review In-Reply-To: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> References: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> Message-ID: <0398f27a-c808-2811-df9e-31bf0127cdd4@gmail.com> I On 1/26/2022 3:44 PM, James Cole wrote: > Subject: > [Skyline] Project mascot to review > From: > James Cole > Date: > 1/26/2022, 3:44 PM > > To: > openstack-discuss at lists.openstack.org > > > Greetings everyone, > > I?m James, a designer with the OpenInfra Foundation. We have a been > working on a new project mascot for Skyline and wanted to share a > couple of options with you. > > The reference we were provided alluded to a?nine-color-deer, so we > created a deer illustration in the style of the other project mascots. > The two options are essentially the same, but one is white with spots > and the other is yellow with spots. Let me know if you like one or the > other?or if we?re totally off on the theme?and we will add?the text to > the logo and finalize the assets. > James, I am partial to the white one, especially if it is supposed to be a deer.? The yellow one looks more like a giraffe to me. Thanks! Jay > There is a PDF attached to this message or you can view the document > by visiting this link > . > > > Thank you! > > -James > > > Attachments: > > 20220120 Mascots Project.pdf 82.0 KB > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Thu Jan 27 00:29:35 2022 From: amy at demarco.com (Amy Marrich) Date: Wed, 26 Jan 2022 18:29:35 -0600 Subject: [Skyline] Project mascot to review In-Reply-To: <0398f27a-c808-2811-df9e-31bf0127cdd4@gmail.com> References: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> <0398f27a-c808-2811-df9e-31bf0127cdd4@gmail.com> Message-ID: I'm with Jay on this but look forward to reading what the Skyline team thinks. Amy On Wed, Jan 26, 2022 at 6:06 PM Jay Bryant wrote: > I > On 1/26/2022 3:44 PM, James Cole wrote: > > Subject: > [Skyline] Project mascot to review > > From: > James Cole > > Date: > 1/26/2022, 3:44 PM > > To: > openstack-discuss at lists.openstack.org > Greetings everyone, > > I?m James, a designer with the OpenInfra Foundation. We have a been > working on a new project mascot for Skyline and wanted to share a couple of > options with you. > > The reference we were provided alluded to a nine-color-deer, so we created > a deer illustration in the style of the other project mascots. The two > options are essentially the same, but one is white with spots and the other > is yellow with spots. Let me know if you like one or the other?or if we?re > totally off on the theme?and we will add the text to the logo and finalize > the assets. > > James, > > I am partial to the white one, especially if it is supposed to be a deer. > The yellow one looks more like a giraffe to me. > > Thanks! > > Jay > > > There is a PDF attached to this message or you can view the document by visiting > this link > > . > > Thank you! > > -James > > > Attachments: > 20220120 Mascots Project.pdf 82.0 KB > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jan 27 01:24:38 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 26 Jan 2022 19:24:38 -0600 Subject: [all][tc] Technical Committee next weekly meeting on Jan 27th at 1500 UTC In-Reply-To: <17e8e195dd4.1181e0a8f53874.3883263648236970911@ghanshyammann.com> References: <17e8e195dd4.1181e0a8f53874.3883263648236970911@ghanshyammann.com> Message-ID: <17e99227259.da8edbf0212381.7652655453033481910@ghanshyammann.com> Hello Everyone, Below is the agenda for Tomorrow's TC IRC meeting schedule at 1500 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting == Agenda for tomorrow's TC meeting == * Roll call * Follow up on past action items * Gate health check ** Fixing Zuul config error in OpenStack *** https://etherpad.opendev.org/p/zuul-config-error-openstack * Z Release Cycle Name ** It is needed for Release Management team to this week's task "Plan the next release cycle schedule" (elodilles) * Z cycle Technical Elections ** Need two volunteers to run the election. * Adjutant need PTLs and maintainers ** http://lists.openstack.org/pipermail/openstack-discuss/2021-October/025555.html * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open * Renaming all `master` branches on all repos to `main` ** lourot: just wanted to kick off a discussion about this -gmann ---- On Mon, 24 Jan 2022 15:58:54 -0600 Ghanshyam Mann wrote ---- > Hello Everyone, > > Technical Committee's next weekly meeting is scheduled for Jan 27th at 1500 UTC. > > If you would like to add topics for discussion, please add them to the below wiki page by > Wednesday, Jan 26th, at 2100 UTC. > > https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting > > -gmann > > From wu.wenxiang at 99cloud.net Thu Jan 27 03:59:15 2022 From: wu.wenxiang at 99cloud.net (=?UTF-8?B?5ZC05paH55u4?=) Date: Thu, 27 Jan 2022 11:59:15 +0800 Subject: [Skyline] Project mascot to review In-Reply-To: References: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> <0398f27a-c808-2811-df9e-31bf0127cdd4@gmail.com> Message-ID: <89E63267-FDFA-45A6-8A4B-4A125F7E7A29@99cloud.net> Hello, Skyline dev team prefer ?White deer? :-) BTW, the original "Dunhuang Nine Color King Deer" is also a white deer. Thanks Best Regards Wenxiang Wu From: on behalf of Amy Marrich Date: Thursday, January 27, 2022 at 8:30 AM To: , James Cole Cc: Subject: Re: [Skyline] Project mascot to review I'm with Jay on this but look forward to reading what the Skyline team thinks. Amy On Wed, Jan 26, 2022 at 6:06 PM Jay Bryant wrote: I On 1/26/2022 3:44 PM, James Cole wrote: Subject: [Skyline] Project mascot to review From: James Cole Date: 1/26/2022, 3:44 PM To: openstack-discuss at lists.openstack.org Greetings everyone, I?m James, a designer with the OpenInfra Foundation. We have a been working on a new project mascot for Skyline and wanted to share a couple of options with you. The reference we were provided alluded to a nine-color-deer, so we created a deer illustration in the style of the other project mascots. The two options are essentially the same, but one is white with spots and the other is yellow with spots. Let me know if you like one or the other?or if we?re totally off on the theme?and we will add the text to the logo and finalize the assets. James, I am partial to the white one, especially if it is supposed to be a deer. The yellow one looks more like a giraffe to me. Thanks! Jay There is a PDF attached to this message or you can view the document by visiting this link. Thank you! -James Attachments: 20220120 Mascots Project.pdf82.0 KB -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Jan 27 08:30:38 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 27 Jan 2022 09:30:38 +0100 Subject: [Skyline] Project mascot to review In-Reply-To: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> References: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> Message-ID: Wow! First of all, it's beautiful! Great job! Like all the others so far, I also strongly prefer the white one. -yoctozepto On Thu, 27 Jan 2022 at 00:56, James Cole wrote: > > Greetings everyone, > > I?m James, a designer with the OpenInfra Foundation. We have a been working on a new project mascot for Skyline and wanted to share a couple of options with you. > > The reference we were provided alluded to a nine-color-deer, so we created a deer illustration in the style of the other project mascots. The two options are essentially the same, but one is white with spots and the other is yellow with spots. Let me know if you like one or the other?or if we?re totally off on the theme?and we will add the text to the logo and finalize the assets. > > There is a PDF attached to this message or you can view the document by visiting this link. > > Thank you! > > -James > From katonalala at gmail.com Thu Jan 27 08:42:25 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 27 Jan 2022 09:42:25 +0100 Subject: Neutron issue || Remote error: TimeoutError QueuePool limit of size 5 overflow 50 reached In-Reply-To: References: <4248952.ejJDZkT8p0@p1> Message-ID: Hi, Yes the default value for rpc_state_report_workers is 1. There are some basic hint for these values in the config: *api_workers*: https://docs.openstack.org/neutron/latest/configuration/neutron.html#DEFAULT.api_workers : "Number of separate API worker processes for service. If not specified, the *default is equal to the number of CPUs available for best performance, capped by potential RAM usage.*" *rpc_workers*: https://docs.openstack.org/neutron/latest/configuration/neutron.html#DEFAULT.rpc_workers : "Number of RPC worker processes for service. If not specified, the *default is equal to half the number of API workers.*" I think this can be useful to start playing with these settings, and find which combination is best for your environment. Regards Lajos Md. Hejbul Tawhid MUNNA ezt ?rta (id?pont: 2022. jan. 26., Sze, 6:56): > Hello Arnaud, > > Thank you so much for your valuable feedback. > > As per our neutron.conf file, #rpc_workers = 1 and #rpc_state_report_workers > = 1 , is it default value neutron taken? > > What is the procedure to increase the value? remove # and increase the > worker from 1 to 2 and restart neutron-server service? > > Regards, > Munna > > > > On Tue, Jan 25, 2022 at 6:26 PM Arnaud wrote: > >> Hi, >> >> I am not an db expert, but openstack tends to open a LOT of connection to >> the db. So one of the thing to monitor is the number of connection you >> have/allow on the db side. >> >> Also, raising the number of RPC (and report states) workers will solve >> your issue. >> The good number is not easy to calculate, and depends on each deployment. >> >> A good approach is to the try/improve loop. >> >> Cheers, >> Arnaud. >> >> >> Le 25 janvier 2022 09:52:26 GMT+01:00, "Md. Hejbul Tawhid MUNNA" < >> munnaeebd at gmail.com> a ?crit : >>> >>> Hello Arnaud, >>> >>> Thank you for your valuable reply. >>> >>> we did not modify default config of RPC worker . >>> >>> /etc/neutron/neutron.conf >>> >>> # Number of separate API worker processes for service. If not specified, >>> the >>> # default is equal to the number of CPUs available for best performance. >>> # (integer value) >>> #api_workers = >>> >>> # Number of RPC worker processes for service. (integer value) >>> #rpc_workers = 1 >>> >>> # Number of RPC worker processes dedicated to state reports queue. >>> (integer >>> # value) >>> #rpc_state_report_workers = 1 >>> >>> how to check load on database. RAM/CPU/Disk-IO utilization is low on the >>> database server. >>> >>> Please guide us further >>> >>> Regards, >>> Munna >>> >>> On Mon, Jan 24, 2022 at 6:29 PM Arnaud wrote: >>> >>>> Hi, >>>> >>>> I would also consider checking the number of RPC workers you have in >>>> neutron.conf, this is maybe a better option to increase this before the >>>> comnection pool params. >>>> >>>> Also, check your database, is it under load? >>>> Updating agent state should not be long. >>>> >>>> Cheers, >>>> Arnaud >>>> >>>> >>>> >>>> Le 24 janvier 2022 10:42:00 GMT+01:00, "Md. Hejbul Tawhid MUNNA" < >>>> munnaeebd at gmail.com> a ?crit : >>>>> >>>>> Hi, >>>>> >>>>> Currently we have running 500+VM and total network is 383 including >>>>> HA-network. >>>>> >>>>> Can you advice the appropriate value and is there any chance of >>>>> service impact? >>>>> >>>>> Should we change the configuration in the neutron.conf on controller >>>>> node? >>>>> >>>>> Regards, >>>>> Munna >>>>> >>>>> >>>>> >>>>> On Mon, Jan 24, 2022 at 2:47 PM Slawek Kaplonski >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> On poniedzia?ek, 24 stycznia 2022 09:09:10 CET Md. Hejbul Tawhid >>>>>> MUNNA wrote: >>>>>> > Hi, >>>>>> > >>>>>> > Suddenly we have observed few VM down . then we have found some >>>>>> agent are >>>>>> > getting down (XXX) , agents are getting UP and down randomly. >>>>>> Please check >>>>>> > the attachment. >>>>>> > >>>>>> > >>>>>> >>>>>> ///////////////////////////////////////////////////////////////////////////// >>>>>> > /////////////////////// /sqlalchemy/pool.py", line 788, in >>>>>> _checkout\n >>>>>> > fairy = >>>>>> > _ConnectionRecord.checkout(pool)\n', u' File >>>>>> > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 532, >>>>>> in >>>>>> > checkout\n rec = pool._do_get()\n', u' File >>>>>> > "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1186, >>>>>> in >>>>>> > _do_get\n (self.size(), self.overflow(), self._timeout), >>>>>> > code="3o7r")\n', u'TimeoutError: QueuePool limit of size 5 overflow >>>>>> 50 >>>>>> > reached, connection timed out, timeout 30 (Background on this error >>>>>> at: >>>>>> > http://sqlalche.me/e/3o7r)\n']. >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>>> Traceback (most >>>>>> > recent call last): >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>>> > "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line >>>>>> 837, in >>>>>> > _report_state >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>>> True) >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>>> > "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 97, in >>>>>> > report_state >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>>> return >>>>>> > method(context, 'report_state', **kwargs) >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>>> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", >>>>>> line 179, >>>>>> > in call >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>>> > retry=self.retry) >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>>> > "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", >>>>>> line 133, >>>>>> > in _send >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>>> retry=retry) >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>>> > >>>>>> "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>>>>> > line 645, in send >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>>> > call_monitor_timeout, retry=retry) >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent File >>>>>> > >>>>>> "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>>>>> > line 636, in _send >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>>> raise result >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>>> RemoteError: >>>>>> > Remote error: TimeoutError QueuePool limit of size 5 overflow 50 >>>>>> reached, >>>>>> > connection timed out, timeout 30 (Background on this error at: >>>>>> > http://sqlalche.me/e/3o7r) >>>>>> > 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent >>>>>> [u'Traceback >>>>>> > (most recent call last):\n', u' File >>>>>> > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", >>>>>> line 163, >>>>>> > in _process_incoming\n res = >>>>>> self.dispatcher.dispatch(message)\n', u' >>>>>> > File >>>>>> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", >>>>>> > line 265, in dispatch\n return self._do_dispatch(endpoint, >>>>>> method, ctxt, >>>>>> > args)\n', u' File >>>>>> > >>>>>> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line >>>>>> > 194, in _do_dispatch\n >>>>>> > >>>>>> > >>>>>> >>>>>> ///////////////////////////////////////////////////////////////////////////// >>>>>> > //// >>>>>> > >>>>>> > Is there anything related with the following default configuration. >>>>>> > >>>>>> > /etc/neutron/neutron.conf >>>>>> > #max_pool_size = 5 >>>>>> > #max_overflow = 50 >>>>>> >>>>>> Yes. You probably have busy environment and You need to increase >>>>>> those values >>>>>> to have more connections from the neutron server to the database. >>>>>> >>>>>> > >>>>>> > regards, >>>>>> > Munna >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Slawek Kaplonski >>>>>> Principal Software Engineer >>>>>> Red Hat >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Thu Jan 27 09:47:06 2022 From: marios at redhat.com (Marios Andreou) Date: Thu, 27 Jan 2022 11:47:06 +0200 Subject: [TripleO] gate blocker centos-9 please hold rechecks Message-ID: o/ Hi folks please hold rechecks especially if you see failing centos9 jobs (should be all of master tripleo repos now) we have a gate blocker there https://bugs.launchpad.net/tripleo/+bug/1959181 thank you, marios (on behalf of tripleo-ci team - #oooq and #tripleo channels) From marios at redhat.com Thu Jan 27 11:59:12 2022 From: marios at redhat.com (Marios Andreou) Date: Thu, 27 Jan 2022 13:59:12 +0200 Subject: [TripleO] gate blocker centos-9 please hold rechecks In-Reply-To: References: Message-ID: should be clear now - ok to start recheck mirrors should be synced Alfredo patch at https://review.opendev.org/c/zuul/zuul-jobs/+/826603 to add AFS mirror will help us in future thank you! On Thu, Jan 27, 2022 at 11:47 AM Marios Andreou wrote: > > o/ Hi folks > > please hold rechecks especially if you see failing centos9 jobs > (should be all of master tripleo repos now) > > we have a gate blocker there https://bugs.launchpad.net/tripleo/+bug/1959181 > > thank you, marios > (on behalf of tripleo-ci team - #oooq and #tripleo channels) From victoria at vmartinezdelacruz.com Thu Jan 27 12:22:03 2022 From: victoria at vmartinezdelacruz.com (=?UTF-8?Q?Victoria_Mart=C3=ADnez_de_la_Cruz?=) Date: Thu, 27 Jan 2022 13:22:03 +0100 Subject: [manila][cinder][glance][nova] Pop-up team for design and development of a Cephadm DevStack plugin In-Reply-To: References: Message-ID: Hi everyone, Just a reminder about this meeting, which will be next Friday at 3pm UTC in #openstack-qa at OFTC. Also, first patch is submitted in [0] Thanks, Victoria [0] https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/826484 On Fri, Jan 21, 2022 at 11:58 AM Victoria Mart?nez de la Cruz < victoria at vmartinezdelacruz.com> wrote: > Thanks everybody for your responses! > > I'll send an initial path with a macro to pick between implementations and > we can go from there. > > Let's have a meeting next Friday at 3pm UTC in #openstack-qa at OFTC. > Would that work for you? > > Thanks, > > Victoria > > On Thu, Jan 20, 2022 at 2:04 PM Sofia Enriquez > wrote: > >> Sounds good. I wanna join . >> I haven't tried cephadm yet but It would help us to make ceph new >> features more transparent to Cinder in the future. >> Thanks >> > > > ++ > > >> >> >> On Thu, Jan 20, 2022 at 4:45 AM Francesco Pantano >> wrote: >> >>> Hi Victoria, >>> thanks for starting this thread. >>> >>> On Wed, Jan 19, 2022 at 2:03 PM Sean Mooney wrote: >>> >>>> On Wed, 2022-01-19 at 12:04 +0100, Victoria Mart?nez de la Cruz wrote: >>>> > Hi all, >>>> > >>>> > I'm reaching out to you to let you know that we will start the design >>>> and >>>> > development of a Cephadm DevStack plugin. >>>> > >>>> > Some of the reasons on why we want to take this approach: >>>> > >>>> > - devstack-plugin-ceph worked for us for a lot of years, but the >>>> > development of it relies on several hacks to adapt to the different >>>> Ceph >>>> > versions we use and the different distros we support. This led to a >>>> > monolithic script that sometimes is hard to debug and break our >>>> development >>>> > environments and our CI >>>> > - cephadm is the deployment tool developed and maintained by the Ceph >>>> > community, it allows their users to get specific Ceph versions very >>>> easily >>>> > and enforces good practices for Ceph clusters. From their docs, >>>> "Cephadm >>>> > manages the full lifecycle of a Ceph cluster. It starts by >>>> bootstrapping a >>>> > tiny Ceph cluster on a single node (one monitor and one manager) and >>>> then >>>> > uses the orchestration interface (?day 2? commands) to expand the >>>> cluster >>>> > to include all hosts and to provision all Ceph daemons and services. >>>> [0]" >>>> > - OpenStack deployment tools are starting to use cephadm as their way >>>> to >>>> > deploy Ceph, so it would be nice to include cephadm in our development >>>> > process to be closer with what is being done in the field >>>> > >>>> > I started the development of this in [1], but it might be better to >>>> change >>>> > devstack-plugin-ceph to do this instead of having a new plugin. This >>>> is >>>> > something I would love to discuss in a first meeting. >>>> i would advocate for pivoting devstack-plugin-ceph. >>>> i dont think we have the capsity as a comunity to devleop, maintaine >>>> and debug/support >>>> 2 differnt ways of deploying ceph in our ci system in the long term. >>>> >>>> > ++ let's pivot devstack-plugin-ceph > > to me the way devstack-plugin-ceph install cpeh is jsut an implementaion >>>> detail. >>>> its contract is that it will install and configure ceph for use with >>>> openstack. >>>> if you make it use cephadm for that its just and internal detail that >>>> should not >>>> affect the consomes of the plugin provide you maintain the interface to >>>> the devstack pluging >>>> mostly the same. >>>> >>> Starting with pacific the deployment of Ceph is moved from ceph-ansible >>> to cephadm: the implication of this change it's not just >>> on the deployment side but this new component (which interacts with the >>> ceph orchestrator module) is able to maintain the lifecycle >>> of the deployed containers, so I'd say the new approach it's not just an >>> implementation detail but also changes the way some components >>> interact with Ceph. >>> Manila using ganesha, for instance, it's the first component that should >>> start using the orchestrator interface, so I guess it's worth >>> aligning (and extending) the most popular dev installer to support the >>> new way (like other projects already did). >>> >>> >>> >>>> >>>> i would suggest addign a devstack macro initally to choose the backend >>>> but then eventually >>>> once the cephadm appoch is stable just swap the default. >>>> >>> +1 on choosing the backend and plan the switch when the cephadm approach >>> is ready and works for all the openstack components >>> >>>> >>>> > >>>> > Having said this, I propose using the channel #openstack-cephadm in >>>> the >>>> > OFTC network to talk about this and set up a first meeting with people >>>> > interested in contributing to this effort. >>>> ack im not sure i will get involed with this but the other option woudl >>>> be to >>>> just use #openstack-qa since that is the chanlle for devstack >>>> development. >>>> >>> >>> Either #openstack-qa or a dedicated one works well , maybe #openstack-qa >>> is useful to reach more people >>> who can help / review the relevant changes .. wdyt >>> >> > ++ let's meet in #openstack-qa > > >> >>>> > >>>> > Thanks, >>>> > >>>> > Victoria >>>> > >>>> > [0] https://docs.ceph.com/en/pacific/cephadm/ >>>> > [1] https://github.com/vkmc/devstack-plugin-cephadm >>>> >>>> >>>> >>> >>> -- >>> Francesco Pantano >>> GPG KEY: F41BD75C >>> >> >> >> -- >> >> Sof?a Enriquez >> >> she/her >> >> Software Engineer >> >> Red Hat PnT >> >> IRC: @enriquetaso >> @RedHat Red Hat >> Red Hat >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Thu Jan 27 14:12:42 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Thu, 27 Jan 2022 23:12:42 +0900 Subject: [rbac][keystone][kolla][osa][tripleo][charms] RBAC in Yoga for deployment projects In-Reply-To: References: <17e731cbfcc.e2463d54885770.5926410035050303741@ghanshyammann.com> <17e78ca4f43.11df902c0973397.2476553495581610085@ghanshyammann.com> <17e790f0a61.fa031a2a976064.4934623486848755469@ghanshyammann.com> <17e7f0f14fd.114ff118752948.3643237598290287986@ghanshyammann.com> Message-ID: On Mon, Jan 24, 2022 at 4:50 PM Takashi Kajinami wrote: > > > > > On Sat, Jan 22, 2022 at 8:57 AM Ghanshyam Mann > wrote: > >> ---- On Thu, 20 Jan 2022 14:41:00 -0600 Mark Goddard >> wrote ---- >> > On Thu, 20 Jan 2022 at 19:55, Ghanshyam Mann >> wrote: >> > > >> > > ---- On Thu, 20 Jan 2022 13:36:53 -0600 Mark Goddard < >> mark at stackhpc.com> wrote ---- >> > > > On Thu, 20 Jan 2022 at 18:40, Ghanshyam Mann < >> gmann at ghanshyammann.com> wrote: >> > > > > >> > > > > >> > > > > ---- On Thu, 20 Jan 2022 03:35:33 -0600 Mark Goddard < >> mark at stackhpc.com> wrote ---- >> > > > > > On Wed, 19 Jan 2022 at 16:12, Ghanshyam Mann < >> gmann at ghanshyammann.com> wrote: >> > > > > > > >> > > > > > > ---- On Wed, 19 Jan 2022 04:35:53 -0600 Mark Goddard < >> mark at stackhpc.com> wrote ---- >> > > > > > > > Hi, >> > > > > > > > >> > > > > > > > If you haven't been paying close attention, it would be >> easy to miss >> > > > > > > > some of the upcoming RBAC changes which will have an >> impact on >> > > > > > > > deployment projects. I thought I'd start a thread so >> that we can share >> > > > > > > > how we are approaching this, get answers to open >> questions, and >> > > > > > > > ideally all end up with a fairly consistent approach. >> > > > > > > > >> > > > > > > > The secure RBAC work has a long history, and continues >> to evolve. >> > > > > > > > According to [1], we should start to see some fairly >> substantial >> > > > > > > > changes over the next few releases. That spec is fairly >> long, but >> > > > > > > > worth a read. >> > > > > > > > >> > > > > > > > In the yoga timeline [2], there is one change in >> particular that has >> > > > > > > > an impact on deployment projects, "3. Keystone enforces >> scope by >> > > > > > > > default". After this change, all of the deprecated >> policies that many >> > > > > > > > still rely on in Keystone will be removed. >> > > > > > > > >> > > > > > > > In kolla-ansible, we have an etherpad [5] with some >> notes, questions >> > > > > > > > and half-baked plans. We made some changes in Xena [3] >> to use system >> > > > > > > > scope in some places when interacting with system APIs >> in Ansible >> > > > > > > > tasks. >> > > > > > > > >> > > > > > > > The next change we have staged is to add the service >> role to all >> > > > > > > > service users [4], in preparation for [2]. >> > > > > > > > >> > > > > > > > Question: should the role be added with system scope or >> in the >> > > > > > > > existing service project? The obvious main use for this >> is token >> > > > > > > > validation, which seems to allow system or project >> scope. >> > > > > > > > >> > > > > > > > We anticipate that some service users may still require >> some >> > > > > > > > project-scoped roles, e.g. when creating resources for >> octavia. We'll >> > > > > > > > deal with those on a case by case basis. >> > > > > > > >> > > > > > > Service roles are planned for phase2 which is Z >> release[1]. The Idea here is >> > > > > > > service to service communication will happen with >> 'service' role (which keystone >> > > > > > > need to implement yet) and end users will keep using the >> what ever role >> > > > > > > is default (or overridden in policy file) which can be >> project or system scoped >> > > > > > > depends on the APIs. >> > > > > > > >> > > > > > > So at the end service-service APIs policy default will >> looks like >> > > > > > > >> > > > > > > '(role:admin and system:network and >> project_id:%(project_id)s) or (role:service and project_name:service)' >> > > > > > > >> > > > > > > Say nova will use that service role to communicate to >> cinder and cinder policy will pass >> > > > > > > as service role is in OR in default policy. >> > > > > > > >> > > > > > > But let's see how they are going to be and if any >> challenges when we will implement >> > > > > > > it in Z cycle. >> > > > > > >> > > > > > I'm not 100% on our reasoning for using the service role in >> yoga (I >> > > > > > wasn't in the discussion when we made the switch, although >> John >> > > > > > Garbutt was), although I can provide at least one reason. >> > > > > > >> > > > > > Currently, we have a bunch of service users doing things >> like keystone >> > > > > > token validation using the admin role in the service >> project. If we >> > > > > > enforce scopes & new defaults in keystone, this will no >> longer work, >> > > > > > due to the default policy: >> > > > > > >> > > > > > identity:validate_token: (role:reader and system_scope:all) >> or >> > > > > > rule:service_role or rule:token_subject >> > > > > > >> > > > > > Now we could go and assign system-reader to all these users, >> but if >> > > > > > the end goal is to give them all the service role, and that >> allows >> > > > > > token validation, then to me that seems like a better path. >> > > > > > >> > > > > > Currently, we're creating the service role during deploy & >> upgrade, >> > > > > > then assigning it to users. Keystone is supposed to create >> the service >> > > > > > role in yoga, so we can eventually drop that part. >> > > > > > >> > > > > > Does this seem reasonable? Is keystone still on track to >> create the >> > > > > > service role in yoga? >> > > > > >> > > > > I think this is a reasonable plan and once we have service >> roles implemented >> > > > > in keystone as well as in all the services to request other >> service APIs then >> > > > > deployment project (Kolla here) can update them from >> system_reader to >> > > > > actual service role. >> > > > >> > > > To be clear, I am proposing to skip system-reader, and go >> straight to >> > > > the service role in yoga. >> > > >> > > But that would not be doable until services implement service roles >> which is >> > > Yoga cycle target for keystone and Z cyle target for other projects. >> Or you mean >> > > to re-consider to target the service role for all projects also in >> Yoga so that >> > > deployment projects can go with service role directly? >> > >> > Our current plan is to add the service role to all service users in >> > yoga. This will allow keystone token validation to work when keystone >> > drops the deprecated policies. >> > >> > We will not remove the admin role from service users in the service >> > project during yoga. This will allow projects other than keystone to >> > continue to work as before. >> > >> > At some later point, we will remove the admin role from service users >> > in the service project, hopefully relying on the service role for most >> > service-service communication. There may be other roles we need to >> > assign in order to drop admin, but we'll assess that as we go. >> > >> > Hopefully that's a bit more of a clear picture, and it seems sensible? >> >> +1, sounds good to me. Hopefully we will get in better shape by Z release >> when all (or maximum) services will be migrated to new RBAC. Till than >> your plan sounds reasonable. >> >> -gmann >> > > I'll follow the same approach in Puppet OpenStack and will add the > project-scoped 'service' role > to each service user by default. IIUC This is consistent with the current > devstack which assigns > the project-scoped service role to each service user, so I expect this > approach will be tested > in dsvm jobs [1]. > [1] > https://github.com/openstack/devstack/blob/d5d0bed479497560489983ae1fc80444b44fe029/lib/keystone#L421 > > The same was already implemented in TripleO by [2] > [2] > https://review.opendev.org/c/openstack/tripleo-heat-templates/+/819250 > > I've spent some time going through all the keystone credentials managed by puppet modules and I recorded my observations in my working note. https://etherpad.opendev.org/p/puppet-secure-rbac#L122 If my observation is correct, credentials in the following sections/services are used to access APIs which are not allowed for the service role and require an additional privilege like system-reader when Keystone is running with only new policies and scope enforcement. glance [oslo_limit] This calls get limits API to obtain the limit for the project where a resource is being created. This requires a system-reader. nova [keystone] This calls get project API to verify the project id passed in flavor access or quota sets. This operation requires a system-reader. swift [s3api] This calls get EC2 credential API to cache credentials for the request user. This requires a system-reader. swift [ceilometer] This calls list project API when ignore_projects is set, to look up these projects. This requires a system-reader. >> > >> > > >> > > -gmann >> > > >> > > > >> > > > > >> > > > > And yes that can be done for token validation as well as >> > > > > the service-to-service API calls for example nova to cinder or >> neutron to nova >> > > > > APIs call. I do not think we can migrate everything (service >> tokens) together for all >> > > > > the services in deployment projects until all these services >> are ready with the 'service' >> > > > > role implementation (implementation means changing their >> default roles >> > > > > to add 'service' role for service-to-service APIs). >> > > > > >> > > > > Regarding the keystone track on service role work in Yoga or >> not, I do not >> > > > > have clear answer may be Lance or keystone team can answer it. >> But Lance >> > > > > has spec up[1] but not yet merged. >> > > > > >> > > > > [1] >> https://review.opendev.org/c/openstack/keystone-specs/+/818616 >> > > > > >> > > > > -gmann >> > > > > >> > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > > In anticipation of keystone setting enforce_scope=True >> and removing >> > > > > > > > old default policies (which I assume effectively removes >> > > > > > > > enforce_new_defaults?), we will set this in >> kolla-ansible, and try to >> > > > > > > > deal with any fallout. Hopefully the previous work will >> make this >> > > > > > > > minimal. >> > > > > > > > >> > > > > > > > How does that line up with other projects' approaches? >> What have we missed? >> > > > > > > >> > > > > > > Yeah, we want users/deployment projects/horizon etc to use >> the new policy from >> > > > > > > keystone as first and we will see feedback how they are >> (good, bad, really bad) from >> > > > > > > usage perspective. Why we choose keystone is, because new >> policy are there since >> > > > > > > many cycle and ready to use. Other projects needs to work >> their policy as per new >> > > > > > > SRBAC design/direction (for example nova needs to modify >> their policy before we ask >> > > > > > > users to use new policy and work is under progress[2]). >> > > > > > > >> > > > > > > I think trying in kolla will be good way to know if we can >> move to keystone's new policy >> > > > > > > completely in yoga. >> > > > > > >> > > > > > We have a scope-enforcing preview patch [1], and it's >> passing our base >> > > > > > set of tests. I have another that triggers all of the jobs. >> > > > > > >> > > > > > [1] >> https://review.opendev.org/c/openstack/kolla-ansible/+/825406 >> > > > > > > >> > > > > > > [1] >> https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#z-release-timeline >> > > > > > > [2] >> https://blueprints.launchpad.net/nova/+spec/policy-defaults-refresh-2 >> > > > > > > >> > > > > > > -gmann >> > > > > > > >> > > > > > > > >> > > > > > > > Mark >> > > > > > > > >> > > > > > > > [1] >> https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst >> > > > > > > > [2] >> https://opendev.org/openstack/governance/src/branch/master/goals/selected/consistent-and-secure-rbac.rst#yoga-timeline-7th-mar-2022 >> > > > > > > > [3] >> https://opendev.org/openstack/kolla-ansible/commit/2e933dceb591c3505f35c2c1de924f3978fb81a7 >> > > > > > > > [4] >> https://review.opendev.org/c/openstack/kolla-ansible/+/815577 >> > > > > > > > [5] >> https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible >> > > > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > >> > > > >> > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Jan 27 14:44:18 2022 From: smooney at redhat.com (Sean Mooney) Date: Thu, 27 Jan 2022 14:44:18 +0000 Subject: [Skyline] Project mascot to review In-Reply-To: References: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> Message-ID: <7530ed0a03c6e3ac8f5980c7156608b5bcebdfc5.camel@redhat.com> On Thu, 2022-01-27 at 09:30 +0100, Rados?aw Piliszek wrote: > Wow! First of all, it's beautiful! > Great job! > > Like all the others so far, I also strongly prefer the white one. what what its worth i think the white deer is nicer too but i slightly perfer the triangel patter form the yellow/gold one since its non overlapping. if you we too keep the white coleration and grey shadign but the coloured triangle pattern form the other one i think it would be slightly nicer but tey both look good. > > -yoctozepto > > On Thu, 27 Jan 2022 at 00:56, James Cole wrote: > > > > Greetings everyone, > > > > I?m James, a designer with the OpenInfra Foundation. We have a been working on a new project mascot for Skyline and wanted to share a couple of options with you. > > > > The reference we were provided alluded to a nine-color-deer, so we created a deer illustration in the style of the other project mascots. The two options are essentially the same, but one is white with spots and the other is yellow with spots. Let me know if you like one or the other?or if we?re totally off on the theme?and we will add the text to the logo and finalize the assets. > > > > There is a PDF attached to this message or you can view the document by visiting this link. > > > > Thank you! > > > > -James > > > From jungleboyj at gmail.com Thu Jan 27 17:15:20 2022 From: jungleboyj at gmail.com (Jay Bryant) Date: Thu, 27 Jan 2022 11:15:20 -0600 Subject: [Skyline] Project mascot to review In-Reply-To: References: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> Message-ID: <0420e873-8b3d-6624-6bce-857346bec1b1@gmail.com> Yes, I forgot to mention that it is a beautiful design!? Very graceful! Jay On 1/27/2022 2:30 AM, Rados?aw Piliszek wrote: > Wow! First of all, it's beautiful! > Great job! > > Like all the others so far, I also strongly prefer the white one. > > -yoctozepto > > On Thu, 27 Jan 2022 at 00:56, James Cole wrote: >> Greetings everyone, >> >> I?m James, a designer with the OpenInfra Foundation. We have a been working on a new project mascot for Skyline and wanted to share a couple of options with you. >> >> The reference we were provided alluded to a nine-color-deer, so we created a deer illustration in the style of the other project mascots. The two options are essentially the same, but one is white with spots and the other is yellow with spots. Let me know if you like one or the other?or if we?re totally off on the theme?and we will add the text to the logo and finalize the assets. >> >> There is a PDF attached to this message or you can view the document by visiting this link. >> >> Thank you! >> >> -James >> From raubvogel at gmail.com Thu Jan 27 18:14:03 2022 From: raubvogel at gmail.com (Mauricio Tavares) Date: Thu, 27 Jan 2022 13:14:03 -0500 Subject: [Skyline] Project mascot to review In-Reply-To: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> References: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> Message-ID: IMHO the white deer looks better because a. It will still look about the same in B&W. The yellow one, on the other hand, will lose something important that defines it. 2. The triangles are more vibrant on a white background. i.e. the background is not competing for attention. On Wed, Jan 26, 2022 at 7:03 PM James Cole wrote: > > Greetings everyone, > > I?m James, a designer with the OpenInfra Foundation. We have a been working on a new project mascot for Skyline and wanted to share a couple of options with you. > > The reference we were provided alluded to a nine-color-deer, so we created a deer illustration in the style of the other project mascots. The two options are essentially the same, but one is white with spots and the other is yellow with spots. Let me know if you like one or the other?or if we?re totally off on the theme?and we will add the text to the logo and finalize the assets. > > There is a PDF attached to this message or you can view the document by visiting this link. > > Thank you! > > -James > From katonalala at gmail.com Thu Jan 27 18:45:56 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 27 Jan 2022 19:45:56 +0100 Subject: [neutron] Drivers meeting agenda - 28.01.2022 Message-ID: Hi Neutron Drivers, The agenda for tomorrow's drivers meeting is at [1]. There is no RFE proposed this week, but I would like to discuss again how to handle neutron-fwaas. To simplify things TC asked to keep neutron-fwaas under openstack/ namespace, see the discussion on #openstack-tc: https://meetings.opendev.org/irclogs/%23openstack-tc/%23openstack-tc.2022-01-24.log.html [1] https://wiki.openstack.org/wiki/Meetings/NeutronDrivers#Agenda See you at the meeting tomorrow. Lajos Katona (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangbailin at inspur.com Fri Jan 28 07:55:15 2022 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Fri, 28 Jan 2022 07:55:15 +0000 Subject: [cyborg] cancel weekly meeting in on February 4, 2022 Message-ID: Hi all, February 1~ February 6, 2022 are the Spring Festival of China, so we will cancel next weekly meeting, if you have some questions you can ping us by email. best regard brinzhang -------------- next part -------------- An HTML attachment was scrubbed... URL: From will at stackhpc.com Fri Jan 28 09:26:16 2022 From: will at stackhpc.com (William Szumski) Date: Fri, 28 Jan 2022 09:26:16 +0000 Subject: [requirements] The version of zeroconf in upper constraints no longer supports python3.6 Message-ID: The zeroconf package seems to have dropped support for python3.6 since 0.38.0, see: https://github.com/jstasiak/python-zeroconf#0380. We currently have 0.38.1 in upper-constraints. My understanding was that the yoga release would still support python3.6. This is important for RHEL8 based distributions that only ship python3.6. Do we need to constrain zeroconf to a version <=0.37.0? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Fri Jan 28 10:27:08 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Fri, 28 Jan 2022 11:27:08 +0100 Subject: [requirements] The version of zeroconf in upper constraints no longer supports python3.6 In-Reply-To: References: Message-ID: Hi William: RHEL8 provides support for python3.8, if you are manually deploying OpenStack. In case of using OSP, the next version, 17, will be delivered in RHEL9, where the native version is python3.9. In any case, the OSP installation is provided using containers with the needed requirements, including the python binary. Regards. On Fri, Jan 28, 2022 at 10:36 AM William Szumski wrote: > The zeroconf package seems to have dropped support for python3.6 since > 0.38.0, see: https://github.com/jstasiak/python-zeroconf#0380. We > currently have 0.38.1 in upper-constraints. My understanding was that the > yoga release would still support python3.6. This is important for RHEL8 > based distributions that only ship python3.6. Do we need to constrain > zeroconf to a version <=0.37.0? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Fri Jan 28 10:32:56 2022 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 28 Jan 2022 10:32:56 +0000 Subject: [requirements] The version of zeroconf in upper constraints no longer supports python3.6 In-Reply-To: References: Message-ID: On Fri, 28 Jan 2022 at 09:30, William Szumski wrote: > > The zeroconf package seems to have dropped support for python3.6 since 0.38.0, see: https://github.com/jstasiak/python-zeroconf#0380. We currently have 0.38.1 in upper-constraints. My understanding was that the yoga release would still support python3.6. This is important for RHEL8 based distributions that only ship python3.6. Do we need to constrain zeroconf to a version <=0.37.0? > > It should be possible to add a Python version specific requirement to upper constraints. Mark From sbauza at redhat.com Fri Jan 28 10:49:37 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Fri, 28 Jan 2022 11:49:37 +0100 Subject: [sdk][devstack] openstacksdk-functional-devstack job failing on stable branches In-Reply-To: <4D3A881A-4EA8-457A-ACFD-48E76BED0919@gmail.com> References: <72c3aff1-1c18-24fc-9869-3e992cc518e0@gmail.com> <5787685.lOV4Wx5bFT@p1> <20220126132655.tm633dhzwdguum27@yuggoth.org> <36f75a014497335c2ccb3c31c376a341625fa62b.camel@redhat.com> <4D3A881A-4EA8-457A-ACFD-48E76BED0919@gmail.com> Message-ID: Le mer. 26 janv. 2022 ? 15:28, Artem Goncharov a ?crit : > Hi > > On 26. Jan 2022, at 14:35, Sean Mooney wrote: > > On Wed, 2022-01-26 at 13:26 +0000, Jeremy Stanley wrote: > > On 2022-01-26 09:00:05 +0100 (+0100), Slawek Kaplonski wrote: > [...] > > Those tests are available only in master branch: [1] and they not > exists in e.g. stable/xena [2]. > > I'm not sure how those jobs are defined exactly but my guess is > that on those stable branches where it fails it runs openstacksdk > from master branch. So either it should be changed and proper > stable branch of the openstacksdk should be used there or we > should skip those tests when API extension in neuron is not > present. > > [...] > > I find it intriguing that the SDK has stable branches at all. Isn't > the goal that the latest version of the SDK be able to interface > with multiple versions of OpenStack services, new and old? If we > don't test that the current SDK code works with Neutron from Xena, > then that opens it up to growing serious problems for users. > > ya that seam odd to me too. > i would expect sdk to be release independent and work simialr to tempest > where master sdk should be useable with all stable branches. > > Having the SDK or tests work out what features are expected to work > sounds like the only sensible solution. > > > Well, actually it should be working. Cause of absence of real tests this > way I can not guarantee it will work in every case, but we do our best to > keep it this way (and most of the tests are checking whether feature is > available or not). > > Honestly I have no clue why it evolved this way and therefore can?t really > comment on that. Maybe (just maybe) there was something from the Debian (or > other distro) packaging pov (stable, not-stable, etc) that somehow leaded > to this setup. Otherwise there is absolutely no possibility to limit which > version of sdk need to go into which older distro. And since i.e. ansible > and openstackclient depend on the sdk things are getting even more > interesting. Sdk in this sense is comparable with keystoneauth lib (and > actually depend on it). So once we say there are no stable branches of sdk > anymore we open even worse can of worms. > > Since we are currently anyway in a phase of big rework in sdk I would say > addressing tests that do not work on older branches can be done as well. > FWIW, we recently discussed on #openstack-nova about this [1] and the consensus was that we should try to avoid as much as possible any attempt to test stable releases of Nova against any stable branch of the SDK. I haven't seen yet any bug report, does anyone know about it ? As our conclusion was that we would refrain pulling stable branches of the SDK, we'll need to make the job non-voting on our repositories as it holds merging backports. More to come in the next hours, -Sylvain [1] https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-nova.2022-01-28.log.html#t2022-01-28T10:42:31 > Artem > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephenfin at redhat.com Fri Jan 28 12:24:54 2022 From: stephenfin at redhat.com (Stephen Finucane) Date: Fri, 28 Jan 2022 12:24:54 +0000 Subject: [nova][powervm][zvm] Time to mark powervm and zvm drivers as deprecated for removal? In-Reply-To: References: Message-ID: <717603dda17a129219a9aadabd29bd6e289c6b1a.camel@redhat.com> On Thu, 2021-11-25 at 18:45 +0000, Stephen Finucane wrote: > I've had a PR open against the pypowervm library for over a fortnight now [1] > with no activity. This has prompted me to go looking into the state of the > powervm driver and what I've found isn't great. It doesn't seem there has been > any feature development on the driver for many years, and the last change I can > see that wasn't simply of a case of "I need to do this to get tests to pass" was > over three years ago [2]. The CI also doesn't appear to have been touched in > years. > > The situation for zvm isn't any better. The last functional change to that > driver was nearly 2 years ago and I wasn't able to spot any CI running (though > it's possible I just missed this). > > This means the powervm and zvm drivers stand out from the other non-libvirt > drivers in-tree, each of which are getting at least some activity (albeit out- > of-tree for Hyper-V, from what I recall). It also begs the question: are these > drivers something we want to keep around? If we do, how are we going to maintain > them long term? If not, how aggressive should we be in removing that now dead > code? > > I'll open patches to mark both powervm and zvm as deprecated shortly so that, > assuming nothing changes in the interim, we can look to remove it early in the Z > cycle. Please respond here or on the reviews if you have concerns. Reviving this thread. Since I posted about this, jichenjc has helpfully piped up and noted that ZVM is very much alive and all the logic simply lives in a separate library [1]. However, the pypowervm PR is still stalled and it seems no one is monitoring this or cares. It also seems this is hindering multiple projects [2]. [3] and [4] will mark the driver as deprecated and move dependencies to 'extras' (meaning they're not installed by default). I suggest we merge these sooner rather than later to signal to potential users that this driver is not long for the world unless something changes. Cheers, Stephen [1] https://review.opendev.org/c/openstack/nova/+/819365 [2] https://github.com/powervm/pypowervm/pull/17#issuecomment-978122930 [3] https://review.opendev.org/c/openstack/nova/+/819366 [4] https://review.opendev.org/c/openstack/nova/+/822749 > > Stephen > > [1] https://github.com/powervm/pypowervm/pull/17 > [2] Change ID I89ad36f19672368a1f795e1f29c5af6368ccfeec > From smooney at redhat.com Fri Jan 28 13:39:22 2022 From: smooney at redhat.com (Sean Mooney) Date: Fri, 28 Jan 2022 13:39:22 +0000 Subject: [requirements] The version of zeroconf in upper constraints no longer supports python3.6 In-Reply-To: References: Message-ID: <1cdbdd8c19ffb097ee8937910ccfe911f0c813fe.camel@redhat.com> On Fri, 2022-01-28 at 10:32 +0000, Mark Goddard wrote: > On Fri, 28 Jan 2022 at 09:30, William Szumski wrote: > > > > The zeroconf package seems to have dropped support for python3.6 since 0.38.0, see: https://github.com/jstasiak/python-zeroconf#0380. We currently have 0.38.1 in upper-constraints. My understanding was that the yoga release would still support python3.6. This is important for RHEL8 based distributions that only ship python3.6. Do we need to constrain zeroconf to a version <=0.37.0? > > > > > > It should be possible to add a Python version specific requirement to > upper constraints. yes you can bascially you just list the package twice with a version sepcify i dont know if we still have example in master but we used to do this for py2.7 in the past for our devstack testing and ooo i think we do not need this but it can be done if needed for other reasons i think https://github.com/openstack/requirements/blob/master/upper-constraints.txt#L20-L21 we have an example for sphinks rather then <=3.8 i would just clamp it for 3.6 > Mark > From mthode at mthode.org Fri Jan 28 13:45:53 2022 From: mthode at mthode.org (Matthew Thode) Date: Fri, 28 Jan 2022 07:45:53 -0600 Subject: [requirements] The version of zeroconf in upper constraints no longer supports python3.6 In-Reply-To: <1cdbdd8c19ffb097ee8937910ccfe911f0c813fe.camel@redhat.com> References: <1cdbdd8c19ffb097ee8937910ccfe911f0c813fe.camel@redhat.com> Message-ID: <20220128134553.t6g2bmfcbklpsj2p@mthode.org> On 22-01-28 13:39:22, Sean Mooney wrote: > On Fri, 2022-01-28 at 10:32 +0000, Mark Goddard wrote: > > On Fri, 28 Jan 2022 at 09:30, William Szumski wrote: > > > > > > The zeroconf package seems to have dropped support for python3.6 since 0.38.0, see: https://github.com/jstasiak/python-zeroconf#0380. We currently have 0.38.1 in upper-constraints. My understanding was that the yoga release would still support python3.6. This is important for RHEL8 based distributions that only ship python3.6. Do we need to constrain zeroconf to a version <=0.37.0? > > > > > > > > > > It should be possible to add a Python version specific requirement to > > upper constraints. > yes you can bascially you just list the package twice with a version sepcify > i dont know if we still have example in master but we used to do this for py2.7 in the past > for our devstack testing and ooo i think we do not need this but it can be done if needed for other reasons i think > https://github.com/openstack/requirements/blob/master/upper-constraints.txt#L20-L21 > we have an example for sphinks > > rather then <=3.8 i would just clamp it for 3.6 > > > Mark > > > > Yes, that's how it's normally done, but they haven't officially dropped the support tag for it. https://pypi.org/project/zeroconf/0.38.1/ still says they support py35/36. So that's what reqs thinks. (uppoer-constraints.txt is a machine generated file). -- Matthew Thode From pierre at stackhpc.com Fri Jan 28 14:05:36 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Fri, 28 Jan 2022 15:05:36 +0100 Subject: [requirements] The version of zeroconf in upper constraints no longer supports python3.6 In-Reply-To: <20220128134553.t6g2bmfcbklpsj2p@mthode.org> References: <1cdbdd8c19ffb097ee8937910ccfe911f0c813fe.camel@redhat.com> <20220128134553.t6g2bmfcbklpsj2p@mthode.org> Message-ID: Not only these classifiers are still present, but they don't use python_requires, which means the latest version installs even on Python 2.7. I opened an issue on their GitHub: https://github.com/jstasiak/python-zeroconf/issues/1051 On Fri, 28 Jan 2022 at 14:50, Matthew Thode wrote: > > On 22-01-28 13:39:22, Sean Mooney wrote: > > On Fri, 2022-01-28 at 10:32 +0000, Mark Goddard wrote: > > > On Fri, 28 Jan 2022 at 09:30, William Szumski wrote: > > > > > > > > The zeroconf package seems to have dropped support for python3.6 since 0.38.0, see: https://github.com/jstasiak/python-zeroconf#0380. We currently have 0.38.1 in upper-constraints. My understanding was that the yoga release would still support python3.6. This is important for RHEL8 based distributions that only ship python3.6. Do we need to constrain zeroconf to a version <=0.37.0? > > > > > > > > > > > > > > It should be possible to add a Python version specific requirement to > > > upper constraints. > > yes you can bascially you just list the package twice with a version sepcify > > i dont know if we still have example in master but we used to do this for py2.7 in the past > > for our devstack testing and ooo i think we do not need this but it can be done if needed for other reasons i think > > https://github.com/openstack/requirements/blob/master/upper-constraints.txt#L20-L21 > > we have an example for sphinks > > > > rather then <=3.8 i would just clamp it for 3.6 > > > > > Mark > > > > > > > > > Yes, that's how it's normally done, but they haven't officially dropped > the support tag for it. https://pypi.org/project/zeroconf/0.38.1/ still > says they support py35/36. So that's what reqs thinks. > (uppoer-constraints.txt is a machine generated file). > > -- > Matthew Thode > From artem.goncharov at gmail.com Fri Jan 28 15:24:11 2022 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Fri, 28 Jan 2022 16:24:11 +0100 Subject: [sdk][devstack] openstacksdk-functional-devstack job failing on stable branches In-Reply-To: References: <72c3aff1-1c18-24fc-9869-3e992cc518e0@gmail.com> <5787685.lOV4Wx5bFT@p1> <20220126132655.tm633dhzwdguum27@yuggoth.org> <36f75a014497335c2ccb3c31c376a341625fa62b.camel@redhat.com> <4D3A881A-4EA8-457A-ACFD-48E76BED0919@gmail.com> Message-ID: > On 28. Jan 2022, at 11:49, Sylvain Bauza wrote: > > > > Le mer. 26 janv. 2022 ? 15:28, Artem Goncharov > a ?crit : > Hi > >> On 26. Jan 2022, at 14:35, Sean Mooney > wrote: >> >> On Wed, 2022-01-26 at 13:26 +0000, Jeremy Stanley wrote: >>> On 2022-01-26 09:00:05 +0100 (+0100), Slawek Kaplonski wrote: >>> [...] >>>> Those tests are available only in master branch: [1] and they not >>>> exists in e.g. stable/xena [2]. >>>> >>>> I'm not sure how those jobs are defined exactly but my guess is >>>> that on those stable branches where it fails it runs openstacksdk >>>> from master branch. So either it should be changed and proper >>>> stable branch of the openstacksdk should be used there or we >>>> should skip those tests when API extension in neuron is not >>>> present. >>> [...] >>> >>> I find it intriguing that the SDK has stable branches at all. Isn't >>> the goal that the latest version of the SDK be able to interface >>> with multiple versions of OpenStack services, new and old? If we >>> don't test that the current SDK code works with Neutron from Xena, >>> then that opens it up to growing serious problems for users. >>> >> ya that seam odd to me too. >> i would expect sdk to be release independent and work simialr to tempest >> where master sdk should be useable with all stable branches. >>> Having the SDK or tests work out what features are expected to work >>> sounds like the only sensible solution. > > Well, actually it should be working. Cause of absence of real tests this way I can not guarantee it will work in every case, but we do our best to keep it this way (and most of the tests are checking whether feature is available or not). > > Honestly I have no clue why it evolved this way and therefore can?t really comment on that. Maybe (just maybe) there was something from the Debian (or other distro) packaging pov (stable, not-stable, etc) that somehow leaded to this setup. Otherwise there is absolutely no possibility to limit which version of sdk need to go into which older distro. And since i.e. ansible and openstackclient depend on the sdk things are getting even more interesting. Sdk in this sense is comparable with keystoneauth lib (and actually depend on it). So once we say there are no stable branches of sdk anymore we open even worse can of worms. > > Since we are currently anyway in a phase of big rework in sdk I would say addressing tests that do not work on older branches can be done as well. > > FWIW, we recently discussed on #openstack-nova about this [1] and the consensus was that we should try to avoid as much as possible any attempt to test stable releases of Nova against any stable branch of the SDK. > > I haven't seen yet any bug report, does anyone know about it ? > > As our conclusion was that we would refrain pulling stable branches of the SDK, we'll need to make the job non-voting on our repositories as it holds merging backports. JFYI: We started working on running our functional tests on old stable branches with latest SDK to enable others using sdk master branch. > > More to come in the next hours, > -Sylvain > > [1] https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-nova.2022-01-28.log.html#t2022-01-28T10:42:31 > > > Artem -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Fri Jan 28 16:55:01 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Fri, 28 Jan 2022 11:55:01 -0500 Subject: [cinder] new driver freeze & exceptions In-Reply-To: <72ed9f96-7f1e-0d0c-63fd-aa3602d703b6@gmail.com> References: <72ed9f96-7f1e-0d0c-63fd-aa3602d703b6@gmail.com> Message-ID: <82a1b19c-477c-2fd0-87d7-b1473ebf760c@gmail.com> Update: The NEC Storage V Series driver has merged. The Lightbits driver has three +2s. The Lightbits os-brick connector it depends on has not yet merged, but has been actively revised to address comments. I am extending the new driver freeze exception until 20:00 Wednesday 2 February. On 1/21/22 5:25 PM, Brian Rosmaita wrote: > Hello Argonauts, > > The new driver merge deadline passed at 20:00 UTC today. > > I'm extending the new driver merge deadline to Friday 28 January at > 20:00 UTC for two drivers: > > 1. Lightbits: It has both a cinder and os-brick patch, and I want the > team to have more time to look at the os-brick patch.? I think the > driver patch is close to ready; the developers have been quick to > respond to comments and make revisions.? Also, the third-party CI is > functioning and responding on patches. > ? cinder: https://review.opendev.org/c/openstack/cinder/+/821602 > ? os-brick: https://review.opendev.org/c/openstack/os-brick/+/821603 > > 2. NEC Storage V Series: the driver patch has one +2 and the third party > CI is functioning and responding on patches; I don't see any reason to > make this one wait for the Z release. > ? https://review.opendev.org/c/openstack/cinder/+/815614 > > Cinder core reviewers: please continue to review the above patches. > > With respect to the other proposed drivers: > - Pure Storage NVMe-RoCE: vendor decided to hold it for Z > - Dell EMC PowerStore NFS driver: vendor decided to hold it for Z > - HPE XP Storage FC & iSCSI: the code is very straightforward, but the > CI is not ready yet, so this will have to wait for Z > - YADRO Tatlin.UNIFIED driver: initial patch arrived yesterday (and has > a -1 from Zuul); the CI appears to be running, but the cinder team needs > to move on to reviewing os-brick and Yoga feature patches, so this will > have to wait for Z also > > > cheers, > brian From katonalala at gmail.com Fri Jan 28 16:59:28 2022 From: katonalala at gmail.com (Lajos Katona) Date: Fri, 28 Jan 2022 17:59:28 +0100 Subject: Can neutron-fwaas project be revived? In-Reply-To: References: <771f9e50a5f0498caecf3cb892902954@inspur.com> <17e6e183905.d179775f807310.479674362228455950@ghanshyammann.com> <20220118175104.a6ppj2kxpijeztz7@yuggoth.org> <17e72a52aee.f17e7143871608.3933408575637218060@ghanshyammann.com> Message-ID: Hi, Today Neutron drivers team again discussed the topic of the maintenance of neutron-fwaas. The team agreed to include neutron-fwaas again to Neutron stadium, with the maintenance of Inspur and the guidance of Neutron core team, and with +2 rights to Inspur developers. The logs of the meeting: https://meetings.opendev.org/meetings/neutron_drivers/2022/neutron_drivers.2022-01-28-14.00.log.html#l-14 The process for Stadium projects: https://docs.openstack.org/neutron/latest/contributor/stadium/governance.html Thanks for stepping in to maintaining and developing neutron-fwaas. Regards Lajos Katona (lajoskatona) Lajos Katona ezt ?rta (id?pont: 2022. jan. 20., Cs, 10:07): > Hi, > Neutron team is open to include projects to the stadium group (that was > the feeling during the meeting also when we discussed this topic) if there > is a stable maintainer team behind the project. > So as you mentioned it would be easier to avoid the back and forth > movement of fwaas if possible. > > Lajos > > > Ghanshyam Mann ezt ?rta (id?pont: 2022. jan. > 19., Sze, 15:02): > >> ---- On Wed, 19 Jan 2022 02:23:39 -0600 Lajos Katona < >> katonalala at gmail.com> wrote ---- >> > Hi, >> > Thanks for the advice. >> > The intention from the Neutron team was to make it clear that the team >> currently has no capacity to help the maintenance of neutron-fwaas, and >> can't help to maintain it.If there's easier ways for volunteers to keep it >> maintained other than forking it to x/ namespace that would be really >> helpful. >> >> Thanks Lajos, >> >> Main point here is if it is maintained by current maintainer (inspur team >> or other developers) whether neutron team will consider that >> to be in added in neutron stadium? >> >> If yes, then it will be extra work to move to x/ namespace now and then >> bring back to openstack/. >> If no, then moving to x/ namespace is good option or if maintainer want >> to be in openstack then we can discuss about >> a separate new project (but that needs more discussion on host much cost >> it adds). >> >> -gmann >> >> > Lajos Katona (lajoskatona) >> > >> > Jeremy Stanley ezt ?rta (id?pont: 2022. jan. 18., >> K, 18:58): >> > On 2022-01-18 10:49:48 -0600 (-0600), Ghanshyam Mann wrote: >> > [...] >> > > As discussed in project-config change[1], you or neutron folks can >> > > propose the retirement now itself (considering there is no one to >> > > maintain/release stable/victoria for new bug fixes) and TC will >> > > merge it as per process. After that, creating it in x/ namespace >> > > will be good to do. >> > [...] >> > >> > Looking at this from a logistical perspective, it's a fair amount of >> > churn in code hosting as well as unwelcoming to the new volunteers, >> > compared to just leaving the repository where it is now and letting >> > them contribute to it there. If the concern is that the Neutron team >> > doesn't want to retain responsibility for it while they evaluate the >> > conviction of the new maintainers for eventual re-inclusion, then >> > the TC would be well within its rights to declare that the >> > repository can remain in place while not having it be part of the >> > Neutron team's responsibilities. >> > >> > There are a number of possible solutions, ranging from making a new >> > category of provisional deliverable, to creating a lightweight >> > project team under the DPL model, to declaring it a pop-up team with >> > a TC-owned repository. There are repositories within the OpenStack >> > namespace which are not an official part of the OpenStack >> > coordinated release, after all. Solutions which don't involve having >> > the new work take place somewhere separate, and the work involved in >> > making that separate place, which will simply be closed down as >> > transient cruft if everything goes as desired. >> > -- >> > Jeremy Stanley >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james at openstack.org Fri Jan 28 18:20:46 2022 From: james at openstack.org (James Cole) Date: Fri, 28 Jan 2022 12:20:46 -0600 Subject: [Skyline] Project mascot to review In-Reply-To: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> References: <54D2CC84-6134-4B62-801D-3C6CB93CBC79@openstack.org> Message-ID: Hi again everyone, Thank you for all the feedback on this! The white background version has a definitive lead at the moment, but I will leave the discussion open over the weekend in case anyone else wants to chime in. @Sean Moony mentioned a preference for the triangle pattern on the yellow version since they aren?t overlapping. I?m curious if anyone else shares a preference for that triangle pattern button on the white background. Thank you again! -James > On Jan 26, 2022, at 3:44 PM, James Cole wrote: > > Greetings everyone, > > I?m James, a designer with the OpenInfra Foundation. We have a been working on a new project mascot for Skyline and wanted to share a couple of options with you. > > The reference we were provided alluded to a nine-color-deer, so we created a deer illustration in the style of the other project mascots. The two options are essentially the same, but one is white with spots and the other is yellow with spots. Let me know if you like one or the other?or if we?re totally off on the theme?and we will add the text to the logo and finalize the assets. > > There is a PDF attached to this message or you can view the document by visiting this link . > > Thank you! > > -James > > <20220120 Mascots Project.pdf> -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Fri Jan 28 18:57:22 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Fri, 28 Jan 2022 19:57:22 +0100 Subject: [release] Release countdown for week R-8, Jan 31 - Feb 04 Message-ID: <608abf60-8b42-2839-cdb8-b333dd0f47c6@est.tech> General Information ------------------- The following cycle-with-intermediary deliverables have not done any intermediary release yet during this cycle. The cycle-with-rc release model is more suited for deliverables that plan to be released only once per cycle. As a result, we have proposed[1] to change the release model for the following deliverables: ovn-octavia-provider ironic-prometheus-exporter ironic-python-agent-builder ironic-ui networking-baremetal networking-generic-switch patrole swift [1] https://review.opendev.org/q/topic:not-yet-released-yoga PTLs and release liaisons for each of those deliverables can either +1 the release model change, or propose an intermediary release for that deliverable. In absence of answer by the end of R-8week we'll consider that the switch to cycle-with-rc is preferable. We also published a proposed release schedule for the upcoming Zcycle. Please check out the separate thread: http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026696.html Upcoming Deadlines & Dates -------------------------- Non-client library freeze: February 17th, 2022(R-6 week) Client library freeze: February 24th, 2022(R-5 week) Yoga-3 milestone: February 24th, 2022(R-5 week) Yoga final release: March 30th, 2022 Next PTG: April 4 - 8, 2022 (virtual) El?d Ill?s irc: elodilles -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Jan 28 19:08:44 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 28 Jan 2022 13:08:44 -0600 Subject: Can neutron-fwaas project be revived? In-Reply-To: References: <771f9e50a5f0498caecf3cb892902954@inspur.com> <17e6e183905.d179775f807310.479674362228455950@ghanshyammann.com> <20220118175104.a6ppj2kxpijeztz7@yuggoth.org> <17e72a52aee.f17e7143871608.3933408575637218060@ghanshyammann.com> Message-ID: <17ea21703ef.1109b0e8c100623.2274090376639502022@ghanshyammann.com> ---- On Fri, 28 Jan 2022 10:59:28 -0600 Lajos Katona wrote ---- > Hi,Today Neutron drivers team again discussed the topic of the maintenance of neutron-fwaas. > The team agreed to include neutron-fwaas again to Neutron stadium, with the maintenance of Inspur and the guidance of Neutron core team, and with +2 rights to Inspur developers. > The logs of the meeting:https://meetings.opendev.org/meetings/neutron_drivers/2022/neutron_drivers.2022-01-28-14.00.log.html#l-14 > The process for Stadium projects:https://docs.openstack.org/neutron/latest/contributor/stadium/governance.html > Thanks for stepping in to maintaining and developing neutron-fwaas. Thanks lajoskatona, neutron team for reconsidering it. @ Inspur team, Plese propose the revert of deprecation of neutron-fwaas in governance. and after that we can setup the project-config, jobs things there. -gmann > RegardsLajos Katona (lajoskatona) > > > Lajos Katona ezt ?rta (id?pont: 2022. jan. 20., Cs, 10:07): > Hi,Neutron team is open to include projects to the stadium group (that was the feeling during the meeting also when we discussed this topic) if there is a stable maintainer team behind the project.So as you mentioned it would be easier to avoid the back and forth movement of fwaas if possible. > Lajos > > > Ghanshyam Mann ezt ?rta (id?pont: 2022. jan. 19., Sze, 15:02): > ---- On Wed, 19 Jan 2022 02:23:39 -0600 Lajos Katona wrote ---- > > Hi, > > Thanks for the advice. > > The intention from the Neutron team was to make it clear that the team currently has no capacity to help the maintenance of neutron-fwaas, and can't help to maintain it.If there's easier ways for volunteers to keep it maintained other than forking it to x/ namespace that would be really helpful. > > Thanks Lajos, > > Main point here is if it is maintained by current maintainer (inspur team or other developers) whether neutron team will consider that > to be in added in neutron stadium? > > If yes, then it will be extra work to move to x/ namespace now and then bring back to openstack/. > If no, then moving to x/ namespace is good option or if maintainer want to be in openstack then we can discuss about > a separate new project (but that needs more discussion on host much cost it adds). > > -gmann > > > Lajos Katona (lajoskatona) > > > > Jeremy Stanley ezt ?rta (id?pont: 2022. jan. 18., K, 18:58): > > On 2022-01-18 10:49:48 -0600 (-0600), Ghanshyam Mann wrote: > > [...] > > > As discussed in project-config change[1], you or neutron folks can > > > propose the retirement now itself (considering there is no one to > > > maintain/release stable/victoria for new bug fixes) and TC will > > > merge it as per process. After that, creating it in x/ namespace > > > will be good to do. > > [...] > > > > Looking at this from a logistical perspective, it's a fair amount of > > churn in code hosting as well as unwelcoming to the new volunteers, > > compared to just leaving the repository where it is now and letting > > them contribute to it there. If the concern is that the Neutron team > > doesn't want to retain responsibility for it while they evaluate the > > conviction of the new maintainers for eventual re-inclusion, then > > the TC would be well within its rights to declare that the > > repository can remain in place while not having it be part of the > > Neutron team's responsibilities. > > > > There are a number of possible solutions, ranging from making a new > > category of provisional deliverable, to creating a lightweight > > project team under the DPL model, to declaring it a pop-up team with > > a TC-owned repository. There are repositories within the OpenStack > > namespace which are not an official part of the OpenStack > > coordinated release, after all. Solutions which don't involve having > > the new work take place somewhere separate, and the work involved in > > making that separate place, which will simply be closed down as > > transient cruft if everything goes as desired. > > -- > > Jeremy Stanley > > > From smooney at redhat.com Fri Jan 28 19:11:59 2022 From: smooney at redhat.com (Sean Mooney) Date: Fri, 28 Jan 2022 19:11:59 +0000 Subject: Secure Boot VM issues (libvirt / SMM) | Secure boot requires SMM feature enabled In-Reply-To: References: Message-ID: <27dc42588601b285a7f2ec0ac6ad88e41701fad5.camel@redhat.com> On Wed, 2022-01-19 at 17:31 +0000, Sean Mooney wrote: > On Wed, 2022-01-19 at 14:21 +0000, Imran Hussain wrote: > > Hi, > > > > Deployed Wallaby on Ubuntu 20.04 nodes. Having issues with libvirt XML > > being incorrect, I need the smm bit () and it isn't > > being added to the XML. Anyone seen this before? Or any ideas? More info > > below... > > > > Error message: > > : libvirt.libvirtError: unsupported configuration: Secure boot requires > > SMM feature enabled > > > > Versions: > > libvirt version: 6.0.0, package: 0ubuntu8.15 > > QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.18) > > Nova 23.1.1 (deployed via kolla, so > > kolla/ubuntu-source-nova-compute:wallaby is the image) > > ovmf 0~20191122.bd85bf54-2ubuntu3.3 > > > > Context: > > https://specs.openstack.org/openstack/nova-specs/specs/wallaby/implemented/allow-secure-boot-for-qemu-kvm-guests.html > > > > Image metadata: > > > > hw_firmware_type: uefi > > hw_machine_type: q35 > > os_secure_boot: required > > ok those d seam to be allinged with the documentaiton > https://docs.openstack.org/nova/latest/admin/secure-boot.html > how in addtion to those option the uefi firmware image used but qemu which is provide by the ovmf package also need > to provide a secure boot capable image > > waht failing here is the system manamgemt mode feature. > > when os_secure_boot is set > we defien the "secure" attibute on the loader element. > > https://github.com/openstack/nova/blob/7aa3a0f558ddbcac3cb97a7eef58cd878acc3f7a/nova/virt/libvirt/config.py#L2871-L2873 > > based on the > https://libvirt.org/formatdomain.html#hypervisor-features > > smm should be enabled by default ok so i quickly hacked togheter a patch to test this. https://review.opendev.org/c/openstack/nova/+/826931 i was able to repoduce the secure boot failure on ubuntu 20.04 this looks like its a ubuntu libvirt bug or upstram libvirts docs are wrong but with that patch i can boot with secure boot enabeld on ubuntu 20.04 full final xml instance-0000000c fc2284e5-a46d-4ee9-aedc-f6d0058eb797 test 2022-01-28 19:01:22 2048 20 0 0 2 admin demo 2097152 2097152 2 2048 /machine OpenStack Foundation OpenStack Nova 24.1.0 fc2284e5-a46d-4ee9-aedc-f6d0058eb797 fc2284e5-a46d-4ee9-aedc-f6d0058eb797 Virtual Machine hvm /usr/share/OVMF/OVMF_CODE.ms.fd /var/lib/libvirt/qemu/nvram/instance-0000000c_VARS.fd Nehalem destroy restart destroy /usr/bin/qemu-system-x86_64