From zigo at debian.org Sun Oct 2 15:53:51 2022 From: zigo at debian.org (Thomas Goirand) Date: Sun, 2 Oct 2022 17:53:51 +0200 Subject: suggestion goal for zed: getting swift and glance to support uwsgi Message-ID: <1dcfb330-c3c0-c82e-3b09-1fa2c7c15087@debian.org> Hi, As you may know, almost all of OpenStack now supports running on UWSGI. However, 2 projects remain incompatible with it: Glance and Swift. I've heard that it's now fixed with Glance, but I haven't checked the fact for myself. Has anyone already run Glance (with Swift as backend) under uwsgi, instead of eventlet? What's the status? Has all of the remaining issues been tackled? As for Swift, while upstream has even examples on how to run Swift over uwsgi, experience (in heavy load production) demonstrated that there are many issues running under uwsgi. That's a shame, because the uwsgi server makes most services (proxy, object, container and account servers) run twice as fast. Currently, the proxy and object servers aren't following the RFCs, and are incompatible with uwsgi, especially when chunks are involved (SLO/DLO in the pipeline). Also, it doesn't look like the Swift servers are thread safe. Switching to more than one thread just fails under heavy load. As a result, in production, we had to run double the amount of servers to handle the load. That's a huge waste of resources, IMO. It'd be great if upstream took it seriously to propose a uwsgi binary by default, and if the CI was using it. Note that I've already proposed such a patch [1], but it received zero votes, and not really getting attention upstream. Your thoughts anyone? Cheers, Thomas Goirand (zigo) P.S: This is just a suggestion for the work in the next cycle, I don't think I have the bandwidth to work on each individual project, and that my time is better spent on what I do best: Debian packaging of OpenStack and cluster deployment integration. [1] https://review.opendev.org/c/openstack/swift/+/821192 From noonedeadpunk at gmail.com Sun Oct 2 17:33:00 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Sun, 2 Oct 2022 19:33:00 +0200 Subject: suggestion goal for zed: getting swift and glance to support uwsgi In-Reply-To: <1dcfb330-c3c0-c82e-3b09-1fa2c7c15087@debian.org> References: <1dcfb330-c3c0-c82e-3b09-1fa2c7c15087@debian.org> Message-ID: Well, at very least in Xena I still do experience issues with uwsgi. It mostly depends on clients though and if they do support chunking properly. For example python-openstackclient works properly, but if you use python-glanceclient with ceph backend, as example, you will hit issues. And other projects, like heat, do still use glanceclient. Also interoperable import still was not working with uwsgi. I have not tested it on Yoga, but haven't saw patches that was aiming to fix it either. I think you have also missed to mention neutron, that does not really work with uwsgi and ovn as a ml2 driver. Not sure if that was fixed on Zed, but as of Yoga it was known not to work. ??, 2 ???. 2022 ?., 18:00 Thomas Goirand : > I've heard that it's now fixed with Glance, but I haven't checked the > fact for myself. Has anyone already run Glance (with Swift as backend) > under uwsgi, instead of eventlet? What's the status? Has all of the > remaining issues been tackled? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clay.gerrard at gmail.com Sun Oct 2 21:41:14 2022 From: clay.gerrard at gmail.com (Clay Gerrard) Date: Sun, 2 Oct 2022 16:41:14 -0500 Subject: [Swift][Ussuri] Erasure Coding Quarantines In-Reply-To: <20220930165217.2901f9cf@niphredil.zaitcev.lan> References: <20220930165217.2901f9cf@niphredil.zaitcev.lan> Message-ID: On Fri, Sep 30, 2022 at 4:56 PM Pete Zaitcev wrote: > > Unfortunately, I'm not familiar with the exact details of this. > There was a window where depending on how linker worked, our > code could get linked with an incorrect zlib crc routine randomly. > > # When upgrading from liberasurecode<=1.5.0, you may want to continue writing # legacy CRCs until all nodes are upgraded and capabale of reading fragments # with zlib CRCs. liberasurecode>=1.6.2 checks for the environment variable # LIBERASURECODE_WRITE_LEGACY_CRC; if set (value doesn't matter), it will use # its legacy CRC. Set this option to true or false to ensure the environment # variable is or is not set. Leave the option blank or absent to not touch # the environment (default). For more information, see # https://bugs.launchpad.net/liberasurecode/+bug/1886088 # write_legacy_ec_crc = https://github.com/NVIDIA/swift/blob/master/etc/proxy-server.conf-sample#L326-L334 set it in your object-server [DEFAULT] confs too -- Clay Gerrard -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Sun Oct 2 22:02:51 2022 From: zigo at debian.org (Thomas Goirand) Date: Mon, 3 Oct 2022 00:02:51 +0200 Subject: suggestion goal for zed: getting swift and glance to support uwsgi In-Reply-To: References: <1dcfb330-c3c0-c82e-3b09-1fa2c7c15087@debian.org> Message-ID: <41eeb380-1df9-10d0-10de-d2f50e03cc46@debian.org> On 10/2/22 19:33, Dmitriy Rabotyagov wrote: > I think you have also missed to mention neutron, that does not really > work with uwsgi and ovn as a ml2 driver. Not sure if that was fixed on > Zed, but as of Yoga it was known not to work. To be honest, I never tried using OVN, so I didn't know. We've been using Neutron with uwsgi since at least rocky, without a glitch though. Why such a regression then? :( Cheers, Thomas Goirand (zigo) From ltomasbo at redhat.com Mon Oct 3 05:44:41 2022 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Mon, 3 Oct 2022 07:44:41 +0200 Subject: [ovn][neutron] RE: OVN BGP Agent query In-Reply-To: References: Message-ID: On Fri, Sep 30, 2022 at 6:20 PM Ihtisham ul Haq wrote: > Hi Luis and Daniel, > > Please see inline response. > > > From: Daniel Alvarez Sanchez > > Sent: 29 September 2022 11:37 > > Subject: Re: OVN BGP Agent query > > > > Hi Ihtisham and Luis, > > > > On Thu, Sep 29, 2022 at 7:42 AM Luis Tomas Bolivar > wrote: > > > Some comments and questions inline > > > > > > On Tue, Sep 27, 2022 at 1:39 PM Ihtisham ul haq < > ihtisham.uh at hotmail.com> wrote: > > > > Hi Luis, > > > > > > > > Thanks for your work on the OVN BGP Agent. We are planning > > > > to use it in our OVN deployment, but have a question regarding it. > > > > > > Great to hear! Can you share a bit more info about this environment? > like > > > openstack version, target workload, etc. > > We plan to use this with Yoga version. Our workload consist of enterprise > users > with VMs running on Openstack and connected to their enterprise network via > transfer network(to which the customer neutron router is attached to). > And we also have public workload but with the ovn-bgp we only want to > we want to advertise the former. > > > > > > > > The way our current setup with ML2/OVS works is that our customer VM > IP routes > > > > are announced via the router IP(of the that customer) to the leaf > switch instead of > > > > the IP of the host where the neutron BGP agent runs. And then even > if the > > > > router fails over, the IP of the router stays the same and thus the > BGP route > > > > doesn't need to be updated. > > > > > > Is this with Neutron Dynamic Routing? When you say Router IP, do you > mean the virtual neutron router and its IP associated with the provider > network? What type of IPs are you announcing with BGP? IPs on provider > network or on tenant networks (or both)? > > Yes, that's with Neutron DR agent, and I meant virtual neutron router with > IP from the provider network. We announce IPs of our tenant network via the > virtual routers external address. > > > > If the router fails over, the route needs to be updated, doesn't it? > Same IP, but exposed in the new location of the router? > > Correct. > > > The route to the tenant network doesn't change, ie. > > 192.168.0.0 via 172.24.4.100 (this route remains the same regardless of > where 172.24.4.100 is). > > If there's L2 in the 172.24.4.0 network, the new location of > 172.24.4.100 will be learnt via GARP announcement. In our case, this won't > happen as we don't have L2 so we expose directly connected routes to > overcome this "limitation". > > Right, in our case we have a stretched L2 transfer network(mentioned above) > to which our gateway nodes and customer routers are connected to, so we can > advertise the IPs from the tenant network via the virtual router external > IP > and thus the location of the router isn't relevant in case of failover as > its > address will be relearned. > > > In the case of Neutron Dynamic Routing, there's no assumption that > everything is L3 so GARPs are needed to learn the new location. > > > > > > We see that the routes are announced by the ovn-bgp-agent via the > host IP(GTW) in our > > > > switch peers. If that's the case then how do you make sure that > during failover > > > > of a router, the BGP routes gets updated with the new host IP(where > the router > > > > failed over to)? > > > > > > The local FRR running at each node is in charge of exposing the IPs. > For the IPs on the provider network, the traffic is directly exposed where > the VMs are, without having to go through the virtual router, so a router > failover won't change the route. > > > In the case of VMs on tenant networks, the traffic is exposed on the > node where the virtual router gateway port is associated (I suppose this is > what you refer to with router IP). In the case of a failover the agent is > in charge of making FRR to withdraw the exposed routes on the old node, and > re-advertise them on the new router IP location > > > > > > > Can we accomplish the same route advertisement as our ML2/OVS setup, > using the ovn-bgp-agent? > > > > I think this is technically possible, and perhaps you want to contribute > that functionality or even help integrating the agent as a driver of > Neutron Dynamic Routing? > > Sounds good, our plan currently is to add this to the ovn-bgp-agent, > so we can announce our tenant routes via virtual routers external address > on > a stretched L2 network, to make it work with our use case. > Great to hear!! Just to make it clear, the ovn-bgp-agent current solution is to expose the tenant VM IPs through the host that has the OVN router gateway port, so for example, if the VM IP (10.0.0.5) is connected to the neutron virtual router, which in turns is connected to your provider network (your transfer network) with IP 172.24.4.10, and hosted in a physical server with IP 192.168.100.100, the route will be exposed as: - 10.0.0.5 nexthop 192.168.100.100 - 172.24.4.10 nexthop 192.168.100.100 As we are using FRR config "redistributed connected". As the traffic to the tenant networks needs to be injected into the OVN overlay through the gateway node hosting that ovn virtual router gateway port (cr-lrp), would it be ok if, besides those route we also advertise? - 10.0.0.5 nexthop 172.24.4.10 Cheers, Luis > > > -- > Ihtisham ul Haq > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r > die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht > der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich > in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie > hier. > > -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Mon Oct 3 11:58:08 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Mon, 3 Oct 2022 13:58:08 +0200 Subject: [neutron] Bug deputy 26 September to 2 October Message-ID: Hello Neutrinos: This is the list of bugs of the past week: Medium: * https://bugs.launchpad.net/neutron/+bug/1991092: Retry port provisioning for "nova:xxx" device_owner ports only. Assigned * https://bugs.launchpad.net/neutron/+bug/1991222: neutron.provisioningblocks WSREP: referenced FK check fail. Assigned Low: * https://bugs.launchpad.net/neutron/+bug/1990999: Install and configure compute node in Neutron. Unassigned * https://bugs.launchpad.net/neutron/+bug/1991398: Update port with given IPv6 address on SLAAC/stateless_dhpc subnets fails always when IP address is given. Assigned Wishlist: * https://bugs.launchpad.net/neutron/+bug/1990842: [RFE] Expose Open vSwitch other_config column in the API. Assigned * https://bugs.launchpad.net/neutron/+bug/1991000: [tripleo] Provide a tag to the container that will be used to kill it. Assigned Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkopec at redhat.com Mon Oct 3 13:05:50 2022 From: mkopec at redhat.com (Martin Kopec) Date: Mon, 3 Oct 2022 15:05:50 +0200 Subject: [neutron] PTG topics and scheduling In-Reply-To: References: Message-ID: perfect, thank you Lajos On Fri, 30 Sept 2022 at 17:39, Lajos Katona wrote: > Hi > We have this topic I think on the Neutron etherpad: > https://etherpad.opendev.org/p/neutron-antelope-ptg#L74 > So this can be a common topic which we can discuss for sure. > > Lajos > > Martin Kopec ezt ?rta (id?pont: 2022. szept. 30., P, > 16:41): > >> Hi Rodolfo, >> >> in QA we have "Clean up deprecated lib/neutron code" [3] topic up for a >> discussion. >> Is that something you plan to discuss? Would you like to? We can either >> move that topic to the neutron's schedule or keep it in ours and you may >> just comment/or attend our session - depending on how much input from the >> neutron team is required there. >> >> [3] https://etherpad.opendev.org/p/qa-antelope-ptg >> >> Thank you, >> >> On Fri, 30 Sept 2022 at 12:06, Rodolfo Alonso Hernandez < >> ralonsoh at redhat.com> wrote: >> >>> Hello all: >>> >>> Based on the schedule polls results, I've booked the Neutron meetings >>> from Tuesday to Thursday in Mitaka channel [1], from 13UTC to 16UTC. Of >>> course, if that is not enough, we can always use Friday to continue any >>> pending conversation. >>> >>> The operator hour (actually 2), will be on Friday (pending for >>> reservation), from 13UTC to 15UTC. >>> >>> Please continue adding any topic you want to discuss in the Neutron >>> etherpad [2]. There is a specific section for the Nova-Neutron cross-team >>> meeting. >>> >>> If you have any doubt or question, do not hesitate to let me know (IRC: >>> ralonsoh, mail: ralonsoh at redhat.com). You can also ping any core >>> reviewer in #openstack-neutron channel. >>> >>> See you in a few weeks! >>> >>> [1]https://ptg.opendev.org/ptg.html >>> [2]https://etherpad.opendev.org/p/neutron-antelope-ptg >>> >>> >> >> -- >> Martin Kopec >> Senior Software Quality Engineer >> Red Hat EMEA >> IM: kopecmartin >> >> >> >> -- Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Mon Oct 3 16:21:12 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 3 Oct 2022 18:21:12 +0200 Subject: [cloudkitty] Another core team cleanup Message-ID: Hello, Almost exactly two years since the last core team cleanup [1], it's probably time to have another one. I don't think we have heard from these contributors in the last couple of years: Justin Ferrieu jferrieu at objectif-libre.com Luis Ramirez luis.ramirez at opencloud.es Luka Peschke mail at lukapeschke.com Maxime Cottret maxime.cottret at gmail.com St?phane Albert sheeprine at nullplace.com Jeremy Liu liuj285 at chinaunicom.cn Is everyone okay with removing cloudkitty-core membership for these users? Cheers, Pierre Riteau (priteau) [1] https://lists.openstack.org/pipermail/openstack-discuss/2020-October/017751.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Mon Oct 3 17:34:06 2022 From: kennelson11 at gmail.com (Kendall Nelson) Date: Mon, 3 Oct 2022 12:34:06 -0500 Subject: PTG Schedule and Reminders Message-ID: Hello Everyone! The October 2022 Project Teams Gathering is right around the corner and the schedule is being setup by your team leads! Slots are going fast, so make sure to get your time booked ASAP! You can find the schedule and available slots on the PTGbot website [1]. The PTGbot site is the during-event website to keep track of what's being discussed and any last-minute schedule changes. It is driven via commands in the #openinfra-events IRC channel (on the OFTC network) where the PTGbot listens. If you have questions about the commands that you can give the bot, check out the documentation here[2]. Also, if you haven?t connected to IRC before, here are some docs on how to get setup![3] Lastly, please don't forget to register[4] (it is free after all!). Please let us know if you have any questions via email to ptg at openinfra.dev. Thanks! -Kendall (diablo_rojo) [1] PTGbot Site: https://ptg.opendev.org/ptg.html [2] PTGbot Documentation: https://github.com/openstack/ptgbot#open-infrastructure-ptg-bot [3] Setup IRC: https://docs.openstack.org/contributors/common/irc.html [4] PTG Registration: https://openinfra-ptg.eventbrite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Mon Oct 3 18:02:18 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Mon, 3 Oct 2022 15:02:18 -0300 Subject: [cloudkitty] Another core team cleanup In-Reply-To: References: Message-ID: I guess it is fine as they are not participating in the project anymore, and this has been a constant for the past two years or so. On Mon, Oct 3, 2022 at 1:26 PM Pierre Riteau wrote: > Hello, > > Almost exactly two years since the last core team cleanup [1], it's > probably time to have another one. I don't think we have heard from these > contributors in the last couple of years: > > Justin Ferrieu jferrieu at objectif-libre.com > Luis Ramirez luis.ramirez at opencloud.es > Luka Peschke mail at lukapeschke.com > Maxime Cottret maxime.cottret at gmail.com > St?phane Albert sheeprine at nullplace.com > Jeremy Liu liuj285 at chinaunicom.cn > > Is everyone okay with removing cloudkitty-core membership for these users? > > Cheers, > Pierre Riteau (priteau) > > [1] > https://lists.openstack.org/pipermail/openstack-discuss/2020-October/017751.html > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Mon Oct 3 19:59:14 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 03 Oct 2022 12:59:14 -0700 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Oct 6 at 1500 UTC Message-ID: <1839f6e6053.e2f4eedf40797.2956987166661707844@ghanshyammann.com> Hello Everyone, The technical Committee's next weekly meeting is scheduled for 2022 Oct 6, at 1500 UTC. If you would like to add topics for discussion, please add them to the below wiki page by Wednesday, Oct 5 at 2100 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting -gmann From jay at gr-oss.io Mon Oct 3 20:06:06 2022 From: jay at gr-oss.io (Jay Faulkner) Date: Mon, 3 Oct 2022 13:06:06 -0700 Subject: [release][ironic] Release desired for Ironic bugfix/N branches Message-ID: Hey all, I was attempting to perform a release of all maintained Ironic branches. The long term support branches cut as part of the integrated OpenStack release have been requested via gerrit, as documented ( https://review.opendev.org/c/openstack/releases/+/860125 ). We have intermediate bugfix releases as well, and I was hoping to get patch releases cut from these as well. As far as I can tell, there is no automation for performing these releases, or an official place to request them. If this is wrong; please correct me and I'm happy to go through the proper process. We know that the majority of consumers pull these from git directly; but checking pypi release notes, we had over 6000 downloads of the 21.0.0 and 20.2.0 releases during the 2/4 months of the Zed cycle they had respectively been released, so I do think there's value in releasing these even if it might be a small amount of manual effort. Below is a list of the Ironic projects, and associated bugfix/ branches I'd like to have a patch (bugfix) release cut for: ironic - bugfix/21.0 - bugfix/20.2 - bugfix/19.0 - bugfix/18.1 ironic-inspector - bugfix/11.0 - bugfix/10.12 - bugfix/10.9 - bugfix/10.7 ironic-python-agent - bugfix/9.0 - bugfix/8.6 - bugfix/8.3 - bugfix/8.1 Thanks, Jay Faulkner -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Mon Oct 3 20:50:24 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 03 Oct 2022 13:50:24 -0700 Subject: [release][ironic] Release desired for Ironic bugfix/N branches In-Reply-To: References: Message-ID: <7334a347-452b-40d6-9832-2c4c019e7370@app.fastmail.com> On Mon, Oct 3, 2022, at 1:06 PM, Jay Faulkner wrote: > Hey all, > > I was attempting to perform a release of all maintained Ironic > branches. The long term support branches cut as part of the integrated > OpenStack release have been requested via gerrit, as documented ( > https://review.opendev.org/c/openstack/releases/+/860125 ). > > We have intermediate bugfix releases as well, and I was hoping to get > patch releases cut from these as well. As far as I can tell, there is > no automation for performing these releases, or an official place to > request them. If this is wrong; please correct me and I'm happy to go > through the proper process. > > We know that the majority of consumers pull these from git directly; > but checking pypi release notes, we had over 6000 downloads of the > 21.0.0 and 20.2.0 releases during the 2/4 months of the Zed cycle they > had respectively been released, so I do think there's value in > releasing these even if it might be a small amount of manual effort. > > Below is a list of the Ironic projects, and associated bugfix/ branches > I'd like to have a patch (bugfix) release cut for: > > ironic > - bugfix/21.0 > - bugfix/20.2 > - bugfix/19.0 > - bugfix/18.1 > > ironic-inspector > - bugfix/11.0 > - bugfix/10.12 > - bugfix/10.9 > - bugfix/10.7 > > ironic-python-agent > - bugfix/9.0 > - bugfix/8.6 > - bugfix/8.3 > - bugfix/8.1 As mentioned in the openstack-releases IRC channel it seems that the Ironic project doesn't have ACLs to push their own manual tags [0]. It was also mentioned that the release team thought releases off of the bugfix branches would need manual releases. I think the lack of ACLs to do this by the Ironic project means that the release team needs to do it, or we need to modify the ACLs to allow the Ironic team to do the work. If the release team ends up doing the work, it would probably be a good idea to very explicitly list the branch, commit sha1, and version number for each of the needed releases. This way the release team doesn't have to guess if they are getting it correct when they make and push those tags. Separately, it seems like some of the intention here is to ensure that users of bugfix branches don't end up with stale installations. Updating the release tooling to handle releases off of these branches or delegating access to the Ironic team seem like an important piece of making that happen. Otherwise the overhead for doing this will be large enough that it is unlikely to happen often enough. Unfortunately, I don't know what is currently missing in the tooling to make that possible. [0] https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/openstack/ironic.config > > Thanks, > Jay Faulkner From smooney at redhat.com Tue Oct 4 00:48:16 2022 From: smooney at redhat.com (Sean Mooney) Date: Tue, 04 Oct 2022 01:48:16 +0100 Subject: [release][ironic] Release desired for Ironic bugfix/N branches In-Reply-To: <7334a347-452b-40d6-9832-2c4c019e7370@app.fastmail.com> References: <7334a347-452b-40d6-9832-2c4c019e7370@app.fastmail.com> Message-ID: On Mon, 2022-10-03 at 13:50 -0700, Clark Boylan wrote: > On Mon, Oct 3, 2022, at 1:06 PM, Jay Faulkner wrote: > > Hey all, > > > > I was attempting to perform a release of all maintained Ironic > > branches. The long term support branches cut as part of the integrated > > OpenStack release have been requested via gerrit, as documented ( > > https://review.opendev.org/c/openstack/releases/+/860125 ). > > > > We have intermediate bugfix releases as well, and I was hoping to get > > patch releases cut from these as well. As far as I can tell, there is > > no automation for performing these releases, or an official place to > > request them. If this is wrong; please correct me and I'm happy to go > > through the proper process. > > > > We know that the majority of consumers pull these from git directly; > > but checking pypi release notes, we had over 6000 downloads of the > > 21.0.0 and 20.2.0 releases during the 2/4 months of the Zed cycle they > > had respectively been released, so I do think there's value in > > releasing these even if it might be a small amount of manual effort. > > > > Below is a list of the Ironic projects, and associated bugfix/ branches > > I'd like to have a patch (bugfix) release cut for: > > > > ironic > > - bugfix/21.0 > > - bugfix/20.2 > > - bugfix/19.0 > > - bugfix/18.1 > > > > ironic-inspector > > - bugfix/11.0 > > - bugfix/10.12 > > - bugfix/10.9 > > - bugfix/10.7 > > > > ironic-python-agent > > - bugfix/9.0 > > - bugfix/8.6 > > - bugfix/8.3 > > - bugfix/8.1 > > As mentioned in the openstack-releases IRC channel it seems that the Ironic project doesn't have ACLs to push their own manual tags [0]. It was also mentioned that the release team thought releases off of the bugfix branches would need manual releases. I think the lack of ACLs to do this by the Ironic project means that the release team needs to do it, or we need to modify the ACLs to allow the Ironic team to do the work. > > If the release team ends up doing the work, it would probably be a good idea to very explicitly list the branch, commit sha1, and version number for each of the needed releases. This way the release team doesn't have to guess if they are getting it correct when they make and push those tags. > > Separately, it seems like some of the intention here is to ensure that users of bugfix branches don't end up with stale installations. Updating the release tooling to handle releases off of these branches or delegating access to the Ironic team seem like an important piece of making that happen. Otherwise the overhead for doing this will be large enough that it is unlikely to happen often enough. Unfortunately, I don't know what is currently missing in the tooling to make that possible. > > [0] https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/openstack/ironic.config its litrally been about 7 or 8 years since i did this but i had configred networkign-ovs-dpdk so that we could push sgined tags teh networking-ovs-dpdk-release group has the required acls to be able to create branches and tags. [access "refs/heads/*"] create = group networking-ovs-dpdk-release allows creating branches and enable branch creation and [access "refs/tags/*"] createSignedTag = group networking-ovs-dpdk-release enables pushing signed tags which used to get mirrored to github as well. this wont auto push the content to pypi i sued to also build the package locally an push it manually but that would enabel ironich to actully tag and push the content if they are added to the pypi repo as well. ideally however i think it woudl be better long term to just do this via the releases repo. im not entirly sure what prevent you just adding a new bugfix branch and sha there https://github.com/openstack/releases/blob/master/deliverables/zed/ironic.yaml#L9-L28 you can have one patch taht update all the bug fix branches acroos multipel release and then the release team just need to review one path. presumable this wont happen more frequently then say 1 a quater or once a month so that is proably doable vai the normal release process. > > > > > Thanks, > > Jay Faulkner > From ramishra at redhat.com Tue Oct 4 04:06:42 2022 From: ramishra at redhat.com (Rabi Mishra) Date: Tue, 4 Oct 2022 09:36:42 +0530 Subject: [TripleO] TripleO Antelope PTG Topics In-Reply-To: References: Message-ID: Hi All, Thanks for all the session proposals. I've moved the contents to the etherpad linked at the PTGBot site and added a draft schedule[1]. Please let me know if there is any conflict and we need to reschedule any of the sessions. Also, we can still accommodate a couple of sessions, if there are more topics to discuss. [1] https://etherpad.opendev.org/p/oct2022-ptg-tripleo On Thu, Sep 29, 2022 at 2:13 PM Rabi Mishra wrote: > Hi All, > > Gentle reminder. > > As we're only three weeks away, if you've any topics to discuss at the > PTG, please add them to the etherpad by this weekend. > > > -- > Regards, > Rabi Mishra > > > On Tue, Sep 13, 2022 at 5:28 PM Rabi Mishra wrote: > >> Hi All, >> >> Please add your session proposals to this etherpad[1]. We'll reserve >> time allocation based on proposed topics and work out a schedule in the >> coming weeks. >> >> [1] https://etherpad.opendev.org/p/tripleo-antelope-topics >> >> -- >> Regards, >> Rabi Mishra >> >> > > > -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From clay.gerrard at gmail.com Tue Oct 4 13:28:12 2022 From: clay.gerrard at gmail.com (Clay Gerrard) Date: Tue, 4 Oct 2022 08:28:12 -0500 Subject: [Swift][Ussuri] Erasure Coding Quarantines In-Reply-To: References: <20220930165217.2901f9cf@niphredil.zaitcev.lan> Message-ID: On Mon, Oct 3, 2022 at 3:37 PM Reid Guyett wrote: > > Thanks for the follow-up. [...] From there the files were downloadable > again. > Nice work! > We are going to try to create a new liberasurecode package 1.6.2 for 20.04 > so we can set the environment variable to write legacy CRC headers until > all the nodes in the cluster can be upgraded. > I'm not sure if you need a new package, I think you have to set the env at runtime - but there's also a swift config option that will force the env to get set that you can turn off after full upgrade. > This is why we have testing environments. > This is why *competent* deployers and operators have testing environments - and it's the only thing that makes the terrible terrible reality of building and releasing software actually a net good. Couldn't do it without you; go FOSS! -- Clay Gerrard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Tue Oct 4 13:56:38 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Tue, 4 Oct 2022 19:26:38 +0530 Subject: [cinder] cancelling this week's meeting Message-ID: Hello Argonauts, I won't be around to take the cinder meeting tomorrow (i.e. 05th October, 2022) and we are close to the PTG so there aren't many things to discuss. The agenda as of now is empty[1] so it shouldn't be a problem to skip this week's meeting. Also I would like to take this opportunity to remind everyone to add topics to the PTG etherpad[2] and that would be a better utilization of the cinder upstream meeting time. [1] https://etherpad.opendev.org/p/cinder-zed-meetings#L102 [2] https://etherpad.opendev.org/p/antelope-ptg-cinder-planning Thanks and regards Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From rguyett at datto.com Mon Oct 3 20:37:50 2022 From: rguyett at datto.com (Reid Guyett) Date: Mon, 3 Oct 2022 20:37:50 +0000 Subject: [Swift][Ussuri] Erasure Coding Quarantines In-Reply-To: References: <20220930165217.2901f9cf@niphredil.zaitcev.lan> Message-ID: Hi, Thanks for the follow-up. I was able to find this cause in the IRC channel. I ultimately upgraded the other nodes to 20.04 in our test clusters and moved the quarantined objects back to where they belonged. From there the files were downloadable again. We are going to try to create a new liberasurecode package 1.6.2 for 20.04 so we can set the environment variable to write legacy CRC headers until all the nodes in the cluster can be upgraded. It is hard to find the information about the bug pre-upgrade. I didn't see it in the release notes for 2.25.2 (well they don't exist) and I don't see anything about it in the main Ubuntu Release notes. This is why we have testing environments. Reid ________________________________ From: Clay Gerrard Sent: Sunday, October 2, 2022 17:41 To: Pete Zaitcev Cc: Reid Guyett ; openstack-discuss at lists.openstack.org ; Matthew Grinnell Subject: Re: [Swift][Ussuri] Erasure Coding Quarantines KASEYA Warning: Sender @clay?.gerrard at gmail?.com is not yet trusted by your organization. Please be careful before replying or clicking on the URLs. Report Phishing Mark as Safe powered by Graphus? [EXTERNAL] On Fri, Sep 30, 2022 at 4:56 PM Pete Zaitcev < zaitcev at redhat.com> wrote: Unfortunately, I'm not familiar with the exact details of this. There was a window where depending on how linker worked, our code could get linked with an incorrect zlib crc routine randomly. # When upgrading from liberasurecode<=1.5.0, you may want to continue writing # legacy CRCs until all nodes are upgraded and capabale of reading fragments # with zlib CRCs. liberasurecode>=1.6.2 checks for the environment variable # LIBERASURECODE_WRITE_LEGACY_CRC; if set (value doesn't matter), it will use # its legacy CRC. Set this option to true or false to ensure the environment # variable is or is not set. Leave the option blank or absent to not touch # the environment (default). For more information, see # https://bugs.launchpad.net/liberasurecode/+bug/1886088 # write_legacy_ec_crc = https://github.com/NVIDIA/swift/blob/master/etc/proxy-server.conf-sample#L326-L334 set it in your object-server [DEFAULT] confs too -- Clay Gerrard Important Notice: This email is intended to be received only by persons entitled to receive the confidential and legally privileged information it presumptively contains, and this notice constitutes identification as such. Any reading, disclosure, copying, distribution or use of this information by or to someone who is not the intended recipient, is prohibited. If you received this email in error, please notify us immediately at legal at kaseya.com, and then delete it. To opt-out of receiving emails Please click here. The term 'this e-mail' includes any and all attachments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-francois.taltavull at elca.ch Tue Oct 4 12:32:52 2022 From: jean-francois.taltavull at elca.ch (=?utf-8?B?VGFsdGF2dWxsIEplYW4tRnJhbsOnb2lz?=) Date: Tue, 4 Oct 2022 12:32:52 +0000 Subject: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number In-Reply-To: References: <2aa77e24a33d48a69032f30b86e9cad8@elca.ch> <1b17c23f8982480db73cf50d04d51af7@elca.ch> <86f048d7931c4cc482f6785437c9b5ea@elca.ch> <671023b5ab3846dfb3a39ef313018eac@elca.ch> <33f69d386462450b9964b2ed78284d57@elca.ch> <1d1c1c3cc6184b529819bb8f3598813f@elca.ch> <3516ab2892694a17a76b56ccacc463f1@elca.ch> Message-ID: <104fac9ebe84471eb338e80d995b97fd@elca.ch> Hello Rapha?l, I restored the RGW keystone authentication and did some more tests. The problem is that the S3 request signature provided by ceilometer and the one computed by keystone mismatch. OpenStack release is Wallaby. keystone/api/s3tokens.py: ```` class S3Resource(EC2_S3_Resource.ResourceBase): @staticmethod def _check_signature(creds_ref, credentials): string_to_sign = base64.urlsafe_b64decode(str(credentials['token'])) if string_to_sign[0:4] != b'AWS4': signature = _calculate_signature_v1(string_to_sign, creds_ref['secret']) else: signature = _calculate_signature_v4(string_to_sign, creds_ref['secret']) if not utils.auth_str_equal(credentials['signature'], signature): raise exception.Unauthorized( <<<------------------------------------------we fall there message=_('Credential signature mismatch')) ```` From: Taltavull Jean-Fran?ois Sent: vendredi, 30 septembre 2022 14:48 To: 'Rafael Weing?rtner' Cc: openstack-discuss Subject: RE: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number ``` $ sudo /usr/bin/radosgw --version ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable) ``` From: Rafael Weing?rtner > Sent: vendredi, 30 septembre 2022 12:37 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. No, I just showed you the code, so you can see how the authentication is being executed, and where/how the parameters are set in the headers. It is a bit odd, I have used this so many times, and it always works. What is your RGW instance version? On Fri, Sep 30, 2022 at 4:09 AM Taltavull Jean-Fran?ois > wrote: Do you mean the issue comes from how the `awsauth` module handles the signature ? From: Rafael Weing?rtner > Sent: jeudi, 29 septembre 2022 17:23 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. This is the signature used by the `awsauth` library: ``` def get_signature(self, r): canonical_string = self.get_canonical_string( r.url, r.headers, r.method) if py3k: key = self.secret_key.encode('utf-8') msg = canonical_string.encode('utf-8') else: key = self.secret_key msg = canonical_string h = hmac.new(key, msg, digestmod=sha) return encodestring(h.digest()).strip() ``` After that is generated, it is added in the headers: # Create date header if it is not created yet. if 'date' not in r.headers and 'x-amz-date' not in r.headers: r.headers['date'] = formatdate( timeval=None, localtime=False, usegmt=True) signature = self.get_signature(r) if py3k: signature = signature.decode('utf-8') r.headers['Authorization'] = 'AWS %s:%s' % (self.access_key, signature) On Thu, Sep 29, 2022 at 9:15 AM Taltavull Jean-Fran?ois > wrote: ``` $ python test_creds.py Executing test on: [FQDN/object-store/]. Rados GW admin context [/admin] and path [/usage?stats=True] used. Rados GW request URL [http://FQDN/object-store/admin/bucket?stats=True]. Rados GW host: FQDN Traceback (most recent call last): File "test_creds.py", line 45, in raise RGWAdminAPIFailed( __main__.RGWAdminAPIFailed: RGW AdminOps API returned 403 Forbidden ``` So the same as with ceilometer. Auth is done by RGW, not by keystone, and the ceph ?admin? user exists and owns the right privileges: ``` $ sudo radosgw-admin user info --uid admin [22/296]{ "user_id": "admin", "display_name": "admin user", "email": "", "suspended": 0, "max_buckets": 1000, "subusers": [], "keys": [ { "user": "admin", "access_key": ?admin_access_key", "secret_key": "admin_secret_key" } ], "swift_keys": [], "caps": [ { "type": "buckets", "perm": "*" }, { "type": "metadata", "perm": "*" }, { "type": "usage", "perm": "*" }, { "type": "users", "perm": "*" } ], ``` From: Rafael Weing?rtner > Sent: jeudi, 29 septembre 2022 12:32 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. Can you test you credentials with the following code? ``` import json import requests import os import six.moves.urllib.parse as urlparse class RGWAdminAPIFailed(Exception): pass if __name__ == '__main__': rados_gw_base_url = "put your RGW URL here. E.g. http://server.com:port/something" print("Executing test on: [%s]." % rados_gw_base_url) rados_gw_admin_context = "/admin" rados_gw_path = "/usage?stats=True" print("Rados GW admin context [%s] and path [%s] used." % (rados_gw_admin_context, rados_gw_path)) rados_gw_request_url = urlparse.urljoin(rados_gw_base_url, '/admin') + '/bucket?stats=True' print("Rados GW request URL [%s]." % rados_gw_request_url) rados_gw_access_key_to_use = "put your access key here" rados_gw_secret_key_to_use = "put your secret key here" rados_gw_host_name = urlparse.urlparse(rados_gw_request_url).netloc print("Rados GW host: %s" % rados_gw_host_name) module_name = "awsauth" class_name = "S3Auth" arguments = [rados_gw_access_key_to_use, rados_gw_secret_key_to_use, rados_gw_host_name] module = __import__(module_name) class_ = getattr(module, class_name) instance = class_(*arguments) r = requests.get( rados_gw_request_url, auth=instance, timeout=30) #auth=awsauth.S3Auth(*arguments)) if r.status_code != 200: raise RGWAdminAPIFailed( ('RGW AdminOps API returned %(status)s %(reason)s') % {'status': r.status_code, 'reason': r.reason}) response_body = r.text parsed_json = json.loads(response_body) print("Response cookies: [%s]." % r.cookies) radosGw_output_file = "/home//Downloads/radosGw-usage.json" if os.path.exists(radosGw_output_file): os.remove(radosGw_output_file) with open(radosGw_output_file, "w") as file1: file1.writelines(json.dumps(parsed_json, indent=4, sort_keys=True)) file1.flush() exit(0) ``` On Thu, Sep 29, 2022 at 4:09 AM Taltavull Jean-Fran?ois > wrote: python Python 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import awsauth >>> awsauth >>> From: Rafael Weing?rtner > Sent: mercredi, 28 septembre 2022 18:40 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. Can you also execute the following: ``` python import awsauth awsauth ``` That will output a path, and then you can `cat `, example: `cat /var/lib/kolla/venv/lib/python3.8/site-packages/awsauth.py` On Wed, Sep 28, 2022 at 1:21 PM Taltavull Jean-Fran?ois > wrote: I removed trailing ?/object-store/? from the last value of authentication_parameters I also: - disabled s3 keystone auth in RGW - created a RGW ?admin? user with the right privileges to allow admin API calls - put RGW in debug mode And here is what I get in RGW logs: get_usage string_to_sign=GET Wed, 28 Sep 2022 16:15:45 GMT /admin/usage get_usage server signature=BlaBlaBlaBla get_usage client signature=BloBloBlo get_usage compare=-75 get_usage rgw::auth::s3::LocalEngine denied with reason=-2027 get_usage rgw::auth::s3::AWSAuthStrategy denied with reason=-2027 get_usage rgw::auth::StrategyRegistry::s3_main_strategy_t: trying rgw::auth::s3::AWSAuthStrategy get_usage rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::LocalEngine From: Rafael Weing?rtner > Sent: mercredi, 28 septembre 2022 13:15 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. I think that the last parameter "/object-store/", should be only "". Can you test it? You are using EC2 credentials to authenticate in RGW. Did you enable the Keystone integration in RGW? Also, as far as I know, this admin endpoint needs a RGW admin. I am not sure if the Keystone and RGW integration would enable/make it possible for someone to authenticate as an admin in RGW. Can you check it? To see if you can call that endpoint with these credentials. On Wed, Sep 28, 2022 at 6:01 AM Taltavull Jean-Fran?ois > wrote: Pollster YML configuration : --- - name: "dynamic.radosgw.usage" sample_type: "gauge" unit: "B" value_attribute: "total.size" url_path: http:///object-store/admin/usage module: "awsauth" authentication_object: "S3Auth" authentication_parameters: ,,/object-store/ user_id_attribute: "user" project_id_attribute: "user" resource_id_attribute: "user" response_entries_key: "summary" ACCESS_KEY and SECRET_KEY have been created with ?openstack ec2 credentials create?. Ceilometer central is deployed with OSA and it uses awsauth.py module. From: Rafael Weing?rtner > Sent: mercredi, 28 septembre 2022 02:01 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. Can you show your YML configuration? Also, did you install the AWS authentication module in the container/host where Ceilometer central is running? On Mon, Sep 26, 2022 at 12:58 PM Taltavull Jean-Fran?ois > wrote: Hello Rafael, Thanks for the information about ceilometer patches but for now I?m testing with the credentials in the dynamic pollster config file. I will use barbican when I push all this to production. The keystone authentication performed by the rados gw with the credentials provided by ceilometer still does not work. I wonder if this could be a S3 signature version issue on ceilometer side, that is on S3 client side. This kind of issue exists with the s3 client ?s3cmd? and you have to add ??signature-v2? so that ?s3cmd? works well. What do you think ? Do you know which version of S3 signature ceilometer uses while authenticating ? From: Rafael Weing?rtner > Sent: mercredi, 7 septembre 2022 19:23 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. Jean, there are two problems with the Ceilometer. I just opened the patches to resolve it: - https://review.opendev.org/c/openstack/ceilometer/+/856305 - https://review.opendev.org/c/openstack/ceilometer/+/856304 Without these patches, you might have problems to use Ceilometer with Non-OpenStack dynamic pollsters and barbican credentials. On Wed, Aug 31, 2022 at 3:55 PM Rafael Weing?rtner > wrote: It is the RGW user that you have. This user must have the role that is needed to access the usage feature in RGW. If I am not mistaken, it required an admin user. On Wed, Aug 31, 2022 at 1:54 PM Taltavull Jean-Fran?ois > wrote: Thanks to your help, I am close to the goal. Dynamic pollster is loaded and triggered. But I get a ?Status[403] and reason [Forbidden]? in ceilometer logs while requesting admin/usage. I?m not sure to understand well the auth mechanism. Are we talking about keystone credentials, ec2 credentials, Rados GW user ?... For now, in testing phase, I use ?authentication_parameters?, not barbican. -JF From: Rafael Weing?rtner > Sent: mardi, 30 ao?t 2022 14:17 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. Yes, you will need to enable the metric/pollster to be processed. That is done via "polling.yml" file. Also, do not forget that you will need to configure Ceilometer to push this new metric. If you use Gnocchi as the backend, you will need to change/update the gnocchi resource YML file. That file maps resources and metrics in the Gnocchi backend. The configuration resides in Ceilometer. You can create/define new resource types and map them to specific metrics. It depends on how you structure your solution. P.S. You do not need to use "authentication_parameters". You can use the barbican integration to avoid setting your credentials in a file. On Tue, Aug 30, 2022 at 9:11 AM Taltavull Jean-Fran?ois > wrote: Hello, I tried to define a Rados GW dynamic pollster and I can see, in Ceilometer logs, that it?s actually loaded. But it looks like it was not triggered, I see no trace of ceilometer connection in Rados GW logs. My definition: - name: "dynamic.radosgw.usage" sample_type: "gauge" unit: "B" value_attribute: "total.size" url_path: http:///object-store/swift/v1/admin/usage module: "awsauth" authentication_object: "S3Auth" authentication_parameters: xxxxxxxxxxxxx,yyyyyyyyyyyyy, user_id_attribute: "admin" project_id_attribute: "admin" resource_id_attribute: "admin" response_entries_key: "summary" Do I have to set an option in ceilometer.conf, or elsewhere, to get my Rados GW dynamic pollster triggered ? -JF From: Taltavull Jean-Fran?ois Sent: lundi, 29 ao?t 2022 18:41 To: 'Rafael Weing?rtner' > Cc: openstack-discuss > Subject: RE: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number Thanks a lot for your quick answer, Rafael ! I will explore this approach. Jean-Francois From: Rafael Weing?rtner > Sent: lundi, 29 ao?t 2022 17:54 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. You could use a different approach. You can use Dynamic pollster [1], and create your own mechanism to collect data, without needing to change Ceilometer code. Basically all hard-coded pollsters can be converted to a dynamic pollster that is defined in YML. [1] https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html#the-dynamic-pollsters-system-configuration-for-non-openstack-apis On Mon, Aug 29, 2022 at 12:51 PM Taltavull Jean-Fran?ois > wrote: Hi All, In our OpenStack deployment, API endpoints are defined by using URLs instead of port numbers and HAProxy forwards requests to the right bakend after having ACLed the URL. In the case of our object-store service, based on RadosGW, the internal API endpoint is "https:///object-store/swift/v1/AUTH_" When Ceilometer RadosGW pollster tries to connect to the RadosGW admin API with the object-store internal endpoint, the URL becomes https:///admin, as shown by HAProxy logs. This URL does not match any API endpoint from HAProxy point of view. The line of code that rewrites the URL is this one: https://opendev.org/openstack/ceilometer/src/branch/stable/wallaby/ceilometer/objectstore/rgw.py#L81 What would you think of adding a mechanism based on new Ceilometer configuration option(s) to control the URL rewriting ? Our deployment characteristics: - OpenStack release: Wallaby - Ceph and RadosGW version: 15.2.16 - deployment tool: OSA 23.2.1 and ceph-ansible Best regards, Jean-Francois -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Oct 4 14:45:35 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Oct 2022 14:45:35 +0000 Subject: [release][ironic] Release desired for Ironic bugfix/N branches In-Reply-To: References: Message-ID: <20221004144534.6xsc3ysyybuznpsd@yuggoth.org> On 2022-10-03 13:06:06 -0700 (-0700), Jay Faulkner wrote: [...] > We have intermediate bugfix releases as well, and I was hoping to get patch > releases cut from these as well. As far as I can tell, there is no > automation for performing these releases, or an official place to request > them. If this is wrong; please correct me and I'm happy to go through the > proper process. > > We know that the majority of consumers pull these from git directly; but > checking pypi release notes, we had over 6000 downloads of the 21.0.0 and > 20.2.0 releases during the 2/4 months of the Zed cycle they had > respectively been released, so I do think there's value in releasing these > even if it might be a small amount of manual effort. [...] It sounds like no releases have ever been made from any of the "bugfix" branches going back over two years since their creation. Is this a change in how the Ironic team views those branches, or was the intention always to tag releases on them but nobody had gotten around to it? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Tue Oct 4 14:53:32 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Oct 2022 14:53:32 +0000 Subject: [TripleO] TripleO Antelope PTG Topics In-Reply-To: References: Message-ID: <20221004145332.jsu5vqckqxlsruyq@yuggoth.org> On 2022-10-04 09:36:42 +0530 (+0530), Rabi Mishra wrote: [...] > I've moved the contents to the etherpad linked at the PTGBot site [...] For future reference, you can also simply inform PTGBot that you have a different Etherpad URL, and it will update the site to list that one for you instead. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From peter.matulis at canonical.com Tue Oct 4 15:13:35 2022 From: peter.matulis at canonical.com (Peter Matulis) Date: Tue, 4 Oct 2022 11:13:35 -0400 Subject: [charms] Team Delegation proposal In-Reply-To: References: Message-ID: What is the status of this proposal? On Wed, Aug 31, 2022 at 3:53 PM Peter Matulis wrote: > > > On Mon, Aug 8, 2022 at 4:25 PM Alex Kavanagh > wrote: > >> Hi Chris >> >> On Thu, 28 Jul 2022 at 21:46, Chris MacNaughton < >> chris.macnaughton at canonical.com> wrote: >> >>> Hello All, >>> >>> >>> I would like to propose some new ACLs in Gerrit for the openstack-charms >>> project: >>> >>> - openstack-core-charms >>> - ceph-charms >>> - network-charms >>> - stable-maintenance >>> >>> >> >> I think the names need to be tweaked slightly: >> >> - charms-openstack >> - charms-ceph >> - charms-ovn >> - charms-maintenance >> > > We would also need an ACL for the documentation: > > - charms-docs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdratlif at globalnoc.iu.edu Tue Oct 4 15:45:45 2022 From: jdratlif at globalnoc.iu.edu (John Ratliff) Date: Tue, 04 Oct 2022 11:45:45 -0400 Subject: OpenStack Ansible Service troubleshooting Message-ID: <3041961ec183a2a9f3d037ff0ec9019aa5a5d0c6.camel@globalnoc.iu.edu> We've started deploying new Xena clusters with openstack-ansible. We keep running into problems with some parts of openstack not working. A service will fail or need restarted, but it's not clear which one or why. Recently, one of our test clusters (2 hosts) stopped working. I could login to horizon, but I could not create instances. At first it told me that a message wasn't answered quick enough. I assumed the problem was rabbitmq and restarted the container, but this didn't help. I eventually restarted every container and the nova- compute and haproxy services on the host. But this didn't help either. I eventually rebooted both hosts, but this made things worse (I think I broke the galera cluster doing this). After bootstrapping the galera cluster, I can log back into horizon, but I still cannot create hosts. It tells me "Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance [UUID]" If I look at the journal for nova-compute, I see this error: "libvirt.libvirtError: Failed to activate service 'org.freedesktop.machine1': timed out " Looking at systemd-machined, it won't start due to "systemd- machined.service: Job systemd-machined.service/start failed with result 'dependency'." I'm not sure what "dependency" it's referring to. In the cluster that does work, this service is running. But on both hosts on the cluster that do not, this service is not running. What should I be looking at here to fix? -- John Ratliff Systems Automation Engineer GlobalNOC @ Indiana University -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5598 bytes Desc: not available URL: From noonedeadpunk at gmail.com Tue Oct 4 16:21:13 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Tue, 4 Oct 2022 18:21:13 +0200 Subject: OpenStack Ansible Service troubleshooting In-Reply-To: <3041961ec183a2a9f3d037ff0ec9019aa5a5d0c6.camel@globalnoc.iu.edu> References: <3041961ec183a2a9f3d037ff0ec9019aa5a5d0c6.camel@globalnoc.iu.edu> Message-ID: Hi John. Well, it seems you've made a bunch of operations that were not required in the first place. However, I believe that at the end you've identified the problem correctly. systemd-machined service should be active and running on nova-compute hosts with kvm driver. I'd suggest looking deeper at why this service systemd-machined can't be started. What does journalctl says about that? As one of dependency systemd-machined requires to have /var/lib/machines. And I do have 2 assumptions there: 1. Was systemd-tmpfiles-setup.service activated? As we have seen sometimes that upon node boot due to some race condition it was not, which resulted in all kind of weirdness 2. Don't you happen to run nova-compute on the same set of hosts where LXC containers are placed? As for example, in AIO setup we do manage /var/lib/machines/ mount with systemd var-lib-machines.mount. So if you happen to run nova-computes on controller host or AIO - this is another thing to check. ??, 4 ???. 2022 ?. ? 17:48, John Ratliff : > > We've started deploying new Xena clusters with openstack-ansible. We > keep running into problems with some parts of openstack not working. A > service will fail or need restarted, but it's not clear which one or > why. > > Recently, one of our test clusters (2 hosts) stopped working. I could > login to horizon, but I could not create instances. > > At first it told me that a message wasn't answered quick enough. I > assumed the problem was rabbitmq and restarted the container, but this > didn't help. I eventually restarted every container and the nova- > compute and haproxy services on the host. But this didn't help either. > I eventually rebooted both hosts, but this made things worse (I think I > broke the galera cluster doing this). > > After bootstrapping the galera cluster, I can log back into horizon, > but I still cannot create hosts. It tells me > > "Exceeded maximum number of retries. Exhausted all hosts available for > retrying build failures for instance [UUID]" > > If I look at the journal for nova-compute, I see this error: > > "libvirt.libvirtError: Failed to activate service > 'org.freedesktop.machine1': timed out " > > Looking at systemd-machined, it won't start due to "systemd- > machined.service: Job systemd-machined.service/start failed with result > 'dependency'." > > I'm not sure what "dependency" it's referring to. In the cluster that > does work, this service is running. But on both hosts on the cluster > that do not, this service is not running. > > What should I be looking at here to fix? > > -- > John Ratliff > Systems Automation Engineer > GlobalNOC @ Indiana University From ianyrchoi at gmail.com Tue Oct 4 16:46:03 2022 From: ianyrchoi at gmail.com (Ian Y. Choi) Date: Wed, 5 Oct 2022 01:46:03 +0900 Subject: [I18n] Zanata status + call for volunteers on Weblate migration Message-ID: Hi all, First, thank you all for contributing to OpenStack with globalization by contributing translations, coordinating translations with artifacts, and making sure that those translations are shipped with releases to all over the world (we are calling it I18n - Internationalization). While OpenStack I18n could be healthier with the tremendous help from translators as well as many OpenStack upstream contributors & teams such as infrastructure, release management, and documentation, the current translation platform we are using on https://translate.openstack.org relies on Zanata, an open-source translation platform which upstream activities were stopped [1]. Currently, there are several issues reported - most things have been resolved while it repeats / worses as time goes: - OpenID authentication issues for new registration users on translate.openstack.org [2] - Translation job failure issues since Zanata client did not work with a newer Java / job compatibility issues with Python versions [3] - Missing translation jobs for Xena/Yoga stable versions on Zanata [4] Considering the situation, to solve the root cause of the situation, I am calling for volunteers on Weblate migration. The detailed activities would continue from what the previous I18n PTL already investigated, identified, and documented to [5] [6]. There would be diverse work items with developer perspective as well as translators and operators. Hope that there are many volunteers to move forward. Meanwhile, I updated the "Translations & Priority" part on https://translate.openstack.org homepage. Note that stable versions are minimized to Horizon and dashboard projects, and ping me if there would be lots of translation work especially during R-3 to R-1. Looking forward to an enhanced open source translation platform landing soon with OpenInfra. With many thanks, /Ian [1] https://lists.fedoraproject.org/archives/list/trans at lists.fedoraproject.org/thread/F2JZTYSK3L5JAZY6VSVGDGNNQ4ATG4HP/ [2] https://lists.openstack.org/pipermail/openstack-i18n/2022-February/003550.html [3] https://review.opendev.org/c/openstack/project-config/+/850962 [4] https://lists.openstack.org/pipermail/openstack-discuss/2021-December/026441.html [5] https://blueprints.launchpad.net/openstack-i18n/+spec/renew-translation-platform [6] https://etherpad.opendev.org/p/I18n-weblate-migration From jay at gr-oss.io Tue Oct 4 16:57:16 2022 From: jay at gr-oss.io (Jay Faulkner) Date: Tue, 4 Oct 2022 09:57:16 -0700 Subject: [release][ironic] Release desired for Ironic bugfix/N branches In-Reply-To: <20221004144534.6xsc3ysyybuznpsd@yuggoth.org> References: <20221004144534.6xsc3ysyybuznpsd@yuggoth.org> Message-ID: It sounds like no releases have ever been made from any of the > "bugfix" branches going back over two years since their creation. Is > this a change in how the Ironic team views those branches, or was > the intention always to tag releases on them but nobody had gotten > around to it? > The primary consumer historically of bugfix branches have been downstream packagers of Ironic -- e.g. openshift. I'm pursuing creating releases of these because after doing some research, we have customers consuming these releases from pypi (6000 downloads of our Zed-cycle releases in 4 months). I do not want those deployers, consuming upstream artifacts, to miss out on backported bugfixes and patches that would be provided in a vendor release artifact. There is no urgency behind getting the bugfix releases made, but the current situation where the Ironic community maintains additional branches for longer term support but does not provide stable releases for customers puiling upstream release artifacts is not one I wish to perpetuate. -- Jay Faulkner -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Oct 4 17:01:36 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Oct 2022 17:01:36 +0000 Subject: [release][ironic] Release desired for Ironic bugfix/N branches In-Reply-To: References: <20221004144534.6xsc3ysyybuznpsd@yuggoth.org> Message-ID: <20221004170135.ocpvfbuvz3vpzlja@yuggoth.org> On 2022-10-04 09:57:16 -0700 (-0700), Jay Faulkner wrote: [...] > There is no urgency behind getting the bugfix releases made, but > the current situation where the Ironic community maintains > additional branches for longer term support but does not provide > stable releases for customers puiling upstream release artifacts > is not one I wish to perpetuate. That makes sense, but it's worth noting what you have now is not entirely dissimilar to how "extended maintenance" works for our official stable branches as well (continuing to merge some backported fixes, but not tagging any further point releases). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jay at gr-oss.io Tue Oct 4 17:22:35 2022 From: jay at gr-oss.io (Jay Faulkner) Date: Tue, 4 Oct 2022 10:22:35 -0700 Subject: [release][ironic] Release desired for Ironic bugfix/N branches In-Reply-To: <20221004170135.ocpvfbuvz3vpzlja@yuggoth.org> References: <20221004144534.6xsc3ysyybuznpsd@yuggoth.org> <20221004170135.ocpvfbuvz3vpzlja@yuggoth.org> Message-ID: > That makes sense, but it's worth noting what you have now is not > entirely dissimilar to how "extended maintenance" works for our > official stable branches as well (continuing to merge some > backported fixes, but not tagging any further point releases). > > Yeah, I understand. This is all working towards a revision of Ironic release policy to ensure it's documented when/how these releases are supported. Right now we create bugfix releases as months 2 and 4 into the cycle, with no indication of support length. In fact; there are leftover bugfix/[] branches that are not maintained and have not yet been retired. Posts like this (and my efforts to get releases done) is part of my fact-finding for trying to ensure how Ironic manages releases is well documented and understood. -Jay Faulkner -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Tue Oct 4 18:18:32 2022 From: jay at gr-oss.io (Jay Faulkner) Date: Tue, 4 Oct 2022 11:18:32 -0700 Subject: [ironic][stable] Proposing EOL of ironic project branches older than Wallaby Message-ID: Hi all, Ironic has a large amount of stable branches still in EM. We need to take action to ensure those branches are either retired or have CI repaired to the point of being usable. Specifically, I'm looking at these branches across all Ironic projects: - stable/queens - stable/rocky - stable/stein - stable/train - stable/ussuri - stable/victoria In lieu of any volunteers to maintain the CI, my recommendation for all the branches listed above is that they be marked EOL. If someone wants to volunteer to maintain CI for those branches, they can propose one of the below paths be taken instead: 1 - Someone volunteers to maintain these branches, and also report the status of CI of these older branches periodically on the Ironic whiteboard and in Ironic meetings. If you feel strongly that one of these branches needs to continue to be in service; volunteering in this way is how to save them. 2 - We seriously reduce CI. Basically removing all tempest tests to ensure that CI remains reliable and able to merge emergency or security fixes when needed. In some cases; this still requires CI fixes as some older inspector branches are failing *installing packages* in unit tests. I would still like, in this case, that someone volunteers to ensure the minimalist CI remains happy. My intention is to let this message serve as notice and a waiting period; and if I've not heard any response here or in Monday's Ironic meeting (in 6 days), I will begin taking action on retiring these branches. This is simply a start; other branches (including bugfix branches) are also in bad shape in CI, but getting these retired will significantly reduce the surface area of projects and branches to evaluate. I know it's painful to drop support for these branches; but we've provided good EM support for these branches for a long time and by pruning them away, we'll be able to save time to dedicate to other items. Thanks, Jay Faulkner -------------- next part -------------- An HTML attachment was scrubbed... URL: From iurygregory at gmail.com Tue Oct 4 18:27:46 2022 From: iurygregory at gmail.com (Iury Gregory) Date: Tue, 4 Oct 2022 15:27:46 -0300 Subject: [ironic][stable] Proposing EOL of ironic project branches older than Wallaby In-Reply-To: References: Message-ID: Hi Jay, We had a discussion a few months ago about closing pre-train branches https://lists.openstack.org/pipermail/openstack-discuss/2022-June/029274.html Train Ussuri and Victoria we should probably raise this in the upstream meeting (to see what people will also think about it, in case we don't have responses here) Thanks! Em ter., 4 de out. de 2022 ?s 15:20, Jay Faulkner escreveu: > Hi all, > > Ironic has a large amount of stable branches still in EM. We need to take > action to ensure those branches are either retired or have CI repaired to > the point of being usable. > > Specifically, I'm looking at these branches across all Ironic projects: > - stable/queens > - stable/rocky > - stable/stein > - stable/train > - stable/ussuri > - stable/victoria > > In lieu of any volunteers to maintain the CI, my recommendation for all > the branches listed above is that they be marked EOL. If someone wants to > volunteer to maintain CI for those branches, they can propose one of the > below paths be taken instead: > > 1 - Someone volunteers to maintain these branches, and also report the > status of CI of these older branches periodically on the Ironic whiteboard > and in Ironic meetings. If you feel strongly that one of these branches > needs to continue to be in service; volunteering in this way is how to save > them. > > 2 - We seriously reduce CI. Basically removing all tempest tests to ensure > that CI remains reliable and able to merge emergency or security fixes when > needed. In some cases; this still requires CI fixes as some older inspector > branches are failing *installing packages* in unit tests. I would still > like, in this case, that someone volunteers to ensure the minimalist CI > remains happy. > > My intention is to let this message serve as notice and a waiting period; > and if I've not heard any response here or in Monday's Ironic meeting (in 6 > days), I will begin taking action on retiring these branches. > > This is simply a start; other branches (including bugfix branches) are > also in bad shape in CI, but getting these retired will significantly > reduce the surface area of projects and branches to evaluate. > > I know it's painful to drop support for these branches; but we've provided > good EM support for these branches for a long time and by pruning them > away, we'll be able to save time to dedicate to other items. > > Thanks, > Jay Faulkner > -- *Att[]'s* *Iury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Ironic PTL * *Senior Software Engineer at Red Hat Brazil* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdratlif at globalnoc.iu.edu Tue Oct 4 18:56:34 2022 From: jdratlif at globalnoc.iu.edu (John Ratliff) Date: Tue, 04 Oct 2022 14:56:34 -0400 Subject: OpenStack Ansible Service troubleshooting In-Reply-To: References: <3041961ec183a2a9f3d037ff0ec9019aa5a5d0c6.camel@globalnoc.iu.edu> Message-ID: On Tue, 2022-10-04 at 18:21 +0200, Dmitriy Rabotyagov wrote: > Hi John. > > Well, it seems you've made a bunch of operations that were not > required in the first place. However, I believe that at the end > you've > identified the problem correctly. systemd-machined service should be > active and running on nova-compute hosts with kvm driver. > I'd suggest looking deeper at why this service systemd-machined can't > be started. What does journalctl says about that? It's not very chatty, though I think your next question might answer the why. $ sudo journalctl -u systemd-machined -- Logs begin at Tue 2022-10-04 17:45:02 UTC, end at Tue 2022-10-04 18:43:45 UTC. -- Oct 04 18:43:37 os-comp1 systemd[1]: Dependency failed for Virtual Machine and Container Registration Service. Oct 04 18:43:37 os-comp1 systemd[1]: systemd-machined.service: Job systemd-machined.service/start failed with result 'dependency'. > > As one of dependency systemd-machined requires to have > /var/lib/machines. And I do have 2 assumptions there: > 1. Was systemd-tmpfiles-setup.service activated? As we have seen > sometimes that upon node boot due to some race condition it was not, > which resulted in all kind of weirdness It appears to be. The output looks very similar between the broken and working clusters. $ sudo systemctl status systemd-tmpfiles-setup ? systemd-tmpfiles-setup.service - Create Volatile Files and Directories Loaded: loaded (/lib/systemd/system/systemd-tmpfiles- setup.service; static; vendor preset: enabled) Active: active (exited) since Mon 2022-10-03 18:23:53 UTC; 24h ago Docs: man:tmpfiles.d(5) man:systemd-tmpfiles(8) Main PID: 1460 (code=exited, status=0/SUCCESS) Tasks: 0 (limit: 8192) Memory: 0B CGroup: /system.slice/systemd-tmpfiles-setup.service Warning: journal has been rotated since unit was started, output may be incomplete. However, /var/lib/machines does not appear to be correct. On the working cluster, this is mounted as an ext4 filesystem and has a lost+found directory along with a directory for a defined instance. There is no mount listed on the broken cluster, and the directory is empty. > 2. Don't you happen to run nova-compute on the same set of hosts > where > LXC containers are placed? As for example, in AIO setup we do manage > /var/lib/machines/ mount with systemd var-lib-machines.mount. So if > you happen to run nova-computes on controller host or AIO - this is > another thing to check. $ sudo journalctl -u var-lib-machines.mount -- Logs begin at Tue 2022-10-04 18:01:46 UTC, end at Tue 2022-10-04 18:52:53 UTC. -- Oct 04 18:43:37 os-comp1 systemd[1]: Mounting Virtual Machine and Container Storage (Compatibility)... Oct 04 18:43:37 os-comp1 mount[1272300]: mount: /var/lib/machines: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error. Oct 04 18:43:37 os-comp1 systemd[1]: var-lib-machines.mount: Mount process exited, code=exited, status=32/n/a Oct 04 18:43:37 os-comp1 systemd[1]: var-lib-machines.mount: Failed with result 'exit-code'. Oct 04 18:43:37 os-comp1 systemd[1]: Failed to mount Virtual Machine and Container Storage (Compatibility). This appears to be the problem. It looks like /dev/loop0 is probably supposed to reference /var/lib/machines.raw. I tried running fsck on /dev/loop0, but it doesn't think there is a valid extX filesystem on any of the superblocks. Maybe /dev/loop0 is not really pointing to /var/lib/machines.raw? Not sure how to tell if that's the case. Maybe I should try to loopback this, or create a blank filesystem image. -- John Ratliff Systems Automation Engineer GlobalNOC @ Indiana University -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5598 bytes Desc: not available URL: From jdratlif at globalnoc.iu.edu Tue Oct 4 20:22:58 2022 From: jdratlif at globalnoc.iu.edu (John Ratliff) Date: Tue, 04 Oct 2022 16:22:58 -0400 Subject: OpenStack Ansible Service troubleshooting In-Reply-To: References: <3041961ec183a2a9f3d037ff0ec9019aa5a5d0c6.camel@globalnoc.iu.edu> Message-ID: <529d89ba8e2f63fd711701473e12f335726e6073.camel@globalnoc.iu.edu> On Tue, 2022-10-04 at 14:56 -0400, John Ratliff wrote: > On Tue, 2022-10-04 at 18:21 +0200, Dmitriy Rabotyagov wrote: > > Hi John. > > > > Well, it seems you've made a bunch of operations that were not > > required in the first place. However, I believe that at the end > > you've > > identified the problem correctly. systemd-machined service should > > be > > active and running on nova-compute hosts with kvm driver. > > I'd suggest looking deeper at why this service systemd-machined > > can't > > be started. What does journalctl says about that? > > It's not very chatty, though I think your next question might answer > the why. > > $ sudo journalctl -u systemd-machined > -- Logs begin at Tue 2022-10-04 17:45:02 UTC, end at Tue 2022-10-04 > 18:43:45 UTC. -- > Oct 04 18:43:37 os-comp1 systemd[1]: Dependency failed for Virtual > Machine and Container Registration Service. > Oct 04 18:43:37 os-comp1 systemd[1]: systemd-machined.service: Job > systemd-machined.service/start failed with result 'dependency'. > > > > > As one of dependency systemd-machined requires to have > > /var/lib/machines. And I do have 2 assumptions there: > > 1. Was systemd-tmpfiles-setup.service activated? As we have seen > > sometimes that upon node boot due to some race condition it was > > not, > > which resulted in all kind of weirdness > > It appears to be. The output looks very similar between the broken > and > working clusters. > > $ sudo systemctl status systemd-tmpfiles-setup??????????????????????? > ? systemd-tmpfiles-setup.service - Create Volatile Files and > Directories > ???? Loaded: loaded (/lib/systemd/system/systemd-tmpfiles- > setup.service; static; vendor preset: enabled) > ???? Active: active (exited) since Mon 2022-10-03 18:23:53 UTC; 24h > ago > ?????? Docs: man:tmpfiles.d(5) > ???????????? man:systemd-tmpfiles(8) > ?? Main PID: 1460 (code=exited, status=0/SUCCESS) > ????? Tasks: 0 (limit: 8192) > ???? Memory: 0B > ???? CGroup: /system.slice/systemd-tmpfiles-setup.service > > Warning: journal has been rotated since unit was started, output may > be > incomplete. > > However, /var/lib/machines does not appear to be correct. On the > working cluster, this is mounted as an ext4 filesystem and has a > lost+found directory along with a directory for a defined instance. > > There is no mount listed on the broken cluster, and the directory is > empty. > > > 2. Don't you happen to run nova-compute on the same set of hosts > > where > > LXC containers are placed? As for example, in AIO setup we do > > manage > > /var/lib/machines/ mount with systemd var-lib-machines.mount. So if > > you happen to run nova-computes on controller host or AIO - this is > > another thing to check. > > $ sudo journalctl -u var-lib-machines.mount > -- Logs begin at Tue 2022-10-04 18:01:46 UTC, end at Tue 2022-10-04 > 18:52:53 UTC. -- > Oct 04 18:43:37 os-comp1 systemd[1]: Mounting Virtual Machine and > Container Storage (Compatibility)... > Oct 04 18:43:37 os-comp1 mount[1272300]: mount: /var/lib/machines: > wrong fs type, bad option, bad superblock on /dev/loop0, missing > codepage or helper program, or other error. > Oct 04 18:43:37 os-comp1 systemd[1]: var-lib-machines.mount: Mount > process exited, code=exited, status=32/n/a > Oct 04 18:43:37 os-comp1 systemd[1]: var-lib-machines.mount: Failed > with result 'exit-code'. > Oct 04 18:43:37 os-comp1 systemd[1]: Failed to mount Virtual Machine > and Container Storage (Compatibility). > > This appears to be the problem. It looks like /dev/loop0 is probably > supposed to reference /var/lib/machines.raw. I tried running fsck on > /dev/loop0, but it doesn't think there is a valid extX filesystem on > any of the superblocks. Maybe /dev/loop0 is not really pointing to > /var/lib/machines.raw? Not sure how to tell if that's the case. > > Maybe I should try to loopback this, or create a blank filesystem > image. > > > Okay, I'm not sure what happened here. The systemd unit mount file for var-lib-machines is different on the broken cluster than the working cluster. It talks about a btrfs system, but the /var/lib/machines.raw file is an ext4 filesystem, like the one on the working cluster. I copied the unit file from the working cluster to the broken cluster, and I could mount /var/lib/machines, get systemd-machined working, and create machines now. I have no idea what happened. I feel like there must have been a system update that changed (reverted from openstack-ansible?) something, but I'm just not sure. In any event, you helped me figure it out. Thanks. -- John Ratliff Systems Automation Engineer GlobalNOC @ Indiana University -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5598 bytes Desc: not available URL: From noonedeadpunk at gmail.com Tue Oct 4 20:45:01 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Tue, 4 Oct 2022 22:45:01 +0200 Subject: OpenStack Ansible Service troubleshooting In-Reply-To: <529d89ba8e2f63fd711701473e12f335726e6073.camel@globalnoc.iu.edu> References: <3041961ec183a2a9f3d037ff0ec9019aa5a5d0c6.camel@globalnoc.iu.edu> <529d89ba8e2f63fd711701473e12f335726e6073.camel@globalnoc.iu.edu> Message-ID: Oh, well, I do recall now that package update could brake systemd mount, as in prior releases we placed our own systemd unit file in place and now we just leverage systemd overrides functionality [1]. I think what you can do is find out what package does provide this mount file and mark it for hold. Or cherry-pick and apply mentioned change. [1] https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/834183 ??, 4 ???. 2022 ?., 22:23 John Ratliff : > On Tue, 2022-10-04 at 14:56 -0400, John Ratliff wrote: > > On Tue, 2022-10-04 at 18:21 +0200, Dmitriy Rabotyagov wrote: > > > Hi John. > > > > > > Well, it seems you've made a bunch of operations that were not > > > required in the first place. However, I believe that at the end > > > you've > > > identified the problem correctly. systemd-machined service should > > > be > > > active and running on nova-compute hosts with kvm driver. > > > I'd suggest looking deeper at why this service systemd-machined > > > can't > > > be started. What does journalctl says about that? > > > > It's not very chatty, though I think your next question might answer > > the why. > > > > $ sudo journalctl -u systemd-machined > > -- Logs begin at Tue 2022-10-04 17:45:02 UTC, end at Tue 2022-10-04 > > 18:43:45 UTC. -- > > Oct 04 18:43:37 os-comp1 systemd[1]: Dependency failed for Virtual > > Machine and Container Registration Service. > > Oct 04 18:43:37 os-comp1 systemd[1]: systemd-machined.service: Job > > systemd-machined.service/start failed with result 'dependency'. > > > > > > > > As one of dependency systemd-machined requires to have > > > /var/lib/machines. And I do have 2 assumptions there: > > > 1. Was systemd-tmpfiles-setup.service activated? As we have seen > > > sometimes that upon node boot due to some race condition it was > > > not, > > > which resulted in all kind of weirdness > > > > It appears to be. The output looks very similar between the broken > > and > > working clusters. > > > > $ sudo systemctl status systemd-tmpfiles-setup > > ? systemd-tmpfiles-setup.service - Create Volatile Files and > > Directories > > Loaded: loaded (/lib/systemd/system/systemd-tmpfiles- > > setup.service; static; vendor preset: enabled) > > Active: active (exited) since Mon 2022-10-03 18:23:53 UTC; 24h > > ago > > Docs: man:tmpfiles.d(5) > > man:systemd-tmpfiles(8) > > Main PID: 1460 (code=exited, status=0/SUCCESS) > > Tasks: 0 (limit: 8192) > > Memory: 0B > > CGroup: /system.slice/systemd-tmpfiles-setup.service > > > > Warning: journal has been rotated since unit was started, output may > > be > > incomplete. > > > > However, /var/lib/machines does not appear to be correct. On the > > working cluster, this is mounted as an ext4 filesystem and has a > > lost+found directory along with a directory for a defined instance. > > > > There is no mount listed on the broken cluster, and the directory is > > empty. > > > > > 2. Don't you happen to run nova-compute on the same set of hosts > > > where > > > LXC containers are placed? As for example, in AIO setup we do > > > manage > > > /var/lib/machines/ mount with systemd var-lib-machines.mount. So if > > > you happen to run nova-computes on controller host or AIO - this is > > > another thing to check. > > > > $ sudo journalctl -u var-lib-machines.mount > > -- Logs begin at Tue 2022-10-04 18:01:46 UTC, end at Tue 2022-10-04 > > 18:52:53 UTC. -- > > Oct 04 18:43:37 os-comp1 systemd[1]: Mounting Virtual Machine and > > Container Storage (Compatibility)... > > Oct 04 18:43:37 os-comp1 mount[1272300]: mount: /var/lib/machines: > > wrong fs type, bad option, bad superblock on /dev/loop0, missing > > codepage or helper program, or other error. > > Oct 04 18:43:37 os-comp1 systemd[1]: var-lib-machines.mount: Mount > > process exited, code=exited, status=32/n/a > > Oct 04 18:43:37 os-comp1 systemd[1]: var-lib-machines.mount: Failed > > with result 'exit-code'. > > Oct 04 18:43:37 os-comp1 systemd[1]: Failed to mount Virtual Machine > > and Container Storage (Compatibility). > > > > This appears to be the problem. It looks like /dev/loop0 is probably > > supposed to reference /var/lib/machines.raw. I tried running fsck on > > /dev/loop0, but it doesn't think there is a valid extX filesystem on > > any of the superblocks. Maybe /dev/loop0 is not really pointing to > > /var/lib/machines.raw? Not sure how to tell if that's the case. > > > > Maybe I should try to loopback this, or create a blank filesystem > > image. > > > > > > > > Okay, I'm not sure what happened here. > > The systemd unit mount file for var-lib-machines is different on the > broken cluster than the working cluster. It talks about a btrfs system, > but the /var/lib/machines.raw file is an ext4 filesystem, like the one > on the working cluster. > > I copied the unit file from the working cluster to the broken cluster, > and I could mount /var/lib/machines, get systemd-machined working, and > create machines now. > > I have no idea what happened. I feel like there must have been a system > update that changed (reverted from openstack-ansible?) something, but > I'm just not sure. > > In any event, you helped me figure it out. Thanks. > > -- > John Ratliff > Systems Automation Engineer > GlobalNOC @ Indiana University > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at andrewboring.com Tue Oct 4 22:28:23 2022 From: andrew at andrewboring.com (Andrew Boring) Date: Tue, 4 Oct 2022 18:28:23 -0400 Subject: [Keystone][Swift] Using policy.json to prohibit specific API operations by policy? Message-ID: <6DED637A-A6C0-4DB6-B1CE-00095A8069D0@andrewboring.com> Hi all, I'm looking to support a situation where one class of Keystone users in a given domain can create Swift containers (either within a single, dedicated project or within their own projects) but *cannot* change ACLs on those containers, while a second class of users *can* alter ACLs on their own containers. For example, User A is in the first class (defined by role) and can perform all CRUD operations, EXCEPT update pre-defined ACLmetadata on those containers. User B is in the second class and CAN update ACLs on their respecitive containers, like any other standard user. Something like this AWS policy condition ("Granting permissions to multiple accounts with added conditions") is directionally what I'm trying to achieve: https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html#example-bucket-policies-use-case-1 Keystone docs imply that I can create policy.json files for all services: "You can define actions for OpenStack service roles in the /etc/PROJECT/policy.yaml files. For example, define actions for Compute service roles in the /etc/nova/policy.yaml file." -https://docs.openstack.org/keystone/yoga/admin/cli-manage-projects-users-and-roles.html But I can't find any indication that Swift actually supports this. So, does Swift support the Oslo policy.json stuff, and if so, is it documented anywhere? Is it simply a "install oslo policy and add it to the pipeline in proxy-server.conf"? If not, is there another/preferred way to achieve the desired restrictions on Swift API operations by policy for a given Keystone domain? Thanks. -- Andrew Boring andrew at andrewboring.com From ppiyakk2 at printf.kr Wed Oct 5 02:14:21 2022 From: ppiyakk2 at printf.kr (Seongsoo Cho) Date: Wed, 5 Oct 2022 11:14:21 +0900 Subject: [I18n] Zanata status + call for volunteers on Weblate migration In-Reply-To: References: Message-ID: Hi Ian. Thanks for letting us know the situation. I'd like to volunteer on Weblate migration. 2022? 10? 5? (?) 01:51, Ian Y. Choi ?? ??: > Hi all, > > First, thank you all for contributing to OpenStack with globalization > by contributing translations, coordinating translations with > artifacts, and making sure that those translations are shipped with > releases to all over the world (we are calling it I18n - > Internationalization). > > While OpenStack I18n could be healthier with the tremendous help from > translators as well as many OpenStack upstream contributors & teams > such as infrastructure, release management, and documentation, the > current translation platform we are using on > https://translate.openstack.org relies on Zanata, an open-source > translation platform which upstream activities were stopped [1]. > Currently, there are several issues reported - most things have been > resolved while it repeats / worses as time goes: > > - OpenID authentication issues for new registration users on > translate.openstack.org [2] > - Translation job failure issues since Zanata client did not work with > a newer Java / job compatibility issues with Python versions [3] > - Missing translation jobs for Xena/Yoga stable versions on Zanata [4] > > Considering the situation, to solve the root cause of the situation, I > am calling for volunteers on Weblate migration. The detailed > activities would continue from what the previous I18n PTL already > investigated, identified, and documented to [5] [6]. There would be > diverse work items with developer perspective as well as translators > and operators. Hope that there are many volunteers to move forward. > > Meanwhile, I updated the "Translations & Priority" part on > https://translate.openstack.org homepage. > Note that stable versions are minimized to Horizon and dashboard > projects, and ping me if there would be lots of translation work > especially during R-3 to R-1. > > Looking forward to an enhanced open source translation platform > landing soon with OpenInfra. > > > With many thanks, > > /Ian > > [1] > https://lists.fedoraproject.org/archives/list/trans at lists.fedoraproject.org/thread/F2JZTYSK3L5JAZY6VSVGDGNNQ4ATG4HP/ > [2] > https://lists.openstack.org/pipermail/openstack-i18n/2022-February/003550.html > [3] https://review.opendev.org/c/openstack/project-config/+/850962 > [4] > https://lists.openstack.org/pipermail/openstack-discuss/2021-December/026441.html > [5] > https://blueprints.launchpad.net/openstack-i18n/+spec/renew-translation-platform > [6] https://etherpad.opendev.org/p/I18n-weblate-migration > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramishra at redhat.com Wed Oct 5 03:16:25 2022 From: ramishra at redhat.com (Rabi Mishra) Date: Wed, 5 Oct 2022 08:46:25 +0530 Subject: [TripleO] TripleO Antelope PTG Topics In-Reply-To: <20221004145332.jsu5vqckqxlsruyq@yuggoth.org> References: <20221004145332.jsu5vqckqxlsruyq@yuggoth.org> Message-ID: On Tue, Oct 4, 2022 at 8:25 PM Jeremy Stanley wrote: > On 2022-10-04 09:36:42 +0530 (+0530), Rabi Mishra wrote: > [...] > > I've moved the contents to the etherpad linked at the PTGBot site > [...] > > For future reference, you can also simply inform PTGBot that you > have a different Etherpad URL, and it will update the site to list > that one for you instead. > Thanks Jeremy. I somehow missed these details in PTGBot docs and thought using an auto-generated etherpad link would be better for consistency:) > -- > Jeremy Stanley > -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From manchandavishal143 at gmail.com Wed Oct 5 05:47:53 2022 From: manchandavishal143 at gmail.com (vishal manchanda) Date: Wed, 5 Oct 2022 11:17:53 +0530 Subject: [horizon] Cancelling Today's Weekly meeting Message-ID: Hello Team, Since there are no agenda items [1] to discuss for today's horizon weekly meeting. Also, Today is a holiday for me. So let's cancel today's weekly meeting. Thanks & regards, Vishal Manchanda(irc:vishalmanchanda) [1] https://etherpad.opendev.org/p/horizon-release-priorities#L38 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xek at redhat.com Wed Oct 5 09:18:42 2022 From: xek at redhat.com (Grzegorz Grasza) Date: Wed, 5 Oct 2022 11:18:42 +0200 Subject: [barbican] New team meeting time Message-ID: Hi Team, At our last meeting, it was proposed to move the meeting forward 1 hour (an hour early), from 1300 to 1200 UTC. [1] If there are no objections, I'll be making the change by the end of the week, before the next meeting takes place. / Greg [1] https://meetings.opendev.org/meetings/barbican/2022/barbican.2022-10-04-13.00.log.html#l-112 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Wed Oct 5 09:38:22 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Wed, 5 Oct 2022 10:38:22 +0100 Subject: [kolla-ansible] How to recover from unfinished upgrade In-Reply-To: References: Message-ID: Hi, Any one??? Regards. Le mer. 28 sept. 2022 ? 16:51, wodel youchi a ?crit : > Hi, > > I am testing the upgrade from xena to yoga and I made a mistake in > globals.yml, I forgot to specify the correct name of gnocchi ceph pool, so > my deployment went wrong. > I interrupted the deployment Ctrl+C I corrected the mistake, then I > restarted the upgrade and it got stuck somewhere else, but I couldn't find > where, I interrupted then restarted the deployment again and it got stuck > at the same place. > My questions : > - Is there a way to rollback the upgrade in case of a problem, then start > over? > - What is the best way to restart a broken upgrade process? > > > Regards. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Wed Oct 5 09:57:51 2022 From: eblock at nde.ag (Eugen Block) Date: Wed, 05 Oct 2022 09:57:51 +0000 Subject: [kolla-ansible] How to recover from unfinished upgrade In-Reply-To: References: Message-ID: <20221005095751.Horde.HRGeD0jS1HpzORRQKau0omH@webmail.nde.ag> Hi, the only thing I can provide is this page: https://docs.openstack.org/operations-guide/ops-upgrades.html#rolling-back-a-failed-upgrade But I'm not sure if they apply to kolla-ansible which I'm not familiar with. >> I interrupted the deployment Ctrl+C I corrected the mistake, then I >> restarted the upgrade and it got stuck somewhere else, but I couldn't find >> where, I interrupted then restarted the deployment again and it got stuck >> at the same place. Wouldn't it be a better idea to let the upgrade fail, maybe changes are rolled back automatically? Where does it fail? Apparently you can reproduce it, so I'd say paste the output of the failure. Regards, Eugen Zitat von wodel youchi : > Hi, > > Any one??? > > Regards. > > Le mer. 28 sept. 2022 ? 16:51, wodel youchi a > ?crit : > >> Hi, >> >> I am testing the upgrade from xena to yoga and I made a mistake in >> globals.yml, I forgot to specify the correct name of gnocchi ceph pool, so >> my deployment went wrong. >> I interrupted the deployment Ctrl+C I corrected the mistake, then I >> restarted the upgrade and it got stuck somewhere else, but I couldn't find >> where, I interrupted then restarted the deployment again and it got stuck >> at the same place. >> My questions : >> - Is there a way to rollback the upgrade in case of a problem, then start >> over? >> - What is the best way to restart a broken upgrade process? >> >> >> Regards. >> From stephenfin at redhat.com Wed Oct 5 12:45:37 2022 From: stephenfin at redhat.com (Stephen Finucane) Date: Wed, 05 Oct 2022 13:45:37 +0100 Subject: [nova][cinder][glance][manila][masakari][tacker][oslo] Configurable soft-delete Message-ID: ? I'm planning on bringing this up in the nova rooms at the PTG in a few weeks, but I'm also raising it here since this potentially affects other service projects and I can't attend all of those room :) Many projects use the concept of "soft delete" in their database models. A soft deletable model typically has two additional columns, 'deleted' and 'deleted_at'. When deleting such a model, instead of actually deleting the database row (i.e. 'DELETE FROM table WHERE condition'), we set 'deleted' to 'True' and populate the 'deleted_at' column. This is helpful for auditing purposes (e.g. you can inspect all resources ever created, even after they've been "deleted") but bad for database performance (your tables can grow without bound). To work around the performance issues, most projects implement some kind of archive or purge command that will allow operators to periodically clean up these deleted resources. However, at least in nova, we've long since come to the conclusion that soft deleting isn't as useful as initially suspected and the need to run these commands is additional work for no benefit. We've moved toward not using it for all new models. With this said, it's going to be difficult to get away from soft-delete quickly. Not only are there database migrations involved, but operators will need to rework their tooling to adapt to a new, no-soft-delete world. As such, I'd like to propose a half-way measure of making soft-delete configurable. To do this, I'd like to add a new flag in oslo.db, '[database] enable_soft_delete'. When set to 'False' anyone using the 'SoftDeleteMixin' from oslo.db would see these models hard deleted rather than soft deleted when calling 'soft_delete'. This would avoid the need for operators to run the various project-specific purge tooling. The RFC patch for this is available for review [1]. I can also do this on a project-specific basis and have proposed a similar patch for nova [2], however, doing it in oslo.db means every project that uses 'SoftDeleteMixin' in their models will get this for free. Projects that don't (glance, cinder) can switch to using this mixin and also get it for free. As noted above, I intend to discuss this in the nova room at the PTG, but I'd be interested in people's thoughts ahead of time. Do you think this is a good idea? Should we proceed with it? Perhaps there are there better ways to do this? Let me know! Cheers, Stephen [1] https://review.opendev.org/c/openstack/oslo.db/+/860407 [2] https://review.opendev.org/c/openstack/nova/+/860401 From senrique at redhat.com Wed Oct 5 12:55:10 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 5 Oct 2022 09:55:10 -0300 Subject: Bug Report - 10-05-2022 Message-ID: This is a bug report from 09-28-2022 to 10-05-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- Low - https://bugs.launchpad.net/cinder/+bug/1991634 "Toyou drivers report allocated_capacity_gb." Fix proposed to master. - https://bugs.launchpad.net/cinder/+bug/1991217 "[docs] Unsupported option in docs for cinder-manage quota check/sync." Fix proposed to master. - https://bugs.launchpad.net/cinder/+bug/1991154 "[docs] Service tokens documentation is misleading." Fix proposed to master. Cheers, -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From clay.gerrard at gmail.com Wed Oct 5 13:20:40 2022 From: clay.gerrard at gmail.com (Clay Gerrard) Date: Wed, 5 Oct 2022 08:20:40 -0500 Subject: [Swift][Ussuri] Erasure Coding Quarantines In-Reply-To: References: <20220930165217.2901f9cf@niphredil.zaitcev.lan> Message-ID: On Wed, Oct 5, 2022 at 7:58 AM Reid Guyett wrote: > the env var only works in 1.6.2 but 20.04 ships with 1.6.1. > Oh shoot, yeah I have no idea what version is packaged downstream. Maybe we can get Thomas to backport the jammy package https://packages.ubuntu.com/jammy/liberasurecode1 to focal https://packages.ubuntu.com/focal/liberasurecode1 -- Clay Gerrard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rguyett at datto.com Wed Oct 5 12:58:39 2022 From: rguyett at datto.com (Reid Guyett) Date: Wed, 5 Oct 2022 12:58:39 +0000 Subject: [Swift][Ussuri] Erasure Coding Quarantines In-Reply-To: References: <20220930165217.2901f9cf@niphredil.zaitcev.lan> Message-ID: I'm not sure how to create a double quote in Outlook web app... We are going to try to create a new liberasurecode package 1.6.2 for 20.04 so we can set the environment variable to write legacy CRC headers until all the nodes in the cluster can be upgraded. I'm not sure if you need a new package, I think you have to set the env at runtime - but there's also a swift config option that will force the env to get set that you can turn off after full upgrade. In the IRC response, the env var only works in 1.6.2 but 20.04 ships with 1.6.1. The application setting you mentioned is in in Swift 2.27 and we are still in Ussuri (2.25.2) but still requires the compatible liberasurecode1 package. I'm not sure how to go about requesting this version to be available in the Focal repos. It seems like it should belong there since upgrading from 18.04 to 20.04 is a contributor to this problem. ________________________________ From: Clay Gerrard Sent: Tuesday, October 4, 2022 09:28 To: Reid Guyett Cc: Pete Zaitcev ; openstack-discuss at lists.openstack.org ; Matthew Grinnell Subject: Re: [Swift][Ussuri] Erasure Coding Quarantines KASEYA Warning: Sender @clay?.gerrard at gmail?.com is not yet trusted by your organization. Please be careful before replying. Report Phishing Mark as Safe powered by Graphus? [EXTERNAL] On Mon, Oct 3, 2022 at 3:37 PM Reid Guyett < rguyett at datto.com> wrote: Thanks for the follow-up. [...] From there the files were downloadable again. Nice work! We are going to try to create a new liberasurecode package 1.6.2 for 20.04 so we can set the environment variable to write legacy CRC headers until all the nodes in the cluster can be upgraded. I'm not sure if you need a new package, I think you have to set the env at runtime - but there's also a swift config option that will force the env to get set that you can turn off after full upgrade. This is why we have testing environments. This is why *competent* deployers and operators have testing environments - and it's the only thing that makes the terrible terrible reality of building and releasing software actually a net good. Couldn't do it without you; go FOSS! -- Clay Gerrard Important Notice: This email is intended to be received only by persons entitled to receive the confidential and legally privileged information it presumptively contains, and this notice constitutes identification as such. Any reading, disclosure, copying, distribution or use of this information by or to someone who is not the intended recipient, is prohibited. If you received this email in error, please notify us immediately at legal at kaseya.com, and then delete it. To opt-out of receiving emails Please click here. The term 'this e-mail' includes any and all attachments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openinfra.dev Wed Oct 5 13:58:59 2022 From: thierry at openinfra.dev (Thierry Carrez) Date: Wed, 5 Oct 2022 15:58:59 +0200 Subject: [release][ironic] Release desired for Ironic bugfix/N branches In-Reply-To: <7334a347-452b-40d6-9832-2c4c019e7370@app.fastmail.com> References: <7334a347-452b-40d6-9832-2c4c019e7370@app.fastmail.com> Message-ID: <83e88d25-bc8a-76a3-f9cb-9b5339205721@openinfra.dev> Clark Boylan wrote: > [...] > If the release team ends up doing the work, it would probably be a good idea to very explicitly list the branch, commit sha1, and version number for each of the needed releases. This way the release team doesn't have to guess if they are getting it correct when they make and push those tags. Yes please. > Separately, it seems like some of the intention here is to ensure that users of bugfix branches don't end up with stale installations. Updating the release tooling to handle releases off of these branches or delegating access to the Ironic team seem like an important piece of making that happen. Otherwise the overhead for doing this will be large enough that it is unlikely to happen often enough. Unfortunately, I don't know what is currently missing in the tooling to make that possible. I'd say it's an unknown and the current release team members may not have bandwidth to explore what releasing on bugfix branches using a patch to openstack/releases could look like. Avoiding collisions between "normal" stable branch point updates and those bugfix branch point releases sounds tricky at best. We should try one manually and see how it goes first :) -- Thierry Carrez From jay at gr-oss.io Wed Oct 5 14:42:19 2022 From: jay at gr-oss.io (Jay Faulkner) Date: Wed, 5 Oct 2022 07:42:19 -0700 Subject: [release][ironic] Release desired for Ironic bugfix/N branches In-Reply-To: <83e88d25-bc8a-76a3-f9cb-9b5339205721@openinfra.dev> References: <7334a347-452b-40d6-9832-2c4c019e7370@app.fastmail.com> <83e88d25-bc8a-76a3-f9cb-9b5339205721@openinfra.dev> Message-ID: On Wed, Oct 5, 2022 at 7:18 AM Thierry Carrez wrote: > Clark Boylan wrote: > > [...] > > If the release team ends up doing the work, it would probably be a good > idea to very explicitly list the branch, commit sha1, and version number > for each of the needed releases. This way the release team doesn't have to > guess if they are getting it correct when they make and push those tags. > > Yes please. > > I'm cleaning up some CI right now; I'll make sure we get this information to the list soon so we can give it a shot :). > > Separately, it seems like some of the intention here is to ensure that > users of bugfix branches don't end up with stale installations. Updating > the release tooling to handle releases off of these branches or delegating > access to the Ironic team seem like an important piece of making that > happen. Otherwise the overhead for doing this will be large enough that it > is unlikely to happen often enough. Unfortunately, I don't know what is > currently missing in the tooling to make that possible. > > I'd say it's an unknown and the current release team members may not > have bandwidth to explore what releasing on bugfix branches using a > patch to openstack/releases could look like. Avoiding collisions between > "normal" stable branch point updates and those bugfix branch point > releases sounds tricky at best. > > I'm hoping this won't be an issue. Ironic policy (and it seems to be true in practice), says any release from master (including bugfix/x branches) must bump either the major or minor release number ( https://specs.openstack.org/openstack/ironic-specs/specs/15.1/new-release-model.html#releasing ). Thanks Thierry and Clark, I'll get you the information you all need to move forward soon! -Jay Faulkner -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Wed Oct 5 15:00:28 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Wed, 5 Oct 2022 17:00:28 +0200 Subject: OpenStack Zed is officially released! Message-ID: <53d79ec4-8a16-08eb-ce32-f0ec773706ee@est.tech> Hello OpenStack community, The official OpenStack Zed release announcement has been sent out: http://lists.openstack.org/pipermail/openstack-announce/2022-October/002061.html Thanks to all who were a part of the Zed development cycle! This marks the official opening of the openstack/releases repository for 2023.1 Antelope releases, and freezes are now lifted. stable/zed is now a fully normal stable branch, and the normal stable policy applies from now on. Thanks, El?d Ill?s and the Release Management team From allison at openinfra.dev Wed Oct 5 15:12:52 2022 From: allison at openinfra.dev (Allison Price) Date: Wed, 5 Oct 2022 10:12:52 -0500 Subject: OpenStack Zed is officially released! In-Reply-To: <53d79ec4-8a16-08eb-ce32-f0ec773706ee@est.tech> References: <53d79ec4-8a16-08eb-ce32-f0ec773706ee@est.tech> Message-ID: Congratulations to all of the contributors! > On Oct 5, 2022, at 10:00 AM, El?d Ill?s wrote: > > Hello OpenStack community, > > The official OpenStack Zed release announcement has been sent out: > > http://lists.openstack.org/pipermail/openstack-announce/2022-October/002061.html > > Thanks to all who were a part of the Zed development cycle! > > This marks the official opening of the openstack/releases repository for > 2023.1 Antelope releases, and freezes are now lifted. stable/zed is now a fully normal stable branch, > and the normal stable policy applies from now on. > > Thanks, > > El?d Ill?s and the Release Management team From fungi at yuggoth.org Wed Oct 5 16:06:41 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Oct 2022 16:06:41 +0000 Subject: [dev][infra][tact-sig] Updating Zuul's default-ansible-version to 6 Message-ID: <20221005160640.buu6aevydtkgs4ly@yuggoth.org> Just a heads up for folks not following the OpenDev Collaboratory's service-announce mailing list... now that Zed is officially released, we'll be increasing the default Ansible version for Zuul jobs from 5 to 6 in preparation for Zuul to drop Ansible 5 support in coming weeks. See the full announcement here: https://lists.opendev.org/pipermail/service-announce/2022-October/000046.html -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gmann at ghanshyammann.com Wed Oct 5 17:46:13 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 05 Oct 2022 10:46:13 -0700 Subject: OpenStack Zed is officially released! In-Reply-To: <53d79ec4-8a16-08eb-ce32-f0ec773706ee@est.tech> References: <53d79ec4-8a16-08eb-ce32-f0ec773706ee@est.tech> Message-ID: <183a9415121.cde7b3d3208615.1637975538880544213@ghanshyammann.com> Congratulation to all the contributors for their 6 months of hard work and thanks to the release team for awesome work to continue doing the on-time release. -gmann ---- On Wed, 05 Oct 2022 08:00:28 -0700 El?d Ill?s wrote --- > Hello OpenStack community, > > The official OpenStack Zed release announcement has been sent out: > > http://lists.openstack.org/pipermail/openstack-announce/2022-October/002061.html > > Thanks to all who were a part of the Zed development cycle! > > This marks the official opening of the openstack/releases repository for > 2023.1 Antelope releases, and freezes are now lifted. stable/zed is now > a fully normal stable branch, > and the normal stable policy applies from now on. > > Thanks, > > El?d Ill?s and the Release Management team > From amy at demarco.com Wed Oct 5 17:51:42 2022 From: amy at demarco.com (Amy) Date: Wed, 5 Oct 2022 12:51:42 -0500 Subject: OpenStack Zed is officially released! In-Reply-To: <183a9415121.cde7b3d3208615.1637975538880544213@ghanshyammann.com> References: <183a9415121.cde7b3d3208615.1637975538880544213@ghanshyammann.com> Message-ID: Congrats Everyone!! Great job! Amy > On Oct 5, 2022, at 12:50 PM, Ghanshyam Mann wrote: > > ?Congratulation to all the contributors for their 6 months of hard work and thanks to the release team for awesome > work to continue doing the on-time release. > > -gmann > > ---- On Wed, 05 Oct 2022 08:00:28 -0700 El?d Ill?s wrote --- >> Hello OpenStack community, >> >> The official OpenStack Zed release announcement has been sent out: >> >> http://lists.openstack.org/pipermail/openstack-announce/2022-October/002061.html >> >> Thanks to all who were a part of the Zed development cycle! >> >> This marks the official opening of the openstack/releases repository for >> 2023.1 Antelope releases, and freezes are now lifted. stable/zed is now >> a fully normal stable branch, >> and the normal stable policy applies from now on. >> >> Thanks, >> >> El?d Ill?s and the Release Management team >> > From rdhasman at redhat.com Wed Oct 5 19:02:05 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Thu, 6 Oct 2022 00:32:05 +0530 Subject: OpenStack Zed is officially released! In-Reply-To: References: <183a9415121.cde7b3d3208615.1637975538880544213@ghanshyammann.com> Message-ID: Congratulations everyone! On Wed, Oct 5, 2022 at 11:26 PM Amy wrote: > Congrats Everyone!! Great job! > > Amy > > > On Oct 5, 2022, at 12:50 PM, Ghanshyam Mann > wrote: > > > > ?Congratulation to all the contributors for their 6 months of hard work > and thanks to the release team for awesome > > work to continue doing the on-time release. > > > > -gmann > > > > ---- On Wed, 05 Oct 2022 08:00:28 -0700 El?d Ill?s wrote --- > >> Hello OpenStack community, > >> > >> The official OpenStack Zed release announcement has been sent out: > >> > >> > http://lists.openstack.org/pipermail/openstack-announce/2022-October/002061.html > >> > >> Thanks to all who were a part of the Zed development cycle! > >> > >> This marks the official opening of the openstack/releases repository for > >> 2023.1 Antelope releases, and freezes are now lifted. stable/zed is now > >> a fully normal stable branch, > >> and the normal stable policy applies from now on. > >> > >> Thanks, > >> > >> El?d Ill?s and the Release Management team > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allison at openinfra.dev Wed Oct 5 20:37:29 2022 From: allison at openinfra.dev (Allison Price) Date: Wed, 5 Oct 2022 15:37:29 -0500 Subject: [zed] OpenInfra Live - October 6 at 1400 UTC Message-ID: <11530FD5-78FF-492E-B787-8156381747C7@openinfra.dev> Hi everyone, This week?s OpenInfra Live episode is brought to you by the OpenStack community who just delivered its 26th on-time release today! Join us to learn about the latest from community leaders about what was delivered in Zed and what we can expect in Antelope, OpenStack's 27th release targeting early 2023. Episode: OpenStack Zed: The End of the Alphabet, The Beginning of a New Era Date and time: October 6 at 1400 UTC You can watch us live on: YouTube: https://www.youtube.com/watch?v=MSbB3L9_MeY LinkedIn: https://www.linkedin.com/video/event/urn:li:ugcPost:6982723169144950786/ Facebook: https://www.facebook.com/events/390328576642133 WeChat: recording will be posted on OpenStack WeChat after the live stream Speakers: Kendall Nelson, OpenInfra Foundation Carlos Silva, Manila Jay Faulkner, Ironic Sylvain Bauza, Nova Lajos Katona, Neutron Wu Wenxiang, Skyline Martin Kopec, Interop Working Group Liye Pang, Venus Have an idea for a future episode? Share it now at ideas.openinfra.live . Thanks, Allison -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuta.kazato.nw at hco.ntt.co.jp Thu Oct 6 01:27:28 2022 From: yuta.kazato.nw at hco.ntt.co.jp (Yuta Kazato) Date: Thu, 06 Oct 2022 10:27:28 +0900 Subject: [tacker] Critical bug report and backport the fix Message-ID: Yasufumi, Ueha, Tacker and Release management team Hi, thanks for your agreements and supports it this issue. We could backport fix patches to stable/zed. I'm glad to release Openstack Zed:) https://releases.openstack.org/zed/ See you next Antelope vPTG! Yuta > Hi Yuta and Yasufumi, > > +1 > > And we have a patch for another critical bug related with pm interface. > The bug report [1] and the fix patch [2] have been already posted. > This patch also requires a backport to stable/zed. > > [1]https://bugs.launchpad.net/tacker/+bug/1990828 > [2]https://review.opendev.org/c/openstack/tacker/+/859377 > > Best Regards, > Ueha > > -----Original Message----- > From: Yasufumi Ogawa > Sent: Tuesday, September 27, 2022 10:42 AM > To: openstack-discuss at lists.openstack.org > Subject: Re: [tacker] Critical bug report and backport the fix > > Hi Yuta, > > On 2022/09/26 16:57, Yuta Kazato wrote: > > Hi tacker team, > > > > As you know, new bug report #1990793 [1] related to K8s resource name > > and v2 API is submitted by Masaki. > > The bug will appear if K8s resource name contains `-`. > > > > I think this is a critical issue because users often set resource > > names that contain `-`. > Agree. > > > Fortunately, the fix patch [2] has already been submitted. > > > > I suggest that we should backport the fix patch to the stable/zed > > branch before Zed release. > > What do you think? > We should fix the issue in stable/zed, or this k8s support doesn't work for many usecases. > > Yasufumi > > > > [1] https://bugs.launchpad.net/tacker/+bug/1990793 > > [2] https://review.opendev.org/c/openstack/tacker/+/859206 > > > > Yuta > > -- Yuta Kazato (?? ??) NTT Network Innovation Center. tel: +81-422-59-6754 mail:yuta.kazato.nw at hco.ntt.co.jp -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Oct 6 01:41:26 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 05 Oct 2022 18:41:26 -0700 Subject: [Keystone][Swift] Using policy.json to prohibit specific API operations by policy? In-Reply-To: <6DED637A-A6C0-4DB6-B1CE-00095A8069D0@andrewboring.com> References: <6DED637A-A6C0-4DB6-B1CE-00095A8069D0@andrewboring.com> Message-ID: <183aaf46352.eeddc639217531.3862609893768540939@ghanshyammann.com> ---- On Tue, 04 Oct 2022 15:28:23 -0700 Andrew Boring wrote --- > Hi all, > > > I'm looking to support a situation where one class of Keystone users in a given domain can create Swift containers (either within a single, dedicated project or within their own projects) but *cannot* change ACLs on those containers, while a second class of users *can* alter ACLs on their own containers. > > For example, User A is in the first class (defined by role) and can perform all CRUD operations, EXCEPT update pre-defined ACLmetadata on those containers. User B is in the second class and CAN update ACLs on their respecitive containers, like any other standard user. > > Something like this AWS policy condition ("Granting permissions to multiple accounts with added conditions") is directionally what I'm trying to achieve: > https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html#example-bucket-policies-use-case-1 > > > Keystone docs imply that I can create policy.json files for all services: > > "You can define actions for OpenStack service roles in the /etc/PROJECT/policy.yaml files. For example, define actions for Compute service roles in the /etc/nova/policy.yaml file." > -https://docs.openstack.org/keystone/yoga/admin/cli-manage-projects-users-and-roles.html > > But I can't find any indication that Swift actually supports this. > > So, does Swift support the Oslo policy.json stuff, and if so, is it documented anywhere? Is it simply a "install oslo policy and add it to the pipeline in proxy-server.conf"? Swift does not use the oslo.policy or policy.json file mechanism to control the access on their APIs. I might be able to provide detail about their ACL mechanism but below doc explain some of it: - https://github.com/openstack/swift/blob/3ad39cd0b83a7f70d6c559c7b0e68a2e625be179/doc/source/overview_acl.rst -gmann > > If not, is there another/preferred way to achieve the desired restrictions on Swift API operations by policy for a given Keystone domain? > > Thanks. > > -- > Andrew Boring > andrew at andrewboring.com > > > > > > > From park0kyung0won at dgist.ac.kr Thu Oct 6 02:00:19 2022 From: park0kyung0won at dgist.ac.kr (=?UTF-8?B?67CV6rK97JuQ?=) Date: Thu, 6 Oct 2022 11:00:19 +0900 (KST) Subject: [metadata agent & keystone] Remote metadata server experienced an internal error? Message-ID: <2610420.159983.1665021619542.JavaMail.root@mailwas2> An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Oct 6 05:14:56 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 05 Oct 2022 22:14:56 -0700 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Oct 6 at 1500 UTC In-Reply-To: <1839f6e6053.e2f4eedf40797.2956987166661707844@ghanshyammann.com> References: <1839f6e6053.e2f4eedf40797.2956987166661707844@ghanshyammann.com> Message-ID: <183abb7d9b7.1182af98b219360.2974238188609745154@ghanshyammann.com> Hello Everyone, Below is the agenda for tomorrow's TC meeting scheduled at 1500 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting * Roll call * Follow up on past action items * Gate health check ** Bare 'recheck' state *** https://etherpad.opendev.org/p/recheck-weekly-summary ** Zuul config error *** https://etherpad.opendev.org/p/zuul-config-error-openstack * Zed cycle tracker checks ** https://etherpad.opendev.org/p/tc-zed-tracker * 2023.1 cycle PTG Planning ** TC + Leaders interaction sessions *** https://etherpad.opendev.org/p/tc-leaders-interaction-2023-1 ** TC PTG etherpad *** https://etherpad.opendev.org/p/tc-2023-1-ptg ** Schedule 'operator hours' *** https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030301.html * 2023.1 cycle Technical Election & Leaderless projects ** Leaderless projects *** https://etherpad.opendev.org/p/2023.1-leaderless * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open -gmann ---- On Mon, 03 Oct 2022 12:59:14 -0700 Ghanshyam Mann wrote --- > Hello Everyone, > > The technical Committee's next weekly meeting is scheduled for 2022 Oct 6, at 1500 UTC. > > If you would like to add topics for discussion, please add them to the below wiki page by > Wednesday, Oct 5 at 2100 UTC. > > https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting > > -gmann > > > From noonedeadpunk at gmail.com Thu Oct 6 05:24:55 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 6 Oct 2022 07:24:55 +0200 Subject: [nova][cinder][glance][manila][masakari][tacker][oslo] Configurable soft-delete In-Reply-To: References: Message-ID: Not having soft delete in the database is really quite bad for operators and it's not about tooling, but it's about audit purposes. If take nova as example, this also means that once server is deleted, event log will be also wiped with no way to see who and when has performed delete action. And that is really used feature, as we got requests like "why my VM has disappeared" at very least one a week. For other services having deleted_at at least points to the datetime where to search in the logs. At the same time I don't see any issue in having soft delete. It's just a matter of one systemd-timer, and too concerned about performance can set it to 1 day, thus almost no impact on db performance. So from operator perspective I can say this is very valuable feature and I personally do struggle regularly with neutron services where it's absent. And I would hate this to disappear at all, as it would be really a nightmare. ??, 5 ???. 2022 ?., 14:48 Stephen Finucane : > ? > > I'm planning on bringing this up in the nova rooms at the PTG in a few > weeks, > but I'm also raising it here since this potentially affects other service > projects and I can't attend all of those room :) > > Many projects use the concept of "soft delete" in their database models. A > soft > deletable model typically has two additional columns, 'deleted' and > 'deleted_at'. When deleting such a model, instead of actually deleting the > database row (i.e. 'DELETE FROM table WHERE condition'), we set 'deleted' > to > 'True' and populate the 'deleted_at' column. This is helpful for auditing > purposes (e.g. you can inspect all resources ever created, even after > they've > been "deleted") but bad for database performance (your tables can grow > without > bound). To work around the performance issues, most projects implement > some kind > of archive or purge command that will allow operators to periodically > clean up > these deleted resources. However, at least in nova, we've long since come > to the > conclusion that soft deleting isn't as useful as initially suspected and > the > need to run these commands is additional work for no benefit. We've moved > toward > not using it for all new models. > > With this said, it's going to be difficult to get away from soft-delete > quickly. > Not only are there database migrations involved, but operators will need to > rework their tooling to adapt to a new, no-soft-delete world. As such, I'd > like > to propose a half-way measure of making soft-delete configurable. To do > this, > I'd like to add a new flag in oslo.db, '[database] enable_soft_delete'. > When set > to 'False' anyone using the 'SoftDeleteMixin' from oslo.db would see these > models hard deleted rather than soft deleted when calling 'soft_delete'. > This > would avoid the need for operators to run the various project-specific > purge > tooling. The RFC patch for this is available for review [1]. I can also do > this > on a project-specific basis and have proposed a similar patch for nova [2], > however, doing it in oslo.db means every project that uses > 'SoftDeleteMixin' in > their models will get this for free. Projects that don't (glance, cinder) > can > switch to using this mixin and also get it for free. > > As noted above, I intend to discuss this in the nova room at the PTG, but > I'd be > interested in people's thoughts ahead of time. Do you think this is a good > idea? > Should we proceed with it? Perhaps there are there better ways to do this? > Let > me know! > > Cheers, > Stephen > > [1] https://review.opendev.org/c/openstack/oslo.db/+/860407 > [2] https://review.opendev.org/c/openstack/nova/+/860401 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From artem.goncharov at gmail.com Thu Oct 6 05:48:19 2022 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Thu, 6 Oct 2022 07:48:19 +0200 Subject: [nova][cinder][glance][manila][masakari][tacker][oslo] Configurable soft-delete In-Reply-To: References: Message-ID: Hey, If this is there mostly for audit purposes then I guess a more efficient solution is to introduce an audit table which will have no performance impact on the "current state" of service. Audit records are also never updated means performance here is relatively straight forward. Audit may become a "feature" with enable switch. This should be a better solution rather then letting db permanently grow and forcing admins to constantly "fight" against it. This also gives a much cleaner audit experience. Actually this is also not a new approach and is being followed in many places (i.e. auditd) Regards, Artem ---- typed from mobile, auto-correct typos assumed ---- On Thu, Oct 6, 2022, 07:27 Dmitriy Rabotyagov wrote: > Not having soft delete in the database is really quite bad for operators > and it's not about tooling, but it's about audit purposes. > > If take nova as example, this also means that once server is deleted, > event log will be also wiped with no way to see who and when has performed > delete action. And that is really used feature, as we got requests like > "why my VM has disappeared" at very least one a week. > > For other services having deleted_at at least points to the datetime where > to search in the logs. > > At the same time I don't see any issue in having soft delete. It's just a > matter of one systemd-timer, and too concerned about performance can set it > to 1 day, thus almost no impact on db performance. > > So from operator perspective I can say this is very valuable feature and I > personally do struggle regularly with neutron services where it's absent. > And I would hate this to disappear at all, as it would be really a > nightmare. > > ??, 5 ???. 2022 ?., 14:48 Stephen Finucane : > >> ? >> >> I'm planning on bringing this up in the nova rooms at the PTG in a few >> weeks, >> but I'm also raising it here since this potentially affects other service >> projects and I can't attend all of those room :) >> >> Many projects use the concept of "soft delete" in their database models. >> A soft >> deletable model typically has two additional columns, 'deleted' and >> 'deleted_at'. When deleting such a model, instead of actually deleting the >> database row (i.e. 'DELETE FROM table WHERE condition'), we set 'deleted' >> to >> 'True' and populate the 'deleted_at' column. This is helpful for auditing >> purposes (e.g. you can inspect all resources ever created, even after >> they've >> been "deleted") but bad for database performance (your tables can grow >> without >> bound). To work around the performance issues, most projects implement >> some kind >> of archive or purge command that will allow operators to periodically >> clean up >> these deleted resources. However, at least in nova, we've long since come >> to the >> conclusion that soft deleting isn't as useful as initially suspected and >> the >> need to run these commands is additional work for no benefit. We've moved >> toward >> not using it for all new models. >> >> With this said, it's going to be difficult to get away from soft-delete >> quickly. >> Not only are there database migrations involved, but operators will need >> to >> rework their tooling to adapt to a new, no-soft-delete world. As such, >> I'd like >> to propose a half-way measure of making soft-delete configurable. To do >> this, >> I'd like to add a new flag in oslo.db, '[database] enable_soft_delete'. >> When set >> to 'False' anyone using the 'SoftDeleteMixin' from oslo.db would see these >> models hard deleted rather than soft deleted when calling 'soft_delete'. >> This >> would avoid the need for operators to run the various project-specific >> purge >> tooling. The RFC patch for this is available for review [1]. I can also >> do this >> on a project-specific basis and have proposed a similar patch for nova >> [2], >> however, doing it in oslo.db means every project that uses >> 'SoftDeleteMixin' in >> their models will get this for free. Projects that don't (glance, cinder) >> can >> switch to using this mixin and also get it for free. >> >> As noted above, I intend to discuss this in the nova room at the PTG, but >> I'd be >> interested in people's thoughts ahead of time. Do you think this is a >> good idea? >> Should we proceed with it? Perhaps there are there better ways to do >> this? Let >> me know! >> >> Cheers, >> Stephen >> >> [1] https://review.opendev.org/c/openstack/oslo.db/+/860407 >> [2] https://review.opendev.org/c/openstack/nova/+/860401 >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Oct 6 07:08:19 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 6 Oct 2022 09:08:19 +0200 Subject: [nova][cinder][glance][manila][masakari][tacker][oslo] Configurable soft-delete In-Reply-To: References: Message-ID: Oh, yes, this is a good alternative. Actually I was thinking about smth like "openstack event list" (as only nova does have that) for quite a while, but without having the resources to lead and help out in implementation across projects I didn't dare to raise this topic. But it's probably high time to start discussing it at the very least and make a proposal as a community goal based on the outcome of these discussions. ??, 6 ???. 2022 ?., 07:48 Artem Goncharov : > Hey, > > If this is there mostly for audit purposes then I guess a more efficient > solution is to introduce an audit table which will have no performance > impact on the "current state" of service. Audit records are also never > updated means performance here is relatively straight forward. Audit may > become a "feature" with enable switch. > > This should be a better solution rather then letting db permanently grow > and forcing admins to constantly "fight" against it. This also gives a much > cleaner audit experience. Actually this is also not a new approach and is > being followed in many places (i.e. auditd) > > Regards, > Artem > > ---- > typed from mobile, auto-correct typos assumed > ---- > > On Thu, Oct 6, 2022, 07:27 Dmitriy Rabotyagov > wrote: > >> Not having soft delete in the database is really quite bad for operators >> and it's not about tooling, but it's about audit purposes. >> >> If take nova as example, this also means that once server is deleted, >> event log will be also wiped with no way to see who and when has performed >> delete action. And that is really used feature, as we got requests like >> "why my VM has disappeared" at very least one a week. >> >> For other services having deleted_at at least points to the datetime >> where to search in the logs. >> >> At the same time I don't see any issue in having soft delete. It's just a >> matter of one systemd-timer, and too concerned about performance can set it >> to 1 day, thus almost no impact on db performance. >> >> So from operator perspective I can say this is very valuable feature and >> I personally do struggle regularly with neutron services where it's absent. >> And I would hate this to disappear at all, as it would be really a >> nightmare. >> >> ??, 5 ???. 2022 ?., 14:48 Stephen Finucane : >> >>> ? >>> >>> I'm planning on bringing this up in the nova rooms at the PTG in a few >>> weeks, >>> but I'm also raising it here since this potentially affects other service >>> projects and I can't attend all of those room :) >>> >>> Many projects use the concept of "soft delete" in their database models. >>> A soft >>> deletable model typically has two additional columns, 'deleted' and >>> 'deleted_at'. When deleting such a model, instead of actually deleting >>> the >>> database row (i.e. 'DELETE FROM table WHERE condition'), we set >>> 'deleted' to >>> 'True' and populate the 'deleted_at' column. This is helpful for auditing >>> purposes (e.g. you can inspect all resources ever created, even after >>> they've >>> been "deleted") but bad for database performance (your tables can grow >>> without >>> bound). To work around the performance issues, most projects implement >>> some kind >>> of archive or purge command that will allow operators to periodically >>> clean up >>> these deleted resources. However, at least in nova, we've long since >>> come to the >>> conclusion that soft deleting isn't as useful as initially suspected and >>> the >>> need to run these commands is additional work for no benefit. We've >>> moved toward >>> not using it for all new models. >>> >>> With this said, it's going to be difficult to get away from soft-delete >>> quickly. >>> Not only are there database migrations involved, but operators will need >>> to >>> rework their tooling to adapt to a new, no-soft-delete world. As such, >>> I'd like >>> to propose a half-way measure of making soft-delete configurable. To do >>> this, >>> I'd like to add a new flag in oslo.db, '[database] enable_soft_delete'. >>> When set >>> to 'False' anyone using the 'SoftDeleteMixin' from oslo.db would see >>> these >>> models hard deleted rather than soft deleted when calling 'soft_delete'. >>> This >>> would avoid the need for operators to run the various project-specific >>> purge >>> tooling. The RFC patch for this is available for review [1]. I can also >>> do this >>> on a project-specific basis and have proposed a similar patch for nova >>> [2], >>> however, doing it in oslo.db means every project that uses >>> 'SoftDeleteMixin' in >>> their models will get this for free. Projects that don't (glance, >>> cinder) can >>> switch to using this mixin and also get it for free. >>> >>> As noted above, I intend to discuss this in the nova room at the PTG, >>> but I'd be >>> interested in people's thoughts ahead of time. Do you think this is a >>> good idea? >>> Should we proceed with it? Perhaps there are there better ways to do >>> this? Let >>> me know! >>> >>> Cheers, >>> Stephen >>> >>> [1] https://review.opendev.org/c/openstack/oslo.db/+/860407 >>> [2] https://review.opendev.org/c/openstack/nova/+/860401 >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Thu Oct 6 07:38:04 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Thu, 6 Oct 2022 09:38:04 +0200 Subject: [nova][cinder][glance][manila][masakari][tacker][oslo] Configurable soft-delete In-Reply-To: References: Message-ID: Le jeu. 6 oct. 2022 ? 07:54, Artem Goncharov a ?crit : > Hey, > > If this is there mostly for audit purposes then I guess a more efficient > solution is to introduce an audit table which will have no performance > impact on the "current state" of service. Audit records are also never > updated means performance here is relatively straight forward. Audit may > become a "feature" with enable switch. > > This should be a better solution rather then letting db permanently grow > and forcing admins to constantly "fight" against it. This also gives a much > cleaner audit experience. Actually this is also not a new approach and is > being followed in many places (i.e. auditd) > > This isn't true. Any operator that sees the Nova DBs [1] growing can use two commands (or just cron them) : nova-manage db archive_deleted_rows nova-manage db purge The first command will archive the soft-deleted records to another table and the second one will purge them. Here, I don't see why we would change this by a configuration option, but we can discuss this at the PTG like Stephen said. -Sylvain Regards, > Artem > > ---- > typed from mobile, auto-correct typos assumed > ---- > > On Thu, Oct 6, 2022, 07:27 Dmitriy Rabotyagov > wrote: > >> Not having soft delete in the database is really quite bad for operators >> and it's not about tooling, but it's about audit purposes. >> >> If take nova as example, this also means that once server is deleted, >> event log will be also wiped with no way to see who and when has performed >> delete action. And that is really used feature, as we got requests like >> "why my VM has disappeared" at very least one a week. >> >> For other services having deleted_at at least points to the datetime >> where to search in the logs. >> >> At the same time I don't see any issue in having soft delete. It's just a >> matter of one systemd-timer, and too concerned about performance can set it >> to 1 day, thus almost no impact on db performance. >> >> So from operator perspective I can say this is very valuable feature and >> I personally do struggle regularly with neutron services where it's absent. >> And I would hate this to disappear at all, as it would be really a >> nightmare. >> >> ??, 5 ???. 2022 ?., 14:48 Stephen Finucane : >> >>> ? >>> >>> I'm planning on bringing this up in the nova rooms at the PTG in a few >>> weeks, >>> but I'm also raising it here since this potentially affects other service >>> projects and I can't attend all of those room :) >>> >>> Many projects use the concept of "soft delete" in their database models. >>> A soft >>> deletable model typically has two additional columns, 'deleted' and >>> 'deleted_at'. When deleting such a model, instead of actually deleting >>> the >>> database row (i.e. 'DELETE FROM table WHERE condition'), we set >>> 'deleted' to >>> 'True' and populate the 'deleted_at' column. This is helpful for auditing >>> purposes (e.g. you can inspect all resources ever created, even after >>> they've >>> been "deleted") but bad for database performance (your tables can grow >>> without >>> bound). To work around the performance issues, most projects implement >>> some kind >>> of archive or purge command that will allow operators to periodically >>> clean up >>> these deleted resources. However, at least in nova, we've long since >>> come to the >>> conclusion that soft deleting isn't as useful as initially suspected and >>> the >>> need to run these commands is additional work for no benefit. We've >>> moved toward >>> not using it for all new models. >>> >>> With this said, it's going to be difficult to get away from soft-delete >>> quickly. >>> Not only are there database migrations involved, but operators will need >>> to >>> rework their tooling to adapt to a new, no-soft-delete world. As such, >>> I'd like >>> to propose a half-way measure of making soft-delete configurable. To do >>> this, >>> I'd like to add a new flag in oslo.db, '[database] enable_soft_delete'. >>> When set >>> to 'False' anyone using the 'SoftDeleteMixin' from oslo.db would see >>> these >>> models hard deleted rather than soft deleted when calling 'soft_delete'. >>> This >>> would avoid the need for operators to run the various project-specific >>> purge >>> tooling. The RFC patch for this is available for review [1]. I can also >>> do this >>> on a project-specific basis and have proposed a similar patch for nova >>> [2], >>> however, doing it in oslo.db means every project that uses >>> 'SoftDeleteMixin' in >>> their models will get this for free. Projects that don't (glance, >>> cinder) can >>> switch to using this mixin and also get it for free. >>> >>> As noted above, I intend to discuss this in the nova room at the PTG, >>> but I'd be >>> interested in people's thoughts ahead of time. Do you think this is a >>> good idea? >>> Should we proceed with it? Perhaps there are there better ways to do >>> this? Let >>> me know! >>> >>> Cheers, >>> Stephen >>> >>> [1] https://review.opendev.org/c/openstack/oslo.db/+/860407 >>> [2] https://review.opendev.org/c/openstack/nova/+/860401 >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Thu Oct 6 07:48:49 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Thu, 6 Oct 2022 09:48:49 +0200 Subject: [nova][cinder][glance][manila][masakari][tacker][oslo] Configurable soft-delete In-Reply-To: References: Message-ID: Le jeu. 6 oct. 2022 ? 09:38, Sylvain Bauza a ?crit : > > > Le jeu. 6 oct. 2022 ? 07:54, Artem Goncharov > a ?crit : > >> Hey, >> >> If this is there mostly for audit purposes then I guess a more efficient >> solution is to introduce an audit table which will have no performance >> impact on the "current state" of service. Audit records are also never >> updated means performance here is relatively straight forward. Audit may >> become a "feature" with enable switch. >> >> This should be a better solution rather then letting db permanently grow >> and forcing admins to constantly "fight" against it. This also gives a much >> cleaner audit experience. Actually this is also not a new approach and is >> being followed in many places (i.e. auditd) >> >> > This isn't true. Any operator that sees the Nova DBs [1] growing can use > two commands (or just cron them) : > nova-manage db archive_deleted_rows > nova-manage db purge > > The first command will archive the soft-deleted records to another table > and the second one will purge them. > My apologies, forgot to add the link https://docs.openstack.org/nova/rocky/cli/nova-manage.html#nova-database Also, forgot the footnote [1] actually only the nova cell DBs are supporting soft-delete records, we don't have this for the API DB. > Here, I don't see why we would change this by a configuration option, but > we can discuss this at the PTG like Stephen said. > -Sylvain > > > Regards, >> Artem >> >> ---- >> typed from mobile, auto-correct typos assumed >> ---- >> >> On Thu, Oct 6, 2022, 07:27 Dmitriy Rabotyagov >> wrote: >> >>> Not having soft delete in the database is really quite bad for operators >>> and it's not about tooling, but it's about audit purposes. >>> >>> If take nova as example, this also means that once server is deleted, >>> event log will be also wiped with no way to see who and when has performed >>> delete action. And that is really used feature, as we got requests like >>> "why my VM has disappeared" at very least one a week. >>> >>> For other services having deleted_at at least points to the datetime >>> where to search in the logs. >>> >>> At the same time I don't see any issue in having soft delete. It's just >>> a matter of one systemd-timer, and too concerned about performance can set >>> it to 1 day, thus almost no impact on db performance. >>> >>> So from operator perspective I can say this is very valuable feature and >>> I personally do struggle regularly with neutron services where it's absent. >>> And I would hate this to disappear at all, as it would be really a >>> nightmare. >>> >>> ??, 5 ???. 2022 ?., 14:48 Stephen Finucane : >>> >>>> ? >>>> >>>> I'm planning on bringing this up in the nova rooms at the PTG in a few >>>> weeks, >>>> but I'm also raising it here since this potentially affects other >>>> service >>>> projects and I can't attend all of those room :) >>>> >>>> Many projects use the concept of "soft delete" in their database >>>> models. A soft >>>> deletable model typically has two additional columns, 'deleted' and >>>> 'deleted_at'. When deleting such a model, instead of actually deleting >>>> the >>>> database row (i.e. 'DELETE FROM table WHERE condition'), we set >>>> 'deleted' to >>>> 'True' and populate the 'deleted_at' column. This is helpful for >>>> auditing >>>> purposes (e.g. you can inspect all resources ever created, even after >>>> they've >>>> been "deleted") but bad for database performance (your tables can grow >>>> without >>>> bound). To work around the performance issues, most projects implement >>>> some kind >>>> of archive or purge command that will allow operators to periodically >>>> clean up >>>> these deleted resources. However, at least in nova, we've long since >>>> come to the >>>> conclusion that soft deleting isn't as useful as initially suspected >>>> and the >>>> need to run these commands is additional work for no benefit. We've >>>> moved toward >>>> not using it for all new models. >>>> >>>> With this said, it's going to be difficult to get away from soft-delete >>>> quickly. >>>> Not only are there database migrations involved, but operators will >>>> need to >>>> rework their tooling to adapt to a new, no-soft-delete world. As such, >>>> I'd like >>>> to propose a half-way measure of making soft-delete configurable. To do >>>> this, >>>> I'd like to add a new flag in oslo.db, '[database] enable_soft_delete'. >>>> When set >>>> to 'False' anyone using the 'SoftDeleteMixin' from oslo.db would see >>>> these >>>> models hard deleted rather than soft deleted when calling >>>> 'soft_delete'. This >>>> would avoid the need for operators to run the various project-specific >>>> purge >>>> tooling. The RFC patch for this is available for review [1]. I can also >>>> do this >>>> on a project-specific basis and have proposed a similar patch for nova >>>> [2], >>>> however, doing it in oslo.db means every project that uses >>>> 'SoftDeleteMixin' in >>>> their models will get this for free. Projects that don't (glance, >>>> cinder) can >>>> switch to using this mixin and also get it for free. >>>> >>>> As noted above, I intend to discuss this in the nova room at the PTG, >>>> but I'd be >>>> interested in people's thoughts ahead of time. Do you think this is a >>>> good idea? >>>> Should we proceed with it? Perhaps there are there better ways to do >>>> this? Let >>>> me know! >>>> >>>> Cheers, >>>> Stephen >>>> >>>> [1] https://review.opendev.org/c/openstack/oslo.db/+/860407 >>>> [2] https://review.opendev.org/c/openstack/nova/+/860401 >>>> >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Thu Oct 6 07:50:35 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Thu, 6 Oct 2022 09:50:35 +0200 Subject: [nova][cinder][glance][manila][masakari][tacker][oslo] Configurable soft-delete In-Reply-To: References: Message-ID: Le jeu. 6 oct. 2022 ? 09:48, Sylvain Bauza a ?crit : > > > Le jeu. 6 oct. 2022 ? 09:38, Sylvain Bauza a ?crit : > >> >> >> Le jeu. 6 oct. 2022 ? 07:54, Artem Goncharov >> a ?crit : >> >>> Hey, >>> >>> If this is there mostly for audit purposes then I guess a more efficient >>> solution is to introduce an audit table which will have no performance >>> impact on the "current state" of service. Audit records are also never >>> updated means performance here is relatively straight forward. Audit may >>> become a "feature" with enable switch. >>> >>> This should be a better solution rather then letting db permanently grow >>> and forcing admins to constantly "fight" against it. This also gives a much >>> cleaner audit experience. Actually this is also not a new approach and is >>> being followed in many places (i.e. auditd) >>> >>> >> This isn't true. Any operator that sees the Nova DBs [1] growing can use >> two commands (or just cron them) : >> nova-manage db archive_deleted_rows >> nova-manage db purge >> >> The first command will archive the soft-deleted records to another table >> and the second one will purge them. >> > > My apologies, forgot to add the link > https://docs.openstack.org/nova/rocky/cli/nova-manage.html#nova-database > > Morning not caffeinated yet, sorry. Wrong link : https://docs.openstack.org/nova/latest/cli/nova-manage.html#db-archive-deleted-rows Also, forgot the footnote > [1] actually only the nova cell DBs are supporting soft-delete records, we > don't have this for the API DB. > > >> Here, I don't see why we would change this by a configuration option, but >> we can discuss this at the PTG like Stephen said. >> -Sylvain >> >> >> Regards, >>> Artem >>> >>> ---- >>> typed from mobile, auto-correct typos assumed >>> ---- >>> >>> On Thu, Oct 6, 2022, 07:27 Dmitriy Rabotyagov >>> wrote: >>> >>>> Not having soft delete in the database is really quite bad for >>>> operators and it's not about tooling, but it's about audit purposes. >>>> >>>> If take nova as example, this also means that once server is deleted, >>>> event log will be also wiped with no way to see who and when has performed >>>> delete action. And that is really used feature, as we got requests like >>>> "why my VM has disappeared" at very least one a week. >>>> >>>> For other services having deleted_at at least points to the datetime >>>> where to search in the logs. >>>> >>>> At the same time I don't see any issue in having soft delete. It's just >>>> a matter of one systemd-timer, and too concerned about performance can set >>>> it to 1 day, thus almost no impact on db performance. >>>> >>>> So from operator perspective I can say this is very valuable feature >>>> and I personally do struggle regularly with neutron services where it's >>>> absent. And I would hate this to disappear at all, as it would be really a >>>> nightmare. >>>> >>>> ??, 5 ???. 2022 ?., 14:48 Stephen Finucane : >>>> >>>>> ? >>>>> >>>>> I'm planning on bringing this up in the nova rooms at the PTG in a few >>>>> weeks, >>>>> but I'm also raising it here since this potentially affects other >>>>> service >>>>> projects and I can't attend all of those room :) >>>>> >>>>> Many projects use the concept of "soft delete" in their database >>>>> models. A soft >>>>> deletable model typically has two additional columns, 'deleted' and >>>>> 'deleted_at'. When deleting such a model, instead of actually deleting >>>>> the >>>>> database row (i.e. 'DELETE FROM table WHERE condition'), we set >>>>> 'deleted' to >>>>> 'True' and populate the 'deleted_at' column. This is helpful for >>>>> auditing >>>>> purposes (e.g. you can inspect all resources ever created, even after >>>>> they've >>>>> been "deleted") but bad for database performance (your tables can grow >>>>> without >>>>> bound). To work around the performance issues, most projects implement >>>>> some kind >>>>> of archive or purge command that will allow operators to periodically >>>>> clean up >>>>> these deleted resources. However, at least in nova, we've long since >>>>> come to the >>>>> conclusion that soft deleting isn't as useful as initially suspected >>>>> and the >>>>> need to run these commands is additional work for no benefit. We've >>>>> moved toward >>>>> not using it for all new models. >>>>> >>>>> With this said, it's going to be difficult to get away from >>>>> soft-delete quickly. >>>>> Not only are there database migrations involved, but operators will >>>>> need to >>>>> rework their tooling to adapt to a new, no-soft-delete world. As such, >>>>> I'd like >>>>> to propose a half-way measure of making soft-delete configurable. To >>>>> do this, >>>>> I'd like to add a new flag in oslo.db, '[database] >>>>> enable_soft_delete'. When set >>>>> to 'False' anyone using the 'SoftDeleteMixin' from oslo.db would see >>>>> these >>>>> models hard deleted rather than soft deleted when calling >>>>> 'soft_delete'. This >>>>> would avoid the need for operators to run the various project-specific >>>>> purge >>>>> tooling. The RFC patch for this is available for review [1]. I can >>>>> also do this >>>>> on a project-specific basis and have proposed a similar patch for nova >>>>> [2], >>>>> however, doing it in oslo.db means every project that uses >>>>> 'SoftDeleteMixin' in >>>>> their models will get this for free. Projects that don't (glance, >>>>> cinder) can >>>>> switch to using this mixin and also get it for free. >>>>> >>>>> As noted above, I intend to discuss this in the nova room at the PTG, >>>>> but I'd be >>>>> interested in people's thoughts ahead of time. Do you think this is a >>>>> good idea? >>>>> Should we proceed with it? Perhaps there are there better ways to do >>>>> this? Let >>>>> me know! >>>>> >>>>> Cheers, >>>>> Stephen >>>>> >>>>> [1] https://review.opendev.org/c/openstack/oslo.db/+/860407 >>>>> [2] https://review.opendev.org/c/openstack/nova/+/860401 >>>>> >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Thu Oct 6 08:07:38 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 6 Oct 2022 10:07:38 +0200 Subject: Query about networking-onos for newer OpenStack releases In-Reply-To: References: Message-ID: Hello Aditya: If you don't have any specific requirement, I would choose one of the Neutron in-tree ML2 plugins: ML2/OVS or ML2/OVN (and ML2/SR-IOV, that can run with the other two). About which one you can choose, I won't point you to any of them. I would prefer you to review the different architectures: * OVS: https://docs.openstack.org/liberty/networking-guide/scenario-classic-ovs.html (this is an old but still valid document to see the different ML2/OVS deployments) * OVN: https://www.openstack.org/videos/summits/austin-2016/practical-ovn-architecture-deployment-and-scale-of-openstack-networking Regards. On Wed, Oct 5, 2022 at 10:37 PM Aditya Sathish wrote: > Hi Lajos and Rodolfo, > > First of all thank you for your previous replies. After discussing it with > my team over here, we have decided to look at other alternatives beyond > ONOS for implementing an SDN controller with OpenStack. > > Lajos, as you mentioned about OVN, we can perform SDN control on the VM > instances using this. One extension we would like to do is to use OVN to > control an openflow hardware switch. The idea is to allow different users > over a network to access the VM instances. Any idea if I can get this done > through OVN? > > I also tried to check out OpenDayLight but even this has not been updated > in some time. > > Any replies would be greatly appreciated! > > Regards, > Aditya > > On Thu, Sep 22, 2022 at 12:22 PM Lajos Katona > wrote: > >> Seems gmail lost some chars from Rodolfo's address, resending. >> >> Lajos Katona ezt ?rta (id?pont: 2022. szept. 22., >> Cs, 17:55): >> >>> Hi, >>> Thanks for considering this question. I do not add this topic now to the >>> agenda, of course it can be discussed any time :-) >>> In openstack OVN as an SDN controller is tested, and more and more >>> companies are using it, so for long term I would check it. >>> OVN is now in-tree in Neutron code base, meaning that you don't need any >>> extra code, you can just use Neutron. >>> OVN uses OVS as soft switch and the OVN code is written in C, and >>> originally started by the same team who develops OVS. >>> >>> If you need any advice, or would like to discuss any topics with the >>> team just ping us on #openstack-neutron channel. >>> >>> Best wishes >>> Lajos Katona (lajoskatona) >>> >>> >>> Aditya Sathish ezt ?rta (id?pont: 2022. szept. 22., >>> Cs, 17:00): >>> >>>> Hi Lajos, >>>> >>>> Thank you for the email. Unfortunately, I'm not sure if I can dedicate >>>> time to maintain this release-on-release. However, I forked the >>>> networking-onos repository and currently verified it with DevStack Zed >>>> along with tempest. (https://github.com/adityasathis/networking-onos). >>>> Considering the changes so far involved only replacing some code to account >>>> for changes in the ML2 callback interface, I think the support should not >>>> be too time consuming if we assume that the ML2 plugin interface remains >>>> the same. >>>> >>>> If we cannot find a way to support networking-onos for long-term >>>> support, do you know a better way to understand the industry implementation >>>> of using SDN controllers with OpenStack? >>>> >>>> Regards, >>>> Aditya. >>>> >>>> On Thu, Sep 22, 2022 at 10:46 AM Lajos Katona >>>> wrote: >>>> >>>>> Hi, >>>>> Do you think that you can maintain networking-onos, if you think yes, >>>>> we can discuss this topic on next drivers meeting (as Rodolfo wrote >>>>> previously). >>>>> Just ping me on IRC (#openstack-neutron lajoskatona) and I add this >>>>> topic for you to the agenda: >>>>> https://wiki.openstack.org/wiki/Meetings/NeutronDrivers >>>>> >>>>> Best Wishes >>>>> Lajos Katona (lajoskatona) >>>>> >>>>> Aditya Sathish ezt ?rta (id?pont: 2022. szept. 20., >>>>> K, 17:48): >>>>> >>>>>> Hello! >>>>>> >>>>>> I am trying to integrate an SDN controller with our lab's OpenStack >>>>>> network. Currently, we have already deployed a version of ONOS to serve our >>>>>> needs and I have been following the SONA project which uses the >>>>>> networking-onos ML2 plugin with OpenStack. However, it seems that the >>>>>> networking-onos project has been retired since the Train release. >>>>>> >>>>>> Is there any way I can get ONOS to work with OpenStack Yoga? If not, >>>>>> what is the go-to way to integrate an SDN controller with OpenFlow support >>>>>> with Neutron?? >>>>>> >>>>>> Any help will be much appreciated. >>>>>> >>>>>> Regards, >>>>>> Aditya. >>>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Thu Oct 6 08:44:40 2022 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 6 Oct 2022 10:44:40 +0200 Subject: [nova][cinder][glance][manila][masakari][tacker][oslo] Configurable soft-delete In-Reply-To: References: Message-ID: <20221006084440.vvg6qdqdcmrdxusl@localhost> On 06/10, Dmitriy Rabotyagov wrote: > Oh, yes, this is a good alternative. > Actually I was thinking about smth like "openstack event list" > (as only nova does have that) for quite a while, but without having the > resources to lead and help out in implementation across projects I didn't > dare to raise this topic. But it's probably high time to start discussing > it at the very least and make a proposal as a community goal based on the > outcome of these discussions. > Hi, The Cinder team has also been exploring the idea of having transaction records to help operators see the series of operations on resources (to figure out what happened to a resource) as well as help see what operations are currently happening (great for planning upgrades or bouncing services to other nodes) [1]. If this is going to be a completely new feature in all projects we may want to agree on some commonalities such as naming and available functionality: - Transaction history, including deleted resources - Ongoing operations: - Detailed: All info for each of the transactions - Summary: - Global: i.e. 10 migrations, 3 attachments - By host: i.e. Host1: 5 migrations | Host2: 5 migrations, 3 attachments [1]: https://review.opendev.org/c/openstack/cinder-specs/+/845176/2/specs/zed/transaction-tracking.rst > ??, 6 ???. 2022 ?., 07:48 Artem Goncharov : > > > Hey, > > > > If this is there mostly for audit purposes then I guess a more efficient > > solution is to introduce an audit table which will have no performance > > impact on the "current state" of service. Audit records are also never > > updated means performance here is relatively straight forward. Audit may > > become a "feature" with enable switch. > > > > This should be a better solution rather then letting db permanently grow > > and forcing admins to constantly "fight" against it. This also gives a much > > cleaner audit experience. Actually this is also not a new approach and is > > being followed in many places (i.e. auditd) > > > > Regards, > > Artem > > > >>> With this said, it's going to be difficult to get away from > >>> soft-delete quickly. Not only are there database migrations > >>> involved, but operators will need to rework their tooling to adapt > >>> to a new, no-soft-delete world. As such, I'd like to propose a > >>> half-way measure of making soft-delete configurable. To do this, > >>> I'd like to add a new flag in oslo.db, '[database] > >>> enable_soft_delete'. When set to 'False' anyone using the > >>> 'SoftDeleteMixin' from oslo.db would see these models hard deleted > >>> rather than soft deleted when calling 'soft_delete'. This would I don't think it's a big deal, but from the Cinder perspective this would require additional work, because in our DB layer we only use the `soft_delete` method for 3 tables: Volume Types, Volume Type Access, and Group Type Access. All other tables use other mechanisms to do the soft deletes. Cheers, Gorka. > >>> avoid the need for operators to run the various project-specific > >>> purge tooling. The RFC patch for this is available for review [1]. > >>> I can also do this on a project-specific basis and have proposed a > >>> similar patch for nova [2], however, doing it in oslo.db means > >>> every project that uses 'SoftDeleteMixin' in their models will get > >>> this for free. Projects that don't (glance, cinder) can switch to > >>> using this mixin and also get it for free. > >>> > >>> As noted above, I intend to discuss this in the nova room at the PTG, > >>> but I'd be > >>> interested in people's thoughts ahead of time. Do you think this is a > >>> good idea? > >>> Should we proceed with it? Perhaps there are there better ways to do > >>> this? Let > >>> me know! > >>> > >>> Cheers, > >>> Stephen > >>> > >>> [1] https://review.opendev.org/c/openstack/oslo.db/+/860407 > >>> [2] https://review.opendev.org/c/openstack/nova/+/860401 > >>> > >>> > >>> From sbauza at redhat.com Thu Oct 6 12:20:54 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Thu, 6 Oct 2022 14:20:54 +0200 Subject: [nova][placement] add your PTG topics before Oct-06 please ! In-Reply-To: References: Message-ID: Le jeu. 29 sept. 2022 ? 10:14, Sylvain Bauza a ?crit : > Hi folks, > > as I said in the nova meeting, I'd like to create an agenda for our PTG > topics. Given the PTG is in 3 weeks, please provide your topics you'd like > to discuss at the PTG in [1] so I could look at them and try to provide an > agenda for them. > Eventually, I'd love to have most of the topics by Oct 6th as I said in > the title so the agenda would be on the Friday Oct 7th. > > Also, if you can't be in the Nova PTG sessions for all the PTG schedule > (between Tues and Fri for Nova), just add in the topic when you would like > to be around. > > As a reminder, please provide your topics today if you can, I'll provide an agenda by tomorrow with the existing topics, if we will have other topics, they would be discussed when we have time, then. Thanks, -S > Thanks, > -Sylvain > > [1] https://etherpad.opendev.org/p/nova-antelope-ptg > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jackdawblues at gmail.com Thu Oct 6 08:23:07 2022 From: jackdawblues at gmail.com (jackdaw blues) Date: Thu, 6 Oct 2022 11:23:07 +0300 Subject: [SECURITY] Openstack Security Assessments Message-ID: Hi all, I am currently leading a team of offensive security engineers and we are trying to create a checklist for each component of Openstack in the context of Security Assessment. At the end of the day what we want to end up with is common exploitable configuration weaknesses for each component. It will be against configuration or installation mistakes that result in unintended privileges or information disclosure, etc. Patch management isn't in scope. Not the exact output, but these links can give a good idea of the contents of the security assessment we are planning (these are for AWS): http://flaws.cloud/ http://flaws2.cloud/ Has anyone had any experience regarding the topic above? If so please feel free to connect. Regardless of the experience, if you want to contribute and at mark zero just like we are, you are still welcome and we can help each other create this assessment checklist. Cheers, Asil -------------- next part -------------- An HTML attachment was scrubbed... URL: From soukessou at gmail.com Thu Oct 6 13:32:20 2022 From: soukessou at gmail.com (samir oukessou) Date: Thu, 6 Oct 2022 14:32:20 +0100 Subject: Limit Access to a Group for a Project- Openstack 13 & 17 Message-ID: Dears, I have a question regarding Openstack, is it possible to limit access for a user to get read only on a specific project that he can only see the instances in that project and eliminate the actions edit,start,stop or delete instance ? i have tried some tests grant *member* role only to the group but all users in the group were able to do everything with the instances thank you in advance, Samir -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Thu Oct 6 14:55:20 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 6 Oct 2022 14:55:20 +0000 Subject: [security-sig] Openstack Security Assessments In-Reply-To: References: Message-ID: <20221006145520.wk3dhow2wrz3vyr5@yuggoth.org> [I'm keeping you in Cc since you don't appear to be subscribed to the mailing list, but please still respond to the list.] On 2022-10-06 11:23:07 +0300 (+0300), jackdaw blues wrote: > I am currently leading a team of offensive security engineers and > we are trying to create a checklist for each component of > Openstack in the context of Security Assessment. Welcome! As the current chair of the OpenStack Security SIG (Special Interest Group)[*], I'm happy to do what I can to help and encourage other community members to further enable your efforts. > At the end of the day what we want to end up with is common > exploitable configuration weaknesses for each component. It will > be against configuration or installation mistakes that result in > unintended privileges or information disclosure, etc. Patch > management isn't in scope. > > Not the exact output, but these links can give a good idea of the > contents of the security assessment we are planning (these are for > AWS): > http://flaws.cloud/ > http://flaws2.cloud/ > > Has anyone had any experience regarding the topic above? If so > please feel free to connect. Regardless of the experience, if you > want to contribute and at mark zero just like we are, you are > still welcome and we can help each other create this assessment > checklist. I'm not aware of any efforts along those lines yet, as far as a coordinated attempt at providing secure usage guidance to end users of OpenStack services, but it sounds like an interesting avenue for research. Most of our focus, to date, has been on solving vulnerabilities within the OpenStack services and tools, and providing guidance to people who deploy and run those services in order that they may better secure their installations. End user guidance has mostly been the realm of the organizations running the software, at least so far. [*] https://wiki.openstack.org/wiki/Security-SIG -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From dms at danplanet.com Thu Oct 6 15:06:06 2022 From: dms at danplanet.com (Dan Smith) Date: Thu, 06 Oct 2022 08:06:06 -0700 Subject: [nova][cinder][glance][manila][masakari][tacker][oslo] Configurable soft-delete In-Reply-To: (Sylvain Bauza's message of "Thu, 6 Oct 2022 09:38:04 +0200") References: Message-ID: Sylvain Bauza writes: > This isn't true. Any operator that sees the Nova DBs [1] growing can use two commands (or just cron them) : > nova-manage db archive_deleted_rows > nova-manage db purge It's actually easier. Adding --purge to the first command removes the need to run the second. --Dan From ralonsoh at redhat.com Thu Oct 6 15:08:18 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 6 Oct 2022 17:08:18 +0200 Subject: [neutron][tempest][all] Broken CI, any job inherited from "devstack" Message-ID: Hello all: I broke the OpenStack CI (good start as Neutron PTL). I pushed [1] testing only against the Neutron CI. After making the needed changes [2], I thought any other job would be safe. I've opened [3]. The default OVN version installed in the CI, using Ubuntu 20.04, is v20.03, that is a bit old. I've proposed to bump to v21.06, extensively tested in the Neutron CI. Any tempest job from any project is inherited from this one. Once we have migrated to Ubuntu 22.04 during this cycle, we'll remove this forced OVN installation from source. Regards. [1]https://review.opendev.org/c/openstack/neutron/+/859642 [2]https://review.opendev.org/c/openstack/neutron/+/860078/ [3]https://bugs.launchpad.net/devstack/+bug/1991952 [4]https://review.opendev.org/c/openstack/devstack/+/860577 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Thu Oct 6 15:30:26 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 6 Oct 2022 17:30:26 +0200 Subject: [neutron][tempest][all] Broken CI, any job inherited from "devstack" In-Reply-To: References: Message-ID: Hello: I've filled [1]. We'll revert the Neutron patch. Regards. [1]https://bugs.launchpad.net/neutron/+bug/1991962 On Thu, Oct 6, 2022 at 5:08 PM Rodolfo Alonso Hernandez wrote: > Hello all: > > I broke the OpenStack CI (good start as Neutron PTL). I pushed [1] testing > only against the Neutron CI. After making the needed changes [2], I thought > any other job would be safe. > > I've opened [3]. The default OVN version installed in the CI, using Ubuntu > 20.04, is v20.03, that is a bit old. I've proposed to bump to v21.06, > extensively tested in the Neutron CI. Any tempest job from any project is > inherited from this one. > > Once we have migrated to Ubuntu 22.04 during this cycle, we'll remove this > forced OVN installation from source. > > Regards. > > [1]https://review.opendev.org/c/openstack/neutron/+/859642 > [2]https://review.opendev.org/c/openstack/neutron/+/860078/ > [3]https://bugs.launchpad.net/devstack/+bug/1991952 > [4]https://review.opendev.org/c/openstack/devstack/+/860577 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Thu Oct 6 15:41:01 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 6 Oct 2022 17:41:01 +0200 Subject: [nova][cinder][glance][manila][masakari][tacker][oslo] Configurable soft-delete In-Reply-To: References: Message-ID: I would like to strongly second this: not having soft delete, or an equivalent for audit purposes (I am not attached to the actual implementation), would be a great loss. We actually have a long standing task to add soft delete to Blazar, which I am hoping will be merged in Antelope. As an operator, I also get annoyed by the lack of soft delete in Neutron, for example to answer the question: who was using this specific floating IP at this specific time? On Thu, 6 Oct 2022 at 07:28, Dmitriy Rabotyagov wrote: > Not having soft delete in the database is really quite bad for operators > and it's not about tooling, but it's about audit purposes. > > If take nova as example, this also means that once server is deleted, > event log will be also wiped with no way to see who and when has performed > delete action. And that is really used feature, as we got requests like > "why my VM has disappeared" at very least one a week. > > For other services having deleted_at at least points to the datetime where > to search in the logs. > > At the same time I don't see any issue in having soft delete. It's just a > matter of one systemd-timer, and too concerned about performance can set it > to 1 day, thus almost no impact on db performance. > > So from operator perspective I can say this is very valuable feature and I > personally do struggle regularly with neutron services where it's absent. > And I would hate this to disappear at all, as it would be really a > nightmare. > > ??, 5 ???. 2022 ?., 14:48 Stephen Finucane : > >> ? >> >> I'm planning on bringing this up in the nova rooms at the PTG in a few >> weeks, >> but I'm also raising it here since this potentially affects other service >> projects and I can't attend all of those room :) >> >> Many projects use the concept of "soft delete" in their database models. >> A soft >> deletable model typically has two additional columns, 'deleted' and >> 'deleted_at'. When deleting such a model, instead of actually deleting the >> database row (i.e. 'DELETE FROM table WHERE condition'), we set 'deleted' >> to >> 'True' and populate the 'deleted_at' column. This is helpful for auditing >> purposes (e.g. you can inspect all resources ever created, even after >> they've >> been "deleted") but bad for database performance (your tables can grow >> without >> bound). To work around the performance issues, most projects implement >> some kind >> of archive or purge command that will allow operators to periodically >> clean up >> these deleted resources. However, at least in nova, we've long since come >> to the >> conclusion that soft deleting isn't as useful as initially suspected and >> the >> need to run these commands is additional work for no benefit. We've moved >> toward >> not using it for all new models. >> >> With this said, it's going to be difficult to get away from soft-delete >> quickly. >> Not only are there database migrations involved, but operators will need >> to >> rework their tooling to adapt to a new, no-soft-delete world. As such, >> I'd like >> to propose a half-way measure of making soft-delete configurable. To do >> this, >> I'd like to add a new flag in oslo.db, '[database] enable_soft_delete'. >> When set >> to 'False' anyone using the 'SoftDeleteMixin' from oslo.db would see these >> models hard deleted rather than soft deleted when calling 'soft_delete'. >> This >> would avoid the need for operators to run the various project-specific >> purge >> tooling. The RFC patch for this is available for review [1]. I can also >> do this >> on a project-specific basis and have proposed a similar patch for nova >> [2], >> however, doing it in oslo.db means every project that uses >> 'SoftDeleteMixin' in >> their models will get this for free. Projects that don't (glance, cinder) >> can >> switch to using this mixin and also get it for free. >> >> As noted above, I intend to discuss this in the nova room at the PTG, but >> I'd be >> interested in people's thoughts ahead of time. Do you think this is a >> good idea? >> Should we proceed with it? Perhaps there are there better ways to do >> this? Let >> me know! >> >> Cheers, >> Stephen >> >> [1] https://review.opendev.org/c/openstack/oslo.db/+/860407 >> [2] https://review.opendev.org/c/openstack/nova/+/860401 >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From allison at openinfra.dev Thu Oct 6 15:20:18 2022 From: allison at openinfra.dev (Allison Price) Date: Thu, 6 Oct 2022 10:20:18 -0500 Subject: [zed] OpenInfra Live - October 6 at 1400 UTC In-Reply-To: <11530FD5-78FF-492E-B787-8156381747C7@openinfra.dev> References: <11530FD5-78FF-492E-B787-8156381747C7@openinfra.dev> Message-ID: <4708D1F9-B53C-4AF1-AAFE-21DF894E151A@openinfra.dev> Thank you to everyone who tuned into the OpenStack Zed episode of OpenInfra Live today! If you missed it, we have you covered! The Superuser recap [1] contains a link to the recording and I have attached the slides from the presentation to this thread for folks who want to explore the links that the contributors shared today. Congratulations again on completing another on-time release?now, onto Antelope! [1] https://superuser.openstack.org/articles/openstack-zed-the-end-of-the-alphabet-the-beginning-of-a-new-era-openinfra-live-recap/ > On Oct 5, 2022, at 3:37 PM, Allison Price wrote: > > Hi everyone, > > This week?s OpenInfra Live episode is brought to you by the OpenStack community who just delivered its 26th on-time release today! Join us to learn about the latest from community leaders about what was delivered in Zed and what we can expect in Antelope, OpenStack's 27th release targeting early 2023. > > Episode: OpenStack Zed: The End of the Alphabet, The Beginning of a New Era > > Date and time: October 6 at 1400 UTC > > You can watch us live on: > YouTube: https://www.youtube.com/watch?v=MSbB3L9_MeY > LinkedIn: https://www.linkedin.com/video/event/urn:li:ugcPost:6982723169144950786/ > Facebook: https://www.facebook.com/events/390328576642133 > WeChat: recording will be posted on OpenStack WeChat after the live stream > > Speakers: > Kendall Nelson, OpenInfra Foundation > Carlos Silva, Manila > Jay Faulkner, Ironic > Sylvain Bauza, Nova > Lajos Katona, Neutron > Wu Wenxiang, Skyline > Martin Kopec, Interop Working Group > Liye Pang, Venus > > Have an idea for a future episode? Share it now at ideas.openinfra.live . > > Thanks, > Allison > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenInfra Live Slides_ OpenStack Zed.pdf Type: application/pdf Size: 1662154 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Thu Oct 6 16:12:41 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 6 Oct 2022 16:12:41 +0000 Subject: [neutron][tempest][all] Broken CI, any job inherited from "devstack" In-Reply-To: References: Message-ID: <20221006161241.4xn6a252y3ah2vfj@yuggoth.org> On 2022-10-06 17:08:18 +0200 (+0200), Rodolfo Alonso Hernandez wrote: [...] > I've opened [3]. The default OVN version installed in the CI, using Ubuntu > 20.04, is v20.03, that is a bit old. I've proposed to bump to v21.06, > extensively tested in the Neutron CI. Any tempest job from any project is > inherited from this one. > > Once we have migrated to Ubuntu 22.04 during this cycle, we'll remove this > forced OVN installation from source. [...] What's the implication for upgrades? Historically, we've needed the software to be operable on the prior platform (upgrade OpenStack from Zed to 2023.1/Antelope, then upgrade Ubuntu from Focal to Jammy). Now with SLURP in the picture, we'll even need OpenStack 2023.2/B working on Focal before upgrading to Jammy, right? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ralonsoh at redhat.com Thu Oct 6 16:24:10 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 6 Oct 2022 18:24:10 +0200 Subject: [neutron][tempest][all] Broken CI, any job inherited from "devstack" In-Reply-To: <20221006161241.4xn6a252y3ah2vfj@yuggoth.org> References: <20221006161241.4xn6a252y3ah2vfj@yuggoth.org> Message-ID: I don't see in the documentation (probably I didn't check all of it) that an OpenStack version upgrade doesn't imply other library/modules/services upgrade. In this case, you will need to bump the OVN/OVS version, keeping the same OVS/NB/SB information (probably not the same database structures). In any case, I'll try to make the Neutron server compatible with both scenarios, but that will take some time to implement (if possible). On Thu, Oct 6, 2022 at 6:13 PM Jeremy Stanley wrote: > On 2022-10-06 17:08:18 +0200 (+0200), Rodolfo Alonso Hernandez wrote: > [...] > > I've opened [3]. The default OVN version installed in the CI, using > Ubuntu > > 20.04, is v20.03, that is a bit old. I've proposed to bump to v21.06, > > extensively tested in the Neutron CI. Any tempest job from any project is > > inherited from this one. > > > > Once we have migrated to Ubuntu 22.04 during this cycle, we'll remove > this > > forced OVN installation from source. > [...] > > What's the implication for upgrades? Historically, we've needed the > software to be operable on the prior platform (upgrade OpenStack > from Zed to 2023.1/Antelope, then upgrade Ubuntu from Focal to > Jammy). Now with SLURP in the picture, we'll even need OpenStack > 2023.2/B working on Focal before upgrading to Jammy, right? > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Thu Oct 6 16:33:25 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 06 Oct 2022 09:33:25 -0700 Subject: [neutron][tempest][all] Broken CI, any job inherited from "devstack" In-Reply-To: References: <20221006161241.4xn6a252y3ah2vfj@yuggoth.org> Message-ID: <96583601-14c2-4d4b-9f38-da6b74dbfdf3@app.fastmail.com> On Thu, Oct 6, 2022, at 9:24 AM, Rodolfo Alonso Hernandez wrote: > I don't see in the documentation (probably I didn't check all of it) > that an OpenStack version upgrade doesn't imply other > library/modules/services upgrade. In this case, you will need to bump > the OVN/OVS version, keeping the same OVS/NB/SB information (probably > not the same database structures). > I don't know about documentation (but that may have come along with the new SLURP stuff), but Grenade enforces this sort of thing in CI. When we run Grenade against a branch we install the old version of OpenStack on the platform that the old version of OpenStack was tested on and then upgrade to the new current version of OpenStack. This upgrade is done on a host without upgrading the host itself. > In any case, I'll try to make the Neutron server compatible with both > scenarios, but that will take some time to implement (if possible). > > On Thu, Oct 6, 2022 at 6:13 PM Jeremy Stanley wrote: >> On 2022-10-06 17:08:18 +0200 (+0200), Rodolfo Alonso Hernandez wrote: >> [...] >> > I've opened [3]. The default OVN version installed in the CI, using Ubuntu >> > 20.04, is v20.03, that is a bit old. I've proposed to bump to v21.06, >> > extensively tested in the Neutron CI. Any tempest job from any project is >> > inherited from this one. >> > >> > Once we have migrated to Ubuntu 22.04 during this cycle, we'll remove this >> > forced OVN installation from source. >> [...] >> >> What's the implication for upgrades? Historically, we've needed the >> software to be operable on the prior platform (upgrade OpenStack >> from Zed to 2023.1/Antelope, then upgrade Ubuntu from Focal to >> Jammy). Now with SLURP in the picture, we'll even need OpenStack >> 2023.2/B working on Focal before upgrading to Jammy, right? >> -- >> Jeremy Stanley From dms at danplanet.com Thu Oct 6 16:55:30 2022 From: dms at danplanet.com (Dan Smith) Date: Thu, 06 Oct 2022 09:55:30 -0700 Subject: [neutron][tempest][all] Broken CI, any job inherited from "devstack" In-Reply-To: <96583601-14c2-4d4b-9f38-da6b74dbfdf3@app.fastmail.com> (Clark Boylan's message of "Thu, 06 Oct 2022 09:33:25 -0700") References: <20221006161241.4xn6a252y3ah2vfj@yuggoth.org> <96583601-14c2-4d4b-9f38-da6b74dbfdf3@app.fastmail.com> Message-ID: > I don't know about documentation (but that may have come along with > the new SLURP stuff), but Grenade enforces this sort of thing in > CI. When we run Grenade against a branch we install the old version of > OpenStack on the platform that the old version of OpenStack was tested > on and then upgrade to the new current version of OpenStack. This > upgrade is done on a host without upgrading the host itself. Yeah, I think this is "expectation by code" in that grenade requires that to work, so (assuming you're running those jobs) you'll be forced into that support. Perhaps we need to expand on that a bit in words to make sure it's clear. --Dan From fungi at yuggoth.org Thu Oct 6 17:06:55 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 6 Oct 2022 17:06:55 +0000 Subject: [neutron][tempest][all] Broken CI, any job inherited from "devstack" In-Reply-To: References: <20221006161241.4xn6a252y3ah2vfj@yuggoth.org> <96583601-14c2-4d4b-9f38-da6b74dbfdf3@app.fastmail.com> Message-ID: <20221006170655.gjpsitcoo2ur4p2h@yuggoth.org> On 2022-10-06 09:55:30 -0700 (-0700), Dan Smith wrote: [...] > Yeah, I think this is "expectation by code" in that grenade requires > that to work, so (assuming you're running those jobs) you'll be forced > into that support. Perhaps we need to expand on that a bit in words to > make sure it's clear. One fairly lightweight solution would just be to include this in the cycle-specific PTI doc. So for the 2023.1 tested runtime list both Ubuntu 20.04 and Ubuntu 22.04 (maybe with a boilerplate sentence that the former platform is only tested insofar as to support in-place upgrading of OpenStack software before upgrading to the latter platform in that release). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Thu Oct 6 17:23:04 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 6 Oct 2022 17:23:04 +0000 Subject: [neutron][tempest][all] Broken CI, any job inherited from "devstack" In-Reply-To: <20221006161241.4xn6a252y3ah2vfj@yuggoth.org> References: <20221006161241.4xn6a252y3ah2vfj@yuggoth.org> Message-ID: <20221006172304.2z33tdo2lfcobdcz@yuggoth.org> On 2022-10-06 16:12:41 +0000 (+0000), Jeremy Stanley wrote: [...] > Now with SLURP in the picture, we'll even need OpenStack 2023.2/B > working on Focal before upgrading to Jammy, right? Just to correct myself, Zed to 2023.2/B is not SLURP. If this transition were occurring between 2023.1/Antelope and 2023.2/B then we'd also need to solve it for upgrades from 2023.1/Antelope to 2024.1/C, but that's thankfully not the case this time. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From smooney at redhat.com Thu Oct 6 17:38:17 2022 From: smooney at redhat.com (Sean Mooney) Date: Thu, 06 Oct 2022 18:38:17 +0100 Subject: Limit Access to a Group for a Project- Openstack 13 & 17 In-Reply-To: References: Message-ID: On Thu, 2022-10-06 at 14:32 +0100, samir oukessou wrote: > Dears, > > I have a question regarding Openstack, is it possible to limit access for a > user to get read only on a specific project that he can only see the > instances in that project and eliminate the actions edit,start,stop or > delete instance ? > > i have tried some tests grant *member* role only to the group but all users > in the group were able to do everything with the instances what your looking for is the reader role. > > > thank you in advance, > > > Samir From gmann at ghanshyammann.com Thu Oct 6 18:35:32 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 06 Oct 2022 11:35:32 -0700 Subject: [neutron][tempest][all] Broken CI, any job inherited from "devstack" In-Reply-To: <20221006170655.gjpsitcoo2ur4p2h@yuggoth.org> References: <20221006161241.4xn6a252y3ah2vfj@yuggoth.org> <96583601-14c2-4d4b-9f38-da6b74dbfdf3@app.fastmail.com> <20221006170655.gjpsitcoo2ur4p2h@yuggoth.org> Message-ID: <183ae94d38c.dfdf2096295966.1384301667564210604@ghanshyammann.com> ---- On Thu, 06 Oct 2022 10:06:55 -0700 Jeremy Stanley wrote --- > On 2022-10-06 09:55:30 -0700 (-0700), Dan Smith wrote: > [...] > > Yeah, I think this is "expectation by code" in that grenade requires > > that to work, so (assuming you're running those jobs) you'll be forced > > into that support. Perhaps we need to expand on that a bit in words to > > make sure it's clear. > > One fairly lightweight solution would just be to include this in the > cycle-specific PTI doc. So for the 2023.1 tested runtime list both > Ubuntu 20.04 and Ubuntu 22.04 (maybe with a boilerplate sentence > that the former platform is only tested insofar as to support > in-place upgrading of OpenStack software before upgrading to the > latter platform in that release). yeah, we do test it but I agree to document it somewhere in PTI, I started the documentation, feel free to review/feedback if more information needs to be mentioned - https://review.opendev.org/c/openstack/governance/+/860599 -gmann > -- > Jeremy Stanley > From katonalala at gmail.com Thu Oct 6 19:19:04 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 6 Oct 2022 21:19:04 +0200 Subject: [neutron] Drivers meeting agenda -06.10.2022. Message-ID: Hi Neutron Drivers, The agenda for tomorrow's drivers meeting is at [1]. We have the following RFE to discuss: [RFE] Strict minimum bandwidth support for tunnelled networks (#link https://bugs.launchpad.net/neutron/+bug/1991965 ) [1] https://wiki.openstack.org/wiki/Meetings/NeutronDrivers#Agenda See you at the meeting tomorrow. Lajos Katona (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From manchandavishal143 at gmail.com Fri Oct 7 11:37:15 2022 From: manchandavishal143 at gmail.com (vishal manchanda) Date: Fri, 7 Oct 2022 17:07:15 +0530 Subject: [horizon] Antelope PTG Schedule Message-ID: Hello Team, Please Find the Schedule for Horizon PTG in the eherpad [1]. Feel Free to add the topics you want to discuss in the PTG. Don't forget to register for PTG, if not done yet [2]. See you at the PTG! Thanks & Regards, Vishal Manchanda (irc: vishalmanchanda) [1] https://etherpad.opendev.org/p/horizon-antelope-ptg [2] https://openinfra-ptg.eventbrite.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Sat Oct 8 01:11:57 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 07 Oct 2022 18:11:57 -0700 Subject: [all][tc] What's happening in Technical Committee: summary 2022 Oct 7: Reading: 5 min Message-ID: <183b5261dfe.cc404432377283.4137496043425468416@ghanshyammann.com> Hello Everyone, Here is this week's summary of the Technical Committee activities. 1. TC Meetings: ============ * We had this week's meeting on Oct 6. Most of the meeting discussions are summarized in this email. Meeting recording is present @ https://www.youtube.com/watch?v=HhL67mf4uAY&t=464s and summary logs are available @ https://meetings.opendev.org/meetings/tc/2022/tc.2022-10-06-15.00.log.html * Next TC weekly meeting will be a video call on Oct 13 Thursday at 15:00 UTC, feel free to add the topic on the agenda[1] by Oct 12. 2. What we completed this week: ========================= * Dedicated Zed release to Ilya Etingof[2] * Selected Ghanshyam as chair [3] 3. Activities In progress: ================== TC Tracker for Zed cycle ------------------------------ * Zed tracker etherpad includes the TC working items[4], Five are completed and other items are in-progress. Open Reviews ----------------- * Six open reviews for ongoing activities[5]. New community-wide goal "Migration CI/CD to Ubuntu 22.04" -------------------------------------------------------------------------------------- Dmitriy proposed to select this goal for 2023.1 cycle [6]. 2023.1 cycle Leaderless projects & TC Chair ---------------------------------------------------- * Zun project PTL appointment is under review[7][8]. * Slaweq volunteer to serve as Vice chair[9]. 2023.1 cycle TC PTG planning ------------------------------------ * Etherpads to add the topics: ** https://etherpad.opendev.org/p/tc-2023-1-ptg ** https://etherpad.opendev.org/p/tc-leaders-interaction-2023-1 * I sent an email about the 'Operator Hours' slots in this PTG, please check and reserve the operator hour slot for your project[10] 2021 User Survey TC Question Analysis ----------------------------------------------- No update on this. The survey summary is up for review[11]. Feel free to check and provide feedback. Fixing Zuul config error ---------------------------- We request projects having zuul config error to fix them, Keep supported stable branches as priority and Extended maintenance stable branch as low priority[12][13]. Project updates ------------------- * None. 4. How to contact the TC: ==================== If you would like to discuss or give feedback to TC, you can reach out to us in multiple ways: 1. Email: you can send the email with tag [tc] on openstack-discuss ML[14]. 2. Weekly meeting: The Technical Committee conduct a weekly meeting every Thursday 15 UTC [15] 3. Ping us using 'tc-members' nickname on #openstack-tc IRC channel. [1] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting [2] https://review.opendev.org/c/openstack/governance/+/859464 [3] https://review.opendev.org/c/openstack/governance/+/858957 [4] https://etherpad.opendev.org/p/tc-zed-tracker [5] https://review.opendev.org/q/projects:openstack/governance+status:open [6] https://review.opendev.org/c/openstack/governance/+/860040 [7] https://review.opendev.org/c/openstack/governance/+/858980 [8] https://review.opendev.org/c/openstack/governance/+/860759 [9] https://review.opendev.org/c/openstack/governance/+/860352 [10] https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030301.html [11] https://review.opendev.org/c/openstack/governance/+/836888 [12] https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030505.html [13] https://etherpad.opendev.org/p/zuul-config-error-openstack [14] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss [15] http://eavesdrop.openstack.org/#Technical_Committee_Meeting -gmann From monish at xaasability.com Sat Oct 8 07:15:12 2022 From: monish at xaasability.com (Monish Selvaraj) Date: Sat, 8 Oct 2022 12:45:12 +0530 Subject: Failed to create user port error Message-ID: Hi, Iam recently enabled trove in our kolla-ansible openstack. I can't create an instance in trove. It seems *Failed to create User port for instance 54ab3ff2-ecec-4b22-a736-3c36972542e3: Subnet 98a65c6e-2473-48f1-a578-702a08649c73 is not associated with router.* Also added the following parameter in trove.conf and restarted the docker container. But it's not working. [network] enable_access_check = False [image: image.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 85699 bytes Desc: not available URL: From wodel.youchi at gmail.com Sun Oct 9 15:19:44 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Sun, 9 Oct 2022 16:19:44 +0100 Subject: [kolla-ansible][Xena] SSL certificate expired Message-ID: Hi, My SSL certificate has expired, and now I cannot authenticate into horizon and I have these errors : *WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting https://dashint.cloud.exemple.com:35357 . Attempting to parse version from URL.: keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to https:// dashint.cloud.exemple.com :35357: HTTPSConnectionPool(host=' dashint.cloud.exemple.com ', port=35357): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))* In my globals.yml I have this parameter : kolla_verify_tls_backend: "no" 1 - How do I disable SSL verification for now? 2 - How to install a new SSL certificate? Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nobody Mon Oct 10 06:08:13 2022 From: nobody Date: Mon, 10 Oct 2022 15:08:13 +0900 Subject: Kroxvx rsvxfzwql Message-ID: This message has been removed. From arxcruz at redhat.com Mon Oct 10 09:25:31 2022 From: arxcruz at redhat.com (Arx Cruz) Date: Mon, 10 Oct 2022 11:25:31 +0200 Subject: [tripleo] Gate blocker Message-ID: Hello, We have a gate blocker due https://bugs.launchpad.net/tripleo/+bug/1992305 please do not recheck jobs until https://review.opendev.org/c/openstack/tripleo-quickstart/+/860810 get merged. I will let you know when gates are green again. -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Mon Oct 10 11:21:01 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Mon, 10 Oct 2022 12:21:01 +0100 Subject: [kolla-ansible][Xena] SSL certificate expired In-Reply-To: References: Message-ID: Hi, I tried to deploy a new certificate using :kolla-ansible reconfigure But I got : "module_stderr": "*Failed to discover available identity versions when contacting https://dashint.cloud.exemple.com:35357 *. Attemptin g to parse version from URL.\nTraceback (most recent call last):\n File \"/opt/ansible/lib/python3.6/site-packages/urllib3/connectio npool.py\", line 706, in urlopen\n chunked=chunked,\n File \"/opt/ansible/lib/python3.6/site-packages/urllib3/connectionpool.py\" , line 382, in _make_request\n self._validate_conn(conn)\n File \"/opt/ansible/lib/python3.6/site-packages/urllib3/connectionpool .py\", line 1010, in _validate_conn\n conn.connect()\n File \"/opt/ansible/lib/python3.6/site-packages/urllib3/connection.py\", l ine 421, in connect\n tls_in_tls=tls_in_tls,\n File \"/opt/ansible/lib/python3.6/site-packages/urllib3/util/ssl_.py\", line 450, in ssl_wrap_socket\n sock, context, tls_in_tls, server_hostname=server_hostname\n File \"/opt/ansible/lib/python3.6/site-packages /urllib3/util/ssl_.py\", line 493, in _ssl_wrap_socket_impl\n return ssl_context.wrap_socket(sock, server_hostname=server_hostname )\n File \"/usr/lib64/python3.6/ssl.py\", line 365, in wrap_socket\n _context=self, _session=session)\n File \"/usr/lib64/python 3.6/ssl.py\", line 776, in __init__\n self.do_handshake()\n File \"/usr/lib64/python3.6/ssl.py\", line 1036, in do_handshake\n self._sslobj.do_handshake()\n File \"/usr/lib64/python3.6/ssl.py\", line 648, in do_handshake\n self._sslobj.do_handshake()\nssl .*SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed* Some help please Regards. Le dim. 9 oct. 2022 ? 16:19, wodel youchi a ?crit : > Hi, > > My SSL certificate has expired, and now I cannot authenticate into horizon > and I have these errors : > *WARNING keystoneauth.identity.generic.base [-] Failed to discover > available identity versions when contacting > https://dashint.cloud.exemple.com:35357 > . Attempting to parse version from > URL.: keystoneauth1.exceptions.connection.SSLError: SSL exception > connecting to https:// dashint.cloud.exemple.com > :35357: HTTPSConnectionPool(host=' > dashint.cloud.exemple.com ', port=35357): > Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: > CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))* > > In my globals.yml I have this parameter : > kolla_verify_tls_backend: "no" > > 1 - How do I disable SSL verification for now? > 2 - How to install a new SSL certificate? > > > > Regards. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Mon Oct 10 11:45:37 2022 From: katonalala at gmail.com (Lajos Katona) Date: Mon, 10 Oct 2022 13:45:37 +0200 Subject: Failed to create user port error Message-ID: (Resending it without the attached picture) Hi, It seems more a Neutron exception, and my 1st thought is to check your subnet and add it to your router, you can do it manually with openstack CLI: openstack router add subnet 98a65c6e-2473-48f1-a578-702a08649c73 Not sure if it is possible if you use trove as I never used it. Best wishes Lajos Katona (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.kavanagh at canonical.com Mon Oct 10 12:06:14 2022 From: alex.kavanagh at canonical.com (Alex Kavanagh) Date: Mon, 10 Oct 2022 13:06:14 +0100 Subject: [charms] Team Delegation proposal In-Reply-To: References: Message-ID: Hi Peter On Tue, 4 Oct 2022 at 16:14, Peter Matulis wrote: > What is the status of this proposal? > So the charm-ceph-core to the ACL groups to provide a more focussed ACL for contributors to the ceph charms. At the moment, the other's haven't been added yet as it's not fully clear how they may be helpful. However, always willing to add them if they are. Cheers Alex. > On Wed, Aug 31, 2022 at 3:53 PM Peter Matulis > wrote: > >> >> >> On Mon, Aug 8, 2022 at 4:25 PM Alex Kavanagh >> wrote: >> >>> Hi Chris >>> >>> On Thu, 28 Jul 2022 at 21:46, Chris MacNaughton < >>> chris.macnaughton at canonical.com> wrote: >>> >>>> Hello All, >>>> >>>> >>>> I would like to propose some new ACLs in Gerrit for the >>>> openstack-charms project: >>>> >>>> - openstack-core-charms >>>> - ceph-charms >>>> - network-charms >>>> - stable-maintenance >>>> >>>> >>> >>> I think the names need to be tweaked slightly: >>> >>> - charms-openstack >>> - charms-ceph >>> - charms-ovn >>> - charms-maintenance >>> >> >> We would also need an ACL for the documentation: >> >> - charms-docs >> > -- Alex Kavanagh - Software Engineer OpenStack Engineering - Data Centre Development - Canonical Ltd -------------- next part -------------- An HTML attachment was scrubbed... URL: From lanchengxu0807 at gmail.com Mon Oct 10 15:04:26 2022 From: lanchengxu0807 at gmail.com (olive tree) Date: Mon, 10 Oct 2022 23:04:26 +0800 Subject: [outreachy][cinder]questions about project"create API reference request/response samples" Message-ID: Hi Cinder Team: I'm an applicant for Outreachy internship program. I've set up a Gerrit account successfully and deployed Devstack in a virtual environment. However, there are a few questions I would like to ask: 1. I failed to find enriquetaso in #openstack-cinder OFTC IRC channel, the list is as follows, I suppose maybe the reason is some steps were wrong or she changed into another username. [image: image.png] 2.when I created a local.conf and statrted the install under instructions in https://docs.openstack.org/devstack/latest/, I got this error: devstack/stackrc:834 Could not determine host ip address. See local.conf for suggestions on setting HOST_IP. i searched it in Google but can't find effective solution. Thank you for reading this email! I hope my questions will not bother you too much. I really appreciate it if you could answer them. Best regards, Chelsy Lan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 181641 bytes Desc: not available URL: From fungi at yuggoth.org Mon Oct 10 16:10:00 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 10 Oct 2022 16:10:00 +0000 Subject: Kroxvx rsvxfzwql In-Reply-To: References: Message-ID: <20221010160959.6mcblrnzwcw6w4xs@yuggoth.org> The previous message in this thread was spoofed to appear to originate from one of the list subscribers in order to get around our moderation mechanism. Normally the attachments on these sorts of messages are too large to make it through without being held for moderator review, but this one was just small enough to make it onto the list. I've scrubbed the attachment from the archives for safety and set up the listserv to strip "zip" attachments from future posts (since it now seems to be a popular vector for distributing Windows malware by E-mail), but if anyone received the original I strongly recommend not opening it. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jean-francois.taltavull at elca.ch Mon Oct 10 15:35:38 2022 From: jean-francois.taltavull at elca.ch (=?utf-8?B?VGFsdGF2dWxsIEplYW4tRnJhbsOnb2lz?=) Date: Mon, 10 Oct 2022 15:35:38 +0000 Subject: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number In-Reply-To: <104fac9ebe84471eb338e80d995b97fd@elca.ch> References: <2aa77e24a33d48a69032f30b86e9cad8@elca.ch> <1b17c23f8982480db73cf50d04d51af7@elca.ch> <86f048d7931c4cc482f6785437c9b5ea@elca.ch> <671023b5ab3846dfb3a39ef313018eac@elca.ch> <33f69d386462450b9964b2ed78284d57@elca.ch> <1d1c1c3cc6184b529819bb8f3598813f@elca.ch> <3516ab2892694a17a76b56ccacc463f1@elca.ch> <104fac9ebe84471eb338e80d995b97fd@elca.ch> Message-ID: <0a7778f2097d4e2c8cb3861141e0a09a@elca.ch> Hi Rafa?l, I finally found the cause and it was on my side. I fixed the setup (ceilometer, radosgw pollsters and haproxy) and keystone auth now works fine. I use the Rados GW ?rgw_admin_entry? variable, in particular. Thanks a lot for helping and for the time you spent on this issue. JF From: Taltavull Jean-Fran?ois Sent: mardi, 4 octobre 2022 14:33 To: 'Rafael Weing?rtner' Cc: 'openstack-discuss' Subject: RE: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number Hello Rapha?l, I restored the RGW keystone authentication and did some more tests. The problem is that the S3 request signature provided by ceilometer and the one computed by keystone mismatch. OpenStack release is Wallaby. keystone/api/s3tokens.py: ```` class S3Resource(EC2_S3_Resource.ResourceBase): @staticmethod def _check_signature(creds_ref, credentials): string_to_sign = base64.urlsafe_b64decode(str(credentials['token'])) if string_to_sign[0:4] != b'AWS4': signature = _calculate_signature_v1(string_to_sign, creds_ref['secret']) else: signature = _calculate_signature_v4(string_to_sign, creds_ref['secret']) if not utils.auth_str_equal(credentials['signature'], signature): raise exception.Unauthorized( <<<------------------------------------------we fall there message=_('Credential signature mismatch')) ```` From: Taltavull Jean-Fran?ois Sent: vendredi, 30 septembre 2022 14:48 To: 'Rafael Weing?rtner' > Cc: openstack-discuss > Subject: RE: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number ``` $ sudo /usr/bin/radosgw --version ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable) ``` From: Rafael Weing?rtner > Sent: vendredi, 30 septembre 2022 12:37 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. No, I just showed you the code, so you can see how the authentication is being executed, and where/how the parameters are set in the headers. It is a bit odd, I have used this so many times, and it always works. What is your RGW instance version? On Fri, Sep 30, 2022 at 4:09 AM Taltavull Jean-Fran?ois > wrote: Do you mean the issue comes from how the `awsauth` module handles the signature ? From: Rafael Weing?rtner > Sent: jeudi, 29 septembre 2022 17:23 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. This is the signature used by the `awsauth` library: ``` def get_signature(self, r): canonical_string = self.get_canonical_string( r.url, r.headers, r.method) if py3k: key = self.secret_key.encode('utf-8') msg = canonical_string.encode('utf-8') else: key = self.secret_key msg = canonical_string h = hmac.new(key, msg, digestmod=sha) return encodestring(h.digest()).strip() ``` After that is generated, it is added in the headers: # Create date header if it is not created yet. if 'date' not in r.headers and 'x-amz-date' not in r.headers: r.headers['date'] = formatdate( timeval=None, localtime=False, usegmt=True) signature = self.get_signature(r) if py3k: signature = signature.decode('utf-8') r.headers['Authorization'] = 'AWS %s:%s' % (self.access_key, signature) On Thu, Sep 29, 2022 at 9:15 AM Taltavull Jean-Fran?ois > wrote: ``` $ python test_creds.py Executing test on: [FQDN/object-store/]. Rados GW admin context [/admin] and path [/usage?stats=True] used. Rados GW request URL [http://FQDN/object-store/admin/bucket?stats=True]. Rados GW host: FQDN Traceback (most recent call last): File "test_creds.py", line 45, in raise RGWAdminAPIFailed( __main__.RGWAdminAPIFailed: RGW AdminOps API returned 403 Forbidden ``` So the same as with ceilometer. Auth is done by RGW, not by keystone, and the ceph ?admin? user exists and owns the right privileges: ``` $ sudo radosgw-admin user info --uid admin [22/296]{ "user_id": "admin", "display_name": "admin user", "email": "", "suspended": 0, "max_buckets": 1000, "subusers": [], "keys": [ { "user": "admin", "access_key": ?admin_access_key", "secret_key": "admin_secret_key" } ], "swift_keys": [], "caps": [ { "type": "buckets", "perm": "*" }, { "type": "metadata", "perm": "*" }, { "type": "usage", "perm": "*" }, { "type": "users", "perm": "*" } ], ``` From: Rafael Weing?rtner > Sent: jeudi, 29 septembre 2022 12:32 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. Can you test you credentials with the following code? ``` import json import requests import os import six.moves.urllib.parse as urlparse class RGWAdminAPIFailed(Exception): pass if __name__ == '__main__': rados_gw_base_url = "put your RGW URL here. E.g. http://server.com:port/something" print("Executing test on: [%s]." % rados_gw_base_url) rados_gw_admin_context = "/admin" rados_gw_path = "/usage?stats=True" print("Rados GW admin context [%s] and path [%s] used." % (rados_gw_admin_context, rados_gw_path)) rados_gw_request_url = urlparse.urljoin(rados_gw_base_url, '/admin') + '/bucket?stats=True' print("Rados GW request URL [%s]." % rados_gw_request_url) rados_gw_access_key_to_use = "put your access key here" rados_gw_secret_key_to_use = "put your secret key here" rados_gw_host_name = urlparse.urlparse(rados_gw_request_url).netloc print("Rados GW host: %s" % rados_gw_host_name) module_name = "awsauth" class_name = "S3Auth" arguments = [rados_gw_access_key_to_use, rados_gw_secret_key_to_use, rados_gw_host_name] module = __import__(module_name) class_ = getattr(module, class_name) instance = class_(*arguments) r = requests.get( rados_gw_request_url, auth=instance, timeout=30) #auth=awsauth.S3Auth(*arguments)) if r.status_code != 200: raise RGWAdminAPIFailed( ('RGW AdminOps API returned %(status)s %(reason)s') % {'status': r.status_code, 'reason': r.reason}) response_body = r.text parsed_json = json.loads(response_body) print("Response cookies: [%s]." % r.cookies) radosGw_output_file = "/home//Downloads/radosGw-usage.json" if os.path.exists(radosGw_output_file): os.remove(radosGw_output_file) with open(radosGw_output_file, "w") as file1: file1.writelines(json.dumps(parsed_json, indent=4, sort_keys=True)) file1.flush() exit(0) ``` On Thu, Sep 29, 2022 at 4:09 AM Taltavull Jean-Fran?ois > wrote: python Python 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import awsauth >>> awsauth >>> From: Rafael Weing?rtner > Sent: mercredi, 28 septembre 2022 18:40 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. Can you also execute the following: ``` python import awsauth awsauth ``` That will output a path, and then you can `cat `, example: `cat /var/lib/kolla/venv/lib/python3.8/site-packages/awsauth.py` On Wed, Sep 28, 2022 at 1:21 PM Taltavull Jean-Fran?ois > wrote: I removed trailing ?/object-store/? from the last value of authentication_parameters I also: - disabled s3 keystone auth in RGW - created a RGW ?admin? user with the right privileges to allow admin API calls - put RGW in debug mode And here is what I get in RGW logs: get_usage string_to_sign=GET Wed, 28 Sep 2022 16:15:45 GMT /admin/usage get_usage server signature=BlaBlaBlaBla get_usage client signature=BloBloBlo get_usage compare=-75 get_usage rgw::auth::s3::LocalEngine denied with reason=-2027 get_usage rgw::auth::s3::AWSAuthStrategy denied with reason=-2027 get_usage rgw::auth::StrategyRegistry::s3_main_strategy_t: trying rgw::auth::s3::AWSAuthStrategy get_usage rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::LocalEngine From: Rafael Weing?rtner > Sent: mercredi, 28 septembre 2022 13:15 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. I think that the last parameter "/object-store/", should be only "". Can you test it? You are using EC2 credentials to authenticate in RGW. Did you enable the Keystone integration in RGW? Also, as far as I know, this admin endpoint needs a RGW admin. I am not sure if the Keystone and RGW integration would enable/make it possible for someone to authenticate as an admin in RGW. Can you check it? To see if you can call that endpoint with these credentials. On Wed, Sep 28, 2022 at 6:01 AM Taltavull Jean-Fran?ois > wrote: Pollster YML configuration : --- - name: "dynamic.radosgw.usage" sample_type: "gauge" unit: "B" value_attribute: "total.size" url_path: http:///object-store/admin/usage module: "awsauth" authentication_object: "S3Auth" authentication_parameters: ,,/object-store/ user_id_attribute: "user" project_id_attribute: "user" resource_id_attribute: "user" response_entries_key: "summary" ACCESS_KEY and SECRET_KEY have been created with ?openstack ec2 credentials create?. Ceilometer central is deployed with OSA and it uses awsauth.py module. From: Rafael Weing?rtner > Sent: mercredi, 28 septembre 2022 02:01 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. Can you show your YML configuration? Also, did you install the AWS authentication module in the container/host where Ceilometer central is running? On Mon, Sep 26, 2022 at 12:58 PM Taltavull Jean-Fran?ois > wrote: Hello Rafael, Thanks for the information about ceilometer patches but for now I?m testing with the credentials in the dynamic pollster config file. I will use barbican when I push all this to production. The keystone authentication performed by the rados gw with the credentials provided by ceilometer still does not work. I wonder if this could be a S3 signature version issue on ceilometer side, that is on S3 client side. This kind of issue exists with the s3 client ?s3cmd? and you have to add ??signature-v2? so that ?s3cmd? works well. What do you think ? Do you know which version of S3 signature ceilometer uses while authenticating ? From: Rafael Weing?rtner > Sent: mercredi, 7 septembre 2022 19:23 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. Jean, there are two problems with the Ceilometer. I just opened the patches to resolve it: - https://review.opendev.org/c/openstack/ceilometer/+/856305 - https://review.opendev.org/c/openstack/ceilometer/+/856304 Without these patches, you might have problems to use Ceilometer with Non-OpenStack dynamic pollsters and barbican credentials. On Wed, Aug 31, 2022 at 3:55 PM Rafael Weing?rtner > wrote: It is the RGW user that you have. This user must have the role that is needed to access the usage feature in RGW. If I am not mistaken, it required an admin user. On Wed, Aug 31, 2022 at 1:54 PM Taltavull Jean-Fran?ois > wrote: Thanks to your help, I am close to the goal. Dynamic pollster is loaded and triggered. But I get a ?Status[403] and reason [Forbidden]? in ceilometer logs while requesting admin/usage. I?m not sure to understand well the auth mechanism. Are we talking about keystone credentials, ec2 credentials, Rados GW user ?... For now, in testing phase, I use ?authentication_parameters?, not barbican. -JF From: Rafael Weing?rtner > Sent: mardi, 30 ao?t 2022 14:17 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. Yes, you will need to enable the metric/pollster to be processed. That is done via "polling.yml" file. Also, do not forget that you will need to configure Ceilometer to push this new metric. If you use Gnocchi as the backend, you will need to change/update the gnocchi resource YML file. That file maps resources and metrics in the Gnocchi backend. The configuration resides in Ceilometer. You can create/define new resource types and map them to specific metrics. It depends on how you structure your solution. P.S. You do not need to use "authentication_parameters". You can use the barbican integration to avoid setting your credentials in a file. On Tue, Aug 30, 2022 at 9:11 AM Taltavull Jean-Fran?ois > wrote: Hello, I tried to define a Rados GW dynamic pollster and I can see, in Ceilometer logs, that it?s actually loaded. But it looks like it was not triggered, I see no trace of ceilometer connection in Rados GW logs. My definition: - name: "dynamic.radosgw.usage" sample_type: "gauge" unit: "B" value_attribute: "total.size" url_path: http:///object-store/swift/v1/admin/usage module: "awsauth" authentication_object: "S3Auth" authentication_parameters: xxxxxxxxxxxxx,yyyyyyyyyyyyy, user_id_attribute: "admin" project_id_attribute: "admin" resource_id_attribute: "admin" response_entries_key: "summary" Do I have to set an option in ceilometer.conf, or elsewhere, to get my Rados GW dynamic pollster triggered ? -JF From: Taltavull Jean-Fran?ois Sent: lundi, 29 ao?t 2022 18:41 To: 'Rafael Weing?rtner' > Cc: openstack-discuss > Subject: RE: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number Thanks a lot for your quick answer, Rafael ! I will explore this approach. Jean-Francois From: Rafael Weing?rtner > Sent: lundi, 29 ao?t 2022 17:54 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. You could use a different approach. You can use Dynamic pollster [1], and create your own mechanism to collect data, without needing to change Ceilometer code. Basically all hard-coded pollsters can be converted to a dynamic pollster that is defined in YML. [1] https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html#the-dynamic-pollsters-system-configuration-for-non-openstack-apis On Mon, Aug 29, 2022 at 12:51 PM Taltavull Jean-Fran?ois > wrote: Hi All, In our OpenStack deployment, API endpoints are defined by using URLs instead of port numbers and HAProxy forwards requests to the right bakend after having ACLed the URL. In the case of our object-store service, based on RadosGW, the internal API endpoint is "https:///object-store/swift/v1/AUTH_" When Ceilometer RadosGW pollster tries to connect to the RadosGW admin API with the object-store internal endpoint, the URL becomes https:///admin, as shown by HAProxy logs. This URL does not match any API endpoint from HAProxy point of view. The line of code that rewrites the URL is this one: https://opendev.org/openstack/ceilometer/src/branch/stable/wallaby/ceilometer/objectstore/rgw.py#L81 What would you think of adding a mechanism based on new Ceilometer configuration option(s) to control the URL rewriting ? Our deployment characteristics: - OpenStack release: Wallaby - Ceph and RadosGW version: 15.2.16 - deployment tool: OSA 23.2.1 and ceph-ansible Best regards, Jean-Francois -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Mon Oct 10 15:59:39 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Mon, 10 Oct 2022 12:59:39 -0300 Subject: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number In-Reply-To: <0a7778f2097d4e2c8cb3861141e0a09a@elca.ch> References: <2aa77e24a33d48a69032f30b86e9cad8@elca.ch> <1b17c23f8982480db73cf50d04d51af7@elca.ch> <86f048d7931c4cc482f6785437c9b5ea@elca.ch> <671023b5ab3846dfb3a39ef313018eac@elca.ch> <33f69d386462450b9964b2ed78284d57@elca.ch> <1d1c1c3cc6184b529819bb8f3598813f@elca.ch> <3516ab2892694a17a76b56ccacc463f1@elca.ch> <104fac9ebe84471eb338e80d995b97fd@elca.ch> <0a7778f2097d4e2c8cb3861141e0a09a@elca.ch> Message-ID: Glad to hear it! If you need something else, just let me know. On Mon, Oct 10, 2022 at 12:35 PM Taltavull Jean-Fran?ois < jean-francois.taltavull at elca.ch> wrote: > Hi Rafa?l, > > I finally found the cause and it was on my side. I fixed the setup > (ceilometer, radosgw pollsters and haproxy) and keystone auth now works > fine. > > > > I use the Rados GW ?rgw_admin_entry? variable, in particular. > > > > Thanks a lot for helping and for the time you spent on this issue. > > > > JF > > > > *From:* Taltavull Jean-Fran?ois > *Sent:* mardi, 4 octobre 2022 14:33 > *To:* 'Rafael Weing?rtner' > *Cc:* 'openstack-discuss' > *Subject:* RE: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > Hello Rapha?l, > > I restored the RGW keystone authentication and did some more tests. The > problem is that the S3 request signature provided by ceilometer and the one > computed by keystone mismatch. > > > > OpenStack release is Wallaby. > > > > keystone/api/s3tokens.py: > > ```` > > class S3Resource(EC2_S3_Resource.ResourceBase): > > @staticmethod > > def _check_signature(creds_ref, credentials): > > string_to_sign = > base64.urlsafe_b64decode(str(credentials['token'])) > > > > if string_to_sign[0:4] != b'AWS4': > > signature = _calculate_signature_v1(string_to_sign, > > creds_ref['secret']) > > else: > > signature = _calculate_signature_v4(string_to_sign, > > creds_ref['secret']) > > if not utils.auth_str_equal(credentials['signature'], signature): > > raise exception.Unauthorized( > <<<------------------------------------------we fall > there > > > message=_('Credential signature mismatch')) > ```` > > > > *From:* Taltavull Jean-Fran?ois > *Sent:* vendredi, 30 septembre 2022 14:48 > *To:* 'Rafael Weing?rtner' > *Cc:* openstack-discuss > *Subject:* RE: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > ``` > > $ sudo /usr/bin/radosgw --version > > ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus > (stable) > > ``` > > > > *From:* Rafael Weing?rtner > *Sent:* vendredi, 30 septembre 2022 12:37 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > No, I just showed you the code, so you can see how the authentication is > being executed, and where/how the parameters are set in the headers. It is > a bit odd, I have used this so many times, and it always works. What is > your RGW instance version? > > > > On Fri, Sep 30, 2022 at 4:09 AM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > Do you mean the issue comes from how the `awsauth` module handles the > signature ? > > > > *From:* Rafael Weing?rtner > *Sent:* jeudi, 29 septembre 2022 17:23 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > This is the signature used by the `awsauth` library: > ``` > > def get_signature(self, r): > canonical_string = self.get_canonical_string( > r.url, r.headers, r.method) > if py3k: > key = self.secret_key.encode('utf-8') > msg = canonical_string.encode('utf-8') > else: > key = self.secret_key > msg = canonical_string > h = hmac.new(key, msg, digestmod=sha) > return encodestring(h.digest()).strip() > > > > ``` > > > > After that is generated, it is added in the headers: > > # Create date header if it is not created yet. > if 'date' not in r.headers and 'x-amz-date' not in r.headers: > r.headers['date'] = formatdate( > timeval=None, > localtime=False, > usegmt=True) > signature = self.get_signature(r) > if py3k: > signature = signature.decode('utf-8') > r.headers['Authorization'] = 'AWS %s:%s' % (self.access_key, signature) > > > > On Thu, Sep 29, 2022 at 9:15 AM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > ``` > > $ python test_creds.py > > Executing test on: [FQDN/object-store/]. > > Rados GW admin context [/admin] and path [/usage?stats=True] used. > > Rados GW request URL [http://FQDN/object-store/admin/bucket?stats=True]. > > Rados GW host: FQDN > > Traceback (most recent call last): > > File "test_creds.py", line 45, in > > raise RGWAdminAPIFailed( > > __main__.RGWAdminAPIFailed: RGW AdminOps API returned 403 Forbidden > > ``` > > > > So the same as with ceilometer. Auth is done by RGW, not by keystone, and > the ceph ?admin? user exists and owns the right privileges: > > ``` > > $ sudo radosgw-admin user info --uid > admin > [22/296]{ > > "user_id": "admin", > > "display_name": "admin user", > > "email": "", > > "suspended": 0, > > "max_buckets": 1000, > > "subusers": [], > > "keys": [ > > { > > "user": "admin", > > "access_key": ?admin_access_key", > > "secret_key": "admin_secret_key" > > } > > ], > > "swift_keys": [], > > "caps": [ > > { > > "type": "buckets", > > "perm": "*" > > }, > > { > > "type": "metadata", > > "perm": "*" > > }, > > > { > "type": > "usage", > "perm": > "*" > }, > { > > "type": "users", > "perm": > "*" > } > ], > > > > > ``` > > > > > > *From:* Rafael Weing?rtner > *Sent:* jeudi, 29 septembre 2022 12:32 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > Can you test you credentials with the following code? > > ``` > > import json > import requests > import os > > import six.moves.urllib.parse as urlparse > > > class RGWAdminAPIFailed(Exception): > pass > > > if __name__ == '__main__': > > rados_gw_base_url = "put your RGW URL here. E.g. > http://server.com:port/something" > print("Executing test on: [%s]." % rados_gw_base_url) > > rados_gw_admin_context = "/admin" > > rados_gw_path = "/usage?stats=True" > > print("Rados GW admin context [%s] and path [%s] used." % > (rados_gw_admin_context, rados_gw_path)) > > rados_gw_request_url = urlparse.urljoin(rados_gw_base_url, '/admin') + > '/bucket?stats=True' > print("Rados GW request URL [%s]." % rados_gw_request_url) > > rados_gw_access_key_to_use = "put your access key here" > rados_gw_secret_key_to_use = "put your secret key here" > > rados_gw_host_name = urlparse.urlparse(rados_gw_request_url).netloc > print("Rados GW host: %s" % rados_gw_host_name) > module_name = "awsauth" > class_name = "S3Auth" > arguments = [rados_gw_access_key_to_use, rados_gw_secret_key_to_use, > rados_gw_host_name] > module = __import__(module_name) > class_ = getattr(module, class_name) > instance = class_(*arguments) > > r = requests.get( > rados_gw_request_url, > auth=instance, timeout=30) > #auth=awsauth.S3Auth(*arguments)) > > > if r.status_code != 200: > raise RGWAdminAPIFailed( > ('RGW AdminOps API returned %(status)s %(reason)s') % > {'status': r.status_code, 'reason': r.reason}) > > response_body = r.text > parsed_json = json.loads(response_body) > > print("Response cookies: [%s]." % r.cookies) > > radosGw_output_file = "/home//Downloads/radosGw-usage.json" > > if os.path.exists(radosGw_output_file): > os.remove(radosGw_output_file) > > with open(radosGw_output_file, "w") as file1: > file1.writelines(json.dumps(parsed_json, indent=4, sort_keys=True)) > file1.flush() > > exit(0) > > ``` > > > > On Thu, Sep 29, 2022 at 4:09 AM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > python > > Python 3.8.10 (default, Sep 28 2021, 16:10:42) > > [GCC 9.3.0] on linux > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import awsauth > > >>> awsauth > > '/openstack/venvs/ceilometer-23.2.0/lib/python3.8/site-packages/awsauth.py'> > > >>> > > > > *From:* Rafael Weing?rtner > *Sent:* mercredi, 28 septembre 2022 18:40 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > Can you also execute the following: > > ``` > > python > > > > import awsauth > > > > awsauth > > ``` > > That will output a path, and then you can `cat `, example: `cat > /var/lib/kolla/venv/lib/python3.8/site-packages/awsauth.py` > > > > On Wed, Sep 28, 2022 at 1:21 PM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > I removed trailing ?/object-store/? from the last value of > authentication_parameters > > > > I also: > > - disabled s3 keystone auth in RGW > > - created a RGW ?admin? user with the right privileges to allow admin API > calls > > - put RGW in debug mode > > > > And here is what I get in RGW logs: > > > > get_usage > string_to_sign=GET > Wed, > 28 Sep 2022 16:15:45 > GMT > /admin/usage > > get_usage server signature=BlaBlaBlaBla > > get_usage client signature=BloBloBlo > > get_usage compare=-75 > > get_usage rgw::auth::s3::LocalEngine denied with reason=-2027 > > get_usage rgw::auth::s3::AWSAuthStrategy denied with reason=-2027 > > get_usage rgw::auth::StrategyRegistry::s3_main_strategy_t: trying > rgw::auth::s3::AWSAuthStrategy > > get_usage rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::LocalEngine > > > > *From:* Rafael Weing?rtner > *Sent:* mercredi, 28 septembre 2022 13:15 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > I think that the last parameter "/object-store/", should be only " > ". Can you test it? > > > > > > You are using EC2 credentials to authenticate in RGW. Did you enable the > Keystone integration in RGW? > > Also, as far as I know, this admin endpoint needs a RGW admin. I am not > sure if the Keystone and RGW integration would enable/make it possible for > someone to authenticate as an admin in RGW. Can you check it? To see if you > can call that endpoint with these credentials. > > > > On Wed, Sep 28, 2022 at 6:01 AM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > Pollster YML configuration : > > > > --- > > - name: "dynamic.radosgw.usage" > > sample_type: "gauge" > > unit: "B" > > value_attribute: "total.size" > > url_path: http:///object-store/admin/usage > > module: "awsauth" > > authentication_object: "S3Auth" > > authentication_parameters: ,,/object-store/ > > user_id_attribute: "user" > > project_id_attribute: "user" > > resource_id_attribute: "user" > > response_entries_key: "summary" > > > > ACCESS_KEY and SECRET_KEY have been created with ?openstack ec2 > credentials create?. > > > > Ceilometer central is deployed with OSA and it uses awsauth.py module. > > > > > > *From:* Rafael Weing?rtner > *Sent:* mercredi, 28 septembre 2022 02:01 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > Can you show your YML configuration? Also, did you install the AWS > authentication module in the container/host where Ceilometer central is > running? > > > > On Mon, Sep 26, 2022 at 12:58 PM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > Hello Rafael, > > > > Thanks for the information about ceilometer patches but for now I?m > testing with the credentials in the dynamic pollster config file. I will > use barbican when I push all this to production. > > > > The keystone authentication performed by the rados gw with the credentials > provided by ceilometer still does not work. I wonder if this could be a S3 > signature version issue on ceilometer side, that is on S3 client side. This > kind of issue exists with the s3 client ?s3cmd? and you have to add > ??signature-v2? so that ?s3cmd? works well. > > > > What do you think ? Do you know which version of S3 signature ceilometer > uses while authenticating ? > > > > *From:* Rafael Weing?rtner > *Sent:* mercredi, 7 septembre 2022 19:23 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > Jean, there are two problems with the Ceilometer. I just opened the > patches to resolve it: > - https://review.opendev.org/c/openstack/ceilometer/+/856305 > > - https://review.opendev.org/c/openstack/ceilometer/+/856304 > > > > Without these patches, you might have problems to use Ceilometer with > Non-OpenStack dynamic pollsters and barbican credentials. > > > > On Wed, Aug 31, 2022 at 3:55 PM Rafael Weing?rtner < > rafaelweingartner at gmail.com> wrote: > > It is the RGW user that you have. This user must have the role that is > needed to access the usage feature in RGW. If I am not mistaken, it > required an admin user. > > > > On Wed, Aug 31, 2022 at 1:54 PM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > Thanks to your help, I am close to the goal. Dynamic pollster is loaded > and triggered. > > > > But I get a ?Status[403] and reason [Forbidden]? in ceilometer logs while > requesting admin/usage. > > > > I?m not sure to understand well the auth mechanism. Are we talking about > keystone credentials, ec2 credentials, Rados GW user ?... > > > > For now, in testing phase, I use ?authentication_parameters?, not barbican. > > > > -JF > > > > *From:* Rafael Weing?rtner > *Sent:* mardi, 30 ao?t 2022 14:17 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > Yes, you will need to enable the metric/pollster to be processed. That is > done via "polling.yml" file. Also, do not forget that you will need to > configure Ceilometer to push this new metric. If you use Gnocchi as the > backend, you will need to change/update the gnocchi resource YML file. That > file maps resources and metrics in the Gnocchi backend. The configuration > resides in Ceilometer. You can create/define new resource types and map > them to specific metrics. It depends on how you structure your solution. > > P.S. You do not need to use "authentication_parameters". You can use the > barbican integration to avoid setting your credentials in a file. > > > > On Tue, Aug 30, 2022 at 9:11 AM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > Hello, > > > > I tried to define a Rados GW dynamic pollster and I can see, in Ceilometer > logs, that it?s actually loaded. But it looks like it was not triggered, I > see no trace of ceilometer connection in Rados GW logs. > > > > My definition: > > > > - name: "dynamic.radosgw.usage" > > sample_type: "gauge" > > unit: "B" > > value_attribute: "total.size" > > url_path: http:///object-store/swift/v1/admin/usage > > module: "awsauth" > > authentication_object: "S3Auth" > > authentication_parameters: xxxxxxxxxxxxx,yyyyyyyyyyyyy, > > user_id_attribute: "admin" > > project_id_attribute: "admin" > > resource_id_attribute: "admin" > > response_entries_key: "summary" > > > > Do I have to set an option in ceilometer.conf, or elsewhere, to get my > Rados GW dynamic pollster triggered ? > > > > -JF > > > > *From:* Taltavull Jean-Fran?ois > *Sent:* lundi, 29 ao?t 2022 18:41 > *To:* 'Rafael Weing?rtner' > *Cc:* openstack-discuss > *Subject:* RE: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > Thanks a lot for your quick answer, Rafael ! > > I will explore this approach. > > > > Jean-Francois > > > > *From:* Rafael Weing?rtner > *Sent:* lundi, 29 ao?t 2022 17:54 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > You could use a different approach. You can use Dynamic pollster [1], and > create your own mechanism to collect data, without needing to change > Ceilometer code. Basically all hard-coded pollsters can be converted to a > dynamic pollster that is defined in YML. > > > > [1] > https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html#the-dynamic-pollsters-system-configuration-for-non-openstack-apis > > > > > > On Mon, Aug 29, 2022 at 12:51 PM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > Hi All, > > In our OpenStack deployment, API endpoints are defined by using URLs > instead of port numbers and HAProxy forwards requests to the right bakend > after having ACLed the URL. > > In the case of our object-store service, based on RadosGW, the internal > API endpoint is "https:///object-store/swift/v1/AUTH_" > > When Ceilometer RadosGW pollster tries to connect to the RadosGW admin API > with the object-store internal endpoint, the URL becomes > https:///admin, as shown by HAProxy logs. This URL does not match > any API endpoint from HAProxy point of view. The line of code that rewrites > the URL is this one: > https://opendev.org/openstack/ceilometer/src/branch/stable/wallaby/ceilometer/objectstore/rgw.py#L81 > > What would you think of adding a mechanism based on new Ceilometer > configuration option(s) to control the URL rewriting ? > > Our deployment characteristics: > - OpenStack release: Wallaby > - Ceph and RadosGW version: 15.2.16 > - deployment tool: OSA 23.2.1 and ceph-ansible > > > Best regards, > Jean-Francois > > > > -- > > Rafael Weing?rtner > > > > -- > > Rafael Weing?rtner > > > > -- > > Rafael Weing?rtner > > > > -- > > Rafael Weing?rtner > > > > -- > > Rafael Weing?rtner > > > > -- > > Rafael Weing?rtner > > > > -- > > Rafael Weing?rtner > > > > -- > > Rafael Weing?rtner > > > > -- > > Rafael Weing?rtner > > > > -- > > Rafael Weing?rtner > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniella.kalimumbalopgs at stu.cu.edu.ng Mon Oct 10 16:34:25 2022 From: daniella.kalimumbalopgs at stu.cu.edu.ng (Daniella kalimumbalo) Date: Mon, 10 Oct 2022 17:34:25 +0100 Subject: Search for openstack cloud billing dataset Message-ID: Openstack can monitor and bill the cloud users according to their resources consumption. I'm looking for a dataset for resources usage in an openstack cloud infrastructure, that contains the resources consumption metrics and bill of each end-user. That dataset will allow me to work on price prediction of cloud ressources using machine/ Deep learning.Please is it possible to get it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Mon Oct 10 17:19:05 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 10 Oct 2022 17:19:05 +0000 Subject: [cloud-research-sig][ops] Search for openstack cloud billing dataset In-Reply-To: References: Message-ID: <20221010171905.ap3jesjmkk5ogyxm@yuggoth.org> [I'm keeping you in Cc since you don't appear to have subscribed to openstack-discuss, but please still reply to the list address.] On 2022-10-10 17:34:25 +0100 (+0100), Daniella kalimumbalo wrote: > Openstack can monitor and bill the cloud users according to their > resources consumption. I'm looking for a dataset for resources > usage in an openstack cloud infrastructure, that contains the > resources consumption metrics and bill of each end-user. That > dataset will allow me to work on price prediction of cloud > ressources using machine/ Deep learning.Please is it possible to > get it? You might consider reaching out to the Cloud Research SIG chair[*], since I expect this is in their SIG's area of interest. The awesome folks with the MOC Alliance[**] may also have data available to researchers (or be affiliated with organizations who do), based on some of the prior discussions I've been in, so it could be worthwhile to get in touch with them as well. [*] https://governance.openstack.org/sigs/ [**] https://massopen.cloud/ -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gmann at ghanshyammann.com Mon Oct 10 20:20:14 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 10 Oct 2022 13:20:14 -0700 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Oct 13 at 1500 UTC Message-ID: <183c38e1c91.112e685a2494298.7607084177217213450@ghanshyammann.com> Hello Everyone, The technical Committee's next weekly meeting is scheduled for 2022 Oct 13, at 1500 UTC. If you would like to add topics for discussion, please add them to the below wiki page by Wednesday, Oct 12 at 2100 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting -gmann From dwilde at redhat.com Mon Oct 10 20:27:00 2022 From: dwilde at redhat.com (Dave Wilde) Date: Mon, 10 Oct 2022 15:27:00 -0500 Subject: [keystone][PTG] Virtual PTG Planning Message-ID: <7b60cd45-900d-cdaa-172c-af211ffa4ef6@redhat.com> Hello all, Our PTG etherpad is live [1], please feel free add any topics you'd like to discuss.? I have reserved two 1 hour time slots on both Monday and Wednesday as well as an operator hour on Friday: 17-Oct 13:00-15:00 UTC (Mitaka) 19-Oct 13:00-15:00 UTC (Newton) 21-Oct 13:00-14:00 UTC (Mitaka) [1]: https://etherpad.opendev.org/p/antelope-ptg-keystone Hope to see you there! Thanks, /Dave Wilde (d34dh0r53) From juliaashleykreger at gmail.com Mon Oct 10 20:43:52 2022 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 10 Oct 2022 13:43:52 -0700 Subject: [PTG][PTL][Ops] PTG Etherpads and information discovery Message-ID: Greetings folks! We're a little under a week out from the PTG and I came to the realization on a call earlier today that it is a bit difficult to discover relationships between "operator hour" sessions and related projects. Further compounded by the auto-created etherpad links on the PTG website[0]. That being said, it seems we need to make things a little easier to discover by cross-linking in some places, as well as ensuring the correct etherpads are being referenced. If your a PTL: * Please review the links on the PTG etherpad list[0]. If the link is not correct, please use the ptgbot to update it. Be careful to check the prior etherpad for content! * If you have an operator hour scheduled as well, please be mindful that we've already saved the etherpad links to [1] and some of the etherpads already have content. If you intend to use another etherpad, please link and make the information/content discoverable on the primary operator etherpad[1]. Operators and those who have questions for Operators: * The central etherpad of operator hour etherpads is an additional etherpad[1]. * Please add topics to specific operator hour sessions that you feel are appropriate or pertinent. Examples would include: "What versions do you run?" "Downstream patches that made your life easier?" "This $issue issue causes us lots of pain, we would love to see it fixed." Thanks everyone! -Julia [0]: https://ptg.opendev.org/etherpads.html [1]: https://etherpad.opendev.org/p/oct2022-ptg-openstack-ops From gmann at ghanshyammann.com Mon Oct 10 23:59:51 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 10 Oct 2022 16:59:51 -0700 Subject: [ptl][tc][ops][ptg] Operator + Developers interaction (operator-hours) slots in 2023.1 Antelope PTG In-Reply-To: <182ff0b3957.11971b8a0597684.4259447734459743811@ghanshyammann.com> References: <182ff0b3957.11971b8a0597684.4259447734459743811@ghanshyammann.com> Message-ID: <183c4572f13.b4d37d16497728.5063772169693971213@ghanshyammann.com> ---- On Fri, 02 Sep 2022 09:31:41 -0700 Ghanshyam Mann wrote --- > Hello Everyone/PTL, > > ... > We request every projects to book at least one 'operator hours' slot for operators to join your PTG slot. > Ping me in #openstack-tc or #openinfra-events IRC channel for any query. 10 projects have reserved the 'operator hours' which is a good number but still projects have not booked yet, please do it ASAP. use the placeholder to avoid the conflict with other projects operator hours. Also, request you all to spread the operator hours to community and operators via ML or twitter. -gmann > > [1] https://ptg.opendev.org/ptg.html > > -gmann > > From arxcruz at redhat.com Tue Oct 11 07:32:33 2022 From: arxcruz at redhat.com (Arx Cruz) Date: Tue, 11 Oct 2022 09:32:33 +0200 Subject: [tripleo] Gate blocker In-Reply-To: References: Message-ID: Hello, The gates are unblocked! Kind regards, Arx Cruz On Mon, Oct 10, 2022 at 11:25 AM Arx Cruz wrote: > Hello, > > We have a gate blocker due https://bugs.launchpad.net/tripleo/+bug/1992305 > please do not recheck jobs until > https://review.opendev.org/c/openstack/tripleo-quickstart/+/860810 get > merged. > I will let you know when gates are green again. > > -- > > Arx Cruz > > Software Engineer > > Red Hat EMEA > > arxcruz at redhat.com > @RedHat Red Hat > Red Hat > > > -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Tue Oct 11 07:50:49 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Tue, 11 Oct 2022 09:50:49 +0200 Subject: [neutron] Antelope PTG, agenda and schedule Message-ID: Hello Neutrinos: Please check the agenda and schedule for the Antelope PTG: https://etherpad.opendev.org/p/neutron-antelope-ptg Some tips: * Remember that on Monday we have the TC sessions. * The first Neutron meeting is on Tuesday. * The Nova-Neutron cross-project sessions are on Thursday (13 - 15 UTC). * The Neutron operator hour is on Friday. See you next week! Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkopec at redhat.com Tue Oct 11 10:54:38 2022 From: mkopec at redhat.com (Martin Kopec) Date: Tue, 11 Oct 2022 12:54:38 +0200 Subject: [qa][ptg] Virtual PTG Planning In-Reply-To: References: Message-ID: Hi there, we have booked the following slots: * Monday (Oct 17th) 14-15 UTC (1 hour) @ icehouse * Tuesday (Oct 18th) 7-8 UTC (1 hour) @ icehouse In case we'll see a need for more, we can always book something else later. See you next week, On Fri, 30 Sept 2022 at 16:48, Martin Kopec wrote: > So far we have the following 4 topics: > * Retrospective > * S-RBAC > * Clean up deprecated lib/neutron code > * Decide which job variant should become the new tempest default > > Feel free to add any other topic, you would like to discuss, to our ptg > etherpad [1]. > We haven't booked any time slots yet, before that, I wanted to make sure > that we have all topics gathered so that we can plan enough time to cover > them all. > We'll book the slots next week, so hurry up :) > > See you soon, > > On Thu, 22 Sept 2022 at 00:46, Martin Kopec wrote: > >> Hello everyone, >> >> here is [1] our etherpad for Antelope PTG. Please, add your topics there >> if there is anything you would like to discuss / propose ... >> You can also vote for time slots of our sessions so that they fit your >> schedule at [2]. >> We will go with 3 maybe 4 one hour slots, depending on the number of >> topics. >> >> [1] https://etherpad.opendev.org/p/qa-antelope-ptg >> [2] https://framadate.org/dC2AEBTq8b5rAkvv >> >> Thanks, >> -- >> Martin Kopec >> Senior Software Quality Engineer >> Red Hat EMEA >> IM: kopecmartin >> >> >> >> > > -- > Martin > -- Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Tue Oct 11 11:08:10 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Tue, 11 Oct 2022 12:08:10 +0100 Subject: [kolla-ansible][Xena] SSL certificate expired In-Reply-To: References: Message-ID: Anyone??? Le lun. 10 oct. 2022 ? 12:21, wodel youchi a ?crit : > Hi, > > I tried to deploy a new certificate using :kolla-ansible reconfigure > But I got : > > "module_stderr": "*Failed to discover available identity versions when > contacting https://dashint.cloud.exemple.com:35357 > *. Attemptin > g to parse version from URL.\nTraceback (most recent call last):\n File > \"/opt/ansible/lib/python3.6/site-packages/urllib3/connectio > npool.py\", line 706, in urlopen\n chunked=chunked,\n File > \"/opt/ansible/lib/python3.6/site-packages/urllib3/connectionpool.py\" > , line 382, in _make_request\n self._validate_conn(conn)\n File > \"/opt/ansible/lib/python3.6/site-packages/urllib3/connectionpool > .py\", line 1010, in _validate_conn\n conn.connect()\n File > \"/opt/ansible/lib/python3.6/site-packages/urllib3/connection.py\", l > ine 421, in connect\n tls_in_tls=tls_in_tls,\n File > \"/opt/ansible/lib/python3.6/site-packages/urllib3/util/ssl_.py\", line 450, > in ssl_wrap_socket\n sock, context, tls_in_tls, > server_hostname=server_hostname\n File > \"/opt/ansible/lib/python3.6/site-packages > /urllib3/util/ssl_.py\", line 493, in _ssl_wrap_socket_impl\n return > ssl_context.wrap_socket(sock, server_hostname=server_hostname > )\n File \"/usr/lib64/python3.6/ssl.py\", line 365, in wrap_socket\n > _context=self, _session=session)\n File \"/usr/lib64/python > 3.6/ssl.py\", line 776, in __init__\n self.do_handshake()\n File > \"/usr/lib64/python3.6/ssl.py\", line 1036, in do_handshake\n > self._sslobj.do_handshake()\n File \"/usr/lib64/python3.6/ssl.py\", line > 648, in do_handshake\n self._sslobj.do_handshake()\nssl > .*SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed* > > Some help please > > > > Regards. > > Le dim. 9 oct. 2022 ? 16:19, wodel youchi a > ?crit : > >> Hi, >> >> My SSL certificate has expired, and now I cannot authenticate into >> horizon and I have these errors : >> *WARNING keystoneauth.identity.generic.base [-] Failed to discover >> available identity versions when contacting >> https://dashint.cloud.exemple.com:35357 >> . Attempting to parse version from >> URL.: keystoneauth1.exceptions.connection.SSLError: SSL exception >> connecting to https:// dashint.cloud.exemple.com >> :35357: HTTPSConnectionPool(host=' >> dashint.cloud.exemple.com ', port=35357): >> Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: >> CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))* >> >> In my globals.yml I have this parameter : >> kolla_verify_tls_backend: "no" >> >> 1 - How do I disable SSL verification for now? >> 2 - How to install a new SSL certificate? >> >> >> >> Regards. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkchn.in at gmail.com Tue Oct 11 12:08:43 2022 From: kkchn.in at gmail.com (KK CHN) Date: Tue, 11 Oct 2022 17:38:43 +0530 Subject: DC DR Setup Queries Message-ID: List, We are having a client DC running on HP( HP simplivity) HCI servers, With VMware ( Vsphere 7.0) only few VMs running on it. (11 VMs maximum all Linux VMs). The DR site also having the same HCI setup another location. ( The VMs are replicated to DR site with HP simplivity). We are planning to use Openstack for both DC and DR solutions with Wallaby or Xena version with KVM as hypervisor to replace the proprietary S/W and H/W vendor locking. The requirement is to Setup a Stable DC- DR solution. Totally confused How to setup a best Dc- DR solution for this purpose. The DR setup can be possible / advisable with Zero down time ?( or manual DR site uping with downtime of hours ) ? What are the available/suggested DC-DR replication mechanisms for high degree of application data protection and service availability? Kindly advise.. Thanks in advance, Krish -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Tue Oct 11 12:33:35 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Tue, 11 Oct 2022 13:33:35 +0100 Subject: [kolla-ansible][Xena] SSL certificate expired In-Reply-To: References: Message-ID: Hi, I disabled TLS in globals.yml then tried to deploy openstack, but it does not work, the deployment still uses https. How can I make a workaround? Le mar. 11 oct. 2022 ? 12:08, wodel youchi a ?crit : > Anyone??? > > Le lun. 10 oct. 2022 ? 12:21, wodel youchi a > ?crit : > >> Hi, >> >> I tried to deploy a new certificate using :kolla-ansible reconfigure >> But I got : >> >> "module_stderr": "*Failed to discover available identity versions when >> contacting https://dashint.cloud.exemple.com:35357 >> *. Attemptin >> g to parse version from URL.\nTraceback (most recent call last):\n File >> \"/opt/ansible/lib/python3.6/site-packages/urllib3/connectio >> npool.py\", line 706, in urlopen\n chunked=chunked,\n File >> \"/opt/ansible/lib/python3.6/site-packages/urllib3/connectionpool.py\" >> , line 382, in _make_request\n self._validate_conn(conn)\n File >> \"/opt/ansible/lib/python3.6/site-packages/urllib3/connectionpool >> .py\", line 1010, in _validate_conn\n conn.connect()\n File >> \"/opt/ansible/lib/python3.6/site-packages/urllib3/connection.py\", l >> ine 421, in connect\n tls_in_tls=tls_in_tls,\n File >> \"/opt/ansible/lib/python3.6/site-packages/urllib3/util/ssl_.py\", line 450, >> in ssl_wrap_socket\n sock, context, tls_in_tls, >> server_hostname=server_hostname\n File >> \"/opt/ansible/lib/python3.6/site-packages >> /urllib3/util/ssl_.py\", line 493, in _ssl_wrap_socket_impl\n return >> ssl_context.wrap_socket(sock, server_hostname=server_hostname >> )\n File \"/usr/lib64/python3.6/ssl.py\", line 365, in wrap_socket\n >> _context=self, _session=session)\n File \"/usr/lib64/python >> 3.6/ssl.py\", line 776, in __init__\n self.do_handshake()\n File >> \"/usr/lib64/python3.6/ssl.py\", line 1036, in do_handshake\n >> self._sslobj.do_handshake()\n File \"/usr/lib64/python3.6/ssl.py\", >> line 648, in do_handshake\n self._sslobj.do_handshake()\nssl >> .*SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed* >> >> Some help please >> >> >> >> Regards. >> >> Le dim. 9 oct. 2022 ? 16:19, wodel youchi a >> ?crit : >> >>> Hi, >>> >>> My SSL certificate has expired, and now I cannot authenticate into >>> horizon and I have these errors : >>> *WARNING keystoneauth.identity.generic.base [-] Failed to discover >>> available identity versions when contacting >>> https://dashint.cloud.exemple.com:35357 >>> . Attempting to parse version from >>> URL.: keystoneauth1.exceptions.connection.SSLError: SSL exception >>> connecting to https:// dashint.cloud.exemple.com >>> :35357: HTTPSConnectionPool(host=' >>> dashint.cloud.exemple.com ', port=35357): >>> Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: >>> CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))* >>> >>> In my globals.yml I have this parameter : >>> kolla_verify_tls_backend: "no" >>> >>> 1 - How do I disable SSL verification for now? >>> 2 - How to install a new SSL certificate? >>> >>> >>> >>> Regards. >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwilde at redhat.com Tue Oct 11 12:45:55 2022 From: dwilde at redhat.com (Dave Wilde) Date: Tue, 11 Oct 2022 07:45:55 -0500 Subject: [keystone][PTG] Virtual PTG Planning In-Reply-To: <7b60cd45-900d-cdaa-172c-af211ffa4ef6@redhat.com> References: <7b60cd45-900d-cdaa-172c-af211ffa4ef6@redhat.com> Message-ID: One correction to make, I have moved the operator hours from 13:00-14:00 on Friday 21-Oct to 15:00-16:00 to avoid conflicts with the neutron operator hours. Thanks, /Dave Wilde (d34dh0r53) On 10/10/22 15:27, Dave Wilde wrote: > Hello all, > > Our PTG etherpad is live [1], please feel free add any topics you'd > like to discuss.? I have reserved two 1 hour time slots on both Monday > and Wednesday as well as an operator hour on Friday: > > 17-Oct 13:00-15:00 UTC (Mitaka) > 19-Oct 13:00-15:00 UTC (Newton) > 21-Oct 13:00-14:00 UTC (Mitaka) > > [1]: https://etherpad.opendev.org/p/antelope-ptg-keystone > > Hope to see you there! > > Thanks, > > /Dave Wilde (d34dh0r53) From senrique at redhat.com Tue Oct 11 01:41:59 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Mon, 10 Oct 2022 22:41:59 -0300 Subject: [outreachy][cinder]questions about project"create API reference request/response samples" In-Reply-To: References: Message-ID: Hi Chelsy Lan, how are you? I'm on PTO, that's why I'm not on IRC. I'll be back tomorrow. Anyway, feel free to write your questions on the Cinder channel and hopefully, a cinder member will reply to you. Regarding the IP error: Are you using VirtualBox? Please try to run `./clean`, then `/.unstack` and then `./stack` again. Regards, Sofia On Mon, Oct 10, 2022 at 12:16 PM olive tree wrote: > Hi Cinder Team: > I'm an applicant for Outreachy internship program. I've set up a Gerrit > account successfully and deployed Devstack in a virtual environment. > However, there are a few questions I would like to ask: > > 1. I failed to find enriquetaso in #openstack-cinder OFTC IRC channel, > the list is as follows, I suppose maybe the reason is some steps were wrong > or she changed into another username. > [image: image.png] > 2.when I created a local.conf and statrted the install under instructions > in https://docs.openstack.org/devstack/latest/, I got this error: devstack/stackrc:834 > Could not determine host ip address. See local.conf for suggestions on > setting HOST_IP. i searched it in Google but can't find effective > solution. > > Thank you for reading this email! I hope my questions will not bother you > too much. I really appreciate it if you could answer them. > > Best regards, > Chelsy Lan > -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 181641 bytes Desc: not available URL: From rdhasman at redhat.com Tue Oct 11 13:16:59 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Tue, 11 Oct 2022 18:46:59 +0530 Subject: [outreachy][cinder]questions about project"create API reference request/response samples" In-Reply-To: References: Message-ID: Hi Olive, On Mon, Oct 10, 2022 at 8:51 PM olive tree wrote: > Hi Cinder Team: > I'm an applicant for Outreachy internship program. I've set up a Gerrit > account successfully and deployed Devstack in a virtual environment. > However, there are a few questions I would like to ask: > > 1. I failed to find enriquetaso in #openstack-cinder OFTC IRC channel, > the list is as follows, I suppose maybe the reason is some steps were wrong > or she changed into another username. > > 2.when I created a local.conf and statrted the install under instructions > in https://docs.openstack.org/devstack/latest/, I got this error: devstack/stackrc:834 > Could not determine host ip address. See local.conf for suggestions on > setting HOST_IP. i searched it in Google but can't find effective > solution. > > Try adding this line in your local.conf and run stack.sh again. The error should be resolved. *HOST_IP=127.0.0.1* For any further query, or if Sofia is not around, you can find me on #openstack-cinder IRC channel with the nick *whoami-rajat*. > Thank you for reading this email! I hope my questions will not bother you > too much. I really appreciate it if you could answer them. > > Best regards, > Chelsy Lan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Tue Oct 11 13:15:05 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Tue, 11 Oct 2022 18:45:05 +0530 Subject: [outreachy][cinder]questions about project"create API reference request/response samples" In-Reply-To: References: Message-ID: Hi Olive, On Mon, Oct 10, 2022 at 8:51 PM olive tree wrote: > Hi Cinder Team: > I'm an applicant for Outreachy internship program. I've set up a Gerrit > account successfully and deployed Devstack in a virtual environment. > However, there are a few questions I would like to ask: > > 1. I failed to find enriquetaso in #openstack-cinder OFTC IRC channel, > the list is as follows, I suppose maybe the reason is some steps were wrong > or she changed into another username. > [image: image.png] > 2.when I created a local.conf and statrted the install under instructions > in https://docs.openstack.org/devstack/latest/, I got this error: devstack/stackrc:834 > Could not determine host ip address. See local.conf for suggestions on > setting HOST_IP. i searched it in Google but can't find effective > solution. > > Try adding this line in your local.conf and run stack.sh again. The error should be resolved. *HOST_IP=127.0.0.1* For any further query, or if Sofia is not around, you can find me on #openstack-cinder IRC channel with the nick *whoami-rajat*. > Thank you for reading this email! I hope my questions will not bother you > too much. I really appreciate it if you could answer them. > > Best regards, > Chelsy Lan > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 181641 bytes Desc: not available URL: From eblock at nde.ag Tue Oct 11 14:16:30 2022 From: eblock at nde.ag (Eugen Block) Date: Tue, 11 Oct 2022 14:16:30 +0000 Subject: [metadata agent & keystone] Remote metadata server experienced an internal error? In-Reply-To: <2610420.159983.1665021619542.JavaMail.root@mailwas2> Message-ID: <20221011141630.Horde.Kba5CrMRH1OhO5TGrGCTMaq@webmail.nde.ag> Does the compute node where that VM is running have correct neutron config (e.g. metadata secret etc.)? Are all VMs affected or do some of them work? Is the neutron-metadata-agent active? Does it log anything useful? Zitat von ??? : > Hello > > I've? installed openstack cluster with ovn and neutron ovn metadata agent. > > Problem is that my VM instances cannot get metadata from metadata agent > > When I try "curl http://169.254.169.254/latest" from VM instance, I get: > > -------------------------------------------------------------------------------------------------- > > > > ? > > ? ? 500 Internal Server Error > > ? > > ? > > ? ?

500 Internal Server Error

> > ? ? Remote metadata server experienced an intertnal server error.
/>
> > ? > > > > -------------------------------------------------------------------------------------------------- > > Only relevant log I could find was in > /var/log/keystone/keystone-wsgi-public.log file. > > -------------------------------------------------------------------------------------------------- > > 2022-10-06 01:49:03.895 3250739 WARNING keystone.server.flask.application > [req-93c1717a-2868-4553-878e-5afd71738195 - - - - -] Authorization failed. > The request you have made requires authentication. from 10.0.10.21: > keystone.exception.Unauthorized: The request you have made requires > authentication. > > -------------------------------------------------------------------------------------------------- > > Everytime I try "curl http://169.254.169.254/latest" inside VM instance, > that "keystone.exception.Unauthorized: The request you have made requires > authentication." log popped up in keystone-wsgi-public.log file. > > It seems like keystone auth related problem, but I can log in to horizon, > create/delete instances, and do other things just fine. > > Only having problem with metadata agent now > > Thank you From tobias.rydberg at cleura.com Tue Oct 11 14:21:01 2022 From: tobias.rydberg at cleura.com (Tobias Rydberg) Date: Tue, 11 Oct 2022 16:21:01 +0200 Subject: [publiccloud-sig] Bi-weekly meeting reminder Message-ID: Hi everyone, Tomorrow it's time again for our bi-weekly meeting, 0800 UTC in #openstack-operators. Notes from previous meeting can be found here [0]. At the same time I would like to push a bit for the operator-focused sessions at the PTG next week. Kendall put together a blogpost [1] where she highlighted them to make it easy for us to find them. Hope to chat with you tomorrow! [0] https://etherpad.opendev.org/p/publiccloud-sig-meeting [1] https://www.openstack.org/blog/calling-all-openstack-operators-the-ptg-starts-monday-and-the-community-needs-your-input/ BR, Tobias Rydberg -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3626 bytes Desc: S/MIME Cryptographic Signature URL: From mrunge at matthias-runge.de Tue Oct 11 15:07:55 2022 From: mrunge at matthias-runge.de (Matthias Runge) Date: Tue, 11 Oct 2022 17:07:55 +0200 Subject: [telemetry][ptg] Virtual PTG Planning Message-ID: <35d752bc-e89b-10d2-2e9f-a73315e7bc17@matthias-runge.de> Hi all, We have two slots for the upcoming PTG and etherpad populated[1]. Please feel encouraged to bring up what you'd think needs discussing. Tuesday, Oct 18 14 UTC Tuesday, Oct 18 15 UTC The full schedule can be found at [2]. [1] https://etherpad.opendev.org/p/oct2022-ptg-telemetry [2] https://ptg.opendev.org/ptg.html#sTuesday Best, Matthias From pierre at stackhpc.com Tue Oct 11 16:02:35 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Tue, 11 Oct 2022 18:02:35 +0200 Subject: [blazar][ptg] Virtual PTG Planning Message-ID: Hello, We have a meeting slot for Blazar in the upcoming PTG: Thursday, October 20, 2022 from 14:00 to 16:00 UTC, in the Diablo room. Please consult the Etherpad for this session [1] and add any topics you would like to discuss. [1] https://etherpad.opendev.org/p/oct2022-ptg-blazar Cheers, Pierre Riteau (priteau) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Tue Oct 11 16:53:31 2022 From: jay at gr-oss.io (Jay Faulkner) Date: Tue, 11 Oct 2022 09:53:31 -0700 Subject: [ironic][stable] Proposing EOL of ironic project branches older than Wallaby In-Reply-To: References: Message-ID: We discussed stable branches in the most recent ironic meeting ( https://meetings.opendev.org/meetings/ironic/2022/ironic.2022-10-10-15.01.log.txt). The decision was made to do the following: EOL these branches: - stable/queens - stable/rocky - stable/stein Reduce testing considerably on these branches, and only backport critical bugfixes or security bugfixes: - stable/train - stable/ussuri - stable/victoria Our remaining branches will continue to get most eligible patches backported to them. This email, plus earlier communications including a tweet, will serve as notice that these branches are being EOL'd. Thanks, Jay Faulkner On Tue, Oct 4, 2022 at 11:18 AM Jay Faulkner wrote: > Hi all, > > Ironic has a large amount of stable branches still in EM. We need to take > action to ensure those branches are either retired or have CI repaired to > the point of being usable. > > Specifically, I'm looking at these branches across all Ironic projects: > - stable/queens > - stable/rocky > - stable/stein > - stable/train > - stable/ussuri > - stable/victoria > > In lieu of any volunteers to maintain the CI, my recommendation for all > the branches listed above is that they be marked EOL. If someone wants to > volunteer to maintain CI for those branches, they can propose one of the > below paths be taken instead: > > 1 - Someone volunteers to maintain these branches, and also report the > status of CI of these older branches periodically on the Ironic whiteboard > and in Ironic meetings. If you feel strongly that one of these branches > needs to continue to be in service; volunteering in this way is how to save > them. > > 2 - We seriously reduce CI. Basically removing all tempest tests to ensure > that CI remains reliable and able to merge emergency or security fixes when > needed. In some cases; this still requires CI fixes as some older inspector > branches are failing *installing packages* in unit tests. I would still > like, in this case, that someone volunteers to ensure the minimalist CI > remains happy. > > My intention is to let this message serve as notice and a waiting period; > and if I've not heard any response here or in Monday's Ironic meeting (in 6 > days), I will begin taking action on retiring these branches. > > This is simply a start; other branches (including bugfix branches) are > also in bad shape in CI, but getting these retired will significantly > reduce the surface area of projects and branches to evaluate. > > I know it's painful to drop support for these branches; but we've provided > good EM support for these branches for a long time and by pruning them > away, we'll be able to save time to dedicate to other items. > > Thanks, > Jay Faulkner > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkoust21 at student.aau.dk Tue Oct 11 17:35:24 2022 From: tkoust21 at student.aau.dk (Thor Christian Koustrup) Date: Tue, 11 Oct 2022 17:35:24 +0000 Subject: can't launch instance Message-ID: <088d92e055e14679976d7f0815b4b398@student.aau.dk> Hello Strato I am facing a problem, I can't launch Instance. I hope you can help me solve this problem. The error massage: "Error: Failed to perform requested operation on instance "moveboks", the instance has an error status: Please try again later [Error: Build of instance 94ebaf03-0aed-4def-bd85-4a7c4ca2e67c aborted: Volume f1a64df6-384e-4f92-b6f9-0092193a5817 did not finish being created even after we waited 0 seconds or 1 attempts. And its status is error.]." Best regards Thor koustrup Student AAU-Copenhagen -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Tue Oct 11 19:26:44 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Tue, 11 Oct 2022 22:26:44 +0300 Subject: can't launch instance In-Reply-To: <088d92e055e14679976d7f0815b4b398@student.aau.dk> References: <088d92e055e14679976d7f0815b4b398@student.aau.dk> Message-ID: Hi there, It worth checking cinder-volume logs, I believe real issue why create ends up with error. Otherwise it's impossible to tell what's the actual reason. Also worth checking that "openstack volume service list" show all services as UP ??, 11 ???. 2022 ?., 20:45 Thor Christian Koustrup : > Hello Strato > > I am facing a problem, I can't launch Instance. I hope you can help me > solve this problem. > > The error massage: > "*Error: *Failed to perform requested operation on instance "moveboks", > the instance has an error status: Please try again later [Error: Build of > instance 94ebaf03-0aed-4def-bd85-4a7c4ca2e67c aborted: Volume > f1a64df6-384e-4f92-b6f9-0092193a5817 did not finish being created even > after we waited 0 seconds or 1 attempts. And its status is error.]." > > Best regards > Thor koustrup > Student AAU-Copenhagen > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Tue Oct 11 22:41:24 2022 From: jay at gr-oss.io (Jay Faulkner) Date: Tue, 11 Oct 2022 15:41:24 -0700 Subject: [ironic] No meeting Monday 10/7; pre-emptied by PTG Message-ID: Our weekly meeting, scheduled for Monday October 17th, is cancelled to permit contributors to attend PTG sessions without worrying about our meeting. Thanks, Jay Faulkner -------------- next part -------------- An HTML attachment was scrubbed... URL: From ces.eduardo98 at gmail.com Tue Oct 11 23:04:00 2022 From: ces.eduardo98 at gmail.com (Carlos Silva) Date: Tue, 11 Oct 2022 20:04:00 -0300 Subject: [PTG][manila] Virtual PTG planning Message-ID: Hello, Zorillas and interested stackers! As mentioned in the previous weekly meeting, the agenda for the next week's PTG is being worked on. I already have a draft with the topic proposals and some time slots [1], please take a look and if you would like some topics to be moved around, please let me know. The agenda on Thursday is likely mutable, considering that this is the day that we might have some cross-project discussions. [1] https://etherpad.opendev.org/p/antelope-ptg-manila Thanks, carloss -------------- next part -------------- An HTML attachment was scrubbed... URL: From folacarine at gmail.com Wed Oct 12 05:09:53 2022 From: folacarine at gmail.com (fola fomduwir carine) Date: Wed, 12 Oct 2022 06:09:53 +0100 Subject: Outreachy Message-ID: An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Oct 12 06:40:55 2022 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 12 Oct 2022 08:40:55 +0200 Subject: [largescale-sig] Next meeting: today October 12th, 15utc Message-ID: <5bb33766-2427-3f18-be3a-0793e757d07b@openstack.org> Hi everyone, Sorry for the late reminder: the Large Scale SIG will be meeting today (Wednesday) in #openstack-operators on OFTC IRC, at 15UTC. You can check how that time translates locally at: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20221012T15 Feel free to add topics to the agenda: https://etherpad.openstack.org/p/large-scale-sig-meeting Regards, -- Thierry Carrez From arxcruz at redhat.com Wed Oct 12 07:38:45 2022 From: arxcruz at redhat.com (Arx Cruz) Date: Wed, 12 Oct 2022 09:38:45 +0200 Subject: [tripleo] Gate blocker Message-ID: Hello, We are facing another gate blocker related to package dependencies https://bugs.launchpad.net/tripleo/+bug/1992560, please do not recheck your patches. We are working to fix the issue. Kind regards, -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Wed Oct 12 08:09:17 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 12 Oct 2022 10:09:17 +0200 Subject: [nova][placement] Nova meetings CANCELLED for Oct-18 and Nov-1 Message-ID: Given next week the PTG will be, we will cancel the next nova meeting (Oct 18) Also, on Nov 1st, most of the contributors are on holiday, so we will also cancel this other meeting. Thanks, -Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucasagomes at gmail.com Wed Oct 12 09:00:14 2022 From: lucasagomes at gmail.com (Lucas Alvares Gomes) Date: Wed, 12 Oct 2022 10:00:14 +0100 Subject: [neutron] Bug Deputy Report October 03 - 10 Message-ID: Hi, This is the Neutron bug report from October 3rd to 10th. Needs further triage: * https://bugs.launchpad.net/neutron/+bug/1992161 - "Unknown quota resource security_group_rule in neutron-rpc-server" - Unassigned Medium: * https://bugs.launchpad.net/neutron/+bug/1991817 - "OVN metadata agent liveness system generate OVN SBDB usage peak" - Assigned to: Krzysztof Tomaszewski * https://bugs.launchpad.net/neutron/+bug/1992109 - "Possible race condition when port unplugged from ovs" - Assigned to: Arnaud Morin * https://bugs.launchpad.net/neutron/+bug/1992352 - "[OVN] POST requests stucks when rabbitmq is not available" - Unassigned Wishlist: * https://bugs.launchpad.net/neutron/+bug/1991965 - " [RFE] Strict minimum bandwidth support for tunnelled networks" - Assigned to: Rodolfo Alonso Cheers, Lucas -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkchn.in at gmail.com Wed Oct 12 09:37:44 2022 From: kkchn.in at gmail.com (KK CHN) Date: Wed, 12 Oct 2022 15:07:44 +0530 Subject: DC DR Setup Queries In-Reply-To: References: Message-ID: few more points to clarify... 1. The VMs in a DataCentre must auto migrated to the DR setup. ( Also the updates / writes to VMs in production in DC should be reflected to the DR copies of VMs like incremental backups. How to achieve this / or what needs to be employed to achieve this requirement. ( What all are the S/W and H/W requirements to achieve the above setup if both DC and DR is planned use OpenStack latest version/s Wallaby/ Xena/ yoga ? ) 2. In the above setups once the DC down or stopped for maintenance, How the IP addresses of each VMs is managed automatically to make all the (application/database server) VMs up and running in DR in case DC down . Is this can be automated? how? Eg: When a VM say X is running in DC it may have an IP 10.10.0. X when it is replicated in DR then it will be with the same IP address right (10.0.0.2) ? But DR Network may be different and cannot have the same IP address as DC right ? Do we need to manually set an IP (Say 10.20.0.X )for each VM which is going to run from the DR site ? Then what about the firewall rules in DR do we need to manipulate for each VM for making the DR up ? Is there a way to automate this ? OR what the automatic mechanism to handle this IP setting up issue ? How normally folks manage this scenario ? Also once your DC site recovered, then We need to Fail back to the DC site from DR with all changes happened to the VMs in DR must be reflected back to the DC site and Fail back.. .How to achieve this ? Kindly shed some light on this with your experience and expertise. What to do ? Where to start ? Which approach to follow to set up a Best failover DC to DR and Failback solution. Thank you, Krish On Tue, Oct 11, 2022 at 5:38 PM KK CHN wrote: > List, > > We are having a client DC running on HP( HP simplivity) HCI servers, > With VMware ( Vsphere 7.0) only few VMs running on it. (11 VMs maximum all > Linux VMs). > > The DR site also having the same HCI setup another location. ( The VMs are > replicated to DR site with HP simplivity). > > We are planning to use Openstack for both DC and DR solutions with Wallaby > or Xena version with KVM as hypervisor to replace the proprietary S/W and > H/W vendor locking. > > The requirement is to Setup a Stable DC- DR solution. > > Totally confused How to setup a best Dc- DR solution for this purpose. > > The DR setup can be possible / advisable with Zero down time ?( or manual > DR site uping with downtime of hours ) ? > > What are the available/suggested DC-DR replication mechanisms for high > degree of application data protection and service availability? > > Kindly advise.. > > Thanks in advance, > Krish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Wed Oct 12 12:54:55 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Wed, 12 Oct 2022 14:54:55 +0200 Subject: [release] Release Management 2023.1 Antelope PTG session Message-ID: <822382f8-1e22-7a45-5683-4d887485dd54@est.tech> Hi, Release Management team Virtual PTG session will take place at Thursday 14:00 UTC - 15:00 UTC in Folsom room. Feel free to visit the session if you have any topic to discuss with us, and please add it to our etherpad in advance. The etherpad can be found at https://etherpad.opendev.org/p/oct2022-ptg-rel-mgt Thanks, El?d Ill?s From lanchengxu0807 at gmail.com Wed Oct 12 13:30:41 2022 From: lanchengxu0807 at gmail.com (chelsy lan) Date: Wed, 12 Oct 2022 21:30:41 +0800 Subject: [outreachy][cinder]questions about project"create API reference request/response samples" In-Reply-To: References: Message-ID: It's working now. Thanks a lot? Rajat Dhasmana ?2022?10?11? ????9:17??? > Hi Olive, > > On Mon, Oct 10, 2022 at 8:51 PM olive tree > wrote: > >> Hi Cinder Team: >> I'm an applicant for Outreachy internship program. I've set up a Gerrit >> account successfully and deployed Devstack in a virtual environment. >> However, there are a few questions I would like to ask: >> >> 1. I failed to find enriquetaso in #openstack-cinder OFTC IRC channel, >> the list is as follows, I suppose maybe the reason is some steps were wrong >> or she changed into another username. >> >> 2.when I created a local.conf and statrted the install under instructions >> in https://docs.openstack.org/devstack/latest/, I got this error: devstack/stackrc:834 >> Could not determine host ip address. See local.conf for suggestions on >> setting HOST_IP. i searched it in Google but can't find effective >> solution. >> >> > Try adding this line in your local.conf and run stack.sh again. The error > should be resolved. > > *HOST_IP=127.0.0.1* > > For any further query, or if Sofia is not around, you can find me on > #openstack-cinder IRC channel with the nick *whoami-rajat*. > > >> Thank you for reading this email! I hope my questions will not bother you >> too much. I really appreciate it if you could answer them. >> >> Best regards, >> Chelsy Lan >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arxcruz at redhat.com Wed Oct 12 14:07:02 2022 From: arxcruz at redhat.com (Arx Cruz) Date: Wed, 12 Oct 2022 16:07:02 +0200 Subject: [tripleo] Gate blocker In-Reply-To: References: Message-ID: Hello, Gate is still blocked, but https://review.opendev.org/c/openstack/tripleo-ci/+/861047 is close to merge, once it merges the games will be restored again. I will update once the patch merges. Kind regards, On Wed, Oct 12, 2022 at 9:38 AM Arx Cruz wrote: > Hello, > > We are facing another gate blocker related to package dependencies > https://bugs.launchpad.net/tripleo/+bug/1992560, please do not recheck > your patches. > We are working to fix the issue. > > Kind regards, > > -- > > Arx Cruz > > Software Engineer > > Red Hat EMEA > > arxcruz at redhat.com > @RedHat Red Hat > Red Hat > > > -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Wed Oct 12 14:35:39 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 12 Oct 2022 11:35:39 -0300 Subject: [cinder] Bug Report 10-12-2022 Message-ID: This is a bug report from 10-05-2022 to 10-12-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- Medium - https://bugs.launchpad.net/cinder/+bug/1992493 "Cinder fails to backup/snapshot/clone/extend volumes when the pool is full." Unassigned. - https://bugs.launchpad.net/os-brick/+bug/1992289 "[Netapp Ontap] iSCSI multipath flush fail with stderr: map in use." Unassigned. - https://bugs.launchpad.net/os-brick/+bug/1992296 "Multipath extend volume not working on running VM randomly." Unassigned. Low - https://bugs.launchpad.net/cinder/+bug/1992160 "[SVf] : lsportip needs to fetch IPs with `host` flag ." Assigned to Kumar Kanishka. - https://bugs.launchpad.net/cinder/+bug/1992292 "cinder backup S3 driver failure: signed integer is greater than maximum." Unassigned. Invalid - https://bugs.launchpad.net/cinder/+bug/1992293 "Cinder backup incremental when volume change size." Unassigned. Cheers, -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From folacarine at gmail.com Wed Oct 12 15:05:40 2022 From: folacarine at gmail.com (fola carine fomduwir) Date: Wed, 12 Oct 2022 16:05:40 +0100 Subject: [cinder] Bug Report 10-12-2022 In-Reply-To: References: Message-ID: Hi , please i need help to start working on a bug Le mer. 12 oct. 2022 ? 15:39, Sofia Enriquez a ?crit : > This is a bug report from 10-05-2022 to 10-12-2022. > Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting > > ----------------------------------------------------------------------------------------- > Medium > > - https://bugs.launchpad.net/cinder/+bug/1992493 "Cinder fails to > backup/snapshot/clone/extend volumes when the pool is full." Unassigned. > - https://bugs.launchpad.net/os-brick/+bug/1992289 "[Netapp Ontap] > iSCSI multipath flush fail with stderr: map in use." Unassigned. > - https://bugs.launchpad.net/os-brick/+bug/1992296 "Multipath extend > volume not working on running VM randomly." Unassigned. > > Low > > - https://bugs.launchpad.net/cinder/+bug/1992160 "[SVf] : lsportip > needs to fetch IPs with `host` flag ." Assigned to Kumar Kanishka. > - https://bugs.launchpad.net/cinder/+bug/1992292 "cinder backup S3 > driver failure: signed integer is greater than maximum." Unassigned. > > Invalid > > - https://bugs.launchpad.net/cinder/+bug/1992293 "Cinder backup > incremental when volume change size." Unassigned. > > > Cheers, > > -- > > Sof?a Enriquez > > she/her > > Software Engineer > > Red Hat PnT > > IRC: @enriquetaso > @RedHat Red Hat > Red Hat > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-francois.taltavull at elca.ch Wed Oct 12 15:17:43 2022 From: jean-francois.taltavull at elca.ch (=?iso-8859-1?Q?Taltavull_Jean-Fran=E7ois?=) Date: Wed, 12 Oct 2022 15:17:43 +0000 Subject: [Ceilometer] Dynamic pollsters : dot in JSON keyname Message-ID: Hello Rafael, To get the the size, in GB, occupied by buckets I need to manipulate a JSON key which contains a dot in its name: ```` - name: "radosgw.containers.objects.size" sample_type: "gauge" unit: "B" value_attribute: "rgw.main.size" <------------------------------------"rgw.main" is a JSON key, with a dot in its name, which belongs to "bucket.usage" JSON container url_path: "http://FQDN/admin/bucket?stats=True" module: "awsauth" authentication_object: "S3Auth" authentication_parameters: my_access_key,my_secret_key,FQDN user_id_attribute: "owner" project_id_attribute: "tenant" resource_id_attribute: "id" response_entries_key: "usage" ```` But with this dynamic pollster definition, I get the python error "KeyError: 'rgw' ". In this case, is there a specific syntax to define "value_attribute" or am I doing something the wrong way ? Jean-Francois From alex.kavanagh at canonical.com Wed Oct 12 15:39:23 2022 From: alex.kavanagh at canonical.com (Alex Kavanagh) Date: Wed, 12 Oct 2022 16:39:23 +0100 Subject: [charms][ptg] PTG Topics and Meetings Message-ID: Hi All The openstack-charms track meetings are: - Tuesday 18th at 14UTC in icehouse - Thursday 20th at 14UTC in havana The etherpad for the meetings is at: https://etherpad.opendev.org/p/oct2022-ptg-openstack-charms Please feel free to add any topics that you would be interested in chatting about. Most of the core team will be in attendance to discuss any issues or topics that you are interested in. We'll also be going over what's coming up in the OpenStack charms project(s), and doing a demo of some of the new sunbeam kubernetes-based charms. Look forward to seeing you there Cheers Alex. -- Alex Kavanagh - Software Engineer OpenStack Engineering - Canonical Ltd -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Oct 12 16:34:29 2022 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 12 Oct 2022 18:34:29 +0200 Subject: [largescale-sig] Next meeting: today October 12th, 15utc In-Reply-To: <5bb33766-2427-3f18-be3a-0793e757d07b@openstack.org> References: <5bb33766-2427-3f18-be3a-0793e757d07b@openstack.org> Message-ID: Hi everyone, Here is the summary of our SIG meeting today. You can read the detailed meeting logs at: https://meetings.opendev.org/meetings/large_scale_sig/2022/large_scale_sig.2022-10-12-15.01.html We will skip the meeting in two weeks. Our next regular IRC meeting will be November 9, at 1500utc on #openstack-operators on OFTC. Regards, -- Thierry Carrez From arxcruz at redhat.com Wed Oct 12 17:48:15 2022 From: arxcruz at redhat.com (Arx Cruz) Date: Wed, 12 Oct 2022 19:48:15 +0200 Subject: [tripleo] Gate blocker In-Reply-To: References: Message-ID: Hello, Patch merged, everything is now back to normal. Kind regards On Wed, Oct 12, 2022 at 4:07 PM Arx Cruz wrote: > Hello, > > Gate is still blocked, but > https://review.opendev.org/c/openstack/tripleo-ci/+/861047 is close to > merge, once it merges the games will be restored again. I will update once > the patch merges. > > Kind regards, > > On Wed, Oct 12, 2022 at 9:38 AM Arx Cruz wrote: > >> Hello, >> >> We are facing another gate blocker related to package dependencies >> https://bugs.launchpad.net/tripleo/+bug/1992560, please do not recheck >> your patches. >> We are working to fix the issue. >> >> Kind regards, >> >> -- >> >> Arx Cruz >> >> Software Engineer >> >> Red Hat EMEA >> >> arxcruz at redhat.com >> @RedHat Red Hat >> Red Hat >> >> >> > > > -- > > Arx Cruz > > Software Engineer > > Red Hat EMEA > > arxcruz at redhat.com > @RedHat Red Hat > Red Hat > > > -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From corey.bryant at canonical.com Wed Oct 12 19:41:48 2022 From: corey.bryant at canonical.com (Corey Bryant) Date: Wed, 12 Oct 2022 15:41:48 -0400 Subject: OpenStack Zed for Ubuntu 22.04 LTS Message-ID: The Ubuntu OpenStack team at Canonical is pleased to announce the general availability of OpenStack Zed on Ubuntu 22.04 LTS (Jammy Jellyfish). Details of the Zed release can be found at: https://www.openstack.org/software/zed To get access to the Ubuntu Zed packages: == Ubuntu 20.04 LTS == The Ubuntu Cloud Archive for OpenStack Zed can be enabled on Ubuntu 22.04 by running the following command: sudo add-apt-repository cloud-archive:zed The Ubuntu Cloud Archive for Zed includes updates for: aodh, barbican, ceilometer, cinder, designate, designate-dashboard, glance, gnocchi, heat, heat-dashboard, horizon, ironic, ironic-ui, keystone, magnum, magnum-ui, manila, manila-ui, masakari, mistral, murano, murano-dashboard, networking-arista, networking-bagpipe, networking-baremetal, networking-bgpvpn, networking-hyperv, networking-l2gw, networking-mlnx, networking-odl, networking-sfc, neutron, neutron-dynamic-routing, neutron-fwaas, neutron-taas, neutron-vpnaas, nova, octavia, octavia-dashboard, openstack-trove, ovn-octavia-provider, placement, sahara, sahara-dashboard, senlin, swift, trove-dashboard, vitrage, watcher, watcher-dashboard, zaqar, and zaqar-ui. For a full list of packages and versions, please refer to: https://openstack-ci-reports.ubuntu.com/reports/cloud-archive/zed_versions.html == Reporting bugs == If you have any issues please report bugs using the ?ubuntu-bug? tool to ensure that bugs get logged in the right place in Launchpad: sudo ubuntu-bug nova-conductor Thank you to everyone who contributed to OpenStack Zed! Corey (on behalf of the Ubuntu OpenStack Engineering team) -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed Oct 12 20:23:18 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 12 Oct 2022 20:23:18 +0000 Subject: [dev][infra][tact-sig] Default Zuul nodeset changing to ubuntu-jammy Message-ID: <20221012202317.5jl6uqgdv3ok2enq@yuggoth.org> Just wanted to call everyone's attention to an OpenDev Collaboratory announcement[*] this week. The tl;dr is that the default job nodeset will be changing from ubuntu-focal to ubuntu-jammy on 2022-10-25. This should be fairly low-impact for OpenStack projects, since we're now into the very early part of the 2023.1/Antelope release cycle and development branch unit test job templates have already been updated to run things on the newer Python version it supplies. Just be aware that any jobs which don't already specify a particular nodeset (or inherit one from a parent) will be following the default when it changes. An easy temporary workaround is to set affected jobs back to ubuntu-focal while you test solutions allowing removal of the override. Also, remember that switching development from Focal to Jammy was accepted[**] as a cross-project goal, so it needs to get done at some point soon for OpenStack projects anyway. [*] https://lists.opendev.org/pipermail/service-announce/2022-October/000047.html [**] https://governance.openstack.org/tc/goals/selected/migrate-ci-jobs-to-ubuntu-jammy.html -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gmann at ghanshyammann.com Wed Oct 12 21:28:53 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 12 Oct 2022 14:28:53 -0700 Subject: [dev][infra][tact-sig] Default Zuul nodeset changing to ubuntu-jammy In-Reply-To: <20221012202317.5jl6uqgdv3ok2enq@yuggoth.org> References: <20221012202317.5jl6uqgdv3ok2enq@yuggoth.org> Message-ID: <183ce19ae66.121ee11bb64504.568008654450510930@ghanshyammann.com> ---- On Wed, 12 Oct 2022 13:23:18 -0700 Jeremy Stanley wrote --- > Just wanted to call everyone's attention to an OpenDev Collaboratory > announcement[*] this week. The tl;dr is that the default job nodeset > will be changing from ubuntu-focal to ubuntu-jammy on 2022-10-25. > > This should be fairly low-impact for OpenStack projects, since we're > now into the very early part of the 2023.1/Antelope release cycle > and development branch unit test job templates have already been > updated to run things on the newer Python version it supplies. Just > be aware that any jobs which don't already specify a particular > nodeset (or inherit one from a parent) will be following the default > when it changes. An easy temporary workaround is to set affected > jobs back to ubuntu-focal while you test solutions allowing removal > of the override. Right, I think it will not impact OpenStack jobs as we have pinned the nodeset to focal for old existing jobs like py39 or pep8 or tox base job (py310 job already running on Jammy) - Example: https://github.com/openstack/openstack-zuul-jobs/blob/7e7045ab92b0b9db28f24fe9a38f914f74174938/zuul.d/jobs.yaml#L262 As part of the community-wide goal for 2023.1 cycle, we are going to start the testing soon and based on testing all the projects/repo first and then we can move the base OpenStack tox, devstack base, tempest base, and projects jobs to Jammy. -gmann > > Also, remember that switching development from Focal to Jammy was > accepted[**] as a cross-project goal, so it needs to get done at > some point soon for OpenStack projects anyway. > > [*] https://lists.opendev.org/pipermail/service-announce/2022-October/000047.html > [**] https://governance.openstack.org/tc/goals/selected/migrate-ci-jobs-to-ubuntu-jammy.html > -- > Jeremy Stanley > From rlandy at redhat.com Wed Oct 12 21:54:54 2022 From: rlandy at redhat.com (Ronelle Landy) Date: Wed, 12 Oct 2022 17:54:54 -0400 Subject: [tripleo] Gate blocker In-Reply-To: References: Message-ID: Hello, It appears that the fix to the previous gate blocker causes another one: Unable to freeze job graph: Job tripleo-ci-centos-9-undercloud-upgrade depends on tripleo-ci-centos-9-content-provider-zed which was not run. Details of this bug and the proposed fix are in: https://bugs.launchpad.net/tripleo/+bug/1992699 We will update this thread as the patch progresses. Thanks! On Wed, Oct 12, 2022 at 1:54 PM Arx Cruz wrote: > Hello, > > Patch merged, everything is now back to normal. > > Kind regards > > On Wed, Oct 12, 2022 at 4:07 PM Arx Cruz wrote: > >> Hello, >> >> Gate is still blocked, but >> https://review.opendev.org/c/openstack/tripleo-ci/+/861047 is close to >> merge, once it merges the games will be restored again. I will update once >> the patch merges. >> >> Kind regards, >> >> On Wed, Oct 12, 2022 at 9:38 AM Arx Cruz wrote: >> >>> Hello, >>> >>> We are facing another gate blocker related to package dependencies >>> https://bugs.launchpad.net/tripleo/+bug/1992560, please do not recheck >>> your patches. >>> We are working to fix the issue. >>> >>> Kind regards, >>> >>> -- >>> >>> Arx Cruz >>> >>> Software Engineer >>> >>> Red Hat EMEA >>> >>> arxcruz at redhat.com >>> @RedHat Red Hat >>> Red Hat >>> >>> >>> >> >> >> -- >> >> Arx Cruz >> >> Software Engineer >> >> Red Hat EMEA >> >> arxcruz at redhat.com >> @RedHat Red Hat >> Red Hat >> >> >> > > > -- > > Arx Cruz > > Software Engineer > > Red Hat EMEA > > arxcruz at redhat.com > @RedHat Red Hat > Red Hat > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasmulhall410 at yahoo.com Wed Oct 12 23:40:15 2022 From: thomasmulhall410 at yahoo.com (Paladox) Date: Wed, 12 Oct 2022 23:40:15 +0000 (UTC) Subject: How do I disable replication in swift In-Reply-To: <20220930164121.7caa41cb@niphredil.zaitcev.lan> References: <1112921123.2169438.1663847346908.ref@mail.yahoo.com> <1112921123.2169438.1663847346908@mail.yahoo.com> <1295063360.641866.1663941856275@mail.yahoo.com> <527985954.3334066.1663961171011@mail.yahoo.com> <2052306468.3384214.1663968003125@mail.yahoo.com> <20220930164121.7caa41cb@niphredil.zaitcev.lan> Message-ID: <272477624.652893.1665618015695@mail.yahoo.com> Thanks! So to be clear, the replication value of 1 means it won't replicate and that it's not 1 + n (n being one in this case). On Friday, 30 September 2022, 22:45:12 BST, Pete Zaitcev wrote: On Fri, 23 Sep 2022 21:20:03 +0000 (UTC) Paladox wrote: >? We have around 3.8tb but we're wanting to move from gluster to swift (which we're currently using?2.6tb). We're a small not-for-profit in the UK. We don't have the funds to support replicating data right now. You are setting yourself for a data loss and then you'll inevitably blame Swift, even though we told you not to do that. -- Pete -------------- next part -------------- An HTML attachment was scrubbed... URL: From ildiko.vancsa at gmail.com Thu Oct 13 04:21:03 2022 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Wed, 12 Oct 2022 21:21:03 -0700 Subject: Edge Computing Group sessions at the PTG Message-ID: Hi All, I?m reaching out with a quick update regarding the OpenInfra Edge Computing Group sessions at the upcoming PTG next week. The group settled on booking a Monday time slot that overlaps with our usual meeting hour: __Monday (October 17) at 1300 UTC - 1600 UTC__. Our main discussion topics will include: * Edge use cases in production * Day 0 - prepare your edge deployment * Roadmap for the working group Beth Cohen from Verizon will also give a presentation during our scheduled time. For further information please see our etherpad: https://etherpad.opendev.org/p/ecg-ptg-october-2022 Looking forward to seeing you at the event! Best Regards, Ildik? ??? Ildik? V?ncsa Senior Manager, Community & Ecosystem Open Infrastructure Foundation From gmann at ghanshyammann.com Thu Oct 13 06:22:44 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 12 Oct 2022 23:22:44 -0700 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Oct 13 at 1500 UTC In-Reply-To: <183c38e1c91.112e685a2494298.7607084177217213450@ghanshyammann.com> References: <183c38e1c91.112e685a2494298.7607084177217213450@ghanshyammann.com> Message-ID: <183d0027286.f6a73c6371346.6083621439966226551@ghanshyammann.com> Hello Everyone, Below is the agenda for Tomorrow's TC meeting scheduled at 1500 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting * Roll call * Follow up on past action items * Gate health check ** Bare 'recheck' state *** https://etherpad.opendev.org/p/recheck-weekly-summary ** Zuul config error *** https://etherpad.opendev.org/p/zuul-config-error-openstack * 2023.1 cycle PTG Planning ** TC + Leaders interaction sessions *** https://etherpad.opendev.org/p/tc-leaders-interaction-2023-1 ** TC PTG etherpad *** https://etherpad.opendev.org/p/tc-2023-1-ptg ** Schedule 'operator hours' *** https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030301.html * 2023.1 cycle Technical Election & Leaderless projects ** Leaderless projects *** https://etherpad.opendev.org/p/2023.1-leaderless * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open -gmann ---- On Mon, 10 Oct 2022 13:20:14 -0700 Ghanshyam Mann wrote --- > Hello Everyone, > > The technical Committee's next weekly meeting is scheduled for 2022 Oct 13, at 1500 UTC. > > If you would like to add topics for discussion, please add them to the below wiki page by > Wednesday, Oct 12 at 2100 UTC. > > https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting > > -gmann > > From arxcruz at redhat.com Thu Oct 13 07:05:04 2022 From: arxcruz at redhat.com (Arx Cruz) Date: Thu, 13 Oct 2022 09:05:04 +0200 Subject: [tripleo] Gate blocker In-Reply-To: References: Message-ID: Hello, The patch has been merged. Kind regards, Arx Cruz On Wed, Oct 12, 2022 at 11:55 PM Ronelle Landy wrote: > Hello, > > It appears that the fix to the previous gate blocker causes another one: > > Unable to freeze job graph: Job tripleo-ci-centos-9-undercloud-upgrade > depends on tripleo-ci-centos-9-content-provider-zed which was not run. > > Details of this bug and the proposed fix are in: > > https://bugs.launchpad.net/tripleo/+bug/1992699 > > We will update this thread as the patch progresses. > > Thanks! > > On Wed, Oct 12, 2022 at 1:54 PM Arx Cruz wrote: > >> Hello, >> >> Patch merged, everything is now back to normal. >> >> Kind regards >> >> On Wed, Oct 12, 2022 at 4:07 PM Arx Cruz wrote: >> >>> Hello, >>> >>> Gate is still blocked, but >>> https://review.opendev.org/c/openstack/tripleo-ci/+/861047 is close to >>> merge, once it merges the games will be restored again. I will update once >>> the patch merges. >>> >>> Kind regards, >>> >>> On Wed, Oct 12, 2022 at 9:38 AM Arx Cruz wrote: >>> >>>> Hello, >>>> >>>> We are facing another gate blocker related to package dependencies >>>> https://bugs.launchpad.net/tripleo/+bug/1992560, please do not recheck >>>> your patches. >>>> We are working to fix the issue. >>>> >>>> Kind regards, >>>> >>>> -- >>>> >>>> Arx Cruz >>>> >>>> Software Engineer >>>> >>>> Red Hat EMEA >>>> >>>> arxcruz at redhat.com >>>> @RedHat Red Hat >>>> Red Hat >>>> >>>> >>>> >>> >>> >>> -- >>> >>> Arx Cruz >>> >>> Software Engineer >>> >>> Red Hat EMEA >>> >>> arxcruz at redhat.com >>> @RedHat Red Hat >>> Red Hat >>> >>> >>> >> >> >> -- >> >> Arx Cruz >> >> Software Engineer >> >> Red Hat EMEA >> >> arxcruz at redhat.com >> @RedHat Red Hat >> Red Hat >> >> >> > -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Thu Oct 13 07:32:40 2022 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Thu, 13 Oct 2022 09:32:40 +0200 Subject: [ussuri] controller has crond PAM errors Message-ID: Hi all, First of all, I understand it is old release, but, maybe it is present in newer also? Running centos8Linux and centos repos based OSP installation (not delorean). I am scrolling through logs and found that cron is having some curious log line related to pam. curious if that is normal? /var/log/cron:Oct 12 00:01:01 c506-ctrl-0 crond[770496]: (cinder) PAM ERROR (Cannot make/remove an entry for the specified session) /var/log/cron:Oct 12 00:01:01 c506-ctrl-0 crond[770496]: (cinder) FAILED to open PAM security session (Cannot make/remove an entry for the specified session) /var/log/cron:Oct 12 00:01:01 c506-ctrl-0 crond[770426]: (heat) PAM ERROR (Cannot make/remove an entry for the specified session) /var/log/cron:Oct 12 00:01:01 c506-ctrl-0 crond[770881]: (nova) PAM ERROR (Cannot make/remove an entry for the specified session) /var/log/cron:Oct 12 00:01:01 c506-ctrl-0 crond[770426]: (heat) FAILED to open PAM security session (Cannot make/remove an entry for the specified session) /var/log/cron:Oct 12 00:01:01 c506-ctrl-0 crond[770881]: (nova) FAILED to open PAM security session (Cannot make/remove an entry for the specified session) /var/log/cron:Oct 12 05:00:01 c506-ctrl-0 crond[773036]: (nova) PAM ERROR (Cannot make/remove an entry for the specified session) /var/log/cron:Oct 12 05:00:01 c506-ctrl-0 crond[773036]: (nova) FAILED to open PAM security session (Cannot make/remove an entry for the specified session) /var/log/secure:Oct 12 00:01:01 c506-ctrl-0 crond[770496]: pam_loginuid(crond:session): Error writing /proc/self/loginuid: Operation not permitted /var/log/secure:Oct 12 00:01:01 c506-ctrl-0 crond[770496]: pam_loginuid(crond:session): set_loginuid failed /var/log/secure:Oct 12 00:01:01 c506-ctrl-0 crond[770881]: pam_loginuid(crond:session): Error writing /proc/self/loginuid: Operation not permitted /var/log/secure:Oct 12 00:01:01 c506-ctrl-0 crond[770426]: pam_loginuid(crond:session): Error writing /proc/self/loginuid: Operation not permitted /var/log/secure:Oct 12 00:01:01 c506-ctrl-0 crond[770881]: pam_loginuid(crond:session): set_loginuid failed /var/log/secure:Oct 12 00:01:01 c506-ctrl-0 crond[770426]: pam_loginuid(crond:session): set_loginuid failed /var/log/secure:Oct 12 05:00:01 c506-ctrl-0 crond[773036]: pam_loginuid(crond:session): Error writing /proc/self/loginuid: Operation not permitted /var/log/secure:Oct 12 05:00:01 c506-ctrl-0 crond[773036]: pam_loginuid(crond:session): set_loginuid failed -- Ruslanas G?ibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkopec at redhat.com Thu Oct 13 08:51:57 2022 From: mkopec at redhat.com (Martin Kopec) Date: Thu, 13 Oct 2022 10:51:57 +0200 Subject: [interop][ptg] Virtual PTG Planning Message-ID: Hello everyone, here is [1] our etherpad for Antelope PTG with the topics we're planning to discuss. If there is anything you would like to bring up with the interop team, feel free to let us know. We've booked 3 one hour slots: * Monday (October 17th) 13 - 14 UTC @ kilo * Tuesday (October 18th) 15 - 16 UTC @ kilo * Thursday (October 20th) 15 - 16 UTC @ liberty In case we'll see a need for more, we can always book something else later. [1] https://etherpad.opendev.org/p/antelope-ptg-interop Thanks, -- Martin Kopec Senior Software Quality Engineer Red Hat EMEA IM: kopecmartin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Thu Oct 13 11:41:02 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Thu, 13 Oct 2022 08:41:02 -0300 Subject: [Ceilometer] Dynamic pollsters : dot in JSON keyname In-Reply-To: References: Message-ID: In such cases, you need to use the "Dynamic pollsters operations" to process the sample, and retrieve the key as "rgw.main.size". By default, this value ("rgw.main.size") is interpreted as a nested dictionary. I mean, an object with a key "rgw", and then, another one that has a key "main", where there is a dict, with a key "size". To handle such cases, you would need something similar to: `value_attribute: "usage || value['rgw.main'] | value['size']"`. However, that might not address all use cases. You will also need to handle situations when there is no key "rgw.main" in the response samples. On Wed, Oct 12, 2022 at 12:17 PM Taltavull Jean-Fran?ois < jean-francois.taltavull at elca.ch> wrote: > Hello Rafael, > > To get the the size, in GB, occupied by buckets I need to manipulate a > JSON key which contains a dot in its name: > > ```` > - name: "radosgw.containers.objects.size" > sample_type: "gauge" > unit: "B" > value_attribute: "rgw.main.size" > <------------------------------------"rgw.main" is a JSON key, with a dot > in its name, which belongs to "bucket.usage" JSON container > url_path: "http://FQDN/admin/bucket?stats=True" > module: "awsauth" > authentication_object: "S3Auth" > authentication_parameters: my_access_key,my_secret_key,FQDN > user_id_attribute: "owner" > project_id_attribute: "tenant" > resource_id_attribute: "id" > response_entries_key: "usage" > ```` > > But with this dynamic pollster definition, I get the python error > "KeyError: 'rgw' ". > > In this case, is there a specific syntax to define "value_attribute" or am > I doing something the wrong way ? > > > Jean-Francois > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From pdeore at redhat.com Thu Oct 13 11:50:31 2022 From: pdeore at redhat.com (Pranali Deore) Date: Thu, 13 Oct 2022 17:20:31 +0530 Subject: [Glance][PTG] Antelope PTG Schedule Message-ID: Hello All, Antelope PTG is going to start next week and we have created our PTG etherpad [1] and also added day wise topics along with timings we are going to discuss. Kindly let me know if you have any concerns with allotted time slots. Friday is reserved for any unplanned discussions. So please feel free to add your topics if you haven't added yet. As a reminder, these are the time slots for our discussion. Tuesday 18 OCT 2022 1400 UTC to 1700 UTC Wednesday 19 OCT 2022 1400 UTC to 1700 UTC Thursday 20 OCT 2022 1400 UTC to 1700 UTC Friday 21 OCT 2022 1400 UTC to 1700 UTC NOTE: We have booked glance operator hours on Thursday at 1600 UTC(we can extend it if required), let us know your availability for the same. At the moment we don't have any sessions scheduled on Friday, if there are any last moment request(s)/topic(s) we will discuss that on Friday else we will conclude our PTG on Thursday 20th OCT. We will be using bluejeans for our discussion, kindly try to use it once before the actual discussion. The meeting URL is mentioned in etherpad [1] and will be the same throughout the PTG. [1] https://etherpad.opendev.org/p/antelope-glance-ptg Hope to see you there!! Thanks & Regards, Pranali Deore -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at mknet.nl Thu Oct 13 06:57:15 2022 From: openstack at mknet.nl (Marcel) Date: Thu, 13 Oct 2022 08:57:15 +0200 Subject: [swift] upgrade newton to train in containers Message-ID: <683435eb7b0602a347a2c38c7a73f373@mknet.nl> I'm planning an upgrade for our newton based swift cluster (OOO vm based) to a (kolla image based containers) train cluster So far I have tested the upgrade for the proxies, account_servers and container servers and it looks promising. I have in a test environment: - switched to the new proxies with the old ring files and it looks like everything works normally - Added the new (train) account and container servers to the rings and it looks like all is fine - Removed the old account and container servers, still fine - tested fall back, also fine My question actually is: Did the account and container database format change between newton and train in such a way that I might run into troubles trying the upgrade as tested above in ways that I did not yet foresee? Thanks Marcel From ma-ooyama at kddi.com Thu Oct 13 07:42:03 2022 From: ma-ooyama at kddi.com (ma-ooyama at kddi.com) Date: Thu, 13 Oct 2022 07:42:03 +0000 Subject: [tacker][ptg] Announcement about operator hours Message-ID: Hello all: The "operator-hour-tacker" has been reserved on 19-Oct Wednesday from 04UTC to 05UTC. This is the oppotunity to share operator's opinion with developers. Please feel free for adding any topics you'd like to discuss on the etherpad [1]. [1]https://etherpad.opendev.org/p/oct2022-ptg-operator-hour-tacker Thanks, Masaki From akanevsk at redhat.com Thu Oct 13 13:53:50 2022 From: akanevsk at redhat.com (Arkady Kanevsky) Date: Thu, 13 Oct 2022 08:53:50 -0500 Subject: [interop][ptg] Virtual PTG Planning In-Reply-To: References: Message-ID: Martin, I will join Monday and Th. Can not do Tu. Thanks, Arkady On Thu, Oct 13, 2022 at 3:52 AM Martin Kopec wrote: > Hello everyone, > > here is [1] our etherpad for Antelope PTG with the topics we're planning > to discuss. If there is anything you would like to bring up with the > interop team, feel free to let us know. > We've booked 3 one hour slots: > * Monday (October 17th) 13 - 14 UTC @ kilo > * Tuesday (October 18th) 15 - 16 UTC @ kilo > * Thursday (October 20th) 15 - 16 UTC @ liberty > > In case we'll see a need for more, we can always book something else later. > > [1] https://etherpad.opendev.org/p/antelope-ptg-interop > > Thanks, > -- > Martin Kopec > Senior Software Quality Engineer > Red Hat EMEA > IM: kopecmartin > > > > -- Arkady Kanevsky, Ph.D. Phone: 972 707-6456 Corporate Phone: 919 729-5744 ext. 8176456 -------------- next part -------------- An HTML attachment was scrubbed... URL: From artem.goncharov at gmail.com Thu Oct 13 14:03:29 2022 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Thu, 13 Oct 2022 16:03:29 +0200 Subject: [sdk][ptg] Operator hours - SDK/Cli Message-ID: <379339AB-5E3D-4AD7-826F-EE18F4A03C9C@gmail.com> Hi all, This is no an official operator hour for sdk/cli, but all operators interested to hear/ask/share are welcome to join us during PTG on Friday 21.10 starting from 14:00 UTC (open end) Feel free to add topics of interest under Operator Hour item on https://etherpad.opendev.org/p/oct2022-ptg-sdk-cli Would be glad to get you there and talk about issues you face Artem -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucioseki at gmail.com Thu Oct 13 16:30:20 2022 From: lucioseki at gmail.com (Lucio Seki) Date: Thu, 13 Oct 2022 13:30:20 -0300 Subject: [glance] Slow image download when using glanceclient Message-ID: Hi glance experts, I'm using the following code to download a glance image: ``` from glanceapi import client ... glance = client.Client(GLANCE_API_VERSION, session=sess) ... with open(path, 'wb') as image_file: data = glance.images.data(image_id) for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): image_file.write(chunk) ``` And I get a speed around 3kB/s. It would take months to download an image. I'm using python3-glanceclient==3.6.0. I even tried: ``` for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): pass ``` to see if the bottleneck was the disk I/O, but didn't get any faster. In the same environment, when I use the glance CLI instead: ``` glance image-download --file $path $image_id ``` I get hundreds of MB/s download speed, and it finishes in a few minutes. Is there anything I can do to improve the glanceclient performance? I'm considering using subprocess.Popen(['glance', 'image-download', ...]) if nothing helps... Regards, Lucio -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Oct 13 17:18:17 2022 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Oct 2022 18:18:17 +0100 Subject: [glance] Slow image download when using glanceclient In-Reply-To: References: Message-ID: <0bf73382755a1b3cf776932bbcc658ea250c23fa.camel@redhat.com> On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote: > Hi glance experts, > > I'm using the following code to download a glance image: > > ``` > from glanceapi import client > ... > glance = client.Client(GLANCE_API_VERSION, session=sess) > ... > with open(path, 'wb') as image_file: > data = glance.images.data(image_id) > for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): > image_file.write(chunk) > ``` > > And I get a speed around 3kB/s. It would take months to download an image. > I'm using python3-glanceclient==3.6.0. > I even tried: > ``` > for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): > pass > ``` > to see if the bottleneck was the disk I/O, but didn't get any faster. > > In the same environment, when I use the glance CLI instead: > > ``` > glance image-download --file $path $image_id > ``` > I get hundreds of MB/s download speed, and it finishes in a few minutes. > > Is there anything I can do to improve the glanceclient performance? > I'm considering using subprocess.Popen(['glance', 'image-download', ...]) > if nothing helps... have you considered using the openstacksdk instead the glanceclint is really only intendeted for other openstack service to use like nova or ironic. its not really ment to be used to write your onw code anymore. in the past it provided a programatic interface for interacting with glance but now you shoudl prefer the openstack sdk instead. https://github.com/openstack/openstacksdk > > Regards, > Lucio From allison at openinfra.dev Thu Oct 13 17:37:32 2022 From: allison at openinfra.dev (Allison Price) Date: Thu, 13 Oct 2022 12:37:32 -0500 Subject: [ptls][tc] 2022 OpenStack User Survey Project Question Responses Message-ID: <236718A0-249C-42D6-ABB8-AE65C5BFA6A6@openinfra.dev> Hi everyone, Please find attached the responses to the project questions from the 2022 OpenStack User Survey. Based on feedback last year, I included additional, non-identifiable information that will hopefully help provident deployment context for the responses to your questions. If you need a reminder of your project question, you can review the OpenStack User Survey [1]. During the PTG, I would encourage you and your teams to review the responses and decide if you would like to make any changes to your question for the 2023 OpenStack User Survey. It is live now, but we can make changes ahead of significant promotion. Please reach out to me directly with any changes. If you have any questions on how to read the results, please let me know. Have a great week at the PTG! Cheers, Allison [1] https://www.openstack.org/usersurvey -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenStackUserSurvey22.csv Type: text/csv Size: 323827 bytes Desc: not available URL: From lucioseki at gmail.com Thu Oct 13 19:21:07 2022 From: lucioseki at gmail.com (Lucio Seki) Date: Thu, 13 Oct 2022 16:21:07 -0300 Subject: [glance] Slow image download when using glanceclient In-Reply-To: <0bf73382755a1b3cf776932bbcc658ea250c23fa.camel@redhat.com> References: <0bf73382755a1b3cf776932bbcc658ea250c23fa.camel@redhat.com> Message-ID: Thanks Sean, that makes much easier to code! ``` ... conn = openstack.connect(cloud_name) with open(path, 'wb') as image_file: response = conn.image.download_image(image_name) for chunk in tqdm(response.iter_content(), **tqdm_params): image_file.write(chunk) ``` And it gave me some performance improvement (3kB/s -> 120kB/s). ... though it would still take several days to download an image. Is there some tuning that I could apply? On Thu, Oct 13, 2022, 14:18 Sean Mooney wrote: > On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote: > > Hi glance experts, > > > > I'm using the following code to download a glance image: > > > > ``` > > from glanceapi import client > > ... > > glance = client.Client(GLANCE_API_VERSION, session=sess) > > ... > > with open(path, 'wb') as image_file: > > data = glance.images.data(image_id) > > for chunk in tqdm(data, unit='B', unit_scale=True, > unit_divisor=1024): > > image_file.write(chunk) > > ``` > > > > And I get a speed around 3kB/s. It would take months to download an > image. > > I'm using python3-glanceclient==3.6.0. > > I even tried: > > ``` > > for chunk in tqdm(data, unit='B', unit_scale=True, > unit_divisor=1024): > > pass > > ``` > > to see if the bottleneck was the disk I/O, but didn't get any faster. > > > > In the same environment, when I use the glance CLI instead: > > > > ``` > > glance image-download --file $path $image_id > > ``` > > I get hundreds of MB/s download speed, and it finishes in a few minutes. > > > > Is there anything I can do to improve the glanceclient performance? > > I'm considering using subprocess.Popen(['glance', 'image-download', ...]) > > if nothing helps... > have you considered using the openstacksdk instead > > the glanceclint is really only intendeted for other openstack service to > use like > nova or ironic. > its not really ment to be used to write your onw code anymore. > in the past it provided a programatic interface for interacting with glance > but now you shoudl prefer the openstack sdk instead. > https://github.com/openstack/openstacksdk > > > > > Regards, > > Lucio > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Oct 13 19:52:03 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 13 Oct 2022 21:52:03 +0200 Subject: [all][openstack-dev][ptls] Migrating devstack jobs to Jammy (Ubuntu LTS 22.04) Message-ID: Hi everyone, According to a 2023.1 community-wide goal [1], base-jobs including but not limited to devstack-minimal, devstack, devstack-ipv6, devstack-multinode, tox, will be switched from Ubuntu Focal (20.04) to Jammy (22.04). That will also bring python 3.10 as a default python interpreter for these jobs. Migration will affect most of the projects, so to make it smooth, we will split the process into the following steps: 1. Patches for switching base jobs to Jammy will be proposed and marked as WIP (Work In Progress). It will allow projects to check if their code is compatible without breaking gates. These patches are ready and you can check them out here [2] 2. All projects should ensure that the code is compatible and merge changes if required. To properly track and test these changes you should: * Create a DNM patch that will contain in its commit message: "Depends-On: https://review.opendev.org/c/openstack/tempest/+/861110" to consume new nodesets. You can check the sample patch for Nova below [3] * In case of any job failure due to Ubuntu 22.04 or Python 3.10, land requires changes to support this new distribution * If you have overridden nodeset for your project jobs, ensure that you also have jobs against Ubuntu 22.04 and these jobs set to voting. * Set topic to all related changes to "migrate-to-jammy" * Please use etherpad [4] for tracking patches, bug reports, current status, or any other activity related to this topic 3. On R-18, which is the first 2023.1 milestone that will happen on the 18th of November 2022, base-jobs patches mentioned in step 1 will be merged. Please ensure you have verified compatibility for your projects and landed the required changes if any were needed before this date otherwise, they might fail. Please, do not hesitate to raise any questions or concerns. [1] https://governance.openstack.org/tc/goals/selected/migrate-ci-jobs-to-ubuntu-jammy.html [2] https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/861116 https://review.opendev.org/c/openstack/tempest/+/861110 https://review.opendev.org/c/openstack/devstack/+/860795 [3] https://review.opendev.org/c/openstack/nova/+/861111 [4] https://etherpad.opendev.org/p/migrate-to-jammy From smooney at redhat.com Thu Oct 13 19:53:47 2022 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Oct 2022 20:53:47 +0100 Subject: [glance] Slow image download when using glanceclient In-Reply-To: References: <0bf73382755a1b3cf776932bbcc658ea250c23fa.camel@redhat.com> Message-ID: <56b7476c090c1223dc91c89eb6acdbd7ff1e307a.camel@redhat.com> On Thu, 2022-10-13 at 16:21 -0300, Lucio Seki wrote: > Thanks Sean, that makes much easier to code! > > ``` > ... > conn = openstack.connect(cloud_name) > > with open(path, 'wb') as image_file: > response = conn.image.download_image(image_name) > for chunk in tqdm(response.iter_content(), **tqdm_params): > image_file.write(chunk) > ``` > > And it gave me some performance improvement (3kB/s -> 120kB/s). > ... though it would still take several days to download an image. > > Is there some tuning that I could apply? this is what nova does https://github.com/openstack/nova/blob/master/nova/image/glance.py#L344 we get the image chunks by calling the data method on the glance client https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e284/nova/image/glance.py#L373-L377 then bwe basiclly just loop over the chunks and write them to a file like you are https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e284/nova/image/glance.py#L413-L437 we have some extra code for doing image verification but its basically the same as what you are doing we use eventlets to monkeypatch python io which can imporve performce but i woudl not expect it to be that dramatic and i dont think the glance clinet or opesntack client use eventlet so its sound liek something else is limiting the transfer speed. this is the glance client method we are invokeing https://github.com/openstack/python-glanceclient/blob/56186d6d5aa1a0c8fde99eeb535a650b0495925d/glanceclient/v2/images.py#L201-L271 im not sure what tqdm is by the way is it meusrign the transfer speed of something linke that? does the speed increase if you remvoe that? i.ie can you test this via a simple time script and see how much downloads say in up to 60 seconds by lookign at the file size? assuming its https://github.com/tqdm/tqdm perhaps the addtional io that woudl be doing to standard out is slowign it down? > > On Thu, Oct 13, 2022, 14:18 Sean Mooney wrote: > > > On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote: > > > Hi glance experts, > > > > > > I'm using the following code to download a glance image: > > > > > > ``` > > > from glanceapi import client > > > ... > > > glance = client.Client(GLANCE_API_VERSION, session=sess) > > > ... > > > with open(path, 'wb') as image_file: > > > data = glance.images.data(image_id) > > > for chunk in tqdm(data, unit='B', unit_scale=True, > > unit_divisor=1024): > > > image_file.write(chunk) > > > ``` > > > > > > And I get a speed around 3kB/s. It would take months to download an > > image. > > > I'm using python3-glanceclient==3.6.0. > > > I even tried: > > > ``` > > > for chunk in tqdm(data, unit='B', unit_scale=True, > > unit_divisor=1024): > > > pass > > > ``` > > > to see if the bottleneck was the disk I/O, but didn't get any faster. > > > > > > In the same environment, when I use the glance CLI instead: > > > > > > ``` > > > glance image-download --file $path $image_id > > > ``` > > > I get hundreds of MB/s download speed, and it finishes in a few minutes. > > > > > > Is there anything I can do to improve the glanceclient performance? > > > I'm considering using subprocess.Popen(['glance', 'image-download', ...]) > > > if nothing helps... > > have you considered using the openstacksdk instead > > > > the glanceclint is really only intendeted for other openstack service to > > use like > > nova or ironic. > > its not really ment to be used to write your onw code anymore. > > in the past it provided a programatic interface for interacting with glance > > but now you shoudl prefer the openstack sdk instead. > > https://github.com/openstack/openstacksdk > > > > > > > > Regards, > > > Lucio > > > > From lucioseki at gmail.com Thu Oct 13 21:24:01 2022 From: lucioseki at gmail.com (Lucio Seki) Date: Thu, 13 Oct 2022 18:24:01 -0300 Subject: [glance] Slow image download when using glanceclient In-Reply-To: <56b7476c090c1223dc91c89eb6acdbd7ff1e307a.camel@redhat.com> References: <0bf73382755a1b3cf776932bbcc658ea250c23fa.camel@redhat.com> <56b7476c090c1223dc91c89eb6acdbd7ff1e307a.camel@redhat.com> Message-ID: Yes, I'm using tqdm to monitor the progress and speed. I removed it, and it improved slightly (120kB/s -> 131kB/s) but not significantly :-/ On Thu, Oct 13, 2022, 16:54 Sean Mooney wrote: > On Thu, 2022-10-13 at 16:21 -0300, Lucio Seki wrote: > > Thanks Sean, that makes much easier to code! > > > > ``` > > ... > > conn = openstack.connect(cloud_name) > > > > with open(path, 'wb') as image_file: > > response = conn.image.download_image(image_name) > > for chunk in tqdm(response.iter_content(), **tqdm_params): > > image_file.write(chunk) > > ``` > > > > And it gave me some performance improvement (3kB/s -> 120kB/s). > > ... though it would still take several days to download an image. > > > > Is there some tuning that I could apply? > this is what nova does > https://github.com/openstack/nova/blob/master/nova/image/glance.py#L344 > > we get the image chunks by calling the data method on the glance client > > https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e284/nova/image/glance.py#L373-L377 > then bwe basiclly just loop over the chunks and write them to a file like > you are > > https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e284/nova/image/glance.py#L413-L437 > we have some extra code for doing image verification but its basically the > same as what you are doing > we use eventlets to monkeypatch python io which can imporve performce but > i woudl not expect it to be that dramatic > and i dont think the glance clinet or opesntack client use eventlet so its > sound liek something else is limiting the transfer speed. > > this is the glance client method we are invokeing > > https://github.com/openstack/python-glanceclient/blob/56186d6d5aa1a0c8fde99eeb535a650b0495925d/glanceclient/v2/images.py#L201-L271 > > > im not sure what tqdm is by the way is it meusrign the transfer speed of > something linke that? > does the speed increase if you remvoe that? > i.ie can you test this via a simple time script and see how much > downloads say in up to 60 seconds by lookign at the file size? > > assuming its https://github.com/tqdm/tqdm perhaps the addtional io that > woudl be doing to standard out is slowign it down? > > > > > > > > On Thu, Oct 13, 2022, 14:18 Sean Mooney wrote: > > > > > On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote: > > > > Hi glance experts, > > > > > > > > I'm using the following code to download a glance image: > > > > > > > > ``` > > > > from glanceapi import client > > > > ... > > > > glance = client.Client(GLANCE_API_VERSION, session=sess) > > > > ... > > > > with open(path, 'wb') as image_file: > > > > data = glance.images.data(image_id) > > > > for chunk in tqdm(data, unit='B', unit_scale=True, > > > unit_divisor=1024): > > > > image_file.write(chunk) > > > > ``` > > > > > > > > And I get a speed around 3kB/s. It would take months to download an > > > image. > > > > I'm using python3-glanceclient==3.6.0. > > > > I even tried: > > > > ``` > > > > for chunk in tqdm(data, unit='B', unit_scale=True, > > > unit_divisor=1024): > > > > pass > > > > ``` > > > > to see if the bottleneck was the disk I/O, but didn't get any faster. > > > > > > > > In the same environment, when I use the glance CLI instead: > > > > > > > > ``` > > > > glance image-download --file $path $image_id > > > > ``` > > > > I get hundreds of MB/s download speed, and it finishes in a few > minutes. > > > > > > > > Is there anything I can do to improve the glanceclient performance? > > > > I'm considering using subprocess.Popen(['glance', 'image-download', > ...]) > > > > if nothing helps... > > > have you considered using the openstacksdk instead > > > > > > the glanceclint is really only intendeted for other openstack service > to > > > use like > > > nova or ironic. > > > its not really ment to be used to write your onw code anymore. > > > in the past it provided a programatic interface for interacting with > glance > > > but now you shoudl prefer the openstack sdk instead. > > > https://github.com/openstack/openstacksdk > > > > > > > > > > > Regards, > > > > Lucio > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Thu Oct 13 21:58:42 2022 From: jay at gr-oss.io (Jay Faulkner) Date: Thu, 13 Oct 2022 14:58:42 -0700 Subject: [ironic][ptg] Antelope PTG Schedule Message-ID: Hey all, Just a reminder, Ironic will be meeting at the PTG during the following times: Monday 1400-1500 UTC, in bexar room (shared session with nova about Ironic driver) Tuesday 1400-1700 UTC in grizzly room Wednesday 1400-1600 UTC in grizzly room Wednesday 2200-2300 UTC in grizzly room Also, baremetal SIG will be holding an operator hour for Ironic and other baremetal users on Wednesday, 1300-1400 UTC in the grizzly room. As always, https://ptg.opendev.org/ptg.html is more accurate and up to date than this email which only serves as a guide. Please also review https://etherpad.opendev.org/p/ironic-antelope-ptg -- I've documented a schedule guideline mapping out our PTG topics across these times. If you have any conflicts, please let me know ASAP and we'll try to accommodate. Thanks, Jay Faulkner -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Thu Oct 13 22:00:42 2022 From: jay at gr-oss.io (Jay Faulkner) Date: Thu, 13 Oct 2022 15:00:42 -0700 Subject: [baremetal-sig][ptg] Baremetal SIG Operator Hour Message-ID: Hey all, Just a friendly reminder that the Baremetal SIG will be holding an Operator hour Wednesday 1300-1400 UTC in the grizzly room. Please see https://etherpad.opendev.org/p/oct2022-ptg-operator-hour-baremetal-sig to review questions and add more if there's information you'd like to know from our baremetal operators. As always, https://ptg.opendev.org/ptg.html is more up to date than this email, which just serves as a guide. I'll see you there next week! - Jay Faulkner -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Fri Oct 14 08:34:11 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Fri, 14 Oct 2022 10:34:11 +0200 Subject: [neutron] Drivers meeting Message-ID: Hello Neutrinos: Due to the lack of agenda, today's meeting is cancelled. Next meeting will be in two weeks, October 28th. Next week, as you know, is the PTG and we will cancel all regular scheduled meetings. See you next week! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-francois.taltavull at elca.ch Fri Oct 14 10:03:41 2022 From: jean-francois.taltavull at elca.ch (=?utf-8?B?VGFsdGF2dWxsIEplYW4tRnJhbsOnb2lz?=) Date: Fri, 14 Oct 2022 10:03:41 +0000 Subject: [Ceilometer] Dynamic pollsters : dot in JSON keyname In-Reply-To: References: Message-ID: This expression does the trick: ``` value_attribute: ". | value['usage'] | value.get('rgw.main', {'size':0}) | value['size']" ``` Thanks ! JF From: Rafael Weing?rtner Sent: jeudi, 13 octobre 2022 13:41 To: Taltavull Jean-Fran?ois Cc: openstack-discuss Subject: Re: [Ceilometer] Dynamic pollsters : dot in JSON keyname EXTERNAL MESSAGE - This email comes from outside ELCA companies. In such cases, you need to use the "Dynamic pollsters operations" to process the sample, and retrieve the key as "rgw.main.size". By default, this value ("rgw.main.size") is interpreted as a nested dictionary. I mean, an object with a key "rgw", and then, another one that has a key "main", where there is a dict, with a key "size". To handle such cases, you would need something similar to: `value_attribute: "usage || value['rgw.main'] | value['size']"`. However, that might not address all use cases. You will also need to handle situations when there is no key "rgw.main" in the response samples. On Wed, Oct 12, 2022 at 12:17 PM Taltavull Jean-Fran?ois > wrote: Hello Rafael, To get the the size, in GB, occupied by buckets I need to manipulate a JSON key which contains a dot in its name: ```` - name: "radosgw.containers.objects.size" sample_type: "gauge" unit: "B" value_attribute: "rgw.main.size" <------------------------------------"rgw.main" is a JSON key, with a dot in its name, which belongs to "bucket.usage" JSON container url_path: "http://FQDN/admin/bucket?stats=True" module: "awsauth" authentication_object: "S3Auth" authentication_parameters: my_access_key,my_secret_key,FQDN user_id_attribute: "owner" project_id_attribute: "tenant" resource_id_attribute: "id" response_entries_key: "usage" ```` But with this dynamic pollster definition, I get the python error "KeyError: 'rgw' ". In this case, is there a specific syntax to define "value_attribute" or am I doing something the wrong way ? Jean-Francois -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From artem.goncharov at gmail.com Fri Oct 14 10:23:12 2022 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Fri, 14 Oct 2022 12:23:12 +0200 Subject: [nova][keystone] What happens to key pairs after user is deleted Message-ID: Hi all, From the API perspective it is possible to delete user without deleting its key pairs. Practice showed, however, that keypairs of deleted user still exist and can be queried by API knowing id of the deleted user (at least in devstack and 1 other public cloud). I know it may be tricky if there is still VM provisioned with the key, but deleting user logically means nobody has access to the private key anyway. And since key pairs belong to users and not to projects it is not possible to clean them up in the project cleanup either. Actually from the API pov there is no reasonable way to ever find those (without knowing ID of the deleted user which is logically not known anymore). If there is no cleanup this can in the mid term cause trashing the database (records are small, but still), especially when using ?dynamic? users to perform some actions. So far I haven?t tried to grep through code basis of Nova to check what is happening, neither tried to check behavior over time, and decided first to ask here whether somebody knows what should be generally happening here, is it a bug or feature? Thanks, Artem From smooney at redhat.com Fri Oct 14 11:54:11 2022 From: smooney at redhat.com (Sean Mooney) Date: Fri, 14 Oct 2022 12:54:11 +0100 Subject: [nova][keystone] What happens to key pairs after user is deleted In-Reply-To: References: Message-ID: <3de78ba2af783aeb1f320d100110437097a91cea.camel@redhat.com> On Fri, 2022-10-14 at 12:23 +0200, Artem Goncharov wrote: > Hi all, > > From the API perspective it is possible to delete user without deleting its key pairs. > from a nova api perspective you have violated the precondition if you do not remove all resouces owned by the user in nova before you delete the user in keystone. so you are conflaiting two things. it is possibel to do in keystone apit but its not valid to do that as you have not met the precondtion of cleaing up the resocue in other project. so no we donot support deleteign the keyparis after the fact like that. > Practice showed, however, that keypairs of deleted user still exist and can be queried by API knowing id of the deleted user (at least in devstack and 1 other public cloud).? > I know it may be tricky if there is still VM provisioned with the key, but deleting user logically means nobody has access to the private key anyway. > correct noone shoudl have access to the key but again you are not allowed to delete the user before you remoave any resouces used by it you can delete the keypari wihotut deleteing it form the vms that were created with it. deleting the keypair has never implied removing the keypair form ths authrised keys in the vm so no assumitions shoudl be made that removing the keypair alther who can log into the vm. that is just not part of what the nova api. > And since key pairs belong to users and not to projects it is not possible to clean them up in the project cleanup either.? > > Actually from the API pov there is no reasonable way to ever find those (without knowing ID of the deleted user which is logically not known anymore). > > If there is no cleanup this can in the mid term cause trashing the database (records are small, but still), especially when using ?dynamic? users to perform some actions. ya so if we wanted to suprot automatic clean up of user resouce likek keyparis we woudl need a new user-deleted exeternal event in the nova api and nova could delete the key pair and any other user (not project) owned api resoruse but i think the key pari is the only example we have today. teh vms are owned by the project not the user. > > So far I haven?t tried to grep through code basis of Nova to check what is happening, neither tried to check behavior over time, and decided first to ask here whether somebody knows what should be generally happening here, is it a bug or feature? > this is not a bug its user error. nova and all other openstack servces to my knoladge require that you clean up the reosuce used by users or proejct are cleaned up before you remove a user/project form keystone. so by violatign that requirement you can nolonger interact with apis that depedn in the delete entitiy and that is expect. it woudl be a large cross proejct effort to chagne that. alternitivly keystoen could prevent the user/project form being deleteed if there are resuouce used by that user/project in other service btu tthat woudl also be a cross project effort. for this specific issue we could add a new Admin only api to allow the deletion fo user keypairs btu that woudl be a new feature. > Thanks, > Artem > From artem.goncharov at gmail.com Fri Oct 14 13:06:38 2022 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Fri, 14 Oct 2022 15:06:38 +0200 Subject: [nova][keystone] What happens to key pairs after user is deleted In-Reply-To: <3de78ba2af783aeb1f320d100110437097a91cea.camel@redhat.com> References: <3de78ba2af783aeb1f320d100110437097a91cea.camel@redhat.com> Message-ID: <9576F4A6-3379-4C48-B3CC-5D32077BA50E@gmail.com> Thanks for answer > On 14. Oct 2022, at 13:54, Sean Mooney wrote: > > On Fri, 2022-10-14 at 12:23 +0200, Artem Goncharov wrote: >> Hi all, >> >> From the API perspective it is possible to delete user without deleting its key pairs. >> > from a nova api perspective you have violated the precondition if you do not remove all resouces owned by the user in nova before you delete the user > in keystone. so you are conflaiting two things. it is possibel to do in keystone apit but its not valid to do that as you have not met the precondtion > of cleaing up the resocue in other project. > so no we donot support deleteign the keyparis after the fact like that. >> Practice showed, however, that keypairs of deleted user still exist and can be queried by API knowing id of the deleted user (at least in devstack and 1 other public cloud). >> I know it may be tricky if there is still VM provisioned with the key, but deleting user logically means nobody has access to the private key anyway. >> > correct noone shoudl have access to the key but again you are not allowed to delete the user before you remoave any resouces used by it > you can delete the keypari wihotut deleteing it form the vms that were created with it. deleting the keypair has never implied > removing the keypair form ths authrised keys in the vm so no assumitions shoudl be made that removing the keypair alther who can log into the vm. > that is just not part of what the nova api. >> And since key pairs belong to users and not to projects it is not possible to clean them up in the project cleanup either. >> >> Actually from the API pov there is no reasonable way to ever find those (without knowing ID of the deleted user which is logically not known anymore). >> >> If there is no cleanup this can in the mid term cause trashing the database (records are small, but still), especially when using ?dynamic? users to perform some actions. > ya so if we wanted to suprot automatic clean up of user resouce likek keyparis we woudl need a new user-deleted exeternal event in the nova api and > nova could delete the key pair and any other user (not project) owned api resoruse but i think the key pari is the only example we have today. > teh vms are owned by the project not the user. Right, this is only valid for key pairs. That is precisely the reason why project cleanup is not dealing with that today. >> >> So far I haven?t tried to grep through code basis of Nova to check what is happening, neither tried to check behavior over time, and decided first to ask here whether somebody knows what should be generally happening here, is it a bug or feature? >> > this is not a bug its user error. I disagree here. There is nothing that blocks you from doing so. User error is i.e. to try to delete something what is still used. Here the user was never said to delete all of the resources, cause same statement is valid to deletion or not deletion of the VM created by the user. You should not be able to delete project if there are resources remaining, and you should not be able to drop user if key pairs exist? And the biggest issue is that once user recognised he did an ?error? - he is not able to fix it. My personal opinion is that it is a bug to let user to make error. What is the user supposed to do after he recognised he deleted project without deleting resources first? As admin you have chance to catch resources not belonging to any existing project, but not with key pairs. And how big are the chances you can still do this on a big cloud? > nova and all other openstack servces to my knoladge require that you clean up the reosuce used by users or proejct are cleaned up before you remove a > user/project form keystone. so by violatign that requirement you can nolonger interact with apis that depedn in the delete entitiy and that is expect. > it woudl be a large cross proejct effort to chagne that. > > alternitivly keystoen could prevent the user/project form being deleteed if there are resuouce used by that user/project in other service btu tthat > woudl also be a cross project effort. This feels logical, but most likely not easy to achieve, because otherwise Keystone need to query every service asking whether deletion of this user should be blocked or not. Keystone sending announcement to the services that certain user/project//domain was deleted so that service makes decision what to do with that is easier to achieve, but blocking is really the only way to make user experience correct and avoid creating a mess in a first place. > > for this specific issue we could add a new Admin only api to allow the deletion fo user keypairs btu that woudl be a new feature. From the user perspective I would prefer extending list key pairs api with something like ?all users?. Having info like that customer based cleanup can determine all KPs owned by deleted users and drop them. For that, however, some form of tracking to which domain user was belonging (maybe instead of "all_users" add param "user_domain_id") need to be also done. I would like to prevent that only super-duper admin of the cloud can do such cleanup, otherwise in a big public cloud it will become a mess. > >> Thanks, >> Artem >> > From artem.goncharov at gmail.com Fri Oct 14 15:07:29 2022 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Fri, 14 Oct 2022 17:07:29 +0200 Subject: [glance] Slow image download when using glanceclient In-Reply-To: References: <0bf73382755a1b3cf776932bbcc658ea250c23fa.camel@redhat.com> <56b7476c090c1223dc91c89eb6acdbd7ff1e307a.camel@redhat.com> Message-ID: <141F3517-5364-4B69-889D-2EB1077DD9D6@gmail.com> ``` import openstack conn = openstack.connect() conn.image.download_image(image_name, stream=True, output="data.iso?) ``` This gives me max performance of the network. Actually using stream=True may be slower (around 40%), but may be crucially necessary when dealing with huge images. Additionally you can specify chunk_size as param to download_image function, what aligns performance of stream vs non stream (for me stream=True and chunk_size=8192 resulted 2.3G image to be downloaded in 14 sec) > On 13. Oct 2022, at 23:24, Lucio Seki wrote: > > Yes, I'm using tqdm to monitor the progress and speed. > I removed it, and it improved slightly (120kB/s -> 131kB/s) but not significantly :-/ > > On Thu, Oct 13, 2022, 16:54 Sean Mooney > wrote: > On Thu, 2022-10-13 at 16:21 -0300, Lucio Seki wrote: > > Thanks Sean, that makes much easier to code! > > > > ``` > > ... > > conn = openstack.connect(cloud_name) > > > > with open(path, 'wb') as image_file: > > response = conn.image.download_image(image_name) > > for chunk in tqdm(response.iter_content(), **tqdm_params): > > image_file.write(chunk) > > ``` > > > > And it gave me some performance improvement (3kB/s -> 120kB/s). > > ... though it would still take several days to download an image. > > > > Is there some tuning that I could apply? > this is what nova does > https://github.com/openstack/nova/blob/master/nova/image/glance.py#L344 > > we get the image chunks by calling the data method on the glance client > https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e284/nova/image/glance.py#L373-L377 > then bwe basiclly just loop over the chunks and write them to a file like you are > https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e284/nova/image/glance.py#L413-L437 > we have some extra code for doing image verification but its basically the same as what you are doing > we use eventlets to monkeypatch python io which can imporve performce but i woudl not expect it to be that dramatic > and i dont think the glance clinet or opesntack client use eventlet so its sound liek something else is limiting the transfer speed. > > this is the glance client method we are invokeing > https://github.com/openstack/python-glanceclient/blob/56186d6d5aa1a0c8fde99eeb535a650b0495925d/glanceclient/v2/images.py#L201-L271 > > > im not sure what tqdm is by the way is it meusrign the transfer speed of something linke that? > does the speed increase if you remvoe that? > i.ie can you test this via a simple time script and see how much downloads say in up to 60 seconds by lookign at the file size? > > assuming its https://github.com/tqdm/tqdm perhaps the addtional io that woudl be doing to standard out is slowign it down? > > > > > > > > On Thu, Oct 13, 2022, 14:18 Sean Mooney > wrote: > > > > > On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote: > > > > Hi glance experts, > > > > > > > > I'm using the following code to download a glance image: > > > > > > > > ``` > > > > from glanceapi import client > > > > ... > > > > glance = client.Client(GLANCE_API_VERSION, session=sess) > > > > ... > > > > with open(path, 'wb') as image_file: > > > > data = glance.images.data(image_id) > > > > for chunk in tqdm(data, unit='B', unit_scale=True, > > > unit_divisor=1024): > > > > image_file.write(chunk) > > > > ``` > > > > > > > > And I get a speed around 3kB/s. It would take months to download an > > > image. > > > > I'm using python3-glanceclient==3.6.0. > > > > I even tried: > > > > ``` > > > > for chunk in tqdm(data, unit='B', unit_scale=True, > > > unit_divisor=1024): > > > > pass > > > > ``` > > > > to see if the bottleneck was the disk I/O, but didn't get any faster. > > > > > > > > In the same environment, when I use the glance CLI instead: > > > > > > > > ``` > > > > glance image-download --file $path $image_id > > > > ``` > > > > I get hundreds of MB/s download speed, and it finishes in a few minutes. > > > > > > > > Is there anything I can do to improve the glanceclient performance? > > > > I'm considering using subprocess.Popen(['glance', 'image-download', ...]) > > > > if nothing helps... > > > have you considered using the openstacksdk instead > > > > > > the glanceclint is really only intendeted for other openstack service to > > > use like > > > nova or ironic. > > > its not really ment to be used to write your onw code anymore. > > > in the past it provided a programatic interface for interacting with glance > > > but now you shoudl prefer the openstack sdk instead. > > > https://github.com/openstack/openstacksdk > > > > > > > > > > > Regards, > > > > Lucio > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Fri Oct 14 15:10:43 2022 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Fri, 14 Oct 2022 17:10:43 +0200 Subject: [kolla][ptg] Antelope PTG schedule Message-ID: Hi Koalas, Antelope PTG is going to start next week, following are the slots that are booked: 17-20 October 2022: ? Monday - 13.00 - 17.00 UTC (mostly general and Kolla) ? Tuesday - 13.00 - 16.00 UTC (mostly Kolla Ansible) and 16.00 - 17.00 UTC Operator hour ? Thursday - 13.00 - 15.00 UTC (Kayobe) NOTE: I have booked Kolla operator hour on Tuesday, 16.00 - 17.00 UTC (Etherpad link [2]) [1] https://etherpad.opendev.org/p/kolla-antelope-ptg [2] https://etherpad.opendev.org/p/oct2022-ptg-operator-hour-kolla See you there! Best regards, Michal -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Oct 14 19:09:44 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 14 Oct 2022 12:09:44 -0700 Subject: [ptls][tc] 2022 OpenStack User Survey Project Question Responses In-Reply-To: <236718A0-249C-42D6-ABB8-AE65C5BFA6A6@openinfra.dev> References: <236718A0-249C-42D6-ABB8-AE65C5BFA6A6@openinfra.dev> Message-ID: <183d7e70321.12082da9c218306.2847086945299618919@ghanshyammann.com> Thanks, Allison for sharing the 2022 survey response which is good timing to discuss in PTG. I noticed that there are a few retired projects still present in the survey: - Congress - Karbor - Qinling - searchlight - tricircle - Panko (this retired last month) But I see some responses for these projects as used in 'production' and 'interested'. In the 2023 Survey, we have two options for these projects: 1. Remove them from Survey 2. We can add some info like "this project is retired and if you are using it or interested to use then you can re-apply these projects to be official OpenStack project and maintain it". As they were included in Survey, that will be a good notification to users and maybe in the 2024 year survey, we can remove them. -gmann ---- On Thu, 13 Oct 2022 10:37:32 -0700 Allison Price wrote --- > Hi everyone, > > Please find attached the responses to the project questions from the 2022 OpenStack User Survey. Based on feedback last year, I included additional, non-identifiable information that will hopefully help provident deployment context for the responses to your questions. If you need a reminder of your project question, you can review the OpenStack User Survey [1]. During the PTG, I would encourage you and your teams to review the responses and decide if you would like to make any changes to your question for the 2023 OpenStack User Survey. It is live now, but we can make changes ahead of significant promotion. Please reach out to me directly with any changes. > > If you have any questions on how to read the results, please let me know. > > Have a great week at the PTG! > > Cheers, > Allison > > [1] https://www.openstack.org/usersurvey > > > From allison at openinfra.dev Fri Oct 14 19:18:27 2022 From: allison at openinfra.dev (Allison Price) Date: Fri, 14 Oct 2022 14:18:27 -0500 Subject: [ptls][tc] 2022 OpenStack User Survey Project Question Responses In-Reply-To: <183d7e70321.12082da9c218306.2847086945299618919@ghanshyammann.com> References: <236718A0-249C-42D6-ABB8-AE65C5BFA6A6@openinfra.dev> <183d7e70321.12082da9c218306.2847086945299618919@ghanshyammann.com> Message-ID: <96475304-6D80-4F47-83E4-F76DE4BA9343@openinfra.dev> > On Oct 14, 2022, at 2:09 PM, Ghanshyam Mann wrote: > > Thanks, Allison for sharing the 2022 survey response which is good timing to discuss in PTG. > > I noticed that there are a few retired projects still present in the survey: > > - Congress > - Karbor > - Qinling > - searchlight > - tricircle > - Panko (this retired last month) > > But I see some responses for these projects as used in 'production' and 'interested'. In the 2023 Survey, we have two options for these projects: > 1. Remove them from Survey > 2. We can add some info like "this project is retired and if you are using it or interested to use then you can re-apply these projects to be official OpenStack project and maintain it". As they were included in Survey, that will be a good notification to users and maybe in the 2024 year survey, we can remove them. I like your second suggestion about putting a disclaimer. We can also reach out to folks who are running these services to let them know. To have the most accurate picture of OpenStack deployments, I do not recommending removing these projects from the User Survey at any point. Most users are running old releases which means that they may still be running these services, which I think is valuable for us to know. > > > -gmann > > ---- On Thu, 13 Oct 2022 10:37:32 -0700 Allison Price wrote --- >> Hi everyone, >> >> Please find attached the responses to the project questions from the 2022 OpenStack User Survey. Based on feedback last year, I included additional, non-identifiable information that will hopefully help provident deployment context for the responses to your questions. If you need a reminder of your project question, you can review the OpenStack User Survey [1]. During the PTG, I would encourage you and your teams to review the responses and decide if you would like to make any changes to your question for the 2023 OpenStack User Survey. It is live now, but we can make changes ahead of significant promotion. Please reach out to me directly with any changes. >> >> If you have any questions on how to read the results, please let me know. >> >> Have a great week at the PTG! >> >> Cheers, >> Allison >> >> [1] https://www.openstack.org/usersurvey >> >> >> From gmann at ghanshyammann.com Fri Oct 14 19:51:05 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 14 Oct 2022 12:51:05 -0700 Subject: [ptls][tc] 2022 OpenStack User Survey Project Question Responses In-Reply-To: <96475304-6D80-4F47-83E4-F76DE4BA9343@openinfra.dev> References: <236718A0-249C-42D6-ABB8-AE65C5BFA6A6@openinfra.dev> <183d7e70321.12082da9c218306.2847086945299618919@ghanshyammann.com> <96475304-6D80-4F47-83E4-F76DE4BA9343@openinfra.dev> Message-ID: <183d80cdd6e.cc5d2659219035.116924690769971438@ghanshyammann.com> ---- On Fri, 14 Oct 2022 12:18:27 -0700 Allison Price wrote --- > > > > On Oct 14, 2022, at 2:09 PM, Ghanshyam Mann gmann at ghanshyammann.com> wrote: > > > > Thanks, Allison for sharing the 2022 survey response which is good timing to discuss in PTG. > > > > I noticed that there are a few retired projects still present in the survey: > > > > - Congress > > - Karbor > > - Qinling > > - searchlight > > - tricircle > > - Panko (this retired last month) > > > > But I see some responses for these projects as used in 'production' and 'interested'. In the 2023 Survey, we have two options for these projects: > > 1. Remove them from Survey > > 2. We can add some info like "this project is retired and if you are using it or interested to use then you can re-apply these projects to be official OpenStack project and maintain it". As they were included in Survey, that will be a good notification to users and maybe in the 2024 year survey, we can remove them. > > I like your second suggestion about putting a disclaimer. We can also reach out to folks who are running these services to let them know. To have the most accurate picture of OpenStack deployments, I do not recommending removing these projects from the User Survey at any point. Most users are running old releases which means that they may still be running these services, which I think is valuable for us to know. ++, sounds like a good plan. Reaching out to folks who are running these services is a really good idea. -gmann > > > > > > > -gmann > > > > ---- On Thu, 13 Oct 2022 10:37:32 -0700 Allison Price wrote --- > >> Hi everyone, > >> > >> Please find attached the responses to the project questions from the 2022 OpenStack User Survey. Based on feedback last year, I included additional, non-identifiable information that will hopefully help provident deployment context for the responses to your questions. If you need a reminder of your project question, you can review the OpenStack User Survey [1]. During the PTG, I would encourage you and your teams to review the responses and decide if you would like to make any changes to your question for the 2023 OpenStack User Survey. It is live now, but we can make changes ahead of significant promotion. Please reach out to me directly with any changes. > >> > >> If you have any questions on how to read the results, please let me know. > >> > >> Have a great week at the PTG! > >> > >> Cheers, > >> Allison > >> > >> [1] https://www.openstack.org/usersurvey > >> > >> > >> > > > From gmann at ghanshyammann.com Fri Oct 14 20:46:43 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 14 Oct 2022 13:46:43 -0700 Subject: [all][tc] Canceling next week TC meetings Message-ID: <183d83fccbb.b3b7becd219914.4007981351422134568@ghanshyammann.com> Hello Everyone, As we all will be in PTG next week, we are cancelling next week's TC meeting. -gmann From gmann at ghanshyammann.com Fri Oct 14 21:10:20 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 14 Oct 2022 14:10:20 -0700 Subject: [all][tc] What's happening in Technical Committee: summary 2022 Oct 14: Reading: 5 min Message-ID: <183d8556d54.c42ff436220187.8182964179249177028@ghanshyammann.com> Hello Everyone, Here is this week's summary of the Technical Committee activities. 1. TC Meetings: ============ * We had this week's meeting on Oct 13. Most of the meeting discussions are summarized in this email. Meeting logs are available @ https://meetings.opendev.org/meetings/tc/2022/tc.2022-10-13-15.00.log.html * Next TC weekly meeting will be on Oct 27 Thursday at 15:00 UTC, feel free to add the topic to the agenda[1] by Oct 26. 2. What we completed this week: ========================= * Selected slaweq nomination as TC vice-chair [2] * Selected Ubuntu 22.04 for CI/CD as a current goal [3] * Added project for managing zuul jobs for charms [4] * Clarify Extended Maintenance branch testing and support policy in stable policy document [5] 3. Activities In progress: ================== TC Tracker for Zed cycle ------------------------------ * Zed tracker etherpad includes the TC working items[6], Five are completed and other items are in-progress. Open Reviews ----------------- * Four open reviews for ongoing activities[7]. 2023.1 cycle Leaderless projects --------------------------------------- * Zun project PTL appointment is under review[8][9]. 2023.1 cycle TC PTG planning ------------------------------------ * Etherpads to add the topics: ** https://etherpad.opendev.org/p/tc-2023-1-ptg ** https://etherpad.opendev.org/p/tc-leaders-interaction-2023-1 * Kubernetes Steering Committee members will be meeting TC in PTG on Friday 21 Oct at 16:00 UTC. * I sent a reminder email about the 'Operator Hours' slots in this PTG, please check and reserve the operator hour slot for your project[10] 2021 User Survey TC Question Analysis ----------------------------------------------- No update on this. The survey summary is up for review[11]. Feel free to check and provide feedback. Fixing Zuul config error ---------------------------- We request projects having zuul config error to fix them, Keep supported stable branches as a priority and Extended maintenance stable branch as low priority[12][13]. Project updates ------------------- * None. 4. How to contact the TC: ==================== If you would like to discuss or give feedback to TC, you can reach out to us in multiple ways: 1. Email: you can send the email with tag [tc] on openstack-discuss ML[14]. 2. Weekly meeting: The Technical Committee conduct a weekly meeting every Thursday 15 UTC [15] 3. Ping us using 'tc-members' nickname on #openstack-tc IRC channel. See you all next week in PTG! [1] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting [2] https://review.opendev.org/c/openstack/governance/+/860352 [3] https://review.opendev.org/c/openstack/governance/+/860040 [4] https://review.opendev.org/c/openstack/governance/+/861044 [5] https://review.opendev.org/c/openstack/project-team-guide/+/861141 [6] https://etherpad.opendev.org/p/tc-zed-tracker [7] https://review.opendev.org/q/projects:openstack/governance+status:open [8] https://review.opendev.org/c/openstack/governance/+/858980 [9] https://review.opendev.org/c/openstack/governance/+/860759 [10] https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030790.html [11] https://review.opendev.org/c/openstack/governance/+/836888 [12] https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030505.html [13] https://etherpad.opendev.org/p/zuul-config-error-openstack [14] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss [15] http://eavesdrop.openstack.org/#Technical_Committee_Meeting -gmann From amy at demarco.com Fri Oct 14 22:10:11 2022 From: amy at demarco.com (Amy Marrich) Date: Fri, 14 Oct 2022 17:10:11 -0500 Subject: [Diversity][PTG] Diversity and Inclusion WG session at PTG and meeting changes Message-ID: The Diversity and Inclusion WG will be holding its session on Monday at 14:00UTC in the Folsom room. On the agenda[0] are holding a Diversity Survey for the Foundation in 2023 which includes all projects and Community Leadership Mentorship. We have also moved our regular monthly meetings to the Second Tuesday of the month at 14:00UTC and we have also moved back from video to IRC with the meetings taking place in #openinfra-diversity on OFTC. Amy (spotz) 0 - https://etherpad.opendev.org/p/oct2022-ptg-diversity -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Sat Oct 15 22:12:15 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 15 Oct 2022 15:12:15 -0700 Subject: [all][tc][goals] : "Consistent and Secure Default RBAC" goal: Zed Timeline updates Message-ID: <183ddb4787f.dab78ed6239203.926140978620112699@ghanshyammann.com> Hello Everyone, As you know, "Consistent and Secure Default RBAC" is one of the currently selected community-wide goals and we have divided it into multiple milestones (cycle)[1]. OpenStack Zed is released, I am going to summarize the progress of Zed timeline targets. Please note that progress is as per the new direction/targets decided in the Zed cycle which is to implement the project personas and drop the system scope. Gerrit Topic: https://review.opendev.org/q/topic:%2522secure-rbac%2522+(status:open+OR+status:merged) Tracking: https://etherpad.opendev.org/p/rbac-goal-tracking Completed: ========= * Nova * Neutron * Glance * Manila * Ironic (no change needed for the direction change in Zed cycle) * Manager Support ** Completed *** Ironic 2023.1 Release Timeline[2] =================== 1. All services must implement Phase 1 (Implement project personas) 2. Services start implementing Phase 2 (Move service-to-service APIs to service role) 3. Services start implementing Phase 3 (Implement 'Manager' role where applicable) The 1st one is important and all services must implement the project personas (drop system scope if already implemented). Please plan this work for next week PTG and join TC discussion on this goal on THURSDAY: 17-19 UTC for any query from the project side. [1] https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rbac.html#completion-date-criteria [2] https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rbac.html#release-timeline -gmann From gmann at ghanshyammann.com Sat Oct 15 23:07:43 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 15 Oct 2022 16:07:43 -0700 Subject: [all][tc][goal][policy] RBAC goal discussion in 2023.1 PTG Message-ID: <183dde7401d.be0ce1d3239655.622022913516457362@ghanshyammann.com> Hello Everyone, I hope you have seen my other email about RBAC goal progress and what are the targets for the 2023.1 cycle[1]. Please plan and discuss the RBAC work on your project and bring your queries to discuss in the TC PTG slot on Thursday 17 UTC. You can add the query in the below etherpad under RBAC goal topic - https://etherpad.opendev.org/p/tc-2023-1-ptg#L71 [1] https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030863.html -gmann From rdhasman at redhat.com Sun Oct 16 18:15:53 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Sun, 16 Oct 2022 23:45:53 +0530 Subject: [ptg][cinder] Finalized Schedule Message-ID: Hello Everyone, I've finalized the schedule for cinder PTG[1] based on the topics, number of hours of PTG and as per the date and time suited to the authors. Please go through it and let me know if any changes are needed. I've also included a *Courtesy Ping* section with each topic so the people interested in a particular topic will get a ping when it is being discussed. Please include your IRC nick in front of it if you would like to be notified. The schedule includes a Day wise topic list where, except for monday, we've PTG from 1300-1700 UTC (where the last hour acts as a buffer time for extended topic discussions or other planned activities). Following are some of the highlight events of each day, Monday: TC + PTL session: 1500-1700 UTC Tuesday: First day of cinder PTG Operator hour from 1500-1600 UTC Wednesday: Team photo at 1400 UTC (timing could be changed based on the topic duration before it) Thursday: Drivers Day! Friday: Festival of XS reviews (If we've enough time left) [1] https://etherpad.opendev.org/p/zed-ptg-cinder Thanks and regards Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From saditya at vt.edu Sun Oct 16 01:13:05 2022 From: saditya at vt.edu (Aditya Sathish) Date: Sat, 15 Oct 2022 21:13:05 -0400 Subject: Installing out-of-tree ML2 neutron plugins with kolla-ansible Message-ID: Hello, I am trying to use an out-of-tree ML2 neutron plugin with OpenStack with Kolla-Ansible but I am having a hard time figuring out how to go about it. For example, this is my repository for the plugin: https://github.com/adityasathis/networking-onos. I have made changes to the deployment YML files to allow configuration for this new neutron plugin from the global.yml file. However, I am not able to figure out how to copy over my plugin files and install them on the controller node. I came across the commit: https://opendev.org/openstack/kolla-ansible/commit/418cb52767270d85e28a6f3027c561f47b805d9d which, I think, does what I'm looking to do and so I kept the networking-onos directory (with the setup.py in it) in the /etc/kolla/config/neutron/plugins directory. The deploy script is able to detect the file in the "Checking for ML2 plugins" step however, I'm not able to copy it anywhere in the "Copying ML2 plugin" step. Am I missing something? Regards, Aditya. -------------- next part -------------- An HTML attachment was scrubbed... URL: From saditya at vt.edu Sun Oct 16 01:30:05 2022 From: saditya at vt.edu (Aditya Sathish) Date: Sat, 15 Oct 2022 21:30:05 -0400 Subject: [kolla] Installing out-of-tree ML2 neutron plugins with kolla-ansible In-Reply-To: References: Message-ID: Hello, I am trying to use an out-of-tree ML2 neutron plugin with OpenStack with Kolla-Ansible but I am having a hard time figuring out how to go about it. For example, this is my repository for the plugin: https://github.com/adityasathis/networking-onos. I have made changes to the deployment YML files to allow configuration for this new neutron plugin from the global.yml file. However, I am not able to figure out how to copy over my plugin files and install them on the controller node. I came across the commit: https://opendev.org/openstack/kolla-ansible/commit/418cb52767270d85e28a6f3027c561f47b805d9d which, I think, does what I'm looking to do and so I kept the networking-onos directory (with the setup.py in it) in the /etc/kolla/config/neutron/plugins directory. The deploy script is able to detect the file in the "Checking for ML2 plugins" step however, I'm not able to copy it anywhere in the "Copying ML2 plugin" step. Am I missing something? Regards, Aditya. -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Sun Oct 16 19:10:11 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Sun, 16 Oct 2022 21:10:11 +0200 Subject: Installing out-of-tree ML2 neutron plugins with kolla-ansible In-Reply-To: References: Message-ID: I believe you want to first install the plugin in the container image. The section on plugins might be of interest to you. [1] [1] https://docs.openstack.org/kolla/yoga/admin/image-building.html#plugin-functionality Kind regards, Radek -yoctozepto On Sun, 16 Oct 2022 at 20:31, Aditya Sathish wrote: > > Hello, > > I am trying to use an out-of-tree ML2 neutron plugin with OpenStack with Kolla-Ansible but I am having a hard time figuring out how to go about it. > > For example, this is my repository for the plugin: https://github.com/adityasathis/networking-onos. > > I have made changes to the deployment YML files to allow configuration for this new neutron plugin from the global.yml file. However, I am not able to figure out how to copy over my plugin files and install them on the controller node. > > I came across the commit: https://opendev.org/openstack/kolla-ansible/commit/418cb52767270d85e28a6f3027c561f47b805d9d which, I think, does what I'm looking to do and so I kept the networking-onos directory (with the setup.py in it) in the /etc/kolla/config/neutron/plugins directory. The deploy script is able to detect the file in the "Checking for ML2 plugins" step however, I'm not able to copy it anywhere in the "Copying ML2 plugin" step. Am I missing something? > > Regards, > Aditya. From songwenping at inspur.com Mon Oct 17 02:24:22 2022 From: songwenping at inspur.com (=?gb2312?B?QWxleCBTb25nICjLzs7Exr0p?=) Date: Mon, 17 Oct 2022 02:24:22 +0000 Subject: [cyborg] Antelope PTG, agenda and schedule Message-ID: <010943eb4fb848d18d0d057864242557@inspur.com> Hello Cyborg team: Please check the agenda and schedule for the Antelope PTG: https://etherpad. opendev.org/p/antelope-cyborg-ptg See you! Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3774 bytes Desc: not available URL: From matt at oliver.net.au Mon Oct 17 03:34:20 2022 From: matt at oliver.net.au (Matthew Oliver) Date: Mon, 17 Oct 2022 14:34:20 +1100 Subject: [swift][ptg] Ops feedback session - Oct 19 at 13:00 UTC Message-ID: As we've done in PTGs past, we're getting devs and ops together to talk about Swift: what's working, what isn't, and what would be most helpful to improve. We're meeting in Diablo (https://www.openinfra.dev/ptg/rooms/diablo) at 13:00UTC -- if you run a Swift cluster, we hope to see you there! Even if you can't make it, We'd appreciate it if you can offer some feedback on the feedback etherpad ( https://etherpad.opendev.org/p/swift-antelope-ops-feedback). This has always been a highlight at every PTG for us swift devs. Have your say and help make Swift even better! Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Mon Oct 17 04:45:10 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Mon, 17 Oct 2022 10:15:10 +0530 Subject: Outreachy In-Reply-To: References: Message-ID: Hi Fola, Welcome to the community! Happy to see you excited to be working on OpenStack. If you've any doubts related to Cinder, you can ask them in the *#openstack-cinder* IRC channel. Thanks and regards Rajat Dhasmana On Wed, Oct 12, 2022 at 10:46 AM fola fomduwir carine wrote: > Good day @enriquetaso, community mentors and everyone > > My name is Fola Fomduwir Carine an applicant for the December 2022 to > March 2023 Outreachy internships round. I am so excited to be joining this > amazing community, OpenStack. I will be making my contributions on > project #1 . > > I will find one of the issues tagged as low-hanging-fruit and try it out > as a first contribution so I can feel more comfortable with the codebase. > It looks a little overwhelming right now but I guess I will figure it out > with time. > > Once again, thank you @enriquetaso for volunteering to mentor and guide > us. The required skills match my present skill sets. I am so excited to > have taken up this challenge to be part of this great community. It?s a > pleasure for me to have you as a mentor as I undergo this journey. I am > ready and open to learn under your mentorship. > > Thank you > > Fola F. Carine > > > > > > Sent from Mail for > Windows > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From folacarine at gmail.com Mon Oct 17 04:50:36 2022 From: folacarine at gmail.com (fola carine fomduwir) Date: Mon, 17 Oct 2022 05:50:36 +0100 Subject: Outreachy In-Reply-To: References: Message-ID: Thank you so much Rajat On Mon, Oct 17, 2022, 5:45 AM Rajat Dhasmana wrote: > Hi Fola, > > Welcome to the community! > Happy to see you excited to be working on OpenStack. If you've any doubts > related to Cinder, you can ask them in the *#openstack-cinder* IRC > channel. > > Thanks and regards > Rajat Dhasmana > > On Wed, Oct 12, 2022 at 10:46 AM fola fomduwir carine < > folacarine at gmail.com> wrote: > >> Good day @enriquetaso, community mentors and everyone >> >> My name is Fola Fomduwir Carine an applicant for the December 2022 to >> March 2023 Outreachy internships round. I am so excited to be joining this >> amazing community, OpenStack. I will be making my contributions on >> project #1 . >> >> I will find one of the issues tagged as low-hanging-fruit and try it out >> as a first contribution so I can feel more comfortable with the codebase. >> It looks a little overwhelming right now but I guess I will figure it out >> with time. >> >> Once again, thank you @enriquetaso for volunteering to mentor and guide >> us. The required skills match my present skill sets. I am so excited to >> have taken up this challenge to be part of this great community. It?s a >> pleasure for me to have you as a mentor as I undergo this journey. I am >> ready and open to learn under your mentorship. >> >> Thank you >> >> Fola F. Carine >> >> >> >> >> >> Sent from Mail for >> Windows >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Oct 17 07:07:55 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 17 Oct 2022 09:07:55 +0200 Subject: [neutron] next CI meeting cancelled Message-ID: <2331235.svhpfoarUZ@p1> Hi, Due to the PTG sessions this week, lets cancel Neutron CI meeting. See You on the PTG and on the next CI meeting which will be on Oct 25th. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From radoslaw.piliszek at gmail.com Mon Oct 17 07:26:53 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 17 Oct 2022 09:26:53 +0200 Subject: Installing out-of-tree ML2 neutron plugins with kolla-ansible In-Reply-To: References: Message-ID: Hi Aditya, First of all, please remember to always CC the mailing list so that other users can benefit from the answers. Regarding the question at hand - the same documentation (starting from the beginning of the page) describes the kolla-build tool, including the usual way to install it. Remember to install the version for your OpenStack release - the right source of versions is [1]. The kolla-build tool is the tool to build the images. I assume so far you have used the prebuilt ones. This time you will have to build your own because you want to add software inside of them. You might also be interested in deploying your own local registry. The simplest way to do so is via [2] but understand it is not production-ready unless in a strictly isolated network (for other use cases I generally recommend Harbor [3]). [1] https://docs.openstack.org/releasenotes/kolla/ [2] https://docs.openstack.org/kolla-ansible/yoga/user/multinode.html#deploy-a-registry [3] https://goharbor.io/ Kind regards, Radek -yoctozepto On Sun, 16 Oct 2022 at 21:43, Aditya Sathish wrote: > > Hi Radek, > > This was incredibly helpful. However, I couldn't find where the kolla-build.conf file is and should I create my own, where I have to add this. > > Regards, > Aditya. > > On Sun, Oct 16, 2022 at 3:10 PM Rados?aw Piliszek wrote: >> >> I believe you want to first install the plugin in the container image. >> The section on plugins might be of interest to you. [1] >> >> [1] https://docs.openstack.org/kolla/yoga/admin/image-building.html#plugin-functionality >> >> Kind regards, >> Radek >> -yoctozepto >> >> On Sun, 16 Oct 2022 at 20:31, Aditya Sathish wrote: >> > >> > Hello, >> > >> > I am trying to use an out-of-tree ML2 neutron plugin with OpenStack with Kolla-Ansible but I am having a hard time figuring out how to go about it. >> > >> > For example, this is my repository for the plugin: https://github.com/adityasathis/networking-onos. >> > >> > I have made changes to the deployment YML files to allow configuration for this new neutron plugin from the global.yml file. However, I am not able to figure out how to copy over my plugin files and install them on the controller node. >> > >> > I came across the commit: https://opendev.org/openstack/kolla-ansible/commit/418cb52767270d85e28a6f3027c561f47b805d9d which, I think, does what I'm looking to do and so I kept the networking-onos directory (with the setup.py in it) in the /etc/kolla/config/neutron/plugins directory. The deploy script is able to detect the file in the "Checking for ML2 plugins" step however, I'm not able to copy it anywhere in the "Copying ML2 plugin" step. Am I missing something? >> > >> > Regards, >> > Aditya. From ralonsoh at redhat.com Mon Oct 17 08:47:41 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Mon, 17 Oct 2022 10:47:41 +0200 Subject: [neutron] Neutron meetings cancelled this week Message-ID: Hello: As you may know, this week is the PTG. Any regular Neutron meeting (team meeting, drivers meeting, CI meeting) will be cancelled and resumed next week. Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Mon Oct 17 09:02:02 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Mon, 17 Oct 2022 11:02:02 +0200 Subject: [openstack-ansible][ptg] Antelope PTG schedule Message-ID: Hi everyone! As you might know, Project Teams Gathering is happening this week as a virtual event and OpenStack-Ansible has its own slots as well. On Tuesday, 18th of October at 14:00 - 17:00 UTC all contributors and interested parties are welcome to join us to discuss future development priorities in Essex room [1]. You can find topics that will be discussed in etherpad [2]. It's also not too late to add something extra there! Also, this year is the first time when Operator Hours were introduced. So if you're operator and have questions or proposals to the team - please join us on Wednesday, 19th of October at 14:00 - 15:00 UTC in Folsom room [3]. We also do have some questions for operators, to understand better how OpenStack-Ansible is being used and where we should focus our efforts. For that pease, use the following Etherpad [4] Hope seeing everyone soon! [1] https://www.openinfra.dev/ptg/rooms/essex [2] https://etherpad.opendev.org/p/osa-antelope-ptg [3] https://www.openinfra.dev/ptg/rooms/folsom [4] https://etherpad.opendev.org/p/osa-antelope-operator-hours From stephenfin at redhat.com Mon Oct 17 09:23:18 2022 From: stephenfin at redhat.com (Stephen Finucane) Date: Mon, 17 Oct 2022 10:23:18 +0100 Subject: [nova][keystone] What happens to key pairs after user is deleted In-Reply-To: <9576F4A6-3379-4C48-B3CC-5D32077BA50E@gmail.com> References: <3de78ba2af783aeb1f320d100110437097a91cea.camel@redhat.com> <9576F4A6-3379-4C48-B3CC-5D32077BA50E@gmail.com> Message-ID: <31db424535d9eb2d5c5694a614eea541e58a1226.camel@redhat.com> On Fri, 2022-10-14 at 15:06 +0200, Artem Goncharov wrote: > Thanks for answer > > > On 14. Oct 2022, at 13:54, Sean Mooney wrote: > > > > On Fri, 2022-10-14 at 12:23 +0200, Artem Goncharov wrote: > > > Hi all, > > > > > > From the API perspective it is possible to delete user without deleting its key pairs. > > > > > from a nova api perspective you have violated the precondition if you do not remove all resouces owned by the user in nova before you delete the user > > in keystone. so you are conflaiting two things. it is possibel to do in keystone apit but its not valid to do that as you have not met the precondtion > > of cleaing up the resocue in other project. > > so no we donot support deleteign the keyparis after the fact like that. > > > Practice showed, however, that keypairs of deleted user still exist and can be queried by API knowing id of the deleted user (at least in devstack and 1 other public cloud). > > > I know it may be tricky if there is still VM provisioned with the key, but deleting user logically means nobody has access to the private key anyway. > > > > > correct noone shoudl have access to the key but again you are not allowed to delete the user before you remoave any resouces used by it > > you can delete the keypari wihotut deleteing it form the vms that were created with it. deleting the keypair has never implied > > removing the keypair form ths authrised keys in the vm so no assumitions shoudl be made that removing the keypair alther who can log into the vm. > > that is just not part of what the nova api. > > > And since key pairs belong to users and not to projects it is not possible to clean them up in the project cleanup either. > > > > > > Actually from the API pov there is no reasonable way to ever find those (without knowing ID of the deleted user which is logically not known anymore). > > > > > > If there is no cleanup this can in the mid term cause trashing the database (records are small, but still), especially when using ?dynamic? users to perform some actions. > > ya so if we wanted to suprot automatic clean up of user resouce likek keyparis we woudl need a new user-deleted exeternal event in the nova api and > > nova could delete the key pair and any other user (not project) owned api resoruse but i think the key pari is the only example we have today. > > teh vms are owned by the project not the user. > > Right, this is only valid for key pairs. That is precisely the reason why project cleanup is not dealing with that today. > > > > > > > So far I haven?t tried to grep through code basis of Nova to check what is happening, neither tried to check behavior over time, and decided first to ask here whether somebody knows what should be generally happening here, is it a bug or feature? > > > > > this is not a bug its user error. > > I disagree here. There is nothing that blocks you from doing so. User error is i.e. to try to delete something what is still used. Here the user was never said to delete all of the resources, cause same statement is valid to deletion or not deletion of the VM created by the user. You should not be able to delete project if there are resources remaining, and you should not be able to drop user if key pairs exist? And the biggest issue is that once user recognised he did an ?error? - he is not able to fix it. > > My personal opinion is that it is a bug to let user to make error. What is the user supposed to do after he recognised he deleted project without deleting resources first? As admin you have chance to catch resources not belonging to any existing project, but not with key pairs. And how big are the chances you can still do this on a big cloud? As a general observation, one of the many downsides of service-specific quotas (as opposed to keystone-managed "limits") is that if/when a project is deleted in keystone, quotas related to that porject can be left hanging around in various services. It's up to the admin to go and clean these up manually, either ahead of time (using the clients) or after the fact (probably via DB, since OSC at least will fail to find the project when looking it up). It sounds like key pairs suffer from a similar issue. Yay, microservices. Stephen > > > nova and all other openstack servces to my knoladge require that you clean up the reosuce used by users or proejct are cleaned up before you remove a > > user/project form keystone. so by violatign that requirement you can nolonger interact with apis that depedn in the delete entitiy and that is expect. > > it woudl be a large cross proejct effort to chagne that. > > > > alternitivly keystoen could prevent the user/project form being deleteed if there are resuouce used by that user/project in other service btu tthat > > woudl also be a cross project effort. > > This feels logical, but most likely not easy to achieve, because otherwise Keystone need to query every service asking whether deletion of this user should be blocked or not. Keystone sending announcement to the services that certain user/project//domain was deleted so that service makes decision what to do with that is easier to achieve, but blocking is really the only way to make user experience correct and avoid creating a mess in a first place. > > > > > for this specific issue we could add a new Admin only api to allow the deletion fo user keypairs btu that woudl be a new feature. > > From the user perspective I would prefer extending list key pairs api with something like ?all users?. Having info like that customer based cleanup can determine all KPs owned by deleted users and drop them. For that, however, some form of tracking to which domain user was belonging (maybe instead of "all_users" add param "user_domain_id") need to be also done. I would like to prevent that only super-duper admin of the cloud can do such cleanup, otherwise in a big public cloud it will become a mess. > > > > > > > Thanks, > > > Artem > > > > > > > From noonedeadpunk at gmail.com Mon Oct 17 09:23:00 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Mon, 17 Oct 2022 11:23:00 +0200 Subject: [openstack-ansible] Meeting on 18th of October 2022 is being cancelled Message-ID: Hello everyone, Due to the PTG week, our regular meeting on Tuesdays will be cancelled on 18.10.2022. At the same time we will have a virtual discussion in Essex room [1]. Hope to see you there instead! [1] https://www.openinfra.dev/ptg/rooms/essex From pdeore at redhat.com Mon Oct 17 10:04:39 2022 From: pdeore at redhat.com (Pranali Deore) Date: Mon, 17 Oct 2022 15:34:39 +0530 Subject: [Glance]Weekly Meeting cancelled for Next 2 Weeks Message-ID: Hello, There will be no weekly meeting *this week* due to PTG sessions and *next week of PTG *as well, as most of the team members will not be around during that week. Next meeting will be directly on Nov 3rd. See you in the PTG !! Thanks, Pranali Deore -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramishra at redhat.com Mon Oct 17 10:12:12 2022 From: ramishra at redhat.com (Rabi Mishra) Date: Mon, 17 Oct 2022 15:42:12 +0530 Subject: [TripleO][PTG]TripleO Antelope PTG schedule and other relevant Information Message-ID: Hi All, As you all would already know, 2023.1(Antelope) PTG will start in a few hours from now. TripleO PTG schedule has been published @ https://etherpad.opendev.org/p/oct2022-ptg-tripleo for your reference. We would have sessions spanning across 2 days i.e. 17th Oct(Monday) and 18th Oct(Tuesday) from 13:00-17:00 UTC @https://www.openinfra.dev/ptg/rooms/ocata . Hope to see you there. Some other relevant information: - Please register for the event, if you've not already done so @ https://openinfra-ptg.eventbrite.com - PTG Page https://ptg.opendev.org/ptg.html - PTG Bot: https://opendev.org/openstack/ptgbot -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Mon Oct 17 11:31:00 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Mon, 17 Oct 2022 17:01:00 +0530 Subject: [ptg][cinder] Finalized Schedule In-Reply-To: References: Message-ID: Sorry about posting the wrong link. Here is the correct link to the PTG etherpad: https://etherpad.opendev.org/p/antelope-ptg-cinder On Sun, Oct 16, 2022 at 11:45 PM Rajat Dhasmana wrote: > Hello Everyone, > > I've finalized the schedule for cinder PTG[1] based on the topics, number > of hours of PTG and as per the date and time suited to the authors. Please > go through it and let me know if any changes are needed. > > I've also included a *Courtesy Ping* section with each topic so the > people interested in a particular topic will get a ping when it is being > discussed. Please include your IRC nick in front of it if you would like to > be notified. > > The schedule includes a Day wise topic list where, except for monday, > we've PTG from 1300-1700 UTC (where the last hour acts as a buffer time for > extended topic discussions or other planned activities). > > Following are some of the highlight events of each day, > > Monday: > TC + PTL session: 1500-1700 UTC > > Tuesday: > First day of cinder PTG > Operator hour from 1500-1600 UTC > > Wednesday: > Team photo at 1400 UTC (timing could be changed based on the topic > duration before it) > > Thursday: > Drivers Day! > > Friday: > Festival of XS reviews (If we've enough time left) > > [1] https://etherpad.opendev.org/p/zed-ptg-cinder > > Thanks and regards > Rajat Dhasmana > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luca.Czesla at mail.schwarz Mon Oct 17 09:12:05 2022 From: Luca.Czesla at mail.schwarz (Luca Czesla) Date: Mon, 17 Oct 2022 09:12:05 +0000 Subject: [ovn][neutron] RE: OVN BGP Agent query In-Reply-To: References: Message-ID: Hey Luis, Thanks for your mail. We have now prepared a first draft. In addition to what Ihtisham already wrote, we need the following options: - to run multiple agents per host (container) because we need more than one BGP session, alternatively there would be the option to do it via frr - we need to be able to filter which networks we announce where, for this we used the address scope from Openstack in the past To make this possible we have built it in so that it runs through the address scope. To make the address scope usable in ovn-bgp-agent we also patched Neutron so that the address scope is part of router_gateway and router_interface patch-ports. We also announce the networks behind the routers instead of the VM IPs directly. We added a new driver for this because we don't really need anything from the previous implementation at the moment and we were missing the similarities between the two. Possibly someone has an idea how we could merge this? It would be nice if you could have a first look at it. There is still some work to do like adding tests and making Zuul happy but I think it is useful to discuss this with you as early as possible. You can find the merge request here: https://review.opendev.org/c/x/ovn-bgp-agent/+/861581 Best regards, Luca Czesla From: Luis Tomas Bolivar Sent: Monday, October 3, 2022 7:45 AM To: Ihtisham ul Haq Cc: Daniel Alvarez ; Luca Czesla ; Max Andr? Lamprecht ; openstack-discuss Subject: Re: [ovn][neutron] RE: OVN BGP Agent query On Fri, Sep 30, 2022 at 6:20 PM Ihtisham ul Haq > wrote: Hi Luis and Daniel, Please see inline response. > From: Daniel Alvarez Sanchez > > Sent: 29 September 2022 11:37 > Subject: Re: OVN BGP Agent query > > Hi Ihtisham and Luis, > > On Thu, Sep 29, 2022 at 7:42 AM Luis Tomas Bolivar > wrote: > > Some comments and questions inline > > > > On Tue, Sep 27, 2022 at 1:39 PM Ihtisham ul haq > wrote: > > > Hi Luis, > > > > > > Thanks for your work on the OVN BGP Agent. We are planning > > > to use it in our OVN deployment, but have a question regarding it. > > > > Great to hear! Can you share a bit more info about this environment? like > > openstack version, target workload, etc. We plan to use this with Yoga version. Our workload consist of enterprise users with VMs running on Openstack and connected to their enterprise network via transfer network(to which the customer neutron router is attached to). And we also have public workload but with the ovn-bgp we only want to we want to advertise the former. > > > > > The way our current setup with ML2/OVS works is that our customer VM IP routes > > > are announced via the router IP(of the that customer) to the leaf switch instead of > > > the IP of the host where the neutron BGP agent runs. And then even if the > > > router fails over, the IP of the router stays the same and thus the BGP route > > > doesn't need to be updated. > > > > Is this with Neutron Dynamic Routing? When you say Router IP, do you mean the virtual neutron router and its IP associated with the provider network? What type of IPs are you announcing with BGP? IPs on provider network or on tenant networks (or both)? Yes, that's with Neutron DR agent, and I meant virtual neutron router with IP from the provider network. We announce IPs of our tenant network via the virtual routers external address. > > If the router fails over, the route needs to be updated, doesn't it? Same IP, but exposed in the new location of the router? Correct. > The route to the tenant network doesn't change, ie. > 192.168.0.0 via 172.24.4.100 (this route remains the same regardless of where 172.24.4.100 is). > If there's L2 in the 172.24.4.0 network, the new location of 172.24.4.100 will be learnt via GARP announcement. In our case, this won't happen as we don't have L2 so we expose directly connected routes to overcome this "limitation". Right, in our case we have a stretched L2 transfer network(mentioned above) to which our gateway nodes and customer routers are connected to, so we can advertise the IPs from the tenant network via the virtual router external IP and thus the location of the router isn't relevant in case of failover as its address will be relearned. > In the case of Neutron Dynamic Routing, there's no assumption that everything is L3 so GARPs are needed to learn the new location. > > > > We see that the routes are announced by the ovn-bgp-agent via the host IP(GTW) in our > > > switch peers. If that's the case then how do you make sure that during failover > > > of a router, the BGP routes gets updated with the new host IP(where the router > > > failed over to)? > > > > The local FRR running at each node is in charge of exposing the IPs. For the IPs on the provider network, the traffic is directly exposed where the VMs are, without having to go through the virtual router, so a router failover won't change the route. > > In the case of VMs on tenant networks, the traffic is exposed on the node where the virtual router gateway port is associated (I suppose this is what you refer to with router IP). In the case of a failover the agent is in charge of making FRR to withdraw the exposed routes on the old node, and re-advertise them on the new router IP location > > > > > Can we accomplish the same route advertisement as our ML2/OVS setup, using the ovn-bgp-agent? > > I think this is technically possible, and perhaps you want to contribute that functionality or even help integrating the agent as a driver of Neutron Dynamic Routing? Sounds good, our plan currently is to add this to the ovn-bgp-agent, so we can announce our tenant routes via virtual routers external address on a stretched L2 network, to make it work with our use case. Great to hear!! Just to make it clear, the ovn-bgp-agent current solution is to expose the tenant VM IPs through the host that has the OVN router gateway port, so for example, if the VM IP (10.0.0.5) is connected to the neutron virtual router, which in turns is connected to your provider network (your transfer network) with IP 172.24.4.10, and hosted in a physical server with IP 192.168.100.100, the route will be exposed as: - 10.0.0.5 nexthop 192.168.100.100 - 172.24.4.10 nexthop 192.168.100.100 As we are using FRR config "redistributed connected". As the traffic to the tenant networks needs to be injected into the OVN overlay through the gateway node hosting that ovn virtual router gateway port (cr-lrp), would it be ok if, besides those route we also advertise? - 10.0.0.5 nexthop 172.24.4.10 Cheers, Luis -- Ihtisham ul Haq Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier>. -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. -------------- next part -------------- An HTML attachment was scrubbed... URL: From saditya at vt.edu Mon Oct 17 10:35:02 2022 From: saditya at vt.edu (Aditya Sathish) Date: Mon, 17 Oct 2022 06:35:02 -0400 Subject: Installing out-of-tree ML2 neutron plugins with kolla-ansible In-Reply-To: References: Message-ID: Hi, My apologies for the unicast email as I wasn't aware of the protocol. I'll be sure to keep it in mind moving forward. Regarding my query, thank you so much, I was able to get my plugin working by following the kolla-build instructions and things seem to have gone my way. Thank you very much! Regards, Aditya. On Mon, Oct 17, 2022, 03:27 Rados?aw Piliszek wrote: > Hi Aditya, > > First of all, please remember to always CC the mailing list so that > other users can benefit from the answers. > > Regarding the question at hand - the same documentation (starting from > the beginning of the page) describes the kolla-build tool, including > the usual way to install it. Remember to install the version for your > OpenStack release - the right source of versions is [1]. > The kolla-build tool is the tool to build the images. I assume so far > you have used the prebuilt ones. This time you will have to build your > own because you want to add software inside of them. > You might also be interested in deploying your own local registry. The > simplest way to do so is via [2] but understand it is not > production-ready unless in a strictly isolated network (for other use > cases I generally recommend Harbor [3]). > > [1] https://docs.openstack.org/releasenotes/kolla/ > [2] > https://docs.openstack.org/kolla-ansible/yoga/user/multinode.html#deploy-a-registry > [3] https://goharbor.io/ > > Kind regards, > Radek > -yoctozepto > > On Sun, 16 Oct 2022 at 21:43, Aditya Sathish wrote: > > > > Hi Radek, > > > > This was incredibly helpful. However, I couldn't find where the > kolla-build.conf file is and should I create my own, where I have to add > this. > > > > Regards, > > Aditya. > > > > On Sun, Oct 16, 2022 at 3:10 PM Rados?aw Piliszek < > radoslaw.piliszek at gmail.com> wrote: > >> > >> I believe you want to first install the plugin in the container image. > >> The section on plugins might be of interest to you. [1] > >> > >> [1] > https://docs.openstack.org/kolla/yoga/admin/image-building.html#plugin-functionality > >> > >> Kind regards, > >> Radek > >> -yoctozepto > >> > >> On Sun, 16 Oct 2022 at 20:31, Aditya Sathish wrote: > >> > > >> > Hello, > >> > > >> > I am trying to use an out-of-tree ML2 neutron plugin with OpenStack > with Kolla-Ansible but I am having a hard time figuring out how to go about > it. > >> > > >> > For example, this is my repository for the plugin: > https://github.com/adityasathis/networking-onos. > >> > > >> > I have made changes to the deployment YML files to allow > configuration for this new neutron plugin from the global.yml file. > However, I am not able to figure out how to copy over my plugin files and > install them on the controller node. > >> > > >> > I came across the commit: > https://opendev.org/openstack/kolla-ansible/commit/418cb52767270d85e28a6f3027c561f47b805d9d > which, I think, does what I'm looking to do and so I kept the > networking-onos directory (with the setup.py in it) in the > /etc/kolla/config/neutron/plugins directory. The deploy script is able to > detect the file in the "Checking for ML2 plugins" step however, I'm not > able to copy it anywhere in the "Copying ML2 plugin" step. Am I missing > something? > >> > > >> > Regards, > >> > Aditya. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Mon Oct 17 12:46:30 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 17 Oct 2022 13:46:30 +0100 Subject: Installing out-of-tree ML2 neutron plugins with kolla-ansible In-Reply-To: References: Message-ID: On Mon, 2022-10-17 at 09:26 +0200, Rados?aw Piliszek wrote: > Hi Aditya, > > First of all, please remember to always CC the mailing list so that > other users can benefit from the answers. > > Regarding the question at hand - the same documentation (starting from > the beginning of the page) describes the kolla-build tool, including > the usual way to install it. Remember to install the version for your > OpenStack release - the right source of versions is [1]. > The kolla-build tool is the tool to build the images. I assume so far > you have used the prebuilt ones. This time you will have to build your > own because you want to add software inside of them. This is quite outdated as i have not been active in kolla for some time but i wote a template-override for ovs-dpdk isntallation from soruce as a replacment for standard ovs. https://github.com/openstack/kolla/blob/master/contrib/template-override/ovs-dpdk.j2 that shows how to use some of the macros ectra that are avaiable https://github.com/openstack/kolla/blob/master/doc/source/admin/template-override/ovs-dpdk.rst show how to hten use that template-override as part of the image build. as radek noted https://docs.openstack.org/kolla/yoga/admin/image-building.html#plugin-functionality porvides a more eplicit exmaple of how to do this for ml2 plugins. if the kolla core tema was open to it you proably coudl add a template to the contib dir in kolla once created to share with others. there used to be odl examples in the past. > You might also be interested in deploying your own local registry. The > simplest way to do so is via [2] but understand it is not > production-ready unless in a strictly isolated network (for other use > cases I generally recommend Harbor [3]). > > [1] https://docs.openstack.org/releasenotes/kolla/ > [2] https://docs.openstack.org/kolla-ansible/yoga/user/multinode.html#deploy-a-registry > [3] https://goharbor.io/ > > Kind regards, > Radek > -yoctozepto > > On Sun, 16 Oct 2022 at 21:43, Aditya Sathish wrote: > > > > Hi Radek, > > > > This was incredibly helpful. However, I couldn't find where the kolla-build.conf file is and should I create my own, where I have to add this. > > > > Regards, > > Aditya. > > > > On Sun, Oct 16, 2022 at 3:10 PM Rados?aw Piliszek wrote: > > > > > > I believe you want to first install the plugin in the container image. > > > The section on plugins might be of interest to you. [1] > > > > > > [1] https://docs.openstack.org/kolla/yoga/admin/image-building.html#plugin-functionality > > > > > > Kind regards, > > > Radek > > > -yoctozepto > > > > > > On Sun, 16 Oct 2022 at 20:31, Aditya Sathish wrote: > > > > > > > > Hello, > > > > > > > > I am trying to use an out-of-tree ML2 neutron plugin with OpenStack with Kolla-Ansible but I am having a hard time figuring out how to go about it. > > > > > > > > For example, this is my repository for the plugin: https://github.com/adityasathis/networking-onos. > > > > > > > > I have made changes to the deployment YML files to allow configuration for this new neutron plugin from the global.yml file. However, I am not able to figure out how to copy over my plugin files and install them on the controller node. > > > > > > > > I came across the commit: https://opendev.org/openstack/kolla-ansible/commit/418cb52767270d85e28a6f3027c561f47b805d9d which, I think, does what I'm looking to do and so I kept the networking-onos directory (with the setup.py in it) in the /etc/kolla/config/neutron/plugins directory. The deploy script is able to detect the file in the "Checking for ML2 plugins" step however, I'm not able to copy it anywhere in the "Copying ML2 plugin" step. Am I missing something? > > > > > > > > Regards, > > > > Aditya. > From sbauza at redhat.com Mon Oct 17 12:53:02 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Mon, 17 Oct 2022 14:53:02 +0200 Subject: [ptg][nova][placement] Nova/Placement at the PTG, howto Message-ID: Hey stackers, Got some questions about how/when joining the Nova community at the PTG during this week, so I'll provide you a howto ;) *What is the main document for the Nova PTG ?* This : https://etherpad.opendev.org/p/nova-antelope-ptg Please, please, don't try to translate this etherpad directly as you'll see. *I'm an operator and I want to tell you how terrible you folks are* There : https://etherpad.opendev.org/p/oct2022-ptg-operator-hour-nova Again, don't try to translate this document or you'll create problems for people who don't know your language. *When do you folks discuss ?* Monday : 14:00-15:00 UTC Ironic/Nova cross-project session Tuesday : 13:00 - 15:00 UTC Operator hours. If you are an operator, please join us ! :-) 15:00 - 17:00 UTC Nova sessions for contributors 15:10 UTC (possibly) : Nova team photo Wednesday : 13:00 - 16:00 UTC Nova sessions for contributors 16:00 - 17:00 Operator hour. If you are an operator, please join us ! :-) Thursday : 13:00 - 15:00 UTC Neutron/Nova cross-project session 15:00 - 17:00 UTC Nova sessions for contributors Friday : 13:00 - 17:00 UTC Nova sessions for contributors *How to join ?* Simple. You can look at https://ptg.opendev.org/ptg.html if you want so you'll see what topic we are currently discussing. If you want to arrive, just click either on "operator-hour" nova tag or "nova-placement" tag and it will automatically open the Google Meet client for joining the Nova meeting. If you also want, you can directly join us by using this link https://www.openstack.org/ptg/rooms/bexar (we'll use the same virtual room for all our sessions) *What if you can't join the whole days ?* No worries ! Just make sure you use an IRC client and just go to the #openstack-nova channel. Once you're there, add your IRC in the etherpad for every topic you'd like to be around and then I'll ping you when we start discussing for those topics you added your nick. *What if I want to add some topic I'd like to discuss ?* Easy peasy. Two possibilities : - you are a contributor and you want to add a new feature in nova. Just add your topic in the main etherpad (link above). Make sure your name is marked in the topic title (like every other topic) and I'll ping you when we start discussing your topic. - you are an operator and you want to discuss wishlists or bugs or whatever else. Good news, we have specific timeslots for you ! See the agenda above, we have *operator hours* that are intended for operators to join and engage on specific deployment and operational topics. Just then look at https://etherpad.opendev.org/p/oct2022-ptg-operator-hour-nova and add your points, we'll discuss them during the operator hours (given above). Hope you'll find your way and you'll appreciate the PTG as much productive as we do. In any case, my IRC nick is bauzas. Don't hesitate to bug me over IRC (#openstack-dev, #openstack-nova or #openinfra-events) or reach me by email (sbauza at redhat.com), I'll try to answer you as much as I can. HTH and thanks, -Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Mon Oct 17 12:59:40 2022 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Mon, 17 Oct 2022 14:59:40 +0200 Subject: [kolla][ptg] Using Meetpad instead of Zoom Message-ID: Hello, If you?re planning on joining the Kolla PTG - please use Meetpad - https://meetpad.opendev.org/kolla-antelope-ptg Sorry for last minute changes Best regards, Michal From jlibosva at redhat.com Mon Oct 17 13:26:26 2022 From: jlibosva at redhat.com (Jakub Libosvar) Date: Mon, 17 Oct 2022 09:26:26 -0400 Subject: [Neutron] Bug Deputy Report Oct 10 - 17 Message-ID: <2b0c1a40-e168-9fa7-5d1b-5fe30cb2ade5@redhat.com> Hi all, I was the bug deputy last week, here is the report. Nothing that requires immediate attention. Medium ------ - [ovn-octavia-provider] Detach OVN-LB LS from the LR breaks OVN-LB connectivity https://bugs.launchpad.net/neutron/+bug/1992363 assigned to Luis Fix proposed: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/860781 - [scale] Setting a gateway on router is killing database https://bugs.launchpad.net/neutron/+bug/1992950 Assigned to Arnaud Morin Fix proposed: https://review.opendev.org/c/openstack/neutron/+/861322 - Disable in-band flow management in the ovs bridges https://bugs.launchpad.net/neutron/+bug/1992953 Assigned to Slawek Fix proposed: https://review.opendev.org/c/openstack/neutron/+/861351 Wishlist ------- - fip will loss when it migrate between dvr-sant agent and dvr_no_external in Rocky https://bugs.launchpad.net/neutron/+bug/1992542 The reported bug uses unsupported configuration dvr_snat agent mode on the compute nodes. Cheers Kuba From openstack at mknet.nl Mon Oct 17 13:44:12 2022 From: openstack at mknet.nl (Marcel) Date: Mon, 17 Oct 2022 15:44:12 +0200 Subject: [swift] upgrade newton to train in containers In-Reply-To: <5ea3efd915cadeecbdfc122990aa20c3@mknet.nl> References: <683435eb7b0602a347a2c38c7a73f373@mknet.nl> <5ea3efd915cadeecbdfc122990aa20c3@mknet.nl> Message-ID: I'm planning an upgrade for our newton based swift cluster (OOO vm based) to a (kolla image based containers) train cluster So far I have tested the upgrade for the proxies, account_servers and container servers and it looks promising. I have in a test environment: - switched to the new proxies with the old ring files and it looks like everything works normally - Added the new (train) account and container servers to the rings and it looks like all is fine - Removed the old account and container servers, still fine - tested fall back, also fine My question actually is: Did the account and container database format change between newton and train in such a way that I might run into troubles trying the upgrade as tested above in ways that I did not yet foresee? Thanks Marcel From rosmaita.fossdev at gmail.com Mon Oct 17 14:41:39 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Mon, 17 Oct 2022 10:41:39 -0400 Subject: [0SSN-0090] Best practices when configuring Glance with COW backends Message-ID: <004d2b5c-04c1-9ca0-53b7-0cc358ea461c@gmail.com> Best practices when configuring Glance with COW backends --- ### Summary ### When deploying Glance in a popular configuration where Glance shares a common storage backend with Nova and/or Cinder, it is possible to open some known attack vectors by which malicious data modification can occur. This note reviews the known issues and suggests a Glance deployment configuration that can mitigate such attacks. ### Affected Services / Software ### Glance, all supported releases (Queens through Zed) ### Discussion ### This note applies to you if you are operating an end-user-facing glance-api service with the 'show_multiple_locations' option set to True (the default value is False) or if your end-user-facing glance-api has the 'show_image_direct_url' option set to True (default value is False). Our recommendation is that the image "locations" and "direct_url" fields [0] *never* be displayed to end users in a cloud. This can be accomplished by running two glance-api services: - A "user-facing" glance-api that is accessible to end users and which appears in users' service catalogs. - An "internal-only-facing" glance-api that is accessible only to those services that require access to the 'direct_url' or image location fields, and which is protected by firewalls from access by end users. (Nova, Cinder, and Ironic all have configuration options to specify the Glance API endpoint each service uses [1].) This dual glance-api deployment was suggested in "Known Issues" in Glance release notes in the Rocky [2] through Ussuri releases, but it seems that the idea has not received sufficient attention. Hence this security note. The attack vector that becomes available when image locations are exposed to end users was originally outlined in OSSN-0065 [3], though that note was not clear about the attack surface or mitigation, and contained some forward-looking statements that were not fulfilled. The attack vector is: [A] malicious user could create an image in Glance, set an additional location on that image pointing to an altered image, then delete the original location, so that consumers of the original image would unwittingly be using the malicious image. Note, however, that this attack vector cannot change the original image's checksum, and it is limited to images that are owned by the attacker. OSSN-0065 suggests that this is only an issue when users do not checksum their image data. It neglects the fact that in some popular deployment configurations in which Nova creates a root disk snapshot, data is never uploaded to Glance, but instead a snapshot is created directly in the backend and Nova creates a Glance image record with "size" 0 and an empty "os_hash_value" [4], making it impossible to compare the hash of downloaded image data to the value maintained by Glance. Further, when Nova is configured to use the same storage for ephemeral disks that is used as a Glance image store, Nova efficiently creates a server root disk directly in the backend without checksumming the image data. This is an intentional design choice to optimize storage space and host resources, but an implication is that even if the image record has a recorded hash, it is not being checked at the point of image consumption. Similarly, when using a shared backend, or a cinder glance_store, Cinder will efficiently clone a volume created from an image directly in the backend without checksumming the image data. Again, this is done intentionally in order to optimize resources, but it is important to be aware of the security tradeoff being made by this configuration. In other words, if the image data is not going to be checked at the point of image consumption, then extra care needs to be taken to ensure the integrity of the data path. OSSN-0065 suggested that the attack vector of substituting image data by modifying the image locations could be addressed by using policies, but that has turned out not to be the case. The only way currently to mitigate this vector is to deploy glance-api in a dual configuration as described above, namely with an internal-only-facing glance-api used by Nova and Cinder (that has show_multiple_locations enabled), and an end-user-facing glance-api (that has show_multiple_locations disabled). So far the focus has been on 'show_multiple_locations'. When that setting is disabled in Glance, it is not possible to manipulate the locations via the OpenStack Images API. Keep in mind, however, that in any Glance/Nova/Cinder configuration where Nova and/or Cinder do copy-on-write directly in the image store, image data transfer takes place outside Glance's image data download path, and hence the os_hash_value is *not* checked. Thus, if the backend store is itself compromised and image data is replaced directly in the backend, the substitution will *not* be detected. This brings us to the 'show_image_direct_url' option, which includes a "direct_url" field in the image-show response that can be used by various OpenStack services to consume images directly from the storage backend. Exposing the 'direct_url' to end users leaks information about the storage backend. What exactly that information consists of depends upon the backend in use and how it is configured, but in general, it is not a good idea to provide hints that could be useful to malicious actors in their attempts to compromise the backend storage by some type of independent exploit. The 'direct_url', being read-only, may appear innocuous, but its use by services is usually to perform some kind of optimized image data access that most likely does not include computing a hash of the image data. We therefore recommend that OpenStack services that require exposure of the 'direct_url' image property be similarly configured to use an internal-only-facing glance-api. It is worth nothing that end users who wish to download an image do not require access to the 'direct_url' image property because they can simply use the image-data-download API call [5]. ### Recommended Actions ### A glance-api service with 'show_multiple_locations' enabled should *never* be exposed directly to end users. This setting should only be enabled on an internal-only-facing glance-api that is used by OpenStack services that require access to image locations. This could be done, for example, by running two glance-api services with different configuration files and using the appropriate configuration options for each service to specify the Image API endpoint to access, and making sure the special internal endpoint is firewalled in such a way that only the appropriate OpenStack services can contact it. Similarly, enabling 'show_image_direct_url' exposes information about the storage backend that could be of use to malicious actors in as yet unknown exploits, so it should likewise only be enabled on an internal-only-facing glance-api. ### Notes / References ### [0] https://docs.openstack.org/api-ref/image/v2/index.html#show-image-schema [1] Nova and Ironic use 'endpoint_override' in the '[glance]' section of the configuration file; Cinder uses 'glance_api_servers' in the '[DEFAULT]' section. [2] OSSN-0065: https://wiki.openstack.org/wiki/OSSN/OSSN-0065 [3] The Glance "multihash" metadata pair of 'os_hash_algo' and 'os_hash_value' were introduced in Rocky to replace the legacy md5 'checksum' field. The python-glanceclient has used multihash checksumming for download verification since version 2.13.0. [4] https://docs.openstack.org/releasenotes/glance/rocky.html#known-issues [5] https://docs.openstack.org/api-ref/image/v2/index.html?#download-binary-image-data ### Contacts / References ### Author: Brian Rosmaita, Red Hat This OSSN : https://wiki.openstack.org/wiki/OSSN/OSSN-0090 Original LaunchPad Bug : https://bugs.launchpad.net/ossn/+bug/1990157 Mailing List : [Security] tag on openstack-discuss at lists.openstack.org OpenStack Security Project : https://launchpad.net/~openstack-ossg CVE: none -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_0xE834C62762D8856C.asc Type: application/pgp-keys Size: 677 bytes Desc: OpenPGP public key URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 236 bytes Desc: OpenPGP digital signature URL: From gagehugo at gmail.com Mon Oct 17 14:49:40 2022 From: gagehugo at gmail.com (Gage Hugo) Date: Mon, 17 Oct 2022 09:49:40 -0500 Subject: [openstack-helm] No Weekly IRC Meeting - PTG Message-ID: Hey team, This week is the PTG so the weekly IRC meeting is cancelled. We will be meeting for an hour at the same time slot for our PTG session: https://ptg.opendev.org/ptg.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Mon Oct 17 15:11:00 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Mon, 17 Oct 2022 17:11:00 +0200 Subject: [release] Release countdown for week R-22, Oct 17 - 21 Message-ID: <6c2a927e-e60d-0996-6874-a50d59835ddf@est.tech> Hi, Welcome back to the release countdown emails! These will be sent at major points in the 2023.1 Antelope development cycle, which should conclude with a final release on March 22nd, 2023. Development Focus ----------------- At this stage in the release cycle, focus should be on planning the 2023.1 Antelope development cycle, assessing 2023.1 Antelope community goals and approving 2023.1 Antelope specs. General Information ------------------- 2023.1 Antelope is a 24 weeks long development cycle. In case you haven't seen it yet, please take a look over the schedule for this release: https://releases.openstack.org/antelope/schedule.html By default, the team PTL is responsible for handling the release cycle and approving release requests. This task can (and probably should) be delegated to release liaisons. Now is a good time to review release liaison information for your team and make sure it is up to date: https://opendev.org/openstack/releases/src/branch/master/data/release_liaisons.yaml By default, all your team deliverables from the Zed release are continued in 2023.1 Antelope with a similar release model. Upcoming Deadlines & Dates -------------------------- Virtual PTG: October 17-21, 2022 Antelope-1 milestone: November 17th, 2022 El?d Ill?s irc: elodilles @ #openstack-release From ces.eduardo98 at gmail.com Mon Oct 17 15:41:14 2022 From: ces.eduardo98 at gmail.com (Carlos Silva) Date: Mon, 17 Oct 2022 12:41:14 -0300 Subject: [manila] Cancelling two weekly meetings Message-ID: Hello, Zorillas! As agreed in the previous week, we will be cancelling the next two weekly meetings: from this week (Oct 20th) and next week (Oct 27th). The next weekly meeting for Manila will be on Nov 3rd. Thank you, carloss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Mon Oct 17 16:18:18 2022 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Mon, 17 Oct 2022 18:18:18 +0200 Subject: [kolla][ptg] Meetings on Tuesday 18th Oct Message-ID: <578AAFB0-2F63-48A3-AB81-AC65F57A8028@gmail.com> Hello Koalas, Due to exhaustion of topics - we?re meeting tomorrow only on the Kolla Operator Hour at 16 UTC. Kolla Operator Hour Etherpad: https://etherpad.opendev.org/p/oct2022-ptg-operator-hour-kolla Best regards, Michal Nasiadka From p.aminian.server at gmail.com Mon Oct 17 17:15:10 2022 From: p.aminian.server at gmail.com (Parsa Aminian) Date: Mon, 17 Oct 2022 20:45:10 +0330 Subject: single Network interface Message-ID: Hello I use kolla ansible wallaby version . my compute node has only one port . is it possible to use this server ? as I know openstack compute need 2 port one for management and other for external user network . Im using provider_networks and it seems neutron_external_interface could not be the same as network_interface because openvswitch need to create br-ex bridge on separate port is there any solution that i can config my compute with 1 port ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From allison at openinfra.dev Mon Oct 17 19:04:37 2022 From: allison at openinfra.dev (Allison Price) Date: Mon, 17 Oct 2022 14:04:37 -0500 Subject: [ptls][tc] 2022 OpenStack User Survey Project Question Responses In-Reply-To: <236718A0-249C-42D6-ABB8-AE65C5BFA6A6@openinfra.dev> References: <236718A0-249C-42D6-ABB8-AE65C5BFA6A6@openinfra.dev> Message-ID: <97B264A9-24D3-4E7E-BAB1-34E7EA458B87@openinfra.dev> Hi everyone, As a reminder, please use your PTG sessions this week to discuss any updates your team would like to make to your OpenStack User Survey [1] questions. Once you have your updates final, please respond and cc Josh Lohse. He is the Web Manager at the OpenInfra Foundation and can answer any questions you have around the functionality options for the survey. He will work with me to make the updates requested to the 2023 survey. Thanks, Allison [1] https://www.openstack.org/usersurvey > On Oct 13, 2022, at 12:37 PM, Allison Price wrote: > > Hi everyone, > > Please find attached the responses to the project questions from the 2022 OpenStack User Survey. Based on feedback last year, I included additional, non-identifiable information that will hopefully help provident deployment context for the responses to your questions. If you need a reminder of your project question, you can review the OpenStack User Survey [1]. During the PTG, I would encourage you and your teams to review the responses and decide if you would like to make any changes to your question for the 2023 OpenStack User Survey. It is live now, but we can make changes ahead of significant promotion. Please reach out to me directly with any changes. > > If you have any questions on how to read the results, please let me know. > > Have a great week at the PTG! > > Cheers, > Allison > > [1] https://www.openstack.org/usersurvey > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwilde at redhat.com Mon Oct 17 23:59:52 2022 From: dwilde at redhat.com (Dave Wilde) Date: Mon, 17 Oct 2022 18:59:52 -0500 Subject: [keystone] Cancelling weekly meeting for 10/18/22 Message-ID: <77bb3cdd-8777-d13c-44d6-bfe99e634e19@redhat.com> Cancelling due to the PTG From yasufum.o at gmail.com Tue Oct 18 02:42:24 2022 From: yasufum.o at gmail.com (Yasufumi Ogawa) Date: Tue, 18 Oct 2022 11:42:24 +0900 Subject: [tacker] Cancelling IRC meeting on Oct 18 Message-ID: <47f9223b-f0ab-5b60-7500-68567044574a@gmail.com> Hi team, Since we're going to have the vPTG session, I'd like to cancel IRC meeting today. Thanks, Yasufumi From mnasiadka at gmail.com Tue Oct 18 06:03:39 2022 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Tue, 18 Oct 2022 08:03:39 +0200 Subject: [kolla] Weekly meeting 20th Oct 2022 Message-ID: <38AB80BD-ED46-44C3-AD07-DDFA1F91DC43@gmail.com> Hello, Since this is PTG week - this week meeting is cancelled. Best regards, Michal From rdhasman at redhat.com Tue Oct 18 07:39:12 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Tue, 18 Oct 2022 13:09:12 +0530 Subject: [cinder] cancelling this week's meeting (19 Oct) Message-ID: Hello Argonauts, Since we've PTG this week, we will not be conducting the weekly meeting on 19th Oct 2022 but everyone is recommended to attend the PTG instead[1]. See you all at the PTG! [1] https://etherpad.opendev.org/p/antelope-ptg-cinder Thanks and regards Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltomasbo at redhat.com Tue Oct 18 07:53:58 2022 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Tue, 18 Oct 2022 09:53:58 +0200 Subject: [ovn][neutron] RE: OVN BGP Agent query In-Reply-To: References: Message-ID: Hello Luca! Awesome patch! See some comments below, though I'll also add some more details on the gerrit side On Mon, Oct 17, 2022 at 11:12 AM Luca Czesla wrote: > Hey Luis, > > > > Thanks for your mail. > > > > We have now prepared a first draft. In addition to what Ihtisham already > wrote, we need the following options: > > - to run multiple agents per host (container) because we need more than > one BGP session, alternatively there would be the option to do it via frr > Nice, this could actually be splitted from the patch into a different one, as it is kind of independent of the new driver, and can be used for other use cases (and merge it almost right away) > - we need to be able to filter which networks we announce where, for this > we used the address scope from Openstack in the past > > > To make this possible we have built it in so that it runs through the > address scope. To make the address scope usable in ovn-bgp-agent we also > patched Neutron so that the address scope is part of router_gateway and > router_interface patch-ports. We also announce the networks behind the > routers instead of the VM IPs directly. > We actually play with something similar to be able to have a simple/initial API here: https://review.opendev.org/c/openstack/neutron/+/797087 (idea was to use port tags instead) Do you have a link to the modifications to enable this in the neutron side? > > > We added a new driver for this because we don't really need anything from > the previous implementation at the moment and we were missing the > similarities between the two. Possibly someone has an idea how we could > merge this? > Is the new expose_router_port/withdraw_router_port similar to the previous expose_subnet/withdraw_subnet? Perhaps we just need a different implementation of those? I was actually playing (after initial discussion with Ihtisha) for options to not only expose VM_IP via HOST_IP, but doing it VM_IP though OVN_GATEWAY_PORT_IP, and then GATEWAY_PORT_IP via HOST_IP. I was playing with "redistribute static" instead of "redistributed kernel" + IP routes (I think I like your approach better). My idea was to have the existing driver with an option to state if the env is pure L3 (the current support) or L2 (where you work is needed). That is another option to explore, but open to having completely different drivers, as it also makes a lot of sense. > > > It would be nice if you could have a first look at it. > > > > There is still some work to do like adding tests and making Zuul happy but > I think it is useful to discuss this with you as early as possible. > Yeah, no problem, we'll make zull happy after the initial design/idea is discussed. Don't need to waste time on that at this moment > > > You can find the merge request here: > https://review.opendev.org/c/x/ovn-bgp-agent/+/861581 > Thanks! I'll leave some more detailed review in there There is one more general question on that patch. Isn't that exposing the routes in all the nodes where the agent is running instead of where the actual ovn router gateway port is attached? Don't you need to check is the node where you can actually inject the traffic to OVN overlay? Or you are assuming everything is L2 at that point, and it does not matter everyone expose the route as at the end of the day, only the node with the ovn gateway router port will reply to ARPs > > Best regards, > > Luca Czesla > > > > *From:* Luis Tomas Bolivar > *Sent:* Monday, October 3, 2022 7:45 AM > *To:* Ihtisham ul Haq > *Cc:* Daniel Alvarez ; Luca Czesla > ; Max Andr? Lamprecht > ; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [ovn][neutron] RE: OVN BGP Agent query > > > > > > > > On Fri, Sep 30, 2022 at 6:20 PM Ihtisham ul Haq < > Ihtisham.ul_Haq at mail.schwarz> wrote: > > Hi Luis and Daniel, > > Please see inline response. > > > From: Daniel Alvarez Sanchez > > Sent: 29 September 2022 11:37 > > Subject: Re: OVN BGP Agent query > > > > Hi Ihtisham and Luis, > > > > On Thu, Sep 29, 2022 at 7:42 AM Luis Tomas Bolivar > wrote: > > > Some comments and questions inline > > > > > > On Tue, Sep 27, 2022 at 1:39 PM Ihtisham ul haq < > ihtisham.uh at hotmail.com> wrote: > > > > Hi Luis, > > > > > > > > Thanks for your work on the OVN BGP Agent. We are planning > > > > to use it in our OVN deployment, but have a question regarding it. > > > > > > Great to hear! Can you share a bit more info about this environment? > like > > > openstack version, target workload, etc. > > We plan to use this with Yoga version. Our workload consist of enterprise > users > with VMs running on Openstack and connected to their enterprise network via > transfer network(to which the customer neutron router is attached to). > And we also have public workload but with the ovn-bgp we only want to > we want to advertise the former. > > > > > > > > The way our current setup with ML2/OVS works is that our customer VM > IP routes > > > > are announced via the router IP(of the that customer) to the leaf > switch instead of > > > > the IP of the host where the neutron BGP agent runs. And then even > if the > > > > router fails over, the IP of the router stays the same and thus the > BGP route > > > > doesn't need to be updated. > > > > > > Is this with Neutron Dynamic Routing? When you say Router IP, do you > mean the virtual neutron router and its IP associated with the provider > network? What type of IPs are you announcing with BGP? IPs on provider > network or on tenant networks (or both)? > > Yes, that's with Neutron DR agent, and I meant virtual neutron router with > IP from the provider network. We announce IPs of our tenant network via the > virtual routers external address. > > > > If the router fails over, the route needs to be updated, doesn't it? > Same IP, but exposed in the new location of the router? > > Correct. > > > The route to the tenant network doesn't change, ie. > > 192.168.0.0 via 172.24.4.100 (this route remains the same regardless of > where 172.24.4.100 is). > > If there's L2 in the 172.24.4.0 network, the new location of > 172.24.4.100 will be learnt via GARP announcement. In our case, this won't > happen as we don't have L2 so we expose directly connected routes to > overcome this "limitation". > > Right, in our case we have a stretched L2 transfer network(mentioned above) > to which our gateway nodes and customer routers are connected to, so we can > advertise the IPs from the tenant network via the virtual router external > IP > and thus the location of the router isn't relevant in case of failover as > its > address will be relearned. > > > In the case of Neutron Dynamic Routing, there's no assumption that > everything is L3 so GARPs are needed to learn the new location. > > > > > > We see that the routes are announced by the ovn-bgp-agent via the > host IP(GTW) in our > > > > switch peers. If that's the case then how do you make sure that > during failover > > > > of a router, the BGP routes gets updated with the new host IP(where > the router > > > > failed over to)? > > > > > > The local FRR running at each node is in charge of exposing the IPs. > For the IPs on the provider network, the traffic is directly exposed where > the VMs are, without having to go through the virtual router, so a router > failover won't change the route. > > > In the case of VMs on tenant networks, the traffic is exposed on the > node where the virtual router gateway port is associated (I suppose this is > what you refer to with router IP). In the case of a failover the agent is > in charge of making FRR to withdraw the exposed routes on the old node, and > re-advertise them on the new router IP location > > > > > > > Can we accomplish the same route advertisement as our ML2/OVS setup, > using the ovn-bgp-agent? > > > > I think this is technically possible, and perhaps you want to contribute > that functionality or even help integrating the agent as a driver of > Neutron Dynamic Routing? > > Sounds good, our plan currently is to add this to the ovn-bgp-agent, > so we can announce our tenant routes via virtual routers external address > on > a stretched L2 network, to make it work with our use case. > > > > Great to hear!! > > > > Just to make it clear, the ovn-bgp-agent current solution is to expose the > tenant VM IPs through the host that has the OVN router gateway port, so for > example, if the VM IP (10.0.0.5) is connected to the neutron virtual > router, which in turns is connected to your provider network (your transfer > network) with IP 172.24.4.10, and hosted in a physical server with IP > 192.168.100.100, the route will be exposed as: > > - 10.0.0.5 nexthop 192.168.100.100 > > - 172.24.4.10 nexthop 192.168.100.100 > > > > As we are using FRR config "redistributed connected". As the traffic to > the tenant networks needs to be injected into the OVN overlay through the > gateway node hosting that ovn virtual router gateway port (cr-lrp), would > it be ok if, besides those route we also advertise? > > - 10.0.0.5 nexthop 172.24.4.10 > > > > Cheers, > > Luis > > > > > > > > > > > > > > > > -- > Ihtisham ul Haq > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r > die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht > der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich > in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie > hier > >. > > > > -- > > LUIS TOM?S BOL?VAR > Principal Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r > die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht > der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich > in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie > hier . > -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Tue Oct 18 10:38:32 2022 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Tue, 18 Oct 2022 12:38:32 +0200 Subject: [kolla] Monasca removal, Kafka and Zookeeper deprecation/removal Message-ID: Hello, As Kolla/Kolla-Ansible projects are implementing switch to OpenSearch - it became obvious to remove Elasticsearch (which is EOL in Kolla-provided version) and orchestrate a direct upgrade into OpenSearch. Part of this work is removal of elasticsearch and log stash Kolla container images and elasticsearch role in Kolla-Ansible. Since Monasca is relying on Elasticsearch and Logstash - and there are no contributors willing to make it work with OpenSearch inside Kolla/Kolla-Ansible - we decided to drop Monasca support in Zed release (especially that the CI jobs have been long failing). One of the Monasca dependencies is Kafka (and ZooKeeper) - is there anybody using it outside of Monasca deployment in Kolla/Kolla-Ansible? If in 4 weeks there will be no answers to this question - we will also remove Kafka and ZooKeeper container images in Kolla (and their respective roles in Kolla-Ansible). Thanks Michal From nguyenhuukhoinw at gmail.com Tue Oct 18 00:19:36 2022 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Tue, 18 Oct 2022 07:19:36 +0700 Subject: Openstack cluster cannot create instances when 1 of 3 rabbitmq cluster node down Message-ID: Description =========== I set up 3 controllers and 3 compute nodes. My system cannot work well when 1 rabbit node in cluster rabbitmq is down, cannot launch instances. It stucked at scheduling. Steps to reproduce =========== Openstack nodes point rabbit://node1:5672,node2:5672,node3:5672// * Reboot 1 of 3 rabbitmq node. * Create instances then it stucked at scheduling. Workaround =========== Point to rabbitmq VIP address. But We cannot share the load with this solution. Please give me some suggestions. Thank you very much. I did google and enabled system log's debug but I still cannot understand why. Nguyen Huu Khoi -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Tue Oct 18 13:35:49 2022 From: eblock at nde.ag (Eugen Block) Date: Tue, 18 Oct 2022 13:35:49 +0000 Subject: Openstack cluster cannot create instances when 1 of 3 rabbitmq cluster node down In-Reply-To: Message-ID: <20221018133549.Horde.J0szHtAQhBKfnfurK84DswO@webmail.nde.ag> Are the remaining two nodes still member of a cluster? Can you share 'rabbitmqctl cluster_status' from both nodes while the third is down? How did you deploy openstack? Zitat von Nguy?n H?u Kh?i : > Description > =========== > I set up 3 controllers and 3 compute nodes. My system cannot work well when > 1 rabbit node in cluster rabbitmq is down, cannot launch instances. It > stucked at scheduling. > > Steps to reproduce > =========== > Openstack nodes point rabbit://node1:5672,node2:5672,node3:5672// > * Reboot 1 of 3 rabbitmq node. > * Create instances then it stucked at scheduling. > > Workaround > =========== > Point to rabbitmq VIP address. But We cannot share the load with this > solution. Please give me some suggestions. Thank you very much. > I did google and enabled system log's debug but I still cannot understand > why. > > Nguyen Huu Khoi From noonedeadpunk at gmail.com Tue Oct 18 15:22:29 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Tue, 18 Oct 2022 17:22:29 +0200 Subject: Openstack cluster cannot create instances when 1 of 3 rabbitmq cluster node down In-Reply-To: <20221018133549.Horde.J0szHtAQhBKfnfurK84DswO@webmail.nde.ag> References: <20221018133549.Horde.J0szHtAQhBKfnfurK84DswO@webmail.nde.ag> Message-ID: I have faced that with quite old rabbitmq versions, like 3.7 As a workaround ha_queues worked nicely for me: https://www.rabbitmq.com/ha.html#mirroring-arguments ??, 18 ???. 2022 ?., 15:45 Eugen Block : > Are the remaining two nodes still member of a cluster? Can you share > 'rabbitmqctl cluster_status' from both nodes while the third is down? > How did you deploy openstack? > > Zitat von Nguy?n H?u Kh?i : > > > Description > > =========== > > I set up 3 controllers and 3 compute nodes. My system cannot work well > when > > 1 rabbit node in cluster rabbitmq is down, cannot launch instances. It > > stucked at scheduling. > > > > Steps to reproduce > > =========== > > Openstack nodes point rabbit://node1:5672,node2:5672,node3:5672// > > * Reboot 1 of 3 rabbitmq node. > > * Create instances then it stucked at scheduling. > > > > Workaround > > =========== > > Point to rabbitmq VIP address. But We cannot share the load with this > > solution. Please give me some suggestions. Thank you very much. > > I did google and enabled system log's debug but I still cannot understand > > why. > > > > Nguyen Huu Khoi > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luca.Czesla at mail.schwarz Tue Oct 18 15:06:06 2022 From: Luca.Czesla at mail.schwarz (Luca Czesla) Date: Tue, 18 Oct 2022 15:06:06 +0000 Subject: [ovn][neutron] RE: OVN BGP Agent query In-Reply-To: References: Message-ID: Hey Luis, thanks a lot for the great feedback. See inline reply please. Best regards, Luca Czesla From: Luis Tomas Bolivar Sent: Tuesday, October 18, 2022 9:54 AM To: Luca Czesla Cc: Daniel Alvarez ; Max Andr? Lamprecht ; openstack-discuss ; Ihtisham ul Haq Subject: Re: [ovn][neutron] RE: OVN BGP Agent query Hello Luca! Awesome patch! See some comments below, though I'll also add some more details on the gerrit side On Mon, Oct 17, 2022 at 11:12 AM Luca Czesla > wrote: Hey Luis, Thanks for your mail. We have now prepared a first draft. In addition to what Ihtisham already wrote, we need the following options: - to run multiple agents per host (container) because we need more than one BGP session, alternatively there would be the option to do it via frr Nice, this could actually be splitted from the patch into a different one, as it is kind of independent of the new driver, and can be used for other use cases (and merge it almost right away) Good idea. I opened a separate merge request here: https://review.opendev.org/c/x/ovn-bgp-agent/+/861749 - we need to be able to filter which networks we announce where, for this we used the address scope from Openstack in the past To make this possible we have built it in so that it runs through the address scope. To make the address scope usable in ovn-bgp-agent we also patched Neutron so that the address scope is part of router_gateway and router_interface patch-ports. We also announce the networks behind the routers instead of the VM IPs directly. We actually play with something similar to be able to have a simple/initial API here: https://review.opendev.org/c/openstack/neutron/+/797087 (idea was to use port tags instead) Do you have a link to the modifications to enable this in the neutron side? I just opened the MR at Neutron. It's pretty much what you do with the port_tags as well. You can find it here: https://review.opendev.org/c/openstack/neutron/+/861719 Is there an advantage of using "port_tags" at this point, or is that simply a catch-all term for metadata? I am fine with both solutions and have no preferences there. We added a new driver for this because we don't really need anything from the previous implementation at the moment and we were missing the similarities between the two. Possibly someone has an idea how we could merge this? Is the new expose_router_port/withdraw_router_port similar to the previous expose_subnet/withdraw_subnet? Perhaps we just need a different implementation of those? I am not sure if it is the right way. I had problems for example with the withdraw_subnet that I could not get the corresponding router port from the lrp port because it was already deleted, so I was missing the reference to the address scope. So if you have a better way to get to the target, I will gladly accept it. The function is similar only that I didn't want to use the lrp ports for the reason above and we need the update event since initial row.mac is unknown and will only be set to router with the next revision number. I am not sure how we could merge this but I am of course open for ideas. Another idea would be to have another separate watcher that implements only these two events. Maybe that would be the cleanest way? I was actually playing (after initial discussion with Ihtisha) for options to not only expose VM_IP via HOST_IP, but doing it VM_IP though OVN_GATEWAY_PORT_IP, and then GATEWAY_PORT_IP via HOST_IP. I was playing with "redistribute static" instead of "redistributed kernel" + IP routes (I think I like your approach better). My idea was to have the existing driver with an option to state if the env is pure L3 (the current support) or L2 (where you work is needed). That is another option to explore, but open to having completely different drivers, as it also makes a lot of sense. It would be nice if you could have a first look at it. There is still some work to do like adding tests and making Zuul happy but I think it is useful to discuss this with you as early as possible. Yeah, no problem, we'll make zull happy after the initial design/idea is discussed. Don't need to waste time on that at this moment You can find the merge request here: https://review.opendev.org/c/x/ovn-bgp-agent/+/861581 Thanks! I'll leave some more detailed review in there Thank you very much. I will try to work on it tomorrow. There is one more general question on that patch. Isn't that exposing the routes in all the nodes where the agent is running instead of where the actual ovn router gateway port is attached? Don't you need to check is the node where you can actually inject the traffic to OVN overlay? Or you are assuming everything is L2 at that point, and it does not matter everyone expose the route as at the end of the day, only the node with the ovn gateway router port will reply to ARPs Yes exactly, we assume that the respective router IPs/Networks are properly filtered via the address scope and that the BGP peers are in the same L2 network as the provider network of the routers. The patch is really only about announcing the tenant networks via the correct router IP. We are not interested in where this IP is located, this task can be done by ARP in the respective L2 network. So everything can be routed locally in the L2 via ARP lookups and every BGP agent that has a foot in the network can announce the networks identically. So in the end we have N times the same route which only differ from whom it was received. Announcing the complete networks makes sense for us as we have less BGP updates and the routers can drop the traffic. The advantage of this is that L3 failover is instant, since no new BGP route needs to be announced. The switching of the router IP to another gateway/chassis is then done by OVN using GARP. Best regards, Luca Czesla From: Luis Tomas Bolivar > Sent: Monday, October 3, 2022 7:45 AM To: Ihtisham ul Haq > Cc: Daniel Alvarez >; Luca Czesla >; Max Andr? Lamprecht >; openstack-discuss > Subject: Re: [ovn][neutron] RE: OVN BGP Agent query On Fri, Sep 30, 2022 at 6:20 PM Ihtisham ul Haq > wrote: Hi Luis and Daniel, Please see inline response. > From: Daniel Alvarez Sanchez > > Sent: 29 September 2022 11:37 > Subject: Re: OVN BGP Agent query > > Hi Ihtisham and Luis, > > On Thu, Sep 29, 2022 at 7:42 AM Luis Tomas Bolivar > wrote: > > Some comments and questions inline > > > > On Tue, Sep 27, 2022 at 1:39 PM Ihtisham ul haq > wrote: > > > Hi Luis, > > > > > > Thanks for your work on the OVN BGP Agent. We are planning > > > to use it in our OVN deployment, but have a question regarding it. > > > > Great to hear! Can you share a bit more info about this environment? like > > openstack version, target workload, etc. We plan to use this with Yoga version. Our workload consist of enterprise users with VMs running on Openstack and connected to their enterprise network via transfer network(to which the customer neutron router is attached to). And we also have public workload but with the ovn-bgp we only want to we want to advertise the former. > > > > > The way our current setup with ML2/OVS works is that our customer VM IP routes > > > are announced via the router IP(of the that customer) to the leaf switch instead of > > > the IP of the host where the neutron BGP agent runs. And then even if the > > > router fails over, the IP of the router stays the same and thus the BGP route > > > doesn't need to be updated. > > > > Is this with Neutron Dynamic Routing? When you say Router IP, do you mean the virtual neutron router and its IP associated with the provider network? What type of IPs are you announcing with BGP? IPs on provider network or on tenant networks (or both)? Yes, that's with Neutron DR agent, and I meant virtual neutron router with IP from the provider network. We announce IPs of our tenant network via the virtual routers external address. > > If the router fails over, the route needs to be updated, doesn't it? Same IP, but exposed in the new location of the router? Correct. > The route to the tenant network doesn't change, ie. > 192.168.0.0 via 172.24.4.100 (this route remains the same regardless of where 172.24.4.100 is). > If there's L2 in the 172.24.4.0 network, the new location of 172.24.4.100 will be learnt via GARP announcement. In our case, this won't happen as we don't have L2 so we expose directly connected routes to overcome this "limitation". Right, in our case we have a stretched L2 transfer network(mentioned above) to which our gateway nodes and customer routers are connected to, so we can advertise the IPs from the tenant network via the virtual router external IP and thus the location of the router isn't relevant in case of failover as its address will be relearned. > In the case of Neutron Dynamic Routing, there's no assumption that everything is L3 so GARPs are needed to learn the new location. > > > > We see that the routes are announced by the ovn-bgp-agent via the host IP(GTW) in our > > > switch peers. If that's the case then how do you make sure that during failover > > > of a router, the BGP routes gets updated with the new host IP(where the router > > > failed over to)? > > > > The local FRR running at each node is in charge of exposing the IPs. For the IPs on the provider network, the traffic is directly exposed where the VMs are, without having to go through the virtual router, so a router failover won't change the route. > > In the case of VMs on tenant networks, the traffic is exposed on the node where the virtual router gateway port is associated (I suppose this is what you refer to with router IP). In the case of a failover the agent is in charge of making FRR to withdraw the exposed routes on the old node, and re-advertise them on the new router IP location > > > > > Can we accomplish the same route advertisement as our ML2/OVS setup, using the ovn-bgp-agent? > > I think this is technically possible, and perhaps you want to contribute that functionality or even help integrating the agent as a driver of Neutron Dynamic Routing? Sounds good, our plan currently is to add this to the ovn-bgp-agent, so we can announce our tenant routes via virtual routers external address on a stretched L2 network, to make it work with our use case. Great to hear!! Just to make it clear, the ovn-bgp-agent current solution is to expose the tenant VM IPs through the host that has the OVN router gateway port, so for example, if the VM IP (10.0.0.5) is connected to the neutron virtual router, which in turns is connected to your provider network (your transfer network) with IP 172.24.4.10, and hosted in a physical server with IP 192.168.100.100, the route will be exposed as: - 10.0.0.5 nexthop 192.168.100.100 - 172.24.4.10 nexthop 192.168.100.100 As we are using FRR config "redistributed connected". As the traffic to the tenant networks needs to be injected into the OVN overlay through the gateway node hosting that ovn virtual router gateway port (cr-lrp), would it be ok if, besides those route we also advertise? - 10.0.0.5 nexthop 172.24.4.10 Cheers, Luis -- Ihtisham ul Haq Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier>. -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltomasbo at redhat.com Tue Oct 18 17:18:57 2022 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Tue, 18 Oct 2022 19:18:57 +0200 Subject: [ovn][neutron] RE: OVN BGP Agent query In-Reply-To: References: Message-ID: On Tue, Oct 18, 2022 at 5:06 PM Luca Czesla wrote: > Hey Luis, > > > > thanks a lot for the great feedback. > > > > See inline reply please. > > > > Best regards, > > Luca Czesla > > > > > > *From:* Luis Tomas Bolivar > *Sent:* Tuesday, October 18, 2022 9:54 AM > *To:* Luca Czesla > *Cc:* Daniel Alvarez ; Max Andr? Lamprecht > ; openstack-discuss < > openstack-discuss at lists.openstack.org>; Ihtisham ul Haq > > *Subject:* Re: [ovn][neutron] RE: OVN BGP Agent query > > > > Hello Luca! > > > > Awesome patch! See some comments below, though I'll also add some more > details on the gerrit side > > > > On Mon, Oct 17, 2022 at 11:12 AM Luca Czesla > wrote: > > Hey Luis, > > > > Thanks for your mail. > > > > We have now prepared a first draft. In addition to what Ihtisham already > wrote, we need the following options: > > - to run multiple agents per host (container) because we need more than > one BGP session, alternatively there would be the option to do it via frr > > > > Nice, this could actually be splitted from the patch into a different one, > as it is kind of independent of the new driver, and can be used for other > use cases (and merge it almost right away) > > > > > > Good idea. I opened a separate merge request here: > https://review.opendev.org/c/x/ovn-bgp-agent/+/861749 > Great! Thanks! > > > - we need to be able to filter which networks we announce where, for this > we used the address scope from Openstack in the past > > > > To make this possible we have built it in so that it runs through the > address scope. To make the address scope usable in ovn-bgp-agent we also > patched Neutron so that the address scope is part of router_gateway and > router_interface patch-ports. We also announce the networks behind the > routers instead of the VM IPs directly. > > > > We actually play with something similar to be able to have a > simple/initial API here: > https://review.opendev.org/c/openstack/neutron/+/797087 > > (idea was to use port tags instead) > > > > Do you have a link to the modifications to enable this in the neutron side? > > > > > > I just opened the MR at Neutron. It's pretty much what you do with the > port_tags as well. You can find it here: > https://review.opendev.org/c/openstack/neutron/+/861719 > > Is there an advantage of using "port_tags" at this point, or is that > simply a catch-all term for metadata? I am fine with both solutions and > have no preferences there. > > > No, there is no advantage of using port_tags, in fact after discussing it with the neutron community it was stated that that was not included on purpose as it is a field for layers that work on top of neutron, not below (OVN). So, I think your approach makes more sense. > > We added a new driver for this because we don't really need anything from > the previous implementation at the moment and we were missing the > similarities between the two. Possibly someone has an idea how we could > merge this? > > > > Is the new expose_router_port/withdraw_router_port similar to the previous > expose_subnet/withdraw_subnet? Perhaps we just need a different > implementation of those? > > > > > > I am not sure if it is the right way. I had problems for example with the > withdraw_subnet that I could not get the corresponding router port from the > lrp port because it was already deleted, so I was missing the reference to > the address scope. So if you have a better way to get to the target, I will > gladly accept it. > Right, I think that was the reason why I used an in-memory dict for that (ovn_local_cr_lrps and ovn_local_lrps) > > > The function is similar only that I didn't want to use the lrp ports for > the reason above and we need the update event since initial row.mac is > unknown and will only be set to router with the next revision number. I am > not sure how we could merge this but I am of course open for ideas. > > Another idea would be to have another separate watcher that implements > only these two events. Maybe that would be the cleanest way? > Umm, ok, probably I'm missing something but I did not have problems with row.mac for the SubnetRouterAttachedEvent, but maybe this is related to the withdraw issue. > > > > > I was actually playing (after initial discussion with Ihtisha) for options > to not only expose VM_IP via HOST_IP, but doing it VM_IP though > OVN_GATEWAY_PORT_IP, and then GATEWAY_PORT_IP via HOST_IP. I was playing > with "redistribute static" instead of "redistributed kernel" + IP routes (I > think I like your approach better). > > > > My idea was to have the existing driver with an option to state if the env > is pure L3 (the current support) or L2 (where you work is needed). That is > another option to explore, but open to having completely different drivers, > as it also makes a lot of sense. > > > > It would be nice if you could have a first look at it. > > > > There is still some work to do like adding tests and making Zuul happy but > I think it is useful to discuss this with you as early as possible. > > > > Yeah, no problem, we'll make zull happy after the initial design/idea is > discussed. Don't need to waste time on that at this moment > > > > You can find the merge request here: > https://review.opendev.org/c/x/ovn-bgp-agent/+/861581 > > > > > Thanks! I'll leave some more detailed review in there > > > > > > Thank you very much. I will try to work on it tomorrow. > > > > > > There is one more general question on that patch. Isn't that exposing the > routes in all the nodes where the agent is running instead of where the > actual ovn router gateway port is attached? Don't you need to check is the > node where you can actually inject the traffic to OVN overlay? Or you are > assuming everything is L2 at that point, and it does not matter everyone > expose the route as at the end of the day, only the node with the ovn > gateway router port will reply to ARPs > > > > > > Yes exactly, we assume that the respective router IPs/Networks are > properly filtered via the address scope and that the BGP peers are in the > same L2 network as the provider network of the routers. The patch is really > only about announcing the tenant networks via the correct router IP. We are > not interested in where this IP is located, this task can be done by ARP in > the respective L2 network. So everything can be routed locally in the L2 > via ARP lookups and every BGP agent that has a foot in the network can > announce the networks identically. So in the end we have N times the same > route which only differ from whom it was received. Announcing the complete > networks makes sense for us as we have less BGP updates and the routers can > drop the traffic. The advantage of this is that L3 failover is instant, > since no new BGP route needs to be announced. The switching of the router > IP to another gateway/chassis is then done by OVN using GARP. > Agree! It won't work for pure L3 domains, but works with L2/ARP as you mention. Thanks for confirming! Cheers, Luis > > > > > > Best regards, > > Luca Czesla > > > > *From:* Luis Tomas Bolivar > *Sent:* Monday, October 3, 2022 7:45 AM > *To:* Ihtisham ul Haq > *Cc:* Daniel Alvarez ; Luca Czesla < > Luca.Czesla at mail.schwarz>; Max Andr? Lamprecht ; > openstack-discuss > *Subject:* Re: [ovn][neutron] RE: OVN BGP Agent query > > > > > > > > On Fri, Sep 30, 2022 at 6:20 PM Ihtisham ul Haq < > Ihtisham.ul_Haq at mail.schwarz> wrote: > > Hi Luis and Daniel, > > Please see inline response. > > > From: Daniel Alvarez Sanchez > > Sent: 29 September 2022 11:37 > > Subject: Re: OVN BGP Agent query > > > > Hi Ihtisham and Luis, > > > > On Thu, Sep 29, 2022 at 7:42 AM Luis Tomas Bolivar > wrote: > > > Some comments and questions inline > > > > > > On Tue, Sep 27, 2022 at 1:39 PM Ihtisham ul haq < > ihtisham.uh at hotmail.com> wrote: > > > > Hi Luis, > > > > > > > > Thanks for your work on the OVN BGP Agent. We are planning > > > > to use it in our OVN deployment, but have a question regarding it. > > > > > > Great to hear! Can you share a bit more info about this environment? > like > > > openstack version, target workload, etc. > > We plan to use this with Yoga version. Our workload consist of enterprise > users > with VMs running on Openstack and connected to their enterprise network via > transfer network(to which the customer neutron router is attached to). > And we also have public workload but with the ovn-bgp we only want to > we want to advertise the former. > > > > > > > > The way our current setup with ML2/OVS works is that our customer VM > IP routes > > > > are announced via the router IP(of the that customer) to the leaf > switch instead of > > > > the IP of the host where the neutron BGP agent runs. And then even > if the > > > > router fails over, the IP of the router stays the same and thus the > BGP route > > > > doesn't need to be updated. > > > > > > Is this with Neutron Dynamic Routing? When you say Router IP, do you > mean the virtual neutron router and its IP associated with the provider > network? What type of IPs are you announcing with BGP? IPs on provider > network or on tenant networks (or both)? > > Yes, that's with Neutron DR agent, and I meant virtual neutron router with > IP from the provider network. We announce IPs of our tenant network via the > virtual routers external address. > > > > If the router fails over, the route needs to be updated, doesn't it? > Same IP, but exposed in the new location of the router? > > Correct. > > > The route to the tenant network doesn't change, ie. > > 192.168.0.0 via 172.24.4.100 (this route remains the same regardless of > where 172.24.4.100 is). > > If there's L2 in the 172.24.4.0 network, the new location of > 172.24.4.100 will be learnt via GARP announcement. In our case, this won't > happen as we don't have L2 so we expose directly connected routes to > overcome this "limitation". > > Right, in our case we have a stretched L2 transfer network(mentioned above) > to which our gateway nodes and customer routers are connected to, so we can > advertise the IPs from the tenant network via the virtual router external > IP > and thus the location of the router isn't relevant in case of failover as > its > address will be relearned. > > > In the case of Neutron Dynamic Routing, there's no assumption that > everything is L3 so GARPs are needed to learn the new location. > > > > > > We see that the routes are announced by the ovn-bgp-agent via the > host IP(GTW) in our > > > > switch peers. If that's the case then how do you make sure that > during failover > > > > of a router, the BGP routes gets updated with the new host IP(where > the router > > > > failed over to)? > > > > > > The local FRR running at each node is in charge of exposing the IPs. > For the IPs on the provider network, the traffic is directly exposed where > the VMs are, without having to go through the virtual router, so a router > failover won't change the route. > > > In the case of VMs on tenant networks, the traffic is exposed on the > node where the virtual router gateway port is associated (I suppose this is > what you refer to with router IP). In the case of a failover the agent is > in charge of making FRR to withdraw the exposed routes on the old node, and > re-advertise them on the new router IP location > > > > > > > Can we accomplish the same route advertisement as our ML2/OVS setup, > using the ovn-bgp-agent? > > > > I think this is technically possible, and perhaps you want to contribute > that functionality or even help integrating the agent as a driver of > Neutron Dynamic Routing? > > Sounds good, our plan currently is to add this to the ovn-bgp-agent, > so we can announce our tenant routes via virtual routers external address > on > a stretched L2 network, to make it work with our use case. > > > > Great to hear!! > > > > Just to make it clear, the ovn-bgp-agent current solution is to expose the > tenant VM IPs through the host that has the OVN router gateway port, so for > example, if the VM IP (10.0.0.5) is connected to the neutron virtual > router, which in turns is connected to your provider network (your transfer > network) with IP 172.24.4.10, and hosted in a physical server with IP > 192.168.100.100, the route will be exposed as: > > - 10.0.0.5 nexthop 192.168.100.100 > > - 172.24.4.10 nexthop 192.168.100.100 > > > > As we are using FRR config "redistributed connected". As the traffic to > the tenant networks needs to be injected into the OVN overlay through the > gateway node hosting that ovn virtual router gateway port (cr-lrp), would > it be ok if, besides those route we also advertise? > > - 10.0.0.5 nexthop 172.24.4.10 > > > > Cheers, > > Luis > > > > > > > > > > > > > > > > -- > Ihtisham ul Haq > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r > die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht > der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich > in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie > hier > >. > > > > -- > > LUIS TOM?S BOL?VAR > Principal Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r > die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht > der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich > in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie > hier > > . > > > > -- > > LUIS TOM?S BOL?VAR > Principal Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > > > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r > die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht > der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich > in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie > hier . > -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Tue Oct 18 18:01:04 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Tue, 18 Oct 2022 15:01:04 -0300 Subject: [cloudkitty] Another core team cleanup In-Reply-To: References: Message-ID: Clean up done. I also found these two others. What do you guys think? Should we clean those as well? openstack-ansible-os_cloudkitty-core = michael at michaelrice.org cloudkitty-release = sheeprine at nullplace.com On Mon, Oct 3, 2022 at 3:02 PM Rafael Weing?rtner < rafaelweingartner at gmail.com> wrote: > I guess it is fine as they are not participating in the project anymore, > and this has been a constant for the past two years or so. > > On Mon, Oct 3, 2022 at 1:26 PM Pierre Riteau wrote: > >> Hello, >> >> Almost exactly two years since the last core team cleanup [1], it's >> probably time to have another one. I don't think we have heard from these >> contributors in the last couple of years: >> >> Justin Ferrieu jferrieu at objectif-libre.com >> Luis Ramirez luis.ramirez at opencloud.es >> Luka Peschke mail at lukapeschke.com >> Maxime Cottret maxime.cottret at gmail.com >> St?phane Albert sheeprine at nullplace.com >> Jeremy Liu liuj285 at chinaunicom.cn >> >> Is everyone okay with removing cloudkitty-core membership for these users? >> >> Cheers, >> Pierre Riteau (priteau) >> >> [1] >> https://lists.openstack.org/pipermail/openstack-discuss/2020-October/017751.html >> > > > -- > Rafael Weing?rtner > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Tue Oct 18 19:43:55 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Tue, 18 Oct 2022 21:43:55 +0200 Subject: [cloudkitty] Another core team cleanup In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 at 20:01, Rafael Weing?rtner < rafaelweingartner at gmail.com> wrote: > Clean up done. > > I also found these two others. What do you guys think? Should we clean > those as well? > > openstack-ansible-os_cloudkitty-core = michael at michaelrice.org > This one is for openstack/openstack-ansible-os_cloudkitty. That's not for us to manage. > cloudkitty-release = sheeprine at nullplace.com > I don't think this one is in use, since cloudkitty ships releases through openstack/releases. Maybe the release team can advise. > On Mon, Oct 3, 2022 at 3:02 PM Rafael Weing?rtner < > rafaelweingartner at gmail.com> wrote: > >> I guess it is fine as they are not participating in the project anymore, >> and this has been a constant for the past two years or so. >> >> On Mon, Oct 3, 2022 at 1:26 PM Pierre Riteau wrote: >> >>> Hello, >>> >>> Almost exactly two years since the last core team cleanup [1], it's >>> probably time to have another one. I don't think we have heard from these >>> contributors in the last couple of years: >>> >>> Justin Ferrieu jferrieu at objectif-libre.com >>> Luis Ramirez luis.ramirez at opencloud.es >>> Luka Peschke mail at lukapeschke.com >>> Maxime Cottret maxime.cottret at gmail.com >>> St?phane Albert sheeprine at nullplace.com >>> Jeremy Liu liuj285 at chinaunicom.cn >>> >>> Is everyone okay with removing cloudkitty-core membership for these >>> users? >>> >>> Cheers, >>> Pierre Riteau (priteau) >>> >>> [1] >>> https://lists.openstack.org/pipermail/openstack-discuss/2020-October/017751.html >>> >> >> >> -- >> Rafael Weing?rtner >> > > > -- > Rafael Weing?rtner > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Wed Oct 19 09:44:13 2022 From: eblock at nde.ag (Eugen Block) Date: Wed, 19 Oct 2022 09:44:13 +0000 Subject: DC DR Setup Queries In-Reply-To: References: Message-ID: <20221019094413.Horde.sqgw7Q4bC5MdItC-Id9dSqU@webmail.nde.ag> Hi, your questions are quite complex, I don't know if anyone can give you defintive answers since there are so many requirements involved and careful planning is key. Maybe you could start with [1] to get an overview how something like that can be done. One scenario is basically to have two ceph clusters and replicate the RBDs from primary A to secondary site B. In case site A goes down you can promote the rbd images in B to be primary and your openstack should continue to work. The video also covers multiple openstack clusters which have the same data (and also an rbd mirror between the two ceph clusters), check it out, maybe a couple of your questions are covered. Regards, Eugen [1] https://www.openstack.org/videos/summits/austin-2016/protecting-the-galaxy-multi-region-disaster-recovery-with-openstack-and-ceph Zitat von KK CHN : > few more points to clarify... > 1. The VMs in a DataCentre must auto migrated to the DR setup. ( Also the > updates / writes to VMs in production in DC should be reflected to the > DR copies of VMs like incremental backups. How to achieve this / or what > needs to be employed to achieve this requirement. > > ( What all are the S/W and H/W requirements to achieve the above setup > if both DC and DR is planned use OpenStack latest version/s > Wallaby/ Xena/ yoga ? ) > > 2. In the above setups once the DC down or stopped for maintenance, How > the IP addresses of each VMs is managed automatically to make all the > (application/database server) VMs up and running in DR in case DC down . > Is this can be automated? how? > > Eg: When a VM say X is running in DC it may have an IP 10.10.0. X > when it is replicated in DR then it will be with the same IP > address right (10.0.0.2) ? But DR Network may be different and cannot > have the same IP address as DC right ? Do we need to manually set an > IP (Say 10.20.0.X )for each VM which is going to run from the DR site ? > Then what about the firewall rules in DR do we need to manipulate for each > VM for making the DR up ? Is there a way to automate this ? > > OR what the automatic mechanism to handle this IP setting up issue ? How > normally folks manage this scenario ? > > Also once your DC site recovered, then We need to Fail back to the DC site > from DR with all changes happened to the VMs in DR must be reflected back > to the DC site and Fail back.. .How to achieve this ? > > Kindly shed some light on this with your experience and expertise. > > What to do ? Where to start ? Which approach to follow to set up a Best > failover DC to DR and Failback solution. > > Thank you, > Krish > > On Tue, Oct 11, 2022 at 5:38 PM KK CHN wrote: > >> List, >> >> We are having a client DC running on HP( HP simplivity) HCI servers, >> With VMware ( Vsphere 7.0) only few VMs running on it. (11 VMs maximum all >> Linux VMs). >> >> The DR site also having the same HCI setup another location. ( The VMs are >> replicated to DR site with HP simplivity). >> >> We are planning to use Openstack for both DC and DR solutions with Wallaby >> or Xena version with KVM as hypervisor to replace the proprietary S/W and >> H/W vendor locking. >> >> The requirement is to Setup a Stable DC- DR solution. >> >> Totally confused How to setup a best Dc- DR solution for this purpose. >> >> The DR setup can be possible / advisable with Zero down time ?( or manual >> DR site uping with downtime of hours ) ? >> >> What are the available/suggested DC-DR replication mechanisms for high >> degree of application data protection and service availability? >> >> Kindly advise.. >> >> Thanks in advance, >> Krish >> From manchandavishal143 at gmail.com Wed Oct 19 11:44:40 2022 From: manchandavishal143 at gmail.com (vishal manchanda) Date: Wed, 19 Oct 2022 17:14:40 +0530 Subject: [horizon] Cancelling Two Weekly Meeting Message-ID: Hello Team, As discussed in Yesterday's PTG meeting, there will be no horizon weekly meeting for today and next week. The next horizon weekly meeting will be on 2nd November. Thanks & Regards, Vishal Manchanda -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony at bakeyournoodle.com Wed Oct 19 15:56:25 2022 From: tony at bakeyournoodle.com (Tony Breeds) Date: Wed, 19 Oct 2022 10:56:25 -0500 Subject: [dev][infra][tact-sig] Updating Zuul's default-ansible-version to 6 In-Reply-To: <20221005160640.buu6aevydtkgs4ly@yuggoth.org> References: <20221005160640.buu6aevydtkgs4ly@yuggoth.org> Message-ID: Thanks for the heads up, and thanks for the ever so gentle prod to join service-announce. /me didn't know about it On Wed, 5 Oct 2022 at 11:12, Jeremy Stanley wrote: > > Just a heads up for folks not following the OpenDev Collaboratory's > service-announce mailing list... now that Zed is officially > released, we'll be increasing the default Ansible version for Zuul > jobs from 5 to 6 in preparation for Zuul to drop Ansible 5 support > in coming weeks. See the full announcement here: > > https://lists.opendev.org/pipermail/service-announce/2022-October/000046.html > > -- > Jeremy Stanley -- Yours Tony. From michal.arbet at ultimum.io Wed Oct 19 16:28:04 2022 From: michal.arbet at ultimum.io (Michal Arbet) Date: Wed, 19 Oct 2022 18:28:04 +0200 Subject: Openstack cluster cannot create instances when 1 of 3 rabbitmq cluster node down In-Reply-To: References: <20221018133549.Horde.J0szHtAQhBKfnfurK84DswO@webmail.nde.ag> Message-ID: Hi, What is your parition_handling_strategy ? Michal Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Po???? 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet at ultimum.io *https://ultimum.io * LinkedIn | Twitter | Facebook ?t 18. 10. 2022 v 17:34 odes?latel Dmitriy Rabotyagov < noonedeadpunk at gmail.com> napsal: > I have faced that with quite old rabbitmq versions, like 3.7 > > As a workaround ha_queues worked nicely for me: > https://www.rabbitmq.com/ha.html#mirroring-arguments > > > ??, 18 ???. 2022 ?., 15:45 Eugen Block : > >> Are the remaining two nodes still member of a cluster? Can you share >> 'rabbitmqctl cluster_status' from both nodes while the third is down? >> How did you deploy openstack? >> >> Zitat von Nguy?n H?u Kh?i : >> >> > Description >> > =========== >> > I set up 3 controllers and 3 compute nodes. My system cannot work well >> when >> > 1 rabbit node in cluster rabbitmq is down, cannot launch instances. It >> > stucked at scheduling. >> > >> > Steps to reproduce >> > =========== >> > Openstack nodes point rabbit://node1:5672,node2:5672,node3:5672// >> > * Reboot 1 of 3 rabbitmq node. >> > * Create instances then it stucked at scheduling. >> > >> > Workaround >> > =========== >> > Point to rabbitmq VIP address. But We cannot share the load with this >> > solution. Please give me some suggestions. Thank you very much. >> > I did google and enabled system log's debug but I still cannot >> understand >> > why. >> > >> > Nguyen Huu Khoi >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From michal.arbet at ultimum.io Wed Oct 19 16:40:21 2022 From: michal.arbet at ultimum.io (Michal Arbet) Date: Wed, 19 Oct 2022 18:40:21 +0200 Subject: [kolla] single Network interface In-Reply-To: References: Message-ID: Hi, If I am correct this is not possible currently, but I remember I was working on a solution, but unfortunately I stopped at some point because kolla upstream didn't want to maintain. In attachment you can find patches for kolla and kolla-ansible and our idea. We added python script to kolla container and provide netplan style configuration by kolla-ansible ..so openvswitch starts and configured networking as it was set in configuration (if i remember ...it is quite long time....and of course it was not final version ...but if i remember it somehow worked). So, you can check it and maybe we can discuss this feature again :) Thanks, Kevko Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Po???? 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet at ultimum.io *https://ultimum.io * LinkedIn | Twitter | Facebook po 17. 10. 2022 v 19:24 odes?latel Parsa Aminian napsal: > Hello > I use kolla ansible wallaby version . > my compute node has only one port . is it possible to use this server ? as > I know openstack compute need 2 port one for management and other for > external user network . Im using provider_networks and it > seems neutron_external_interface could not be the same as network_interface > because openvswitch need to create br-ex bridge on separate port > is there any solution that i can config my compute with 1 port ? > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ovs_kolla Type: application/octet-stream Size: 5617 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ovs_kolla_ansible Type: application/octet-stream Size: 4882 bytes Desc: not available URL: From elod.illes at est.tech Wed Oct 19 17:10:01 2022 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Wed, 19 Oct 2022 17:10:01 +0000 Subject: [PTL][TC] library *feature* freeze at Milestone-2 Message-ID: Hi, During 'TC + Community leaders interaction' [1] a case was discussed, where a late library release caused last minute fire fighting in Zed cycle, and people discussed the possibility to introduce a (non-client) library *feature* freeze at Milestone-2 to avoid similar issues in the future. I've started to propose the possible schedule change [2] (note: it's not ready yet as it does not emphasize that at Milestone-2 we mean *feature* freeze for libraries, not "final library release"). The patch already got some reviews from library maintainers so I'm calling the attention to this change here on the ML. Thanks everyone for the responses in advance, El?d [1] https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030718.html [2] https://review.opendev.org/c/openstack/releases/+/861900 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Wed Oct 19 21:57:18 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Wed, 19 Oct 2022 14:57:18 -0700 Subject: [kolla] single Network interface In-Reply-To: References: Message-ID: <49f4b31f-d0c9-4b0a-be3c-70480f45f39e@app.fastmail.com> On Wed, Oct 19, 2022, at 9:40 AM, Michal Arbet wrote: > Hi, > > If I am correct this is not possible currently, but I remember I was > working on a solution, but unfortunately I stopped at some point > because kolla upstream didn't want to maintain. > > In attachment you can find patches for kolla and kolla-ansible and our idea. > > We added python script to kolla container and provide netplan style > configuration by kolla-ansible ..so openvswitch starts and configured > networking as it was set in configuration (if i remember ...it is quite > long time....and of course it was not final version ...but if i > remember it somehow worked). > > So, you can check it and maybe we can discuss this feature again :) > > Thanks, > Kevko > > > Michal Arbet > Openstack Engineer > > Ultimum Technologies a.s. > Na Po???? 1047/26, 11000 Praha 1 > Czech Republic > > +420 604 228 897 > michal.arbet at ultimum.io > _https://ultimum.io_ > > LinkedIn | > Twitter | Facebook > > > > po 17. 10. 2022 v 19:24 odes?latel Parsa Aminian > napsal: >> Hello >> I use kolla ansible wallaby version . >> my compute node has only one port . is it possible to use this server ? as I know openstack compute need 2 port one for management and other for external user network . Im using provider_networks and it seems neutron_external_interface could not be the same as network_interface because openvswitch need to create br-ex bridge on separate port >> is there any solution that i can config my compute with 1 port ? A very long time ago the OpenStack Infra Team ran the "Infracloud". This OpenStack installation ran on donated hardware and the instances there only had a single network port as well. To workaround this we ended up using vlan specific subinterfaces on the node so that logically we were presenting more than one interface to the OpenStack installation. I don't remember all the details but the now retired opendev/puppet-infracloud repo may have some clues: https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e > Attachments: > * ovs_kolla > * ovs_kolla_ansible From michal.arbet at ultimum.io Wed Oct 19 23:44:07 2022 From: michal.arbet at ultimum.io (Michal Arbet) Date: Thu, 20 Oct 2022 01:44:07 +0200 Subject: [kolla] single Network interface In-Reply-To: <49f4b31f-d0c9-4b0a-be3c-70480f45f39e@app.fastmail.com> References: <49f4b31f-d0c9-4b0a-be3c-70480f45f39e@app.fastmail.com> Message-ID: Hmm, But I think there is a problem with vlan - you need to setup it in OVS, don't you ? Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Po???? 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet at ultimum.io *https://ultimum.io * LinkedIn | Twitter | Facebook st 19. 10. 2022 v 23:57 odes?latel Clark Boylan napsal: > On Wed, Oct 19, 2022, at 9:40 AM, Michal Arbet wrote: > > Hi, > > > > If I am correct this is not possible currently, but I remember I was > > working on a solution, but unfortunately I stopped at some point > > because kolla upstream didn't want to maintain. > > > > In attachment you can find patches for kolla and kolla-ansible and our > idea. > > > > We added python script to kolla container and provide netplan style > > configuration by kolla-ansible ..so openvswitch starts and configured > > networking as it was set in configuration (if i remember ...it is quite > > long time....and of course it was not final version ...but if i > > remember it somehow worked). > > > > So, you can check it and maybe we can discuss this feature again :) > > > > Thanks, > > Kevko > > > > > > Michal Arbet > > Openstack Engineer > > > > Ultimum Technologies a.s. > > Na Po???? 1047/26, 11000 Praha 1 > > Czech Republic > > > > +420 604 228 897 > > michal.arbet at ultimum.io > > _https://ultimum.io_ > > > > LinkedIn | > > Twitter | Facebook > > > > > > > > po 17. 10. 2022 v 19:24 odes?latel Parsa Aminian > > napsal: > >> Hello > >> I use kolla ansible wallaby version . > >> my compute node has only one port . is it possible to use this server ? > as I know openstack compute need 2 port one for management and other for > external user network . Im using provider_networks and it seems > neutron_external_interface could not be the same as network_interface > because openvswitch need to create br-ex bridge on separate port > >> is there any solution that i can config my compute with 1 port ? > > A very long time ago the OpenStack Infra Team ran the "Infracloud". This > OpenStack installation ran on donated hardware and the instances there only > had a single network port as well. To workaround this we ended up using > vlan specific subinterfaces on the node so that logically we were > presenting more than one interface to the OpenStack installation. > > I don't remember all the details but the now retired > opendev/puppet-infracloud repo may have some clues: > https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e > > > Attachments: > > * ovs_kolla > > * ovs_kolla_ansible > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Wed Oct 19 23:50:58 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Wed, 19 Oct 2022 16:50:58 -0700 Subject: [kolla] single Network interface In-Reply-To: References: <49f4b31f-d0c9-4b0a-be3c-70480f45f39e@app.fastmail.com> Message-ID: On Wed, Oct 19, 2022, at 4:44 PM, Michal Arbet wrote: > Hmm, > > But I think there is a problem with vlan - you need to setup it in OVS, > don't you ? There was also a bridge and a veth pair involved: https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e/manifests/veth.pp Possibly to deal with this? Like I said its been a long time and I don't remember the details. I just know it was possible to solve at least at the time. Linux gives you a whole suite of virtual network components that you can throw together to workaround physical limitations like this. > > Michal Arbet > Openstack Engineer > > Ultimum Technologies a.s. > Na Po???? 1047/26, 11000 Praha 1 > Czech Republic > > +420 604 228 897 > michal.arbet at ultimum.io > _https://ultimum.io_ > > LinkedIn | > Twitter | Facebook > > > > st 19. 10. 2022 v 23:57 odes?latel Clark Boylan napsal: >> On Wed, Oct 19, 2022, at 9:40 AM, Michal Arbet wrote: >> > Hi, >> > >> > If I am correct this is not possible currently, but I remember I was >> > working on a solution, but unfortunately I stopped at some point >> > because kolla upstream didn't want to maintain. >> > >> > In attachment you can find patches for kolla and kolla-ansible and our idea. >> > >> > We added python script to kolla container and provide netplan style >> > configuration by kolla-ansible ..so openvswitch starts and configured >> > networking as it was set in configuration (if i remember ...it is quite >> > long time....and of course it was not final version ...but if i >> > remember it somehow worked). >> > >> > So, you can check it and maybe we can discuss this feature again :) >> > >> > Thanks, >> > Kevko >> > >> > >> > Michal Arbet >> > Openstack Engineer >> > >> > Ultimum Technologies a.s. >> > Na Po???? 1047/26, 11000 Praha 1 >> > Czech Republic >> > >> > +420 604 228 897 >> > michal.arbet at ultimum.io >> > _https://ultimum.io_ >> > >> > LinkedIn | >> > Twitter | Facebook >> > >> > >> > >> > po 17. 10. 2022 v 19:24 odes?latel Parsa Aminian >> > napsal: >> >> Hello >> >> I use kolla ansible wallaby version . >> >> my compute node has only one port . is it possible to use this server ? as I know openstack compute need 2 port one for management and other for external user network . Im using provider_networks and it seems neutron_external_interface could not be the same as network_interface because openvswitch need to create br-ex bridge on separate port >> >> is there any solution that i can config my compute with 1 port ? >> >> A very long time ago the OpenStack Infra Team ran the "Infracloud". This OpenStack installation ran on donated hardware and the instances there only had a single network port as well. To workaround this we ended up using vlan specific subinterfaces on the node so that logically we were presenting more than one interface to the OpenStack installation. >> >> I don't remember all the details but the now retired opendev/puppet-infracloud repo may have some clues: https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e >> >> > Attachments: >> > * ovs_kolla >> > * ovs_kolla_ansible From berndbausch at gmail.com Thu Oct 20 02:06:39 2022 From: berndbausch at gmail.com (Bernd Bausch) Date: Thu, 20 Oct 2022 11:06:39 +0900 Subject: [kolla] single Network interface In-Reply-To: References: <49f4b31f-d0c9-4b0a-be3c-70480f45f39e@app.fastmail.com> Message-ID: SInce you can easily have five to ten different networks in a cloud installation, e.g. networks dedicated to object storage, provider networks for Octavia, a network just for iSCSI etc, VLANs are (or used to be?) a common solution. See for example the (sadly, defunct) SUSE OpenStack cloud https://documentation.suse.com/soc/9/html/suse-openstack-cloud-crowbar-all/cha-deploy-poc.html#sec-depl-poc-vlans. On 2022/10/20 8:50 AM, Clark Boylan wrote: > On Wed, Oct 19, 2022, at 4:44 PM, Michal Arbet wrote: >> Hmm, >> >> But I think there is a problem with vlan - you need to setup it in OVS, >> don't you ? > There was also a bridge and a veth pair involved: https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e/manifests/veth.pp > > Possibly to deal with this? Like I said its been a long time and I don't remember the details. I just know it was possible to solve at least at the time. Linux gives you a whole suite of virtual network components that you can throw together to workaround physical limitations like this. > >> Michal Arbet >> Openstack Engineer >> >> Ultimum Technologies a.s. >> Na Po???? 1047/26, 11000 Praha 1 >> Czech Republic >> >> +420 604 228 897 >> michal.arbet at ultimum.io >> _https://ultimum.io_ >> >> LinkedIn | >> Twitter | Facebook >> >> >> >> st 19. 10. 2022 v 23:57 odes?latel Clark Boylan napsal: >>> On Wed, Oct 19, 2022, at 9:40 AM, Michal Arbet wrote: >>>> Hi, >>>> >>>> If I am correct this is not possible currently, but I remember I was >>>> working on a solution, but unfortunately I stopped at some point >>>> because kolla upstream didn't want to maintain. >>>> >>>> In attachment you can find patches for kolla and kolla-ansible and our idea. >>>> >>>> We added python script to kolla container and provide netplan style >>>> configuration by kolla-ansible ..so openvswitch starts and configured >>>> networking as it was set in configuration (if i remember ...it is quite >>>> long time....and of course it was not final version ...but if i >>>> remember it somehow worked). >>>> >>>> So, you can check it and maybe we can discuss this feature again :) >>>> >>>> Thanks, >>>> Kevko >>>> >>>> >>>> Michal Arbet >>>> Openstack Engineer >>>> >>>> Ultimum Technologies a.s. >>>> Na Po???? 1047/26, 11000 Praha 1 >>>> Czech Republic >>>> >>>> +420 604 228 897 >>>> michal.arbet at ultimum.io >>>> _https://ultimum.io_ >>>> >>>> LinkedIn | >>>> Twitter | Facebook >>>> >>>> >>>> >>>> po 17. 10. 2022 v 19:24 odes?latel Parsa Aminian >>>> napsal: >>>>> Hello >>>>> I use kolla ansible wallaby version . >>>>> my compute node has only one port . is it possible to use this server ? as I know openstack compute need 2 port one for management and other for external user network . Im using provider_networks and it seems neutron_external_interface could not be the same as network_interface because openvswitch need to create br-ex bridge on separate port >>>>> is there any solution that i can config my compute with 1 port ? >>> A very long time ago the OpenStack Infra Team ran the "Infracloud". This OpenStack installation ran on donated hardware and the instances there only had a single network port as well. To workaround this we ended up using vlan specific subinterfaces on the node so that logically we were presenting more than one interface to the OpenStack installation. >>> >>> I don't remember all the details but the now retired opendev/puppet-infracloud repo may have some clues: https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e >>> >>>> Attachments: >>>> * ovs_kolla >>>> * ovs_kolla_ansible From eblock at nde.ag Thu Oct 20 08:23:53 2022 From: eblock at nde.ag (Eugen Block) Date: Thu, 20 Oct 2022 08:23:53 +0000 Subject: [kolla] single Network interface In-Reply-To: References: <49f4b31f-d0c9-4b0a-be3c-70480f45f39e@app.fastmail.com> Message-ID: <20221020082353.Horde.6MWHgx32zrDcj67t4O2PAF6@webmail.nde.ag> Hi, we don't use kolla but our cloud runs with only one interface (actually two, it's a bond) on control and compute nodes. We use two different VLANs for openvswitch and the "management" network on that bond, and it works perfectly fine. I just wouldn't know how to handle that in kolla. > See for example the (sadly, defunct) SUSE OpenStack cloud > https://documentation.suse.com/soc/9/html/suse-openstack-cloud-crowbar-all/cha-deploy-poc.html#sec-depl-poc-vlans. Yeah, that made us sad, too. Zitat von Bernd Bausch : > SInce you can easily have five to ten different networks in a cloud > installation, e.g. networks dedicated to object storage, provider > networks for Octavia, a network just for iSCSI etc, VLANs are (or > used to be?) a common solution. See for example the (sadly, defunct) > SUSE OpenStack cloud > https://documentation.suse.com/soc/9/html/suse-openstack-cloud-crowbar-all/cha-deploy-poc.html#sec-depl-poc-vlans. > > On 2022/10/20 8:50 AM, Clark Boylan wrote: >> On Wed, Oct 19, 2022, at 4:44 PM, Michal Arbet wrote: >>> Hmm, >>> >>> But I think there is a problem with vlan - you need to setup it in OVS, >>> don't you ? >> There was also a bridge and a veth pair involved: >> https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e/manifests/veth.pp >> >> Possibly to deal with this? Like I said its been a long time and I >> don't remember the details. I just know it was possible to solve at >> least at the time. Linux gives you a whole suite of virtual network >> components that you can throw together to workaround physical >> limitations like this. >> >>> Michal Arbet >>> Openstack Engineer >>> >>> Ultimum Technologies a.s. >>> Na Po???? 1047/26, 11000 Praha 1 >>> Czech Republic >>> >>> +420 604 228 897 >>> michal.arbet at ultimum.io >>> _https://ultimum.io_ >>> >>> LinkedIn | >>> Twitter | Facebook >>> >>> >>> >>> st 19. 10. 2022 v 23:57 odes?latel Clark Boylan >>> napsal: >>>> On Wed, Oct 19, 2022, at 9:40 AM, Michal Arbet wrote: >>>>> Hi, >>>>> >>>>> If I am correct this is not possible currently, but I remember I was >>>>> working on a solution, but unfortunately I stopped at some point >>>>> because kolla upstream didn't want to maintain. >>>>> >>>>> In attachment you can find patches for kolla and kolla-ansible >>>>> and our idea. >>>>> >>>>> We added python script to kolla container and provide netplan style >>>>> configuration by kolla-ansible ..so openvswitch starts and configured >>>>> networking as it was set in configuration (if i remember ...it is quite >>>>> long time....and of course it was not final version ...but if i >>>>> remember it somehow worked). >>>>> >>>>> So, you can check it and maybe we can discuss this feature again :) >>>>> >>>>> Thanks, >>>>> Kevko >>>>> >>>>> >>>>> Michal Arbet >>>>> Openstack Engineer >>>>> >>>>> Ultimum Technologies a.s. >>>>> Na Po???? 1047/26, 11000 Praha 1 >>>>> Czech Republic >>>>> >>>>> +420 604 228 897 >>>>> michal.arbet at ultimum.io >>>>> _https://ultimum.io_ >>>>> >>>>> LinkedIn | >>>>> Twitter | Facebook >>>>> >>>>> >>>>> >>>>> po 17. 10. 2022 v 19:24 odes?latel Parsa Aminian >>>>> napsal: >>>>>> Hello >>>>>> I use kolla ansible wallaby version . >>>>>> my compute node has only one port . is it possible to use this >>>>>> server ? as I know openstack compute need 2 port one for >>>>>> management and other for external user network . Im using >>>>>> provider_networks and it seems neutron_external_interface could >>>>>> not be the same as network_interface because openvswitch need >>>>>> to create br-ex bridge on separate port >>>>>> is there any solution that i can config my compute with 1 port ? >>>> A very long time ago the OpenStack Infra Team ran the >>>> "Infracloud". This OpenStack installation ran on donated hardware >>>> and the instances there only had a single network port as well. >>>> To workaround this we ended up using vlan specific subinterfaces >>>> on the node so that logically we were presenting more than one >>>> interface to the OpenStack installation. >>>> >>>> I don't remember all the details but the now retired >>>> opendev/puppet-infracloud repo may have some clues: >>>> https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e >>>> >>>>> Attachments: >>>>> * ovs_kolla >>>>> * ovs_kolla_ansible From smooney at redhat.com Thu Oct 20 08:35:38 2022 From: smooney at redhat.com (Sean Mooney) Date: Thu, 20 Oct 2022 09:35:38 +0100 Subject: [kolla] single Network interface In-Reply-To: References: <49f4b31f-d0c9-4b0a-be3c-70480f45f39e@app.fastmail.com> Message-ID: I have not been following this too cloesly and sorry to top post but its possibel to deploy multi node openstack using a singel interface. i often do that with devstack and it shoudl be possibel to do with kolla. first if you do not need vlan/flat tenant networks and and geneve/vxlan with ml2/ovs or ml2/ovn is sufficent then the tunell endpoint ip can just be the manamgnet interface. when im deploying wiht devstack i just create a dumy interfaces and use that for neutorn so you shoudl be able to do that for kolla-ansible too just have a playbook that will create a dumy interface on all host and set that as the neutron_interface. in kolla all other interface are shared by defautl so its only the nuetorn_interface for the br-ex that need to be managed. this approch reqired yuo to asign the gateway ip for the external network to one of the contolers and configre that host in your router. the better approch whihc allows provider networks to work and avoids the need to asisng the gateway ip in a hacky way is use macvlan interfaces i dont thinki have an example of this form my home cloud any more since i have redpeloyed it but i previoulsy used to create macvlan sub interfaces to do this by hand you would do somehting like this sudo ip link add api link eth0 type macvlan mode bridge sudo ip link add ovs link eth0 type macvlan mode bridge sudo ip link add storage link eth0 type macvlan mode bridge sudo ifconfig api up sudo ifconfig ovs up sudo ifconfig storage up you can wrap that up into a systemd service file and have it run before the docker service. if your on ubuntu netplan does not support macvlans currently but you can do it the tradtional way or wiht systemd networkd Macvlan allows a single physical interface to have multiple mac and ip addresses. you can also do the same with a linux bridge but that is less then ideal in terms of performance. if your nic support sriov another good way to partion then nice is to use a VF in this case you just put a trivial udev rule to allocate them or use netplan https://netplan.io/examples its the final example. macvlan works if you dont have hardware supprot for sriov and sriov is a good option otherwise On Thu, 2022-10-20 at 11:06 +0900, Bernd Bausch wrote: > SInce you can easily have five to ten different networks in a cloud > installation, e.g. networks dedicated to object storage, provider > networks for Octavia, a network just for iSCSI etc, VLANs are (or used > to be?) a common solution. See for example the (sadly, defunct) SUSE > OpenStack cloud > https://documentation.suse.com/soc/9/html/suse-openstack-cloud-crowbar-all/cha-deploy-poc.html#sec-depl-poc-vlans. > > On 2022/10/20 8:50 AM, Clark Boylan wrote: > > On Wed, Oct 19, 2022, at 4:44 PM, Michal Arbet wrote: > > > Hmm, > > > > > > But I think there is a problem with vlan - you need to setup it in OVS, > > > don't you ? > > There was also a bridge and a veth pair involved: https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e/manifests/veth.pp > > > > Possibly to deal with this? Like I said its been a long time and I don't remember the details. I just know it was possible to solve at least at the time. Linux gives you a whole suite of virtual network components that you can throw together to workaround physical limitations like this. > > > > > Michal Arbet > > > Openstack Engineer > > > > > > Ultimum Technologies a.s. > > > Na Po???? 1047/26, 11000 Praha 1 > > > Czech Republic > > > > > > +420 604 228 897 > > > michal.arbet at ultimum.io > > > _https://ultimum.io_ > > > > > > LinkedIn | > > > Twitter | Facebook > > > > > > > > > > > > st 19. 10. 2022 v 23:57 odes?latel Clark Boylan napsal: > > > > On Wed, Oct 19, 2022, at 9:40 AM, Michal Arbet wrote: > > > > > Hi, > > > > > > > > > > If I am correct this is not possible currently, but I remember I was > > > > > working on a solution, but unfortunately I stopped at some point > > > > > because kolla upstream didn't want to maintain. > > > > > > > > > > In attachment you can find patches for kolla and kolla-ansible and our idea. > > > > > > > > > > We added python script to kolla container and provide netplan style > > > > > configuration by kolla-ansible ..so openvswitch starts and configured > > > > > networking as it was set in configuration (if i remember ...it is quite > > > > > long time....and of course it was not final version ...but if i > > > > > remember it somehow worked). > > > > > > > > > > So, you can check it and maybe we can discuss this feature again :) > > > > > > > > > > Thanks, > > > > > Kevko > > > > > > > > > > > > > > > Michal Arbet > > > > > Openstack Engineer > > > > > > > > > > Ultimum Technologies a.s. > > > > > Na Po???? 1047/26, 11000 Praha 1 > > > > > Czech Republic > > > > > > > > > > +420 604 228 897 > > > > > michal.arbet at ultimum.io > > > > > _https://ultimum.io_ > > > > > > > > > > LinkedIn | > > > > > Twitter | Facebook > > > > > > > > > > > > > > > > > > > > po 17. 10. 2022 v 19:24 odes?latel Parsa Aminian > > > > > napsal: > > > > > > Hello > > > > > > I use kolla ansible wallaby version . > > > > > > my compute node has only one port . is it possible to use this server ? as I know openstack compute need 2 port one for management and other for external user network . Im using provider_networks and it seems neutron_external_interface could not be the same as network_interface because openvswitch need to create br-ex bridge on separate port > > > > > > is there any solution that i can config my compute with 1 port ? > > > > A very long time ago the OpenStack Infra Team ran the "Infracloud". This OpenStack installation ran on donated hardware and the instances there only had a single network port as well. To workaround this we ended up using vlan specific subinterfaces on the node so that logically we were presenting more than one interface to the OpenStack installation. > > > > > > > > I don't remember all the details but the now retired opendev/puppet-infracloud repo may have some clues: https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e > > > > > > > > > Attachments: > > > > > * ovs_kolla > > > > > * ovs_kolla_ansible > From ralonsoh at redhat.com Thu Oct 20 08:41:44 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 20 Oct 2022 10:41:44 +0200 Subject: [PTG][nova][neutron] Nova-Neutron cross-team meetings Message-ID: Hello all: The Nova-Neutron cross-team sessions today, starting at 13UTC, will be on Nova channel "bexar". In any case, I'll send an update in our IRC channel and I'll keep "mitaka" session open to redirect people to "bexar". Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhitman at groupw.com Thu Oct 20 14:09:46 2022 From: swhitman at groupw.com (Stuart Whitman) Date: Thu, 20 Oct 2022 14:09:46 +0000 Subject: [kolla-ansible] [yoga] [magnum] [k8s] cannot attach persistent volume to pod Message-ID: Hello, When I try to attach a persistent cinder volume to a pod, I get FailedMount and FailedAttachVolume timeout events. I also get these errors in the log of the csi-cinder-controllerplugin-0 pod: E1020 13:38:41.747511 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.VolumeAttachment: the server could not find the requested resource E1020 13:38:41.748187 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSINode: the server could not find the requested resource ?I fixed a CrashLoopBackoff error with the csi-snapshotter container in the csi-cinder-controllerplugin-0 pod by providing the label "csi_snapshotter_tag=v4.0.0" when I created the cluster template. I found that suggestion in an issue on the GitHub cloud-provider-openstack project. I'm not finding any help with this error on Google. Thanks, -Stu _____________________________________ The information contained in this e-mail and any attachments from Group W may contain confidential and/or proprietary information and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, be aware that any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately of that fact by return e-mail and permanently delete the e-mail and any attachments to it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Oct 20 14:13:13 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 20 Oct 2022 07:13:13 -0700 Subject: [all][tc] 2023.1 Antelope TC-PTG Planning In-Reply-To: References: <182d57d2a78.1127e41be140296.2872907929814651386@ghanshyammann.com> <183680c4a58.dc375de7355972.8647386454272610958@ghanshyammann.com> Message-ID: <183f5bdb227.d9102c14270372.5253932786583769927@ghanshyammann.com> Hello Everyone, In case anyone has not noticed, I would like to highlight one thing for TC slots of today and tomorrow. TC will be meeting from 15:00 to 19:00 UTC whereas 17:00 to 19:00 UTC are out of PTG scheduled hours (not shown on ptg bot page also). 17:00 UTC does not mean we are done, you can join us for another 2 hours. * Thursday 15:00 - 19:00 UTC (4 hours) * Friday 15:00 - 19:00 UTC (4 hours) -gmann ---- On Thu, 29 Sep 2022 10:51:20 -0700 Jay Faulkner wrote --- > It was indicated to me that some GMail users were having trouble importing this ICS file. Please use the following procedure for best results: > Open Google Calendar, click the gear and choose Settings, then select Import & Export from the left side, and import the ICS file attached on the previous email on this thread. > When properly imported; you should see all three meetings imported from that single ICS file. Please reach out to me over email or at JayF in #openstack-tc if there are any further issues. > -Jay Faulkner > > > On Thu, Sep 29, 2022 at 9:47 AM Jay Faulkner jay at gr-oss.io> wrote: > Hey all, > In order to make this easier on folks (and to let the computers do timezone calculations!) I've created an ICS file that you can import into your calendaring app of choice to get these TC sessions added to your calendar. > See you there,Jay Faulkner > > On Thu, Sep 22, 2022 at 7:08 PM Ghanshyam Mann gmann at ghanshyammann.com> wrote: > Updates: > > TC decided to meet at the below slots: > > * Monday 15:00 - 17:00 UTC (2 hours) for TC+leaders interaction discussion. > * Thursday 15:00 - 19:00 UTC (4 hours) > * Friday 15:00 - 19:00 UTC (4 hours) > > PLEASE NOTE: To minimize the conflict with the project sessions, the last 2 hours on Thursday and Friday are booked out of the PTG schedule. > > Details are there in the below etherpad, please start adding the topic you would like to discuss: > > - https://etherpad.opendev.org/p/tc-2023-1-ptg > > > -gmann > > ?---- On Thu, 25 Aug 2022 07:52:06 -0700? Ghanshyam Mann? wrote --- > ?> Hello Everyone, > ?> > ?> As you already know that the 2023.1 cycle virtual PTG will be held between Oct 17th - 21[1]. > ?> > ?> I have started the preparation for the Technical Committee PTG sessions. Please do the following: > ?> > ?> 1. Fill the below poll as per your availability. > ?> > ?> - https://framadate.org/yi8LNQaph5wrirks > ?> > ?> 2. Add the topics you would like to discuss to the below etherpad. > ?> > ?> - https://etherpad.opendev.org/p/tc-2023-1-ptg > ?> > ?> NOTE: this is not limited to TC members only; I would like all community members to > ?> fill the doodle poll and, add the topics you would like or want TC members to discuss in PTG. > ?> > ?> [1] https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030041.html > ?> > ?> -gmann > ?> > ?> > > From lucioseki at gmail.com Thu Oct 20 14:21:25 2022 From: lucioseki at gmail.com (Lucio Seki) Date: Thu, 20 Oct 2022 11:21:25 -0300 Subject: [glance] Slow image download when using glanceclient In-Reply-To: <141F3517-5364-4B69-889D-2EB1077DD9D6@gmail.com> References: <0bf73382755a1b3cf776932bbcc658ea250c23fa.camel@redhat.com> <56b7476c090c1223dc91c89eb6acdbd7ff1e307a.camel@redhat.com> <141F3517-5364-4B69-889D-2EB1077DD9D6@gmail.com> Message-ID: Thanks Artem, Indeed, using the `output` parameter increased the download speed from 120KB/s to >120MB/s (the max network performance I have). That's great! I'll look into the method definition and see what's the secret. Regards, Lucio On Fri, Oct 14, 2022, 12:07 Artem Goncharov wrote: > ``` > import openstack > > conn = openstack.connect() > > conn.image.download_image(image_name, stream=True, output="data.iso?) > ``` > > This gives me max performance of the network. Actually using stream=True > may be slower (around 40%), but may be crucially necessary when dealing > with huge images. Additionally you can specify chunk_size as param to > download_image function, what aligns performance of stream vs non stream > (for me stream=True and chunk_size=8192 resulted 2.3G image to be > downloaded in 14 sec) > > > On 13. Oct 2022, at 23:24, Lucio Seki wrote: > > Yes, I'm using tqdm to monitor the progress and speed. > I removed it, and it improved slightly (120kB/s -> 131kB/s) but not > significantly :-/ > > On Thu, Oct 13, 2022, 16:54 Sean Mooney wrote: > >> On Thu, 2022-10-13 at 16:21 -0300, Lucio Seki wrote: >> > Thanks Sean, that makes much easier to code! >> > >> > ``` >> > ... >> > conn = openstack.connect(cloud_name) >> > >> > with open(path, 'wb') as image_file: >> > response = conn.image.download_image(image_name) >> > for chunk in tqdm(response.iter_content(), **tqdm_params): >> > image_file.write(chunk) >> > ``` >> > >> > And it gave me some performance improvement (3kB/s -> 120kB/s). >> > ... though it would still take several days to download an image. >> > >> > Is there some tuning that I could apply? >> this is what nova does >> https://github.com/openstack/nova/blob/master/nova/image/glance.py#L344 >> >> we get the image chunks by calling the data method on the glance client >> >> https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e284/nova/image/glance.py#L373-L377 >> then bwe basiclly just loop over the chunks and write them to a file like >> you are >> >> https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e284/nova/image/glance.py#L413-L437 >> we have some extra code for doing image verification but its basically >> the same as what you are doing >> we use eventlets to monkeypatch python io which can imporve performce but >> i woudl not expect it to be that dramatic >> and i dont think the glance clinet or opesntack client use eventlet so >> its sound liek something else is limiting the transfer speed. >> >> this is the glance client method we are invokeing >> >> https://github.com/openstack/python-glanceclient/blob/56186d6d5aa1a0c8fde99eeb535a650b0495925d/glanceclient/v2/images.py#L201-L271 >> >> >> im not sure what tqdm is by the way is it meusrign the transfer speed of >> something linke that? >> does the speed increase if you remvoe that? >> i.ie can you test this via a simple time script and see how much >> downloads say in up to 60 seconds by lookign at the file size? >> >> assuming its https://github.com/tqdm/tqdm perhaps the addtional io that >> woudl be doing to standard out is slowign it down? >> >> >> >> >> > >> > On Thu, Oct 13, 2022, 14:18 Sean Mooney wrote: >> > >> > > On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote: >> > > > Hi glance experts, >> > > > >> > > > I'm using the following code to download a glance image: >> > > > >> > > > ``` >> > > > from glanceapi import client >> > > > ... >> > > > glance = client.Client(GLANCE_API_VERSION, session=sess) >> > > > ... >> > > > with open(path, 'wb') as image_file: >> > > > data = glance.images.data(image_id) >> > > > for chunk in tqdm(data, unit='B', unit_scale=True, >> > > unit_divisor=1024): >> > > > image_file.write(chunk) >> > > > ``` >> > > > >> > > > And I get a speed around 3kB/s. It would take months to download an >> > > image. >> > > > I'm using python3-glanceclient==3.6.0. >> > > > I even tried: >> > > > ``` >> > > > for chunk in tqdm(data, unit='B', unit_scale=True, >> > > unit_divisor=1024): >> > > > pass >> > > > ``` >> > > > to see if the bottleneck was the disk I/O, but didn't get any >> faster. >> > > > >> > > > In the same environment, when I use the glance CLI instead: >> > > > >> > > > ``` >> > > > glance image-download --file $path $image_id >> > > > ``` >> > > > I get hundreds of MB/s download speed, and it finishes in a few >> minutes. >> > > > >> > > > Is there anything I can do to improve the glanceclient performance? >> > > > I'm considering using subprocess.Popen(['glance', 'image-download', >> ...]) >> > > > if nothing helps... >> > > have you considered using the openstacksdk instead >> > > >> > > the glanceclint is really only intendeted for other openstack service >> to >> > > use like >> > > nova or ironic. >> > > its not really ment to be used to write your onw code anymore. >> > > in the past it provided a programatic interface for interacting with >> glance >> > > but now you shoudl prefer the openstack sdk instead. >> > > https://github.com/openstack/openstacksdk >> > > >> > > > >> > > > Regards, >> > > > Lucio >> > > >> > > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Albert.Shih at obspm.fr Thu Oct 20 20:51:54 2022 From: Albert.Shih at obspm.fr (Albert Shih) Date: Thu, 20 Oct 2022 22:51:54 +0200 Subject: [victoria] Loose network on all instance. Message-ID: Hi, I've a small openstack installation running Victoria on Ubuntu 20 LTS. After a update (not upgrade) of Ubuntu, I loose all network on all my instance. On a instance console I got : [[0;32m OK [0m] Started [0;1;39mifup for ens3[0m. Starting [0;1;39mRaise network interfaces[0m... [[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 4s / 6min 2s) M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 4s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 5s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 5s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Raise ne???ork interfaces (1min 6s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m] A start job is running for Raise ne???ork interfaces (1min 6s / 6min 2s) M[K[ [0;31m*[0m] A start job is running for Raise ne???ork interfaces (1min 7s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m] A start job is running for Raise ne???ork interfaces (1min 7s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Raise ne???ork interfaces (1min 8s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 8s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 9s / 6min 2s) M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 9s / 6min 2s) M[K[[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 10s / 6min 2s) M[K[[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 10s / 6min 2s) M[K[[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 11s / 6min 2s) M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 11s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 12s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 12s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Raise ne???rk interfaces (1min 13s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m] A start job is running for Raise ne???rk interfaces (1min 13s / 6min 2s) M[K[ [0;31m*[0m] A start job is running for Raise ne???rk interfaces (1min 14s / 6min 2s) ...... ...... M[K[ [0;31m*[0;1;31m*[0m] A start job is running for Raise ne???rk interfaces (2min 59s / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Raise network interfaces (3min / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise network interfaces (3min / 6min 2s) M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 1s / 6min 2s) M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 1s / 6min 2s) M[K[[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 2s / 6min 2s) M[K[[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 2s / 6min 2s) M[K[[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 3s / 6min 2s) M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 3s / 6min 2s) M[K[[0;32m OK [0m] Finished [0;1;39mRaise network interfaces[0m. [K[[0;32m OK [0m] Reached target [0;1;39mNetwork[0m. Starting [0;1;39mInitial cloud-ini??? (metadata service crawler)[0m... [ 186.709234] cloud-init[514]: Cloud-init v. 20.4.1 running 'init' at Thu, 20 Oct 2022 20:12:41 +0000. Up 186.69 seconds. [ 186.727021] cloud-init[514]: ci-info: ++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++ [ 186.728928] cloud-init[514]: ci-info: +--------+------+------------------------------+-----------+-------+-------------------+ [ 186.730709] cloud-init[514]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | [ 186.732694] cloud-init[514]: ci-info: +--------+------+------------------------------+-----------+-------+-------------------+ [ 186.734765] cloud-init[514]: ci-info: | ens3 | True | fe80::f816:3eff:fec3:2e6c/64 | . | link | fa:16:3e:c3:2e:6c | [ 186.736681] cloud-init[514]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . | [ 186.738473] cloud-init[514]: ci-info: | lo | True | ::1/128 | . | host | . | [ 186.740246] cloud-init[514]: ci-info: +--------+------+------------------------------+-----------+-------+-------------------+ [ 186.741723] cloud-init[514]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++ [ 186.742306] cloud-init[514]: ci-info: +-------+-------------+---------+-----------+-------+ [ 186.742881] cloud-init[514]: ci-info: | Route | Destination | Gateway | Interface | Flags | [ 186.743460] cloud-init[514]: ci-info: +-------+-------------+---------+-----------+-------+ [ 186.744075] cloud-init[514]: ci-info: | 1 | fe80::/64 | :: | ens3 | U | [ 186.744681] cloud-init[514]: ci-info: | 3 | local | :: | ens3 | U | [ 186.745255] cloud-init[514]: ci-info: | 4 | multicast | :: | ens3 | U | [ 186.745833] cloud-init[514]: ci-info: +-------+-------------+---------+-----------+-------+ in the log of neutron-linuxbridge-agent.log it seem everything should work : 2022-10-20 20:17:54.967 59611 INFO neutron.plugins.ml2.drivers.agent._common_agent [req-9e6b23d3-1e0e-4c48-8f0e-30d761233188 - - - - -] Port tap4f68bcf1-4d updated. Details: {'device': 'tap4f68bcf1-4d', 'network_id': '9223f9ff-7ab0-4268-9b7f-3b5966625c65', 'port_id': '4f68bcf1-4d32-4ad3-a0df-0415f7170c5d', 'mac_address': 'fa:16:3e:c3:2e:6c', 'admin_state_up': True, 'network_type': 'flat', 'segmentation_id': None, 'physical_network': 'provider', 'mtu': 1500, 'fixed_ips': [{'subnet_id': '12cdcc06-bbf0-48f2-b303-bdfab460c696', 'ip_address': '145.238.137.52'}], 'device_owner': 'compute:nova', 'allowed_address_pairs': [], 'port_security_enabled': True, 'qos_policy_id': None, 'network_qos_policy_id': None, 'profile': {}, 'propagate_uplink_status': False} But I'm unable to ping any instance. Any idea ? Regards -- Albert SHIH ? Observatoire de Paris France Heure locale/Local time: jeu. 20 oct. 2022 22:47:15 CEST From rosmaita.fossdev at gmail.com Thu Oct 20 21:32:49 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 20 Oct 2022 17:32:49 -0400 Subject: [i18n] questions for the user survey Message-ID: Hello people interested in internationalization, At the PTG this week, the i18n SIG drafted some questions for the 2023 OpenStack User Survey. Please look them over and leave any comments on the etherpad: https://etherpad.opendev.org/p/oct2022-ptg-openstack-i18n-user-survey-questions We need to get these to the Foundation quickly, so please leave your comments before 1200 UTC on Monday 24 October. From johnsomor at gmail.com Thu Oct 20 21:46:50 2022 From: johnsomor at gmail.com (Michael Johnson) Date: Thu, 20 Oct 2022 14:46:50 -0700 Subject: [designate] [release] Proposing to EOL Designate stable branches Queens, Rocky, Stein, and Train Message-ID: Today at the Designate PTG session we discussed the number of stable branches (10) we are carrying for Designate and the lack of interest in maintaining them. For example, there has not been a patch merged on the Queens branch for over two years. Based on the discussion, and lack of patches merged to the older branches, I am proposing we move the Queens, Rocky, Stein, and Train stable branches from extended maintenance to end-of-life status. The definition of this process is part of the project team guide[1]. I plan to propose patches for the required changes next week. If you would like to maintain one of these branches, please respond to this email and post a patch to the branch to resolve any existing issues on the older branches. Michael [1] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life From berndbausch at gmail.com Fri Oct 21 01:19:48 2022 From: berndbausch at gmail.com (Bernd Bausch) Date: Fri, 21 Oct 2022 10:19:48 +0900 Subject: [victoria] Loose network on all instance. In-Reply-To: References: Message-ID: <6028846f-d781-e9dd-77d0-888d1825db52@gmail.com> Is the cloud running on a single server? How did you deploy it? Your description indicates that instances are unable to reach their DHCP servers. They don't get an IP address and consequently can't be ping'ed. Start with openstack network agent list to see if there is an obvious problem with the DHCP agent(s). Also check if /dnsmasq/ processes (which normally implement the DHCP service) are running; there should be one per Neutron network. Also check the relevant log files and the /dnsmasq /configuration and lease files. If there is no problem at that level, I would guess network connectivity between instances and DHCP servers is broken. That is harder to troubleshoot, as there are many moving parts and many implementation options. On 2022/10/21 5:51 AM, Albert Shih wrote: > Hi, > > I've a small openstack installation running Victoria on Ubuntu 20 LTS. > > After a update (not upgrade) of Ubuntu, I loose all network on all my > instance. > > On a instance console I got : > > [[0;32m OK [0m] Started [0;1;39mifup for ens3[0m. > Starting [0;1;39mRaise network interfaces[0m... > [[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 4s / 6min 2s) > M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 4s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 5s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 5s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Raise ne???ork interfaces (1min 6s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m] A start job is running for Raise ne???ork interfaces (1min 6s / 6min 2s) > M[K[ [0;31m*[0m] A start job is running for Raise ne???ork interfaces (1min 7s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m] A start job is running for Raise ne???ork interfaces (1min 7s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Raise ne???ork interfaces (1min 8s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 8s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 9s / 6min 2s) > M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (1min 9s / 6min 2s) > M[K[[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 10s / 6min 2s) > M[K[[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 10s / 6min 2s) > M[K[[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 11s / 6min 2s) > M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 11s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 12s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???rk interfaces (1min 12s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Raise ne???rk interfaces (1min 13s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m] A start job is running for Raise ne???rk interfaces (1min 13s / 6min 2s) > M[K[ [0;31m*[0m] A start job is running for Raise ne???rk interfaces (1min 14s / 6min 2s) > ...... > ...... > M[K[ [0;31m*[0;1;31m*[0m] A start job is running for Raise ne???rk interfaces (2min 59s / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Raise network interfaces (3min / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise network interfaces (3min / 6min 2s) > M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 1s / 6min 2s) > M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 1s / 6min 2s) > M[K[[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 2s / 6min 2s) > M[K[[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 2s / 6min 2s) > M[K[[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 3s / 6min 2s) > M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise ne???ork interfaces (3min 3s / 6min 2s) > M[K[[0;32m OK [0m] Finished [0;1;39mRaise network interfaces[0m. > [K[[0;32m OK [0m] Reached target [0;1;39mNetwork[0m. > Starting [0;1;39mInitial cloud-ini??? (metadata service crawler)[0m... > [ 186.709234] cloud-init[514]: Cloud-init v. 20.4.1 running 'init' at Thu, 20 Oct 2022 20:12:41 +0000. Up 186.69 seconds. > [ 186.727021] cloud-init[514]: ci-info: ++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++ > [ 186.728928] cloud-init[514]: ci-info: +--------+------+------------------------------+-----------+-------+-------------------+ > [ 186.730709] cloud-init[514]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | > [ 186.732694] cloud-init[514]: ci-info: +--------+------+------------------------------+-----------+-------+-------------------+ > [ 186.734765] cloud-init[514]: ci-info: | ens3 | True | fe80::f816:3eff:fec3:2e6c/64 | . | link | fa:16:3e:c3:2e:6c | > [ 186.736681] cloud-init[514]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . | > [ 186.738473] cloud-init[514]: ci-info: | lo | True | ::1/128 | . | host | . | > [ 186.740246] cloud-init[514]: ci-info: +--------+------+------------------------------+-----------+-------+-------------------+ > [ 186.741723] cloud-init[514]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++ > [ 186.742306] cloud-init[514]: ci-info: +-------+-------------+---------+-----------+-------+ > [ 186.742881] cloud-init[514]: ci-info: | Route | Destination | Gateway | Interface | Flags | > [ 186.743460] cloud-init[514]: ci-info: +-------+-------------+---------+-----------+-------+ > [ 186.744075] cloud-init[514]: ci-info: | 1 | fe80::/64 | :: | ens3 | U | > [ 186.744681] cloud-init[514]: ci-info: | 3 | local | :: | ens3 | U | > [ 186.745255] cloud-init[514]: ci-info: | 4 | multicast | :: | ens3 | U | > [ 186.745833] cloud-init[514]: ci-info: +-------+-------------+---------+-----------+-------+ > > > in the log of neutron-linuxbridge-agent.log it seem everything should work : > > 2022-10-20 20:17:54.967 59611 INFO neutron.plugins.ml2.drivers.agent._common_agent [req-9e6b23d3-1e0e-4c48-8f0e-30d761233188 - - - - -] Port tap4f68bcf1-4d updated. Details: {'device': 'tap4f68bcf1-4d', 'network_id': '9223f9ff-7ab0-4268-9b7f-3b5966625c65', 'port_id': '4f68bcf1-4d32-4ad3-a0df-0415f7170c5d', 'mac_address': 'fa:16:3e:c3:2e:6c', 'admin_state_up': True, 'network_type': 'flat', 'segmentation_id': None, 'physical_network': 'provider', 'mtu': 1500, 'fixed_ips': [{'subnet_id': '12cdcc06-bbf0-48f2-b303-bdfab460c696', 'ip_address': '145.238.137.52'}], 'device_owner': 'compute:nova', 'allowed_address_pairs': [], 'port_security_enabled': True, 'qos_policy_id': None, 'network_qos_policy_id': None, 'profile': {}, 'propagate_uplink_status': False} > > But I'm unable to ping any instance. > > Any idea ? > > Regards > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Fri Oct 21 08:11:11 2022 From: zigo at debian.org (Thomas Goirand) Date: Fri, 21 Oct 2022 10:11:11 +0200 Subject: [telemetry][cloudkitty][ceilometer] Billing windows instances Message-ID: <07d23aac-fee9-cd11-bf25-5030dba2cf6c@debian.org> Hi there! We're using telemetry+cloudkitty for our rating. We're happy of it, though we're having the issue that we would like to bill instances running Windows. The example given in the Cloudkitty doc shows how to bill more when an instance runs Windows. That works in theory, though in practice, Microsoft has a license model based on how many vCPU the instannce runs. So the model that the Cloudkitty documentation shows simply doesn't work with the SPLA thingy. We've looked at options. One way would of course writing a new custom pollster, but we don't like the idea: this would mean polling for all of the thousands of instances that are running in our deployments. So we would very much prefer having ceilometer running on compute node (with polling_namespaces=compute) to do the work, as this scales a way better. However, Ceilometer polls libvirt, which only has the information about the image ID, not the metadata associated with the image (like the property os_type=windows, for example). So, is there a better way than a dynamic pollster? Can this be done with ceilometer on the compute nodes? Cheers, Thomas Goirand (zigo) From satish.txt at gmail.com Fri Oct 21 09:49:31 2022 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 21 Oct 2022 05:49:31 -0400 Subject: [kolla] single Network interface In-Reply-To: References: <49f4b31f-d0c9-4b0a-be3c-70480f45f39e@app.fastmail.com> Message-ID: Here is the doc to deploy kolla using a single network interface port. https://www.keepcalmandrouteon.com/post/kolla-os-part-1/ On Thu, Oct 20, 2022 at 4:46 AM Sean Mooney wrote: > I have not been following this too cloesly and sorry to top post but its > possibel to deploy multi node openstack using a singel interface. > i often do that with devstack and it shoudl be possibel to do with kolla. > > first if you do not need vlan/flat tenant networks and and geneve/vxlan > with ml2/ovs or ml2/ovn is sufficent then the tunell endpoint ip can just be > the manamgnet interface. when im deploying wiht devstack i just create a > dumy interfaces and use that for neutorn > so you shoudl be able to do that for kolla-ansible too just have a > playbook that will create a dumy interface on all host and set that as the > neutron_interface. > > in kolla all other interface are shared by defautl so its only the > nuetorn_interface for the br-ex that need to be managed. > this approch reqired yuo to asign the gateway ip for the external network > to one of the contolers and configre that host in your router. > > the better approch whihc allows provider networks to work and avoids the > need to asisng the gateway ip in a hacky way is use macvlan interfaces > i dont thinki have an example of this form my home cloud any more since i > have redpeloyed it but i previoulsy used to create macvlan sub interfaces > > to do this by hand you would do somehting like this > > sudo ip link add api link eth0 type macvlan mode bridge > sudo ip link add ovs link eth0 type macvlan mode bridge > sudo ip link add storage link eth0 type macvlan mode bridge > sudo ifconfig api up > sudo ifconfig ovs up > sudo ifconfig storage up > > > you can wrap that up into a systemd service file and have it run before > the docker service. > if your on ubuntu netplan does not support macvlans currently but you can > do it the tradtional way or wiht systemd networkd > > Macvlan allows a single physical interface to have multiple mac and ip > addresses. > you can also do the same with a linux bridge but that is less then ideal > in terms of performance. > if your nic support sriov another good way to partion then nice is to use > a VF > > in this case you just put a trivial udev rule to allocate them or use > netplan > https://netplan.io/examples its the final example. > > > macvlan works if you dont have hardware supprot for sriov and sriov is a > good option otherwise > > On Thu, 2022-10-20 at 11:06 +0900, Bernd Bausch wrote: > > SInce you can easily have five to ten different networks in a cloud > > installation, e.g. networks dedicated to object storage, provider > > networks for Octavia, a network just for iSCSI etc, VLANs are (or used > > to be?) a common solution. See for example the (sadly, defunct) SUSE > > OpenStack cloud > > > https://documentation.suse.com/soc/9/html/suse-openstack-cloud-crowbar-all/cha-deploy-poc.html#sec-depl-poc-vlans > . > > > > On 2022/10/20 8:50 AM, Clark Boylan wrote: > > > On Wed, Oct 19, 2022, at 4:44 PM, Michal Arbet wrote: > > > > Hmm, > > > > > > > > But I think there is a problem with vlan - you need to setup it in > OVS, > > > > don't you ? > > > There was also a bridge and a veth pair involved: > https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e/manifests/veth.pp > > > > > > Possibly to deal with this? Like I said its been a long time and I > don't remember the details. I just know it was possible to solve at least > at the time. Linux gives you a whole suite of virtual network components > that you can throw together to workaround physical limitations like this. > > > > > > > Michal Arbet > > > > Openstack Engineer > > > > > > > > Ultimum Technologies a.s. > > > > Na Po???? 1047/26, 11000 Praha 1 > > > > Czech Republic > > > > > > > > +420 604 228 897 > > > > michal.arbet at ultimum.io > > > > _https://ultimum.io_ > > > > > > > > LinkedIn | > > > > Twitter | Facebook > > > > > > > > > > > > > > > > st 19. 10. 2022 v 23:57 odes?latel Clark Boylan < > cboylan at sapwetik.org> napsal: > > > > > On Wed, Oct 19, 2022, at 9:40 AM, Michal Arbet wrote: > > > > > > Hi, > > > > > > > > > > > > If I am correct this is not possible currently, but I remember I > was > > > > > > working on a solution, but unfortunately I stopped at some point > > > > > > because kolla upstream didn't want to maintain. > > > > > > > > > > > > In attachment you can find patches for kolla and kolla-ansible > and our idea. > > > > > > > > > > > > We added python script to kolla container and provide netplan > style > > > > > > configuration by kolla-ansible ..so openvswitch starts and > configured > > > > > > networking as it was set in configuration (if i remember ...it > is quite > > > > > > long time....and of course it was not final version ...but if i > > > > > > remember it somehow worked). > > > > > > > > > > > > So, you can check it and maybe we can discuss this feature again > :) > > > > > > > > > > > > Thanks, > > > > > > Kevko > > > > > > > > > > > > > > > > > > Michal Arbet > > > > > > Openstack Engineer > > > > > > > > > > > > Ultimum Technologies a.s. > > > > > > Na Po???? 1047/26, 11000 Praha 1 > > > > > > Czech Republic > > > > > > > > > > > > +420 604 228 897 > > > > > > michal.arbet at ultimum.io > > > > > > _https://ultimum.io_ > > > > > > > > > > > > LinkedIn > | > > > > > > Twitter | Facebook > > > > > > > > > > > > > > > > > > > > > > > > po 17. 10. 2022 v 19:24 odes?latel Parsa Aminian > > > > > > napsal: > > > > > > > Hello > > > > > > > I use kolla ansible wallaby version . > > > > > > > my compute node has only one port . is it possible to use this > server ? as I know openstack compute need 2 port one for management and > other for external user network . Im using provider_networks and it seems > neutron_external_interface could not be the same as network_interface > because openvswitch need to create br-ex bridge on separate port > > > > > > > is there any solution that i can config my compute with 1 port > ? > > > > > A very long time ago the OpenStack Infra Team ran the > "Infracloud". This OpenStack installation ran on donated hardware and the > instances there only had a single network port as well. To workaround this > we ended up using vlan specific subinterfaces on the node so that logically > we were presenting more than one interface to the OpenStack installation. > > > > > > > > > > I don't remember all the details but the now retired > opendev/puppet-infracloud repo may have some clues: > https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e > > > > > > > > > > > Attachments: > > > > > > * ovs_kolla > > > > > > * ovs_kolla_ansible > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephenfin at redhat.com Fri Oct 21 10:42:20 2022 From: stephenfin at redhat.com (Stephen Finucane) Date: Fri, 21 Oct 2022 11:42:20 +0100 Subject: [PTL][TC] library *feature* freeze at Milestone-2 In-Reply-To: References: Message-ID: <553fd064379209d0ceae91eb747c8d0656996321.camel@redhat.com> On Wed, 2022-10-19 at 17:10 +0000, El?d Ill?s wrote: > Hi, > > During 'TC + Community leaders interaction' [1] a case was discussed, where a > late library release caused last minute fire fighting in Zed cycle, and people > discussed the possibility to introduce a (non-client) library *feature* freeze > at Milestone-2 to avoid similar issues in the future. > > I've started to propose the possible schedule change [2] (note: it's not ready > yet as it does not emphasize that at Milestone-2 we mean *feature* freeze for > libraries, not "final library release"). The patch already got some reviews > from library maintainers so I'm calling the attention to this change here on > the ML. Repeating what I said on the reviews, I'd really rather not do this. There are a couple of reasons for this. Firstly, regarding the proposal itself, this is going to make my life as an oslo maintainer harder than it already is. This is a crucial point. I'm not aware of anyone whose official job responsibilities extend to oslo and it's very much a case of doing it because no one else is doing it. We're a tiny team and pretty overwhelmed with multiple other non-oslo $things and for me at least this means I tend to do oslo work (including reviews) in spurts. Introducing a rather large window (6 weeks per cycle, which is approximately 1/4 of the total available time in a cycle) during which we can't merge the larger, harder to review feature patches is simply too long: whatever context I would have built up before the freeze would be long-since gone after a month and a half. Secondly, regarding the issue that led to this proposal, I don't think this proposal would have actually helped. The patch that this proposal stems from was actually merged back on July 20th [1]. This was technically after Zed M2 but barely (5 days [2]). However, reports of issues didn't appear until September, when this was released as oslo.db 12.1.0 [3][4]. If we had released 12.1.0 in late July or early August, the issue would have been spotted far earlier, but as noted above the oslo team is tiny and overwhelmed, and I would guess the release team is in a similar boat (and can't be expected to know about all these things). I also feel compelled to note that this didn't arrive out of the blue. I have been shouting about SQLAlchemy 2.0 for over a year now [5] and I have also been quite vocal about other oslo.db-related changes on their way [6][7]. For the SQLAlchemy 2.0 case specifically, clearly not enough people have been listening. I sympathise (again, tiny, overwhelmed teams are not an oslo-specific phenomenon) but the pain was going to arrive eventually and it's just unfortunate that it landed with an oslo.db release that was cut so close to the deadline (see above). I manged to get nova, cinder and placement prepared well ahead of time but it isn't sustainable for one person to do this for all projects. Project teams need to prioritise this stuff ahead of time rather than waiting until things are on fire. Finally, it's worth remembering that this isn't a regular occurence. Yes, there was some pain, but we handled the issue pretty well (IMO) and affected projects are now hopefully aware of the ticking tech debt bomb ? sitting in their codebase. However, as far as I can tell, there's no trend of the oslo team (or any other library project) introducing breaking changes like this so close to release deadlines, so it does feel a bit like putting the cart before the horse. To repeat myself from the top, I'd really rather not do this. If we wanted to start cutting oslo releases faster, by all means let's figure out how to do that. If we wanted to branch earlier and keep master moving, I'm onboard. Preventing us from merging features for a combined ~3 months of the year is a non-starter IMO though. Cheers, Stephen [1] https://review.opendev.org/c/openstack/oslo.db/+/804775 [2] https://releases.openstack.org/zed/schedule.html [3] https://review.opendev.org/c/openstack/releases/+/853975/ [4] https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030317.html [5] https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html [6] https://lists.openstack.org/pipermail/openstack-discuss/2022-April/028197.html [7] https://lists.openstack.org/pipermail/openstack-discuss/2022-April/028198.html > > Thanks everyone for the responses in advance, > > El?d > > [1] > https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030718.html > [2] https://review.opendev.org/c/openstack/releases/+/861900 From stephenfin at redhat.com Fri Oct 21 10:48:41 2022 From: stephenfin at redhat.com (Stephen Finucane) Date: Fri, 21 Oct 2022 11:48:41 +0100 Subject: [PTL][TC] library *feature* freeze at Milestone-2 In-Reply-To: <553fd064379209d0ceae91eb747c8d0656996321.camel@redhat.com> References: <553fd064379209d0ceae91eb747c8d0656996321.camel@redhat.com> Message-ID: On Fri, 2022-10-21 at 11:42 +0100, Stephen Finucane wrote: > On Wed, 2022-10-19 at 17:10 +0000, El?d Ill?s wrote: > > Hi, > > > > During 'TC + Community leaders interaction' [1] a case was discussed, where a > > late library release caused last minute fire fighting in Zed cycle, and people > > discussed the possibility to introduce a (non-client) library *feature* freeze > > at Milestone-2 to avoid similar issues in the future. > > > > I've started to propose the possible schedule change [2] (note: it's not ready > > yet as it does not emphasize that at Milestone-2 we mean *feature* freeze for > > libraries, not "final library release"). The patch already got some reviews > > from library maintainers so I'm calling the attention to this change here on > > the ML. > > Repeating what I said on the reviews, I'd really rather not do this. There are a > couple of reasons for this. Firstly, regarding the proposal itself, this is > going to make my life as an oslo maintainer harder than it already is. This is a > crucial point. I'm not aware of anyone whose official job responsibilities > extend to oslo and it's very much a case of doing it because no one else is > doing it. We're a tiny team and pretty overwhelmed with multiple other non-oslo > $things and for me at least this means I tend to do oslo work (including > reviews) in spurts. Introducing a rather large window (6 weeks per cycle, which > is approximately 1/4 of the total available time in a cycle) during which we > can't merge the larger, harder to review feature patches is simply too long: > whatever context I would have built up before the freeze would be long-since > gone after a month and a half. > > Secondly, regarding the issue that led to this proposal, I don't think this > proposal would have actually helped. The patch that this proposal stems from was > actually merged back on July 20th [1]. This was technically after Zed M2 but > barely (5 days [2]). However, reports of issues didn't appear until September, > when this was released as oslo.db 12.1.0 [3][4]. If we had released 12.1.0 in > late July or early August, the issue would have been spotted far earlier, but as > noted above the oslo team is tiny and overwhelmed, and I would guess the release > team is in a similar boat (and can't be expected to know about all these > things). > > I also feel compelled to note that this didn't arrive out of the blue. I have > been shouting about SQLAlchemy 2.0 for over a year now [5] and I have also been > quite vocal about other oslo.db-related changes on their way [6][7]. For the > SQLAlchemy 2.0 case specifically, clearly not enough people have been listening. > I sympathise (again, tiny, overwhelmed teams are not an oslo-specific > phenomenon) but the pain was going to arrive eventually and it's just > unfortunate that it landed with an oslo.db release that was cut so close to the > deadline (see above). I manged to get nova, cinder and placement prepared well > ahead of time but it isn't sustainable for one person to do this for all > projects. Project teams need to prioritise this stuff ahead of time rather than > waiting until things are on fire. > > Finally, it's worth remembering that this isn't a regular occurence. Yes, there > was some pain, but we handled the issue pretty well (IMO) and affected projects > are now hopefully aware of the ticking tech debt bomb ? sitting in their > codebase. However, as far as I can tell, there's no trend of the oslo team (or > any other library project) introducing breaking changes like this so close to > release deadlines, so it does feel a bit like putting the cart before the horse. Oh, and one final point here: I didn't actually _know_ this was going to cause as many issue as it did. Perhaps there's value in an oslo-tips job that tests service projects against the HEAD of the various oslo libraries. However, that's a whole load of extra CI resources that we'd have to find resources for. Testing in oslo.db itself didn't and wouldn't catch this because all the affected projects were all projects that were not deployed by default in 'tempest-full- py3' job. Stephen > > To repeat myself from the top, I'd really rather not do this. If we wanted to > start cutting oslo releases faster, by all means let's figure out how to do > that. If we wanted to branch earlier and keep master moving, I'm onboard. > Preventing us from merging features for a combined ~3 months of the year is a > non-starter IMO though. > > Cheers, > Stephen > > > [1] https://review.opendev.org/c/openstack/oslo.db/+/804775 > [2] https://releases.openstack.org/zed/schedule.html > [3] https://review.opendev.org/c/openstack/releases/+/853975/ > [4] https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030317.html > [5] https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html > [6] https://lists.openstack.org/pipermail/openstack-discuss/2022-April/028197.html > [7] https://lists.openstack.org/pipermail/openstack-discuss/2022-April/028198.html > > > > > Thanks everyone for the responses in advance, > > > > El?d > > > > [1] > > https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030718.html > > [2] https://review.opendev.org/c/openstack/releases/+/861900 > From thierry at openstack.org Fri Oct 21 10:57:32 2022 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 21 Oct 2022 12:57:32 +0200 Subject: [PTL][TC] library *feature* freeze at Milestone-2 In-Reply-To: <553fd064379209d0ceae91eb747c8d0656996321.camel@redhat.com> References: <553fd064379209d0ceae91eb747c8d0656996321.camel@redhat.com> Message-ID: <956a91a1-b77e-60bc-2ab4-877b0139d03b@openstack.org> Stephen Finucane wrote: > On Wed, 2022-10-19 at 17:10 +0000, El?d Ill?s wrote: >> During 'TC + Community leaders interaction' [1] a case was discussed, where a >> late library release caused last minute fire fighting in Zed cycle, and people >> discussed the possibility to introduce a (non-client) library *feature* freeze >> at Milestone-2 to avoid similar issues in the future. >> [...] > > Repeating what I said on the reviews, I'd really rather not do this. [...] I tend to agree with Stephen on this... Cutting a significant window of time to handle a relatively rare occurrence might not be a great tradeoff. From a release management perspective, if that ensured that we'd always avoid last-minute release issues, I would support it. But reality is, this covers just a part of our blind spot. There are a lot of reasons why CI for a particular project ends up no longer working, Oslo breaking changes is just one of them. Periodically checking that CI works for all projects (ideally through normalized periodic testing, if not through regularly posting a bunch of noop test changes in inactive repositories) would detect issues earlier and cover all of our blind spot. We should also make sure we only ship actively-maintained projects, so that we know who to turn to to fix it when it's broken. Freezing Oslo a lot earlier? Not convinced. -- Thierry Carrez (ttx) From marios at redhat.com Fri Oct 21 13:05:26 2022 From: marios at redhat.com (Marios Andreou) Date: Fri, 21 Oct 2022 16:05:26 +0300 Subject: [tripleo] gate blocker centos 9 content provider - trending fixed Message-ID: Hello FYI (few folks asking in irc apologies should have sent sooner) gate blocker on the content provider https://bugs.launchpad.net/tripleo/+bug/1984237/comments/5 thanks chkumar we are trying to get a fix through gate at https://review.opendev.org/c/openstack/tripleo-quickstart/+/856582 please hold your recheck until that merges or use depends-on thank you! From arnaud.morin at gmail.com Fri Oct 21 14:55:37 2022 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Fri, 21 Oct 2022 14:55:37 +0000 Subject: [large-scale][oslo.messaging] RPC workers and db connection Message-ID: Hey all, TLDR: How can I fine tune the number of DB connection on OpenStack services? Long story, with some inline questions: I am trying to figure out the maximum number of db connection we should allow on our db cluster. For this short speech, I will use neutron RPC as example service, but I think nova is acting similar. So, to do so, I identified few parameters that I can tweak: rpc_workers [1] max_pool_size [2] max_overflow [3] executor_thread_pool_size [4] rpc_worker default is half CPU threads available (result of nproc) max_pool_size default is 5 max_overflow default is 50 executor_thread_pool_size is 64 Now imagine I have a server with 40 cores, So rpc_worker will be 20. Each worker will have a DB pool with 5+50 connections available. Each worker will use up to 64 "green" thread. The theorical max connection that I should set on my database is then: rpc_workers*(max_pool_size+max_overflow) = 20*(5+50) = 1100 Q1: am I right here? I have the feeling that this is huge. Now, let's assume each thread is consuming 1 connection from the DB pool. Under heavy load, I am affraid that the 64 threads could exceed the number of max_pool_size+max_overflow. Also, I noticed that some green threads were consuming more than 1 connection from the pool, so I can reach the max even sooner! Another thing, I notice that I have 21 RPC workers, not 20. Is it normal? [1] https://docs.openstack.org/neutron/latest/configuration/neutron.html#DEFAULT.rpc_workers [2] https://docs.openstack.org/neutron/latest/configuration/neutron.html#database.max_pool_size [3] https://docs.openstack.org/neutron/latest/configuration/neutron.html#database.max_overflow [4] https://docs.openstack.org/neutron/latest/configuration/neutron.html#DEFAULT.executor_thread_pool_size Cheers, Arnaud. From rafaelweingartner at gmail.com Fri Oct 21 15:40:28 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Fri, 21 Oct 2022 12:40:28 -0300 Subject: [telemetry][cloudkitty][ceilometer] Billing windows instances In-Reply-To: <07d23aac-fee9-cd11-bf25-5030dba2cf6c@debian.org> References: <07d23aac-fee9-cd11-bf25-5030dba2cf6c@debian.org> Message-ID: Hello Zigo!, You might want to take a look at the new implementations we made in ceilometer, and CloudKitty. - https://review.opendev.org/c/openstack/cloudkitty/+/861806 - https://review.opendev.org/c/openstack/ceilometer/+/856178 - https://review.opendev.org/c/openstack/ceilometer/+/852021 - https://review.opendev.org/c/openstack/ceilometer/+/850253 - https://review.opendev.org/c/openstack/ceilometer/+/855953 Not directly relate to this use case, but also might interest you: - https://review.opendev.org/c/openstack/cloudkitty/+/861786 - https://review.opendev.org/c/openstack/cloudkitty/+/861807 - https://review.opendev.org/c/openstack/cloudkitty/+/861908 - https://review.opendev.org/c/openstack/ceilometer/+/856972 - https://review.opendev.org/c/openstack/ceilometer/+/861109 - https://review.opendev.org/c/openstack/ceilometer/+/856304 - https://review.opendev.org/c/openstack/ceilometer/+/856305 In short, we can now create Ceilometer compute dynamic pollsters, which can execute scripts in the host, and check the actual operating system installed in the VM. Then, this data can be pushed back to the storage backend via Ceilometer as an attribute, which is then processed in CloudKitty. Furthermore, we extended cloudkitty to generate different ratings for the same metric. Therefore, by doing this, we do not need multiple metrics to have different CloudKitty ratings appearing for users. This allows us, for instance, to have one rating for the VM usage itself, and others for each license, and so on. On Fri, Oct 21, 2022 at 5:19 AM Thomas Goirand wrote: > Hi there! > > We're using telemetry+cloudkitty for our rating. We're happy of it, > though we're having the issue that we would like to bill instances > running Windows. > > The example given in the Cloudkitty doc shows how to bill more when an > instance runs Windows. That works in theory, though in practice, > Microsoft has a license model based on how many vCPU the instannce runs. > So the model that the Cloudkitty documentation shows simply doesn't work > with the SPLA thingy. > > We've looked at options. One way would of course writing a new custom > pollster, but we don't like the idea: this would mean polling for all of > the thousands of instances that are running in our deployments. So we > would very much prefer having ceilometer running on compute node (with > polling_namespaces=compute) to do the work, as this scales a way better. > > However, Ceilometer polls libvirt, which only has the information about > the image ID, not the metadata associated with the image (like the > property os_type=windows, for example). > > So, is there a better way than a dynamic pollster? Can this be done with > ceilometer on the compute nodes? > > Cheers, > > Thomas Goirand (zigo) > > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Fri Oct 21 17:30:55 2022 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Fri, 21 Oct 2022 17:30:55 +0000 Subject: [ptl][release][stable][EM] Extended Maintenance - Wallaby Message-ID: Hi, In less than a two weeks Wallaby is planned to transition to Extended Maintenance phase [1] (planned date: November 2nd, 2022). The release patches for the transition have been generated [2], these patches mark the latest release in stable/wallaby with wallaby-em. PTLs and release liaisons are encouraged to approve them as soon as possible, by latest before the transition date. After the transition stable/wallaby will be still open for bug fixes, but there won't be any official releases. Note: if a team wants to do a final release before the transition, then it can be done, but be careful to not release fixes that could break things to avoid broken releases as *final* stable Wallaby release. Thanks, El?d irc: elodilles @ #openstack-releases [1] https://releases.openstack.org/ [2] https://review.opendev.org/q/topic:wallaby-em -------------- next part -------------- An HTML attachment was scrubbed... URL: From helena at openstack.org Fri Oct 21 17:49:33 2022 From: helena at openstack.org (Helena Spease) Date: Fri, 21 Oct 2022 10:49:33 -0700 Subject: Track Chair Nominations Close in One Week! Message-ID: <8DEC8B1D-9D52-4622-B0C1-24626E791345@openstack.org> Hey everyone! Track Chair nominations for the 2023 OpenInfra Summit in Vancouver (June 13-15, 2023) are closing soon! Please submit your nominations before they close on October 28, 2022 Track Chairs for each Track will help build the Summit schedule, and are made up of individuals working in open infrastructure. Responsibilities include: - Help the Summit team put together the best possible content based on your subject matter expertise - Promote the individual Tracks within your networks - Review the submissions and Community voting results in your particular Track - Determine if there are any major content gaps in your Track, and if so, potentially solicit additional speakers directly to submit - Ensure diversity of speakers and companies represented in your Track - Avoid vendor sales pitches, focusing more on real-world user stories and technical, in-the-trenches experiences 2023 Summit Tracks: - 5G/NFV & Edge - AI/Machine Learning/HPC - CI/CD - Container Infrastructure - Getting Started - Hardware Enablement - Open Development - Private & Hybrid Cloud - Public Cloud - Security - Hands On Workshops Full track descriptions are available here . If you?re interested in nominating yourself or someone else to be a member of the Summit Track Chairs for a specific Track, please fill out the nomination form . Nominations will close on October 28th, 2022. Track Chairs selections will occur before we close the Call for Presentations (CFP) so that the Chairs can host office hours to consult on submissions, and help promote the event. CFP will be opening in November, registration and sponsorship information are already available. Please email speakersupport at openinfra.dev with any questions or feedback. Cheers, Helena Spease -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsmigiel at redhat.com Fri Oct 21 18:18:26 2022 From: dsmigiel at redhat.com (dsmigiel at redhat.com) Date: Fri, 21 Oct 2022 11:18:26 -0700 Subject: [tripleo] gate blocker centos 9 content provider - trending fixed In-Reply-To: References: Message-ID: <149019762ef4da63f2aae26abc179119e9f1c147.camel@redhat.com> On Fri, 2022-10-21 at 16:05 +0300, Marios Andreou wrote: > Hello > > FYI (few folks asking in irc apologies should have sent sooner) > > gate blocker on the content provider > https://bugs.launchpad.net/tripleo/+bug/1984237/comments/5 > > thanks chkumar we are trying to get a fix through gate at > https://review.opendev.org/c/openstack/tripleo-quickstart/+/856582 > > please hold your recheck until that merges or use depends-on > The change has been merged. All seems to be working again. Thanks, Dariusz From rajiv.mucheli at gmail.com Fri Oct 21 09:25:26 2022 From: rajiv.mucheli at gmail.com (rajiv mucheli) Date: Fri, 21 Oct 2022 14:55:26 +0530 Subject: Follow up on https://storyboard.openstack.org/#!/story/2010377 Message-ID: Hi, Unlike Openstack Yoga, where the setuptools version was set in upper-contstraints.txt. This is not set in Openstack Zed, which uses jammy with python 3.10. I get the below while performing few tests : - [[ -f /failure ]] - echo Wheel failed to build - cat /failure Wheel failed to build thrift===0.16.0 python-nss===1.0.1 We will need to stay with setuptools <58 to fix the 2to3 compiler issue, else anyjson packages adds up to the above list, but thrift is removed when setuptools >63. Regards, Rajiv -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaiswalaj716 at gmail.com Fri Oct 21 09:55:34 2022 From: jaiswalaj716 at gmail.com (Ayush Jaiswal) Date: Fri, 21 Oct 2022 15:25:34 +0530 Subject: Seeking help for Keystone Error while assign_role_to_user for a Project in Openstack SDK Message-ID: Hi Team, I am using the Openstack SDK in my django rest api project where I have to create a new project and assign various users with different roles to that project in openstack. However, while adding a user to a project I am getting the following error. [image: image.png] Complete details of the steps I followed that resulted in this error are in the text file attached with this email. Kindly help me resolve this issue. Your support is highly appreciated. -- Thanks and Regards, Ayush Jaiswal -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 47351 bytes Desc: not available URL: -------------- next part -------------- # clouds.yml file content START. Replace the credentials accordingly. # This is a clouds.yaml file, which can be used by OpenStack tools as a source # of configuration on how to connect to a cloud. If this is your only cloud, # just put this file in ~/.config/openstack/clouds.yaml and tools like # python-openstackclient will just work with no further config. (You will need # to add your password to the auth section) # If you have more than one cloud account, add the cloud entry to the clouds # section of your existing file and you can refer to them by name with # OS_CLOUD=openstack or --os-cloud=openstack clouds: openstack: auth: auth_url: http://192.168.36.129:5000 username: "admin" password: "admin" project_id: d525a3ec045045d3b430420458e9a990 project_name: "admin" user_domain_name: "Default" region_name: "RegionOne" interface: "public" identity_api_version: 3 # clouds.yml file content END. # Following are python shell commands to assign a user to the project # For importing connection object of openstack sdk to access openstack >>> from cloud_resources.connect import conn # To check if object "conn" is working properly >>> servers = conn.compute.servers() # If following shell command prints a list of server then connection is established successfully and object "conn" is working properly. # I performed the same check and it was working fine. >>> print(servers) # To fetch an existing Project with Project ID = 37ac4f1be68e4c95b666f0750f5efc8d. Replace with your Project ID. >>> project = conn.identity.get_project("37ac4f1be68e4c95b666f0750f5efc8d") # Following shell command prints the project's detail # I performed the same check and it was working fine. >>> print(project) # To fetch an existing User with User ID = e5b8d04148ea41c48028728cdc484497. Replace with your User ID. >>> user = conn.identity.get_user("e5b8d04148ea41c48028728cdc484497") # Following shell command prints the user's detail # I performed the same check and it was working fine. >>> print(user) # To fetch an existing Role (with which the project will be accessed by the user) with Role ID = 3e875c2689de4b9aa4bc38739578e7f5. Replace with your Role ID. # This is ID is of role Admin >>> role = conn.identity.get_role("3e875c2689de4b9aa4bc38739578e7f5") # Following shell command prints the role's detail # I performed the same check and it was working fine. >>> print(role) # Calling the openstack sdk built-in function to assign user to the project with the Admin role. >>> project.assign_role_to_user(conn.session, user, role) # Output of the above shell command Traceback (most recent call last): File "", line 1, in File "/opt/original/OpenYmir/venv/lib/python3.10/site-packages/openstack/identity/v3/project.py", line 69, in assign_role_to_user resp = session.put(url,) File "/opt/original/OpenYmir/venv/lib/python3.10/site-packages/keystoneauth1/session.py", line 1157, in put return self.request(url, 'PUT', **kwargs) File "/opt/original/OpenYmir/venv/lib/python3.10/site-packages/keystoneauth1/session.py", line 815, in request raise exceptions.EndpointNotFound() keystoneauth1.exceptions.catalog.EndpointNotFound: Could not find requested endpoint in Service Catalog. # Observation: With Google search it has been observed that this was an actual bug in the openstack sdk for keystoneauth1 (version < 3.2), but it was resolved afterwards. However, in our scenario this issue is still there even when keystoneauth1 is updated to version 5.0.0. # Pip Freeze output of current venv appdirs==1.4.4 asgiref==3.5.2 certifi==2022.9.24 cffi==1.15.1 charset-normalizer==2.1.1 cryptography==38.0.1 decorator==5.1.1 Django==4.1.2 django-cors-headers==3.13.0 djangorestframework==3.14.0 dogpile.cache==1.1.8 idna==3.4 iso8601==1.1.0 jmespath==1.0.1 jsonpatch==1.32 jsonpointer==2.3 keystoneauth1==5.0.0 munch==2.5.0 netifaces==0.11.0 openstacksdk==0.102.0 os-service-types==1.7.0 pbr==5.10.0 pycparser==2.21 pytz==2022.4 PyYAML==6.0 requests==2.28.1 requestsexceptions==1.4.0 six==1.16.0 sqlparse==0.4.3 stevedore==4.0.1 urllib3==1.26.12 From fungi at yuggoth.org Fri Oct 21 19:04:59 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 21 Oct 2022 19:04:59 +0000 Subject: Follow up on https://storyboard.openstack.org/#!/story/2010377 In-Reply-To: References: Message-ID: <20221021190459.r4migw3zlqehlwa5@yuggoth.org> [Keeping you in Cc because you don't seem to be subscribed, but please reply to the list address.] On 2022-10-21 14:55:26 +0530 (+0530), rajiv mucheli wrote: > Unlike Openstack Yoga, where the setuptools version was set in > upper-contstraints.txt. This is not set in Openstack Zed, which uses jammy > with python 3.10. [...] Zed was tested on Ubuntu Focal (20.04 LTS) not Jammy (22.04 LTS): https://governance.openstack.org/tc/reference/runtimes/zed.html -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From elod.illes at est.tech Fri Oct 21 20:05:26 2022 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Fri, 21 Oct 2022 20:05:26 +0000 Subject: [PTL][TC] library *feature* freeze at Milestone-2 In-Reply-To: References: <553fd064379209d0ceae91eb747c8d0656996321.camel@redhat.com> Message-ID: Thanks Stephen for the detailed summary and explanation of the situation, in this respect I agree with you that the library feature freeze date should not be on such early date in the cycle as Milestone-2. If anyone else has any opinion then please let us know. Thanks, El?d irc: elodilles @ #openstack-release ________________________________ From: Stephen Finucane Sent: Friday, October 21, 2022 12:48 PM To: El?d Ill?s ; openstack-discuss at lists.openstack.org Subject: Re: [PTL][TC] library *feature* freeze at Milestone-2 On Fri, 2022-10-21 at 11:42 +0100, Stephen Finucane wrote: > On Wed, 2022-10-19 at 17:10 +0000, El?d Ill?s wrote: > > Hi, > > > > During 'TC + Community leaders interaction' [1] a case was discussed, where a > > late library release caused last minute fire fighting in Zed cycle, and people > > discussed the possibility to introduce a (non-client) library *feature* freeze > > at Milestone-2 to avoid similar issues in the future. > > > > I've started to propose the possible schedule change [2] (note: it's not ready > > yet as it does not emphasize that at Milestone-2 we mean *feature* freeze for > > libraries, not "final library release"). The patch already got some reviews > > from library maintainers so I'm calling the attention to this change here on > > the ML. > > Repeating what I said on the reviews, I'd really rather not do this. There are a > couple of reasons for this. Firstly, regarding the proposal itself, this is > going to make my life as an oslo maintainer harder than it already is. This is a > crucial point. I'm not aware of anyone whose official job responsibilities > extend to oslo and it's very much a case of doing it because no one else is > doing it. We're a tiny team and pretty overwhelmed with multiple other non-oslo > $things and for me at least this means I tend to do oslo work (including > reviews) in spurts. Introducing a rather large window (6 weeks per cycle, which > is approximately 1/4 of the total available time in a cycle) during which we > can't merge the larger, harder to review feature patches is simply too long: > whatever context I would have built up before the freeze would be long-since > gone after a month and a half. > > Secondly, regarding the issue that led to this proposal, I don't think this > proposal would have actually helped. The patch that this proposal stems from was > actually merged back on July 20th [1]. This was technically after Zed M2 but > barely (5 days [2]). However, reports of issues didn't appear until September, > when this was released as oslo.db 12.1.0 [3][4]. If we had released 12.1.0 in > late July or early August, the issue would have been spotted far earlier, but as > noted above the oslo team is tiny and overwhelmed, and I would guess the release > team is in a similar boat (and can't be expected to know about all these > things). > > I also feel compelled to note that this didn't arrive out of the blue. I have > been shouting about SQLAlchemy 2.0 for over a year now [5] and I have also been > quite vocal about other oslo.db-related changes on their way [6][7]. For the > SQLAlchemy 2.0 case specifically, clearly not enough people have been listening. > I sympathise (again, tiny, overwhelmed teams are not an oslo-specific > phenomenon) but the pain was going to arrive eventually and it's just > unfortunate that it landed with an oslo.db release that was cut so close to the > deadline (see above). I manged to get nova, cinder and placement prepared well > ahead of time but it isn't sustainable for one person to do this for all > projects. Project teams need to prioritise this stuff ahead of time rather than > waiting until things are on fire. > > Finally, it's worth remembering that this isn't a regular occurence. Yes, there > was some pain, but we handled the issue pretty well (IMO) and affected projects > are now hopefully aware of the ticking tech debt bomb ? sitting in their > codebase. However, as far as I can tell, there's no trend of the oslo team (or > any other library project) introducing breaking changes like this so close to > release deadlines, so it does feel a bit like putting the cart before the horse. Oh, and one final point here: I didn't actually _know_ this was going to cause as many issue as it did. Perhaps there's value in an oslo-tips job that tests service projects against the HEAD of the various oslo libraries. However, that's a whole load of extra CI resources that we'd have to find resources for. Testing in oslo.db itself didn't and wouldn't catch this because all the affected projects were all projects that were not deployed by default in 'tempest-full- py3' job. Stephen > > To repeat myself from the top, I'd really rather not do this. If we wanted to > start cutting oslo releases faster, by all means let's figure out how to do > that. If we wanted to branch earlier and keep master moving, I'm onboard. > Preventing us from merging features for a combined ~3 months of the year is a > non-starter IMO though. > > Cheers, > Stephen > > > [1] https://review.opendev.org/c/openstack/oslo.db/+/804775 > [2] https://releases.openstack.org/zed/schedule.html > [3] https://review.opendev.org/c/openstack/releases/+/853975/ > [4] https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030317.html > [5] https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html > [6] https://lists.openstack.org/pipermail/openstack-discuss/2022-April/028197.html > [7] https://lists.openstack.org/pipermail/openstack-discuss/2022-April/028198.html > > > > > Thanks everyone for the responses in advance, > > > > El?d > > > > [1] > > https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030718.html > > [2] https://review.opendev.org/c/openstack/releases/+/861900 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhitman at groupw.com Fri Oct 21 20:11:56 2022 From: swhitman at groupw.com (Stuart Whitman) Date: Fri, 21 Oct 2022 20:11:56 +0000 Subject: [kolla-ansible] [yoga] [magnum] [k8s] cannot attach persistent volume to pod In-Reply-To: References: Message-ID: I fixed this by replacing the csi-cinder-controllerplugin and the csi-nodeplugin using the manifests found in this project: https://github.com/kubernetes/cloud-provider-openstack. I used kubectl to make the changes. Does anyone know if end-users can configure these kinds of changes when kolla-ansible installs magnum? Or when creating the cluster template? Thanks, -Stu ________________________________ From: Stuart Whitman Sent: Thursday, October 20, 2022 10:09 AM To: openstack-discuss at lists.openstack.org Subject: [kolla-ansible] [yoga] [magnum] [k8s] cannot attach persistent volume to pod Hello, When I try to attach a persistent cinder volume to a pod, I get FailedMount and FailedAttachVolume timeout events. I also get these errors in the log of the csi-cinder-controllerplugin-0 pod: E1020 13:38:41.747511 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.VolumeAttachment: the server could not find the requested resource E1020 13:38:41.748187 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSINode: the server could not find the requested resource ?I fixed a CrashLoopBackoff error with the csi-snapshotter container in the csi-cinder-controllerplugin-0 pod by providing the label "csi_snapshotter_tag=v4.0.0" when I created the cluster template. I found that suggestion in an issue on the GitHub cloud-provider-openstack project. I'm not finding any help with this error on Google. Thanks, -Stu _____________________________________ The information contained in this e-mail and any attachments from Group W may contain confidential and/or proprietary information and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, be aware that any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately of that fact by return e-mail and permanently delete the e-mail and any attachments to it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fsbiz at yahoo.com Fri Oct 21 22:49:56 2022 From: fsbiz at yahoo.com (Farhad Sunavala) Date: Fri, 21 Oct 2022 22:49:56 +0000 (UTC) Subject: [neutron]: DVR with OVN References: <2109452471.1083312.1666392596041.ref@mail.yahoo.com> Message-ID: <2109452471.1083312.1666392596041@mail.yahoo.com> Hi, Just want to get a feel with how users are using DVR with OVN especially for a medium sized installation (500 nodes, 3000-4000 VMs)Are you just using it in the default configuration with all the complexities and just take the hit when troubleshooting why a particular path doesn't work as expected? Are you taking precautions such as providing VMs with a separate direct path to the public network in case the main interface connected to DVR fails?See OpenInfra 17th minute of Live Ep. 26 Large Scale Neutron Best Practices @?(340) OpenInfra Live Ep. 26: Large Scale OpenStack: Neutron Scaling Best Practices - YouTube | | | | | | | | | | | OpenInfra Live Ep. 26: Large Scale OpenStack: Neutron Scaling Best Pract... | | | Is DVR with OVN reasonably solid for production environments that makes you comfortable enough to use it in installations with around 5000 VMs? thanks,Fred. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Sat Oct 22 00:37:34 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 21 Oct 2022 17:37:34 -0700 Subject: [PTL][TC] library *feature* freeze at Milestone-2 In-Reply-To: References: <553fd064379209d0ceae91eb747c8d0656996321.camel@redhat.com> Message-ID: <183fd1faa4b.11531c77092877.4153806712360929468@ghanshyammann.com> ---- On Fri, 21 Oct 2022 13:05:26 -0700 El?d Ill?s wrote --- > div.zm_-846580390376284124_parse_7905949415713004498 P { margin-top: 0; margin-bottom: 0 }Thanks Stephen for the detailed summary and explanation of the situation,in this respect I agree with you that the library feature freeze date shouldnot be on such early date in the cycle as Milestone-2. > If anyone else has any opinion then please let us know. You are right, I think Stephen, Herve points from the Oslo maintainer's perspective are all valid and I agree now not to change the feature freeze timeline instead I will work on testing side to make sure we have enough testing with the master code in Oslo as well as on project gate and some cross-project testing. -gmann > Thanks, > El?dirc: elodilles @ #openstack-release > > From: Stephen Finucane stephenfin at redhat.com> > Sent: Friday, October 21, 2022 12:48 PM > To: El?d Ill?s elod.illes at est.tech>; openstack-discuss at lists.openstack.org openstack-discuss at lists.openstack.org> > Subject: Re: [PTL][TC] library *feature* freeze at Milestone-2?On Fri, 2022-10-21 at 11:42 +0100, Stephen Finucane wrote: > > On Wed, 2022-10-19 at 17:10 +0000, El?d Ill?s wrote: > > > Hi, > > > > > > During 'TC + Community leaders interaction' [1] a case was discussed, where a > > > late library release caused last minute fire fighting in Zed cycle, and people > > > discussed the possibility to introduce a (non-client) library *feature* freeze > > > at Milestone-2 to avoid similar issues in the future. > > > > > > I've started to propose the possible schedule change [2] (note: it's not ready > > > yet as it does not emphasize that at Milestone-2 we mean *feature* freeze for > > > libraries, not "final library release"). The patch already got some reviews > > > from library maintainers so I'm calling the attention to this change here on > > > the ML. > > > > Repeating what I said on the reviews, I'd really rather not do this. There are a > > couple of reasons for this. Firstly, regarding the proposal itself, this is > > going to make my life as an oslo maintainer harder than it already is. This is a > > crucial point. I'm not aware of anyone whose official job responsibilities > > extend to oslo and it's very much a case of doing it because no one else is > > doing it. We're a tiny team and pretty overwhelmed with multiple other non-oslo > > $things and for me at least this means I tend to do oslo work (including > > reviews) in spurts. Introducing a rather large window (6 weeks per cycle, which > > is approximately 1/4 of the total available time in a cycle) during which we > > can't merge the larger, harder to review feature patches is simply too long: > > whatever context I would have built up before the freeze would be long-since > > gone after a month and a half. > > > > Secondly, regarding the issue that led to this proposal, I don't think this > > proposal would have actually helped. The patch that this proposal stems from was > > actually merged back on July 20th [1]. This was technically after Zed M2 but > > barely (5 days [2]). However, reports of issues didn't appear until September, > > when this was released as oslo.db 12.1.0 [3][4]. If we had released 12.1.0 in > > late July or early August, the issue would have been spotted far earlier, but as > > noted above the oslo team is tiny and overwhelmed, and I would guess the release > > team is in a similar boat (and can't be expected to know about all these > > things). > > > > I also feel compelled to note that this didn't arrive out of the blue. I have > > been shouting about SQLAlchemy 2.0 for over a year now [5] and I have also been > > quite vocal about other oslo.db-related changes on their way [6][7]. For the > > SQLAlchemy 2.0 case specifically, clearly not enough people have been listening. > > I sympathise (again, tiny, overwhelmed teams are not an oslo-specific > > phenomenon) but the pain was going to arrive eventually and it's just > > unfortunate that it landed with an oslo.db release that was cut so close to the > > deadline (see above). I manged to get nova, cinder and placement prepared well > > ahead of time but it isn't sustainable for one person to do this for all > > projects. Project teams need to prioritise this stuff ahead of time rather than > > waiting until things are on fire. > > > > Finally, it's worth remembering that this isn't a regular occurence. Yes, there > > was some pain, but we handled the issue pretty well (IMO) and affected projects > > are now hopefully aware of the ticking tech debt bomb ? sitting in their > > codebase. However, as far as I can tell, there's no trend of the oslo team (or > > any other library project) introducing breaking changes like this so close to > > release deadlines, so it does feel a bit like putting the cart before the horse. > > Oh, and one final point here: I didn't actually _know_ this was going to cause > as many issue as it did. Perhaps there's value in an oslo-tips job that tests > service projects against the HEAD of the various oslo libraries. However, that's > a whole load of extra CI resources that we'd have to find resources for. Testing > in oslo.db itself didn't and wouldn't catch this because all the affected > projects were all projects that were not deployed by default in 'tempest-full- > py3' job. > > Stephen > > > > > To repeat myself from the top, I'd really rather not do this. If we wanted to > > start cutting oslo releases faster, by all means let's figure out how to do > > that. If we wanted to branch earlier and keep master moving, I'm onboard. > > Preventing us from merging features for a combined ~3 months of the year is a > > non-starter IMO though. > > > > Cheers, > > Stephen > > > > > > [1] https://review.opendev.org/c/openstack/oslo.db/+/804775 > > [2] https://releases.openstack.org/zed/schedule.html > > [3] https://review.opendev.org/c/openstack/releases/+/853975/ > > [4] https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030317.html > > [5] https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html > > [6] https://lists.openstack.org/pipermail/openstack-discuss/2022-April/028197.html > > [7] https://lists.openstack.org/pipermail/openstack-discuss/2022-April/028198.html > > > > > > > > Thanks everyone for the responses in advance, > > > > > > El?d > > > > > > [1] > > > https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030718.html > > > [2] https://review.opendev.org/c/openstack/releases/+/861900 > > > > From rajiv.mucheli at gmail.com Sat Oct 22 01:08:34 2022 From: rajiv.mucheli at gmail.com (rajiv mucheli) Date: Sat, 22 Oct 2022 06:38:34 +0530 Subject: Follow up on https://storyboard.openstack.org/#!/story/2010377 In-Reply-To: <20221021190459.r4migw3zlqehlwa5@yuggoth.org> References: <20221021190459.r4migw3zlqehlwa5@yuggoth.org> Message-ID: Thanks for the quick reply, when I tried with focal I got this error message: E: The repository 'http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/zed Release' does not have a Release file. I found openstack zed distro only in Jammy release file : http://ubuntu-cloud.archive.canonical.com/ubuntu/dists/jammy-updates/ Also openstack kolla repo readme suggested Jammy. On Sat, 22 Oct 2022 at 12:35 AM, Jeremy Stanley wrote: > [Keeping you in Cc because you don't seem to be subscribed, but > please reply to the list address.] > > On 2022-10-21 14:55:26 +0530 (+0530), rajiv mucheli wrote: > > Unlike Openstack Yoga, where the setuptools version was set in > > upper-contstraints.txt. This is not set in Openstack Zed, which uses > jammy > > with python 3.10. > [...] > > Zed was tested on Ubuntu Focal (20.04 LTS) not Jammy (22.04 LTS): > > https://governance.openstack.org/tc/reference/runtimes/zed.html > > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Sat Oct 22 12:49:29 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sat, 22 Oct 2022 12:49:29 +0000 Subject: [kolla][requirements][packaging-sig] Follow up on https://storyboard.openstack.org/#!/story/2010377 In-Reply-To: References: <20221021190459.r4migw3zlqehlwa5@yuggoth.org> Message-ID: <20221022124927.7uwjdcrixqolvqgk@yuggoth.org> On 2022-10-22 06:38:34 +0530 (+0530), rajiv mucheli wrote: > Thanks for the quick reply, when I tried with focal I got this error > message: > > E: The repository 'http://ubuntu-cloud.archive.canonical.com/ubuntu > focal-updates/zed Release' does not have a Release file. > > I found openstack zed distro only in Jammy release file : > http://ubuntu-cloud.archive.canonical.com/ubuntu/dists/jammy-updates/ I expect those are for the Ubuntu OpenStack distribution, which is not maintained by the upstream OpenStack community, but you can find its documentation here: https://ubuntu.com/openstack Ubuntu OpenStack doesn't necessarily always deploy on the same versions of their distribution as we use to test the software (in this case, Jammy and did not exist yet when we started making Zed), so I've tagged the Packaging SIG in the subject line since some of its members may be involved in that effort so could have relevant recommendations for you. > Also openstack kolla repo readme suggested Jammy. I've tagged the Kolla team in the subject line since it sounds like you may be trying to use it, and their installation recommendations may not align with our upstream testing standards in the rest of the OpenStack community. I also tagged the Requirements team in the subject line since the bug report you referenced is about the constraints file we use for upstream testing of OpenStack software. For a more direct answer though, the openstack/requirements repository is a tool we use in testing OpenStack software in order to confirm that changes to it work with the specific distributions and Python versions in the tested runtimes list I linked from my earlier reply. It may not be a useful tool for other situations like installing on newer distributions or with newer Python interpreters, as you've observed. The upstream OpenStack community is currently working on its 2023.1 release (Antelope), which is targeting the versions you seem to be interested in: https://governance.openstack.org/tc/reference/runtimes/2023.1.html Hope that helps! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From rajiv.mucheli at gmail.com Sat Oct 22 13:20:53 2022 From: rajiv.mucheli at gmail.com (rajiv mucheli) Date: Sat, 22 Oct 2022 18:50:53 +0530 Subject: [kolla][requirements][packaging-sig] Follow up on https://storyboard.openstack.org/#!/story/2010377 In-Reply-To: <20221022124927.7uwjdcrixqolvqgk@yuggoth.org> References: <20221021190459.r4migw3zlqehlwa5@yuggoth.org> <20221022124927.7uwjdcrixqolvqgk@yuggoth.org> Message-ID: Hi Jeremy, Thanks for all the information and excellent advice. Last request, could you share the exact link or command to download openstack zed ? the below links direct me to canonical https://ubuntu.com/openstack/install https://wiki.ubuntu.com/OpenStack/CloudArchive Regards, Rajiv On Sat, Oct 22, 2022 at 6:19 PM Jeremy Stanley wrote: > On 2022-10-22 06:38:34 +0530 (+0530), rajiv mucheli wrote: > > Thanks for the quick reply, when I tried with focal I got this error > > message: > > > > E: The repository 'http://ubuntu-cloud.archive.canonical.com/ubuntu > > focal-updates/zed Release' does not have a Release file. > > > > I found openstack zed distro only in Jammy release file : > > http://ubuntu-cloud.archive.canonical.com/ubuntu/dists/jammy-updates/ > > I expect those are for the Ubuntu OpenStack distribution, which is > not maintained by the upstream OpenStack community, but you can find > its documentation here: https://ubuntu.com/openstack > > Ubuntu OpenStack doesn't necessarily always deploy on the same > versions of their distribution as we use to test the software (in > this case, Jammy and did not exist yet when we started making Zed), > so I've tagged the Packaging SIG in the subject line since some of > its members may be involved in that effort so could have relevant > recommendations for you. > > > Also openstack kolla repo readme suggested Jammy. > > I've tagged the Kolla team in the subject line since it sounds like > you may be trying to use it, and their installation recommendations > may not align with our upstream testing standards in the rest of the > OpenStack community. I also tagged the Requirements team in the > subject line since the bug report you referenced is about the > constraints file we use for upstream testing of OpenStack software. > > For a more direct answer though, the openstack/requirements > repository is a tool we use in testing OpenStack software in order > to confirm that changes to it work with the specific distributions > and Python versions in the tested runtimes list I linked from my > earlier reply. It may not be a useful tool for other situations like > installing on newer distributions or with newer Python interpreters, > as you've observed. The upstream OpenStack community is currently > working on its 2023.1 release (Antelope), which is targeting the > versions you seem to be interested in: > https://governance.openstack.org/tc/reference/runtimes/2023.1.html > > Hope that helps! > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Sat Oct 22 14:53:22 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sat, 22 Oct 2022 14:53:22 +0000 Subject: [kolla][requirements][packaging-sig] Follow up on https://storyboard.openstack.org/#!/story/2010377 In-Reply-To: References: <20221021190459.r4migw3zlqehlwa5@yuggoth.org> <20221022124927.7uwjdcrixqolvqgk@yuggoth.org> Message-ID: <20221022145321.ru33ojzb7otcr3yc@yuggoth.org> On 2022-10-22 18:50:53 +0530 (+0530), rajiv mucheli wrote: [...] > could you share the exact link or command to download openstack > zed ? [...] OpenStack isn't quite a singular thing, it's a modular suite of components developed collaboratively by a community of contributors, released in coordination with each other twice yearly. There are a variety of ways to get and install OpenStack services. For example, the community produces multiple deployment and lifecycle management solutions which are listed here: https://www.openstack.org/software/project-navigator/deployment-tools Alternatively, there are pre-built distributions (both free and commercial), many of which are collected in our Marketplace: https://www.openstack.org/marketplace/distros/ All of those options provide OpenStack in one way or another, but some of them are better for some situations, others for other use cases. It will really depend on what you're looking to do as far as which one is right for you or your organization. You might also take a look at the upstream OpenStack Zed Installation Guides: https://docs.openstack.org/zed/install/ And if you're looking to quickly install a non-production deployment of services from source in order to develop and test patches to the software, DevStack is the primary wrapper we use for that: https://docs.openstack.org/devstack/ Though if that's your goal, it's probably better to start from the OpenStack Contributor Guide since there's a lot of related topics you'll need some familiarity with first: https://docs.openstack.org/contributors/ -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From moreira.belmiro.email.lists at gmail.com Sat Oct 22 16:37:15 2022 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Sat, 22 Oct 2022 18:37:15 +0200 Subject: [large-scale][oslo.messaging] RPC workers and db connection In-Reply-To: References: Message-ID: Hi, having the DB "max connections" ~ 1000 is not unreasonable and I have been doing it since long ago. This is also related to the number of nodes running the services. For example in Nova, related to the number of nodes running APIs, conductors, schedulers... cheers, Belmiro On Fri, Oct 21, 2022 at 5:07 PM Arnaud Morin wrote: > Hey all, > > TLDR: How can I fine tune the number of DB connection on OpenStack > services? > > > Long story, with some inline questions: > > I am trying to figure out the maximum number of db connection we should > allow on our db cluster. > > For this short speech, I will use neutron RPC as example service, but I > think nova is acting similar. > > So, to do so, I identified few parameters that I can tweak: > rpc_workers [1] > max_pool_size [2] > max_overflow [3] > executor_thread_pool_size [4] > > > rpc_worker default is half CPU threads available (result of nproc) > max_pool_size default is 5 > max_overflow default is 50 > executor_thread_pool_size is 64 > > Now imagine I have a server with 40 cores, > So rpc_worker will be 20. > Each worker will have a DB pool with 5+50 connections available. > Each worker will use up to 64 "green" thread. > > The theorical max connection that I should set on my database is then: > rpc_workers*(max_pool_size+max_overflow) = 20*(5+50) = 1100 > > Q1: am I right here? > I have the feeling that this is huge. > > Now, let's assume each thread is consuming 1 connection from the DB pool. > Under heavy load, I am affraid that the 64 threads could exceed the > number of max_pool_size+max_overflow. > > Also, I noticed that some green threads were consuming more than 1 > connection from the pool, so I can reach the max even sooner! > > Another thing, I notice that I have 21 RPC workers, not 20. Is it > normal? > > > [1] > https://docs.openstack.org/neutron/latest/configuration/neutron.html#DEFAULT.rpc_workers > [2] > https://docs.openstack.org/neutron/latest/configuration/neutron.html#database.max_pool_size > [3] > https://docs.openstack.org/neutron/latest/configuration/neutron.html#database.max_overflow > [4] > https://docs.openstack.org/neutron/latest/configuration/neutron.html#DEFAULT.executor_thread_pool_size > > Cheers, > > Arnaud. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Sat Oct 22 22:55:07 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Sat, 22 Oct 2022 23:55:07 +0100 Subject: [kolla-ansible][Yoga] Deployment stuck Message-ID: Hi, I am trying to deploy a new platform using kolla-ansible Yoga and I am trying to upgrade another platform from Xena to yoga. On both platforms the prechecks went well, but when I start the process of deployment for the first and upgrade for the second, the process gets stuck. I tried to tail -f /var/log/kolla/*/*.log but I can't get hold of the cause. In the first platform, some services get deployed, and at some point the script gets stuck, several times in the modprobe phase. In the second platform, the upgrade gets stuck on : Escalation succeeded [204/1859] <20.3.0.28> (0, b'\n{"path": "/etc/kolla/cron", "changed": false, "diff": {"before": {"path": "/etc/kolla/cro n"}, "after": {"path": "/etc/kolla/cron"}}, "uid": 0, "gid": 0, "owner": "root", "group": "root", "mode": "07 70", "state": "directory", "secontext": "unconfined_u:object_r:etc_t:s0", "size": 70, "invocation": {"module_ args": {"path": "/etc/kolla/cron", "owner": "root", "group": "root", "mode": "0770", "recurse": false, "force ": false, "follow": true, "modification_time_format": "%Y%m%d%H%M.%S", "access_time_format": "%Y%m%d%H%M.%S", "unsafe_writes": false, "state": "directory", "_original_basename": null, "_diff_peek": null, "src": null, " modification_time": null, "access_time": null, "seuser": null, "serole": null, "selevel": null, "setype": nul l, "attributes": null}}}\n', b'') ok: [20.3.0.28] => (item={'key': 'cron', 'value': {'container_name': 'cron', 'group': 'cron', 'enabled': True , 'image': '20.3.0.34:4000/openstack.kolla/centos-source-cron:yoga', 'environment': {'DUMMY_ENVIRONMENT': 'ko lla_useless_env', 'KOLLA_LOGROTATE_SCHEDULE': 'daily'}, 'volumes': ['/etc/kolla/cron/:/var/lib/kolla/config_f iles/:ro', '/etc/localtime:/etc/localtime:ro', '', 'kolla_logs:/var/log/kolla/'], 'dimensions': {}}}) => { "ansible_loop_var": "item", "changed": false, "diff": { "after": { "path": "/etc/kolla/cron" }, "before": { "path": "/etc/kolla/cron" } }, "gid": 0, "group": "root", How to start debugging the situation. Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Sun Oct 23 00:50:12 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 22 Oct 2022 17:50:12 -0700 Subject: [tc][all][ptg] "Technical Committee + Leaders interaction" 2023.1 cycle vPTG sessions summary Message-ID: <18402519864.c7eafeeb113938.2211712903410420339@ghanshyammann.com> Hello Everyone, I am writing the Technical Committee + Leader interaction session discussion summary that happened in the 2023.1 Antelope cycle PTG this week. We continue the interaction session in this PTG also. The main idea here is to interact with community leaders and ask for their feedback on TC. I am happy to see ~30 attendance and had a good amount of discussions. Below are the topics we discussed in this feedback session. Updates from TC: ============= * Conveying current selected Community-wide goals. * Avoid Bare recheck * TC PTG slots: 15-19 UTC on Thursday and Friday * Status on previous PTG interaction sessions' Action items/feedback: ** PTLs/Leaders to start spreading the word and monitor ignorant recheck in their weekly meeting Status: Good progress in this, many projects started monitoring it and slaweq continues contacting the PTL for this. But we still encourage PTLs/Leaders to publish the blog/video of their stories/project updates etc at least once in a cycle. You can check the places where to publish such blogs/videos [1]. ** Recognize the contribution: Status: Not done, we are continuing this AI and will work on it this cycle. ** Bare Recheck: Status: Done. TC is monitoring the bare recheck in the weekly meetings and slaweq reaching out to projects to do the same in their weekly meeting. Feedback on TC activities or anything Community Leaders would like TC to spend more time on: ========================================================================= * Continue the TC weekly summary email which is helpful * TC to build a review process that would happen periodically to check the project status. Basically to ensure the project status as per "Emerging Technology and Inactive projects"[2]. JayF volunteered to work on this to write some documentation or resolution. ** Action Item: 1. JayF to write a resolution on criteria/flag to identify/action on the inactive projects. Renovate translation SIG i18: ====================== Ian and Seongsoo joined the discussion to provide the current state and help needed in i18 SIG. We collected some information on translation usage, language, and content(GUI, doc, log etc). Motoki updated information from the Japan group but still, needs more information from other groups/translation users and agrees to add some queries in the 2023 user survey. Also, Brian will investigate the possibility of migration to weblate tool. * Action Item: 1. rosmaita: draft questions for the survey, get feedback from ian and seongsoo 2. rosmaita: follow up with weblate service to determine whether OpenStack qualifies for gratis service 3. Ian to reach out to contributors/users to know the translation requirements for applicable language. Possibilities to avoid RC time issues with Oslo release: ======================================== After discussing the issue that occurred for Sqlalchemy 2.0 related merge in the zed cycle, we discuss one option to propose the non-client feature freezes to m-2 and get more feedback from Oslo or other non-client lib maintainers. Also, we encourage adding the lib forward testing which tests the integration gate with the master version of lib which can help to find the issues while the code merges itself. As discussed Elod proposed the review to get more feedback[3] and also email it to ML[4]. After getting feedback from Oslo maintainer it seems it is not a good idea to preponed the lib feature freeze. Discussion is going on in ML[4]. Clean up Zuul config errors: ===================== At the end of the session, we discussed the current zuul config error in OpenStack projects. TC is actively working on this and knikolla and other members trying to fix them as much as possible. Please check your projects and fix them, priority is to fix the master and supported branches, not the EM state branches. [1] https://docs.openstack.org/project-team-guide/spread-the-word.html [2] https://governance.openstack.org/tc/reference/emerging-technology-and-inactive-projects.html [3] https://review.opendev.org/c/openstack/releases/+/861900 [4] https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030914.html -gmann From gmann at ghanshyammann.com Sun Oct 23 00:58:40 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 22 Oct 2022 17:58:40 -0700 Subject: [tc][all][ Technical Committee 2023.1 Antelope cycle Virtual PTG discussions summary Message-ID: <18402595811.bd2fe2f0113995.2547015532286384322@ghanshyammann.com> Hello Everyone, I am summarizing the Technical Committee discussion that happened in the 2023.1 Antelope cycle PTG last week. Also, in case anyone wanted to search for all the projects etherpad later after ptgbot cleans them, I have added those to this wiki[1] Attendance: ========= * ~30 on Monday, ~25 on Thursday, ~20 on Friday Improvement in project governance: =========================== We continued the discussion from Monday and discussed a few high-level criteria which can be the potential criteria to monitor inactive projects. Along with contributors, event, and meeting occurrence is one of them but not all. JayF will be working on this to list the exact criteria/things TC can monitor to find and work on help inactive projects. User Survey: ========== * 2021 user Survey result analysis: We started with the pending review of the 2021 survey analysis[2]. There wasn't 2020 analysis so, the comparison was done using 2019. Overall, the trends in the survey were positive. We discussed adding the question about UI participants especially if users need new features to be implemented in UI then what prevents them to help and contribute? * 2022 User Survey result: knikolla and I will be working on the analysis of this survey result. jungleboyj will help on documenting the process and tips to analyze the survey xls which is not an easy task. Thanks to jungleboyj for doing the survey analysis and helping with the 2022 survey too. * 2023 User Survey question review: While doing the 2021 survey analysis, we discussed the below changes in 2023 survey. ** Remove "Other ways users participate:" ** Add: 1. how users are consuming OS. From pypi packages? From RHOSP? etc. 2. How are users interacting with OpenStack? Through Horizon? CLI? Skyline? An internally developed tool? etc. 3. Adding a sub-question about UI development help in "To which projects does your organization contribute maintenance resources such as patches for bugs and reviews on master or stable branches?" 4. Asking about what all OSS/software/services are used in your OpenStack provisioned cloud/resource. * Action: As the next step, we will work on the exact wording for the above questions and send it to Allison before the end of Oct month. Next step for OSC work ================== We started the discussion on sdk/osc should continue supporting the old interfaces even if we are not able to test them properly. Anyone running a very old cloud should be able to use the sdk/osc latest. We agree and Artem mentioned being more careful about removing the things. Next, we discussed where we stand on OSC work and direction from the project side and good progress from nova, neutron, and manila (might be a few more) but at the same time, many other projects need work on fixing the feature gaps. Glance is also in agreement to work towards OSC. There was a question on when and how we should work towards deprecating/removing the project's python client. The best way is to start implementing the new features in OSC and try their best to fill the old feature gap. Based on the current state, we agree to start this work as one of the community-wide and Artem volunteers to help here. * Action: ** Propose this as a community-wide goal "focus on fixing the feature gaps in osc not to remove the pythonclient which can be another goal" ** Look at devstack's use of OSC to see if we can enable token caching and perhaps use a smaller venv for performance Consistent and Secure Default RBAC ============================ As a good amount of the services are ready with the project personas, We will be testing them in the integrated gate (tempest and tempest plugins jobs), and based on the results we will enable the scope to enforce and new defaults enforce by default in this cycle. Also, services can start implementing phase 2 of the goal which is the service-to-service role. On Horizon question of moving to new defaults. because of the explicit policy files (with defaults) on the horizon side, the operator will not be able to just switch the flag until they change the policy file. For a better migration path and getting one more cycle to test the new defaults by default on the service side, it is better for Horizon to move to new defaults in the next cycle. FIPs goal ======= Ade lee provided the current status[3] of the goal which is good progress. There is work going on having the ubuntu based FIPs enabled job[4] and that will be to achieve the goal of having of voting job in the check/gate pipeline. Migrate CI/CD jobs to Ubuntu 22.04 (Jammy Jellyfish) ========================================= Projects are testing the jobs on Jammy but not all[5], we encourage the projects to start testing your project jobs before the deadline which is m-1 (Nov 18), and fix the issues if there are any. One known issue is Ceph job failing on Jammy as there aren't jammy upstream packages. There are packages in UCA and Ubuntu proper that could be used instead. The community-wide goal process in a world with few contributors =================================================== This topic is on how we can improve the community-wide goal progress and productivity but at the same time reduce the overhead on projects. Also, to highlight that the Champion of a goal is not required to do all the work and it is up to the champion. knikolla will work on the goal documentation to try goal overhead minimum. We will be more careful with goal selection to make sure if any work can be done without a community-wide goal then we should proceed that way. * Action: knikolla: see opportunities for rewording the community goal document. mention: striving to keep goals to a minimum to reduce overhead on teams and try to build consensus first. Pop-Up Team checks ================ We have two active popup teams, 1. Image Encryption 2. Policy(RBAC). Both have pending work to finish so will continue in this cycle also. Support policy for 32-bit platforms support ================================= Zigo is testing the OpenStack on 32-bit and filed bugs[6]. As we do not test it in the upstream CI/CD, we will not be committing to its support. But it is completely ok to fix the bug or add a skip in tests. Thanks to zigo for testing and reporting the bugs. Fixing docs build issues with recent Sphinx ================================ Sphinx is capped at 4.5.0 in upper constraints for a long time and whether we should move our doc to 6.0 or not is the question. Moving to 6.0 needs more work in openstackdocstheme. Until we have someone fixing things, it is ok to continue on Sphinx 4.5.0. Election retrospective: ================= We had a good amount of discussion on elections and many ideas to improve them in the future. Also by discussing the k8s election process with k8s steering committee members. We also discussed TC Chair election process to record the nomination in a better way. This is still not concluded and we continue the discussion in TC. Overall it was a productive discussion and we came up with below action items: *Action: 1. Extend the nomination and voting period to two weeks and also communicate the election well in advance. 2. Along with existing projects/SIG repos, add a few more related repos (governance, etc) in election tooling to count the electorate. 3. Add in the election process: call for AC on openstack-discuss or any other ML before the deadline. 4. Appoint a TC liaison as a point of contact to track the election dates more carefully and ICAL for election tasks will be a good idea to send/add. 5. Update the TC charter to mention the election period, and deadline explicitly. Cross-community sessions with k8s steering committee team: =============================================== We invited Kubernetes Steering Committee members to TC sessions. Tim and Christoph joined the sessions. This is a great way to collaborate between two communities. Following the introduction from both sides, we discussed various topics and share the process, challenges, and feedback from both sides. The election process, contributor recruiting, and how to engage them as long-term contributors. Also having part-time or non-corporate contributors is still a challenge for both communities. We also discussed the operator engagement challenges OpenStack is facing. Kubernetes operator thing is a bit complicated and they have app dev, cluster, app, and infra operator. Similar to OpenInfra foundation to the OpenStack community, CNCF foundation plays an important role for them to connect the operator with the community as much as possible. Discuss and clarify the supported upgrade-path testing in PTI: =============================================== To provide a better upgrade path, we decided to test the old distro version also whenever we will bump the distro version in our CI/CD. Below are details of the upgrade testing we will be following: Agree: * Supporting two distro versions when we bump the new distro version in any release: (this is only for the release bump of the distro and after that, we can go back to a single distro version testing) ** Run single tempest job in project gate on old distro version, not all jobs ** Add the previous python version unit test job in the new release testing template. * For non-SLURP releases, we will try not to change the testing runtime unless it is very much required due to the EOL of versions, we are using in testing. * In case the project has to add some feature that needs a new version of deps which is not supported in the old distro version then they need to be explicit about it and communicate in a better way. * Action: I will document the above in PTI. Guidelines for using the OpenStack release version/name and project version in 1. releasenotes 2. Documentation ===================================================================================== In Zed cycle, TC passed a resolution[7] and also prepared the guidelines[8] on using the release number as a primary identifier. During nova PTG, there was a question on what is the recommendation for using OpenStack version and package version (say Nova 27.0.0) in project documentation, releasenotes etc. After discussing the multiple options listed in etherpad we agree to go for the below: * OpenStack ( ) Example: OpenStack 2023.1 (Nova 27.0.0) I will add this to the release identifier page[8] so that all projects can use it consistently. Discussion on projects (like neutron, ceilometer ) in Upper Constraints: ===================================================== Having projects in u-c makes it difficult for users to have a consistent deployment of those services. But we do not recommend using the u-c in production and it can be explicitly mentioned in the requirement and project-team-guide document. * Action: tonyb to document the upper constraints usage expectation (especially for production usage) in the requirement document as well as in project-team-guide[9]. Zed Retrospective: ============== In the end, we discussed the Zed cycle retrospective. * What went well? ** Good amount of work in the zed cycle[10] ** New TC members ** Good participation in meetings especially video call ** TC & Community Engagement (leaders interaction) improving *** i18 SIG team having there helped to proceed with i18 SIG work One thing to improve next time is to explicitly call out the team/members with a courtesy ping for future PTG if any related discussion. We do send the agenda on ML in advance but no harm in ping also. Meeting time check: =============== TC weekly Video calls are more productive compared to text meetings and we will do two video calls in a month. We will also start a poll to select the meeting time. 2023.1 cycle TC Tracker ================== I prepared the TC tracker for 2023.1 and listed all the actionable working items that came up during the PTG discussion. This is helpful for tracking the working items. - https://etherpad.opendev.org/p/tc-2023.1-tracker Thank you for reading the summary or I will say detailed summary :), have a nice weekend everyone. [1] https://wiki.openstack.org/wiki/PTG/2023.1/Etherpads [2] https://review.opendev.org/c/openstack/governance/+/836888 [3] https://etherpad.opendev.org/p/fips_goal_status [4] https://review.opendev.org/c/openstack/project-config/+/861457/ [5] https://etherpad.opendev.org/p/migrate-to-jammy [6] https://bugs.launchpad.net/glance-store/+bug/1991406 [7] https://governance.openstack.org/tc/resolutions/20220524-release-identification-process.html [8] https://governance.openstack.org/tc/reference/release-naming.html [9] https://docs.openstack.org/project-team-guide/dependency-management.html [10] https://etherpad.opendev.org/p/tc-zed-tracker -gmann From gmann at ghanshyammann.com Sun Oct 23 02:18:42 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 22 Oct 2022 19:18:42 -0700 Subject: [tc][all][policy][rbac] RBAC 2023.1 Antelope cycle Virtual PTG discussions summary Message-ID: <18402a29dc9.11b45df1e114395.6556261416973730381@ghanshyammann.com> Hello Everyone, Last week, I tried to attend all the RBAC-related sessions on various projects. We mainly discussed the next steps on RBAC and implementing the project personas if not yet done. I captured the summary and outcome of the discussions of those projects in the below etherpad: Look for the section: "RBAC/Policy 2023.1 Antelope PTG summary". If I have missed any project discussion, please add it to the etherpad. - https://etherpad.opendev.org/p/rbac-goal-tracking One of the big outcomes, we agreed on all the projects (implemented phase-1) to enable the scope checks and new defaults by default. But before doing that, we need to test the new defaults and scope in Tempest (tempest plugin) integrated gate. I have started working on the job enabling the scope and new defaults[1]. [1] https://review.opendev.org/c/openstack/tempest/+/614484 -gmann From p.aminian.server at gmail.com Sun Oct 23 10:25:59 2022 From: p.aminian.server at gmail.com (Parsa Aminian) Date: Sun, 23 Oct 2022 13:55:59 +0330 Subject: [kolla] single Network interface In-Reply-To: References: <49f4b31f-d0c9-4b0a-be3c-70480f45f39e@app.fastmail.com> Message-ID: thanks it works for me On Fri, Oct 21, 2022 at 1:24 PM Satish Patel wrote: > Here is the doc to deploy kolla using a single network interface port. > https://www.keepcalmandrouteon.com/post/kolla-os-part-1/ > > On Thu, Oct 20, 2022 at 4:46 AM Sean Mooney wrote: > >> I have not been following this too cloesly and sorry to top post but its >> possibel to deploy multi node openstack using a singel interface. >> i often do that with devstack and it shoudl be possibel to do with kolla. >> >> first if you do not need vlan/flat tenant networks and and geneve/vxlan >> with ml2/ovs or ml2/ovn is sufficent then the tunell endpoint ip can just be >> the manamgnet interface. when im deploying wiht devstack i just create a >> dumy interfaces and use that for neutorn >> so you shoudl be able to do that for kolla-ansible too just have a >> playbook that will create a dumy interface on all host and set that as the >> neutron_interface. >> >> in kolla all other interface are shared by defautl so its only the >> nuetorn_interface for the br-ex that need to be managed. >> this approch reqired yuo to asign the gateway ip for the external network >> to one of the contolers and configre that host in your router. >> >> the better approch whihc allows provider networks to work and avoids the >> need to asisng the gateway ip in a hacky way is use macvlan interfaces >> i dont thinki have an example of this form my home cloud any more since >> i have redpeloyed it but i previoulsy used to create macvlan sub interfaces >> >> to do this by hand you would do somehting like this >> >> sudo ip link add api link eth0 type macvlan mode bridge >> sudo ip link add ovs link eth0 type macvlan mode bridge >> sudo ip link add storage link eth0 type macvlan mode bridge >> sudo ifconfig api up >> sudo ifconfig ovs up >> sudo ifconfig storage up >> >> >> you can wrap that up into a systemd service file and have it run before >> the docker service. >> if your on ubuntu netplan does not support macvlans currently but you can >> do it the tradtional way or wiht systemd networkd >> >> Macvlan allows a single physical interface to have multiple mac and ip >> addresses. >> you can also do the same with a linux bridge but that is less then ideal >> in terms of performance. >> if your nic support sriov another good way to partion then nice is to use >> a VF >> >> in this case you just put a trivial udev rule to allocate them or use >> netplan >> https://netplan.io/examples its the final example. >> >> >> macvlan works if you dont have hardware supprot for sriov and sriov is a >> good option otherwise >> >> On Thu, 2022-10-20 at 11:06 +0900, Bernd Bausch wrote: >> > SInce you can easily have five to ten different networks in a cloud >> > installation, e.g. networks dedicated to object storage, provider >> > networks for Octavia, a network just for iSCSI etc, VLANs are (or used >> > to be?) a common solution. See for example the (sadly, defunct) SUSE >> > OpenStack cloud >> > >> https://documentation.suse.com/soc/9/html/suse-openstack-cloud-crowbar-all/cha-deploy-poc.html#sec-depl-poc-vlans >> . >> > >> > On 2022/10/20 8:50 AM, Clark Boylan wrote: >> > > On Wed, Oct 19, 2022, at 4:44 PM, Michal Arbet wrote: >> > > > Hmm, >> > > > >> > > > But I think there is a problem with vlan - you need to setup it in >> OVS, >> > > > don't you ? >> > > There was also a bridge and a veth pair involved: >> https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e/manifests/veth.pp >> > > >> > > Possibly to deal with this? Like I said its been a long time and I >> don't remember the details. I just know it was possible to solve at least >> at the time. Linux gives you a whole suite of virtual network components >> that you can throw together to workaround physical limitations like this. >> > > >> > > > Michal Arbet >> > > > Openstack Engineer >> > > > >> > > > Ultimum Technologies a.s. >> > > > Na Po???? 1047/26, 11000 Praha 1 >> > > > Czech Republic >> > > > >> > > > +420 604 228 897 >> > > > michal.arbet at ultimum.io >> > > > _https://ultimum.io_ >> > > > >> > > > LinkedIn | >> > > > Twitter | Facebook >> > > > >> > > > >> > > > >> > > > st 19. 10. 2022 v 23:57 odes?latel Clark Boylan < >> cboylan at sapwetik.org> napsal: >> > > > > On Wed, Oct 19, 2022, at 9:40 AM, Michal Arbet wrote: >> > > > > > Hi, >> > > > > > >> > > > > > If I am correct this is not possible currently, but I remember >> I was >> > > > > > working on a solution, but unfortunately I stopped at some point >> > > > > > because kolla upstream didn't want to maintain. >> > > > > > >> > > > > > In attachment you can find patches for kolla and kolla-ansible >> and our idea. >> > > > > > >> > > > > > We added python script to kolla container and provide netplan >> style >> > > > > > configuration by kolla-ansible ..so openvswitch starts and >> configured >> > > > > > networking as it was set in configuration (if i remember ...it >> is quite >> > > > > > long time....and of course it was not final version ...but if i >> > > > > > remember it somehow worked). >> > > > > > >> > > > > > So, you can check it and maybe we can discuss this feature >> again :) >> > > > > > >> > > > > > Thanks, >> > > > > > Kevko >> > > > > > >> > > > > > >> > > > > > Michal Arbet >> > > > > > Openstack Engineer >> > > > > > >> > > > > > Ultimum Technologies a.s. >> > > > > > Na Po???? 1047/26, 11000 Praha 1 >> > > > > > Czech Republic >> > > > > > >> > > > > > +420 604 228 897 >> > > > > > michal.arbet at ultimum.io >> > > > > > _https://ultimum.io_ >> > > > > > >> > > > > > LinkedIn >> | >> > > > > > Twitter | Facebook >> > > > > > >> > > > > > >> > > > > > >> > > > > > po 17. 10. 2022 v 19:24 odes?latel Parsa Aminian >> > > > > > napsal: >> > > > > > > Hello >> > > > > > > I use kolla ansible wallaby version . >> > > > > > > my compute node has only one port . is it possible to use >> this server ? as I know openstack compute need 2 port one for management >> and other for external user network . Im using provider_networks and it >> seems neutron_external_interface could not be the same as network_interface >> because openvswitch need to create br-ex bridge on separate port >> > > > > > > is there any solution that i can config my compute with 1 >> port ? >> > > > > A very long time ago the OpenStack Infra Team ran the >> "Infracloud". This OpenStack installation ran on donated hardware and the >> instances there only had a single network port as well. To workaround this >> we ended up using vlan specific subinterfaces on the node so that logically >> we were presenting more than one interface to the OpenStack installation. >> > > > > >> > > > > I don't remember all the details but the now retired >> opendev/puppet-infracloud repo may have some clues: >> https://opendev.org/opendev/puppet-infracloud/src/commit/121afc07bdd277d8ba3ba70f1433d5e6a4a4b14e >> > > > > >> > > > > > Attachments: >> > > > > > * ovs_kolla >> > > > > > * ovs_kolla_ansible >> > >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Sun Oct 23 11:36:54 2022 From: thierry at openstack.org (Thierry Carrez) Date: Sun, 23 Oct 2022 13:36:54 +0200 Subject: [largescale-sig] No meeting this week Message-ID: <575e6521-8ef8-ec9b-981b-46295479d6ac@openstack.org> Hi everyone, With the PTG just over, the Large Scale SIG will not be meeting this week. Our next regular IRC meeting will be November 9, at 1500utc on #openstack-operators on OFTC. Feel free to add topics to the agenda: https://etherpad.openstack.org/p/large-scale-sig-meeting Regards, -- Thierry Carrez From yu-kishimoto at kddi.com Mon Oct 24 02:46:26 2022 From: yu-kishimoto at kddi.com (yu-kishimoto at kddi.com) Date: Mon, 24 Oct 2022 02:46:26 +0000 Subject: [Cinder][Nova]Zombie process prevention for Cinder and Nova APIs Message-ID: Hi all, I'm trying to fix issues of spawning zombie processes for Cinder and Nova APIs during instantiation in an IaC (Ansible which is not openstack-ansible). Does someone can kindly help me to solve the issue or give me some clues like parameter changes that can be expected to be effective? - Issues The cinder api service process(port: 8776) existed in a zombie state. The neutron api service process(port: 9696) required for nova existed in a zombie state. - What the IaC does Use an existing tenant to create a network and subnet, then use 8volumes to instantiate 8VMs. - Workarounds and what I've done so far Identify and kill zombie processes. - Environment OS: CentOS Stream 8 Kernel: 4.18.0-408.el8.x86_64 OpenStack: Yoga(Deployed by PackStack: https://www.rdoproject.org/install/packstack/) Nova: 25.0.1 Neutron: 20.2.0 Cinder: 20.0.1 KeyStone: 21.0.0 -- Yukihiro Kishimoto Technologist KDDI Co., Ltd. Tokyo Japan From nguyenhuukhoinw at gmail.com Mon Oct 24 03:57:17 2022 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Mon, 24 Oct 2022 10:57:17 +0700 Subject: Openstack cluster cannot create instances when 1 of 3 rabbitmq cluster node down In-Reply-To: References: Message-ID: Hello. 2 remain nodes still running, here is my output: Basics Cluster name: rabbit at controller01 Disk Nodes rabbit at controller01 rabbit at controller02 rabbit at controller03 Running Nodes rabbit at controller01 rabbit at controller03 Versions rabbit at controller01: RabbitMQ 3.8.35 on Erlang 23.3.4.18 rabbit at controller03: RabbitMQ 3.8.35 on Erlang 23.3.4.18 Maintenance status Node: rabbit at controller01, status: not under maintenance Node: rabbit at controller03, status: not under maintenance Alarms (none) Network Partitions (none) Listeners Node: rabbit at controller01, interface: [::], port: 15672, protocol: http, purpose: HTTP API Node: rabbit at controller01, interface: 183.81.13.227, port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication Node: rabbit at controller01, interface: 183.81.13.227, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 Node: rabbit at controller03, interface: [::], port: 15672, protocol: http, purpose: HTTP API Node: rabbit at controller03, interface: 183.81.13.229, port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication Node: rabbit at controller03, interface: 183.81.13.229, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 Feature flags Flag: drop_unroutable_metric, state: enabled Flag: empty_basic_get_metric, state: enabled Flag: implicit_default_bindings, state: enabled Flag: maintenance_mode_status, state: enabled Flag: quorum_queue, state: enabled Flag: user_limits, state: enabled Flag: virtual_host_metadata, state: enabled I used ha_queues mode all But it is not better. Nguyen Huu Khoi On Tue, Oct 18, 2022 at 7:19 AM Nguy?n H?u Kh?i wrote: > Description > =========== > I set up 3 controllers and 3 compute nodes. My system cannot work well > when 1 rabbit node in cluster rabbitmq is down, cannot launch instances. It > stucked at scheduling. > > Steps to reproduce > =========== > Openstack nodes point rabbit://node1:5672,node2:5672,node3:5672// > * Reboot 1 of 3 rabbitmq node. > * Create instances then it stucked at scheduling. > > Workaround > =========== > Point to rabbitmq VIP address. But We cannot share the load with this > solution. Please give me some suggestions. Thank you very much. > I did google and enabled system log's debug but I still cannot understand > why. > > Nguyen Huu Khoi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From park0kyung0won at dgist.ac.kr Mon Oct 24 05:21:00 2022 From: park0kyung0won at dgist.ac.kr (=?UTF-8?B?67CV6rK97JuQ?=) Date: Mon, 24 Oct 2022 14:21:00 +0900 (KST) Subject: [yoga][cinder] Cinder NFS backend: Compute service cannot access volume file (UID/GID problem) Message-ID: <1076006407.404315.1666588860827.JavaMail.root@mailwas2> An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Mon Oct 24 06:53:42 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Mon, 24 Oct 2022 07:53:42 +0100 Subject: [kolla-ansible][Yoga] Deployment stuck In-Reply-To: References: Message-ID: Hi, My setup is simple, it's an hci deployment composed of 3 controllers nodes and 6 compute and storage nodes. I am using ceph-ansible for deploying the storage part and the deployment goes well. My base OS is Rocky Linux 8 fully updated. My network is composed of a 1Gb management network for OS, application deployment and server management. And a 40Gb with LACP (80Gb) data network. I am using vlans to segregate openstack networks. I updated both Xena and Yoga kolla-ansible package I updated several times the container images (I am using a local registry). No matter how many times I tried to deploy it's the same behavior. The setup gets stuck somewhere. I tried to deploy the core modules without SSL, I tried to use an older kernel, I tried to use the 40Gb network to deploy, nothing works. The problem is the lack of error if there was one it would have been a starting point but I have nothing. Regards. On Sun, Oct 23, 2022, 00:42 wodel youchi wrote: > Hi, > > Here you can find the kolla-ansible *deploy *log with ANSIBLE_DEBUG=1 > > Regards. > > Le sam. 22 oct. 2022 ? 23:55, wodel youchi a > ?crit : > >> Hi, >> >> I am trying to deploy a new platform using kolla-ansible Yoga and I am >> trying to upgrade another platform from Xena to yoga. >> >> On both platforms the prechecks went well, but when I start the process >> of deployment for the first and upgrade for the second, the process gets >> stuck. >> >> I tried to tail -f /var/log/kolla/*/*.log but I can't get hold of the >> cause. >> >> In the first platform, some services get deployed, and at some point the >> script gets stuck, several times in the modprobe phase. >> >> In the second platform, the upgrade gets stuck on : >> >> Escalation succeeded >> [204/1859] >> <20.3.0.28> (0, b'\n{"path": "/etc/kolla/cron", "changed": false, "diff": >> {"before": {"path": "/etc/kolla/cro >> n"}, "after": {"path": "/etc/kolla/cron"}}, "uid": 0, "gid": 0, "owner": >> "root", "group": "root", "mode": "07 >> 70", "state": "directory", "secontext": "unconfined_u:object_r:etc_t:s0", >> "size": 70, "invocation": {"module_ >> args": {"path": "/etc/kolla/cron", "owner": "root", "group": "root", >> "mode": "0770", "recurse": false, "force >> ": false, "follow": true, "modification_time_format": "%Y%m%d%H%M.%S", >> "access_time_format": "%Y%m%d%H%M.%S", >> "unsafe_writes": false, "state": "directory", "_original_basename": >> null, "_diff_peek": null, "src": null, " >> modification_time": null, "access_time": null, "seuser": null, "serole": >> null, "selevel": null, "setype": nul >> l, "attributes": null}}}\n', b'') >> ok: [20.3.0.28] => (item={'key': 'cron', 'value': {'container_name': >> 'cron', 'group': 'cron', 'enabled': True >> , 'image': '20.3.0.34:4000/openstack.kolla/centos-source-cron:yoga', >> 'environment': {'DUMMY_ENVIRONMENT': 'ko >> lla_useless_env', 'KOLLA_LOGROTATE_SCHEDULE': 'daily'}, 'volumes': >> ['/etc/kolla/cron/:/var/lib/kolla/config_f >> iles/:ro', '/etc/localtime:/etc/localtime:ro', '', >> 'kolla_logs:/var/log/kolla/'], 'dimensions': {}}}) => { >> "ansible_loop_var": "item", >> "changed": false, >> "diff": { >> "after": { >> "path": "/etc/kolla/cron" >> }, >> "before": { >> "path": "/etc/kolla/cron" >> } >> }, >> "gid": 0, >> "group": "root", >> >> How to start debugging the situation. >> >> Regards. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.huettner at mail.schwarz Mon Oct 24 07:12:48 2022 From: felix.huettner at mail.schwarz (=?utf-8?B?RmVsaXggSMO8dHRuZXI=?=) Date: Mon, 24 Oct 2022 07:12:48 +0000 Subject: [yoga][cinder] Cinder NFS backend: Compute service cannot access volume file (UID/GID problem) In-Reply-To: <1076006407.404315.1666588860827.JavaMail.root@mailwas2> References: <1076006407.404315.1666588860827.JavaMail.root@mailwas2> Message-ID: Hi, we are solving this issue for us by creating a ?cinder? group on all hypervisors with the same gid (64061 in your case). Then we add the nova user to the cinder group and we are fine afterwards. You might need set ?dynamic_ownership = 0" In your libvirt qemu.conf -- Felix Huettner From: ??? Sent: Monday, October 24, 2022 7:21 AM To: openstack-discuss at lists.openstack.org Subject: [yoga][cinder] Cinder NFS backend: Compute service cannot access volume file (UID/GID problem) Hi I'm trying to setup cinder-volume service with NFS backend When I create a new VM instance with a volume from web UI, cinder-volume service on storage node creates volume file just fine But I get the following error on compute node and instance fails to spawn. 2022-10-24 02:14:25.347 402789 ERROR nova.compute.manager [req-47ec9fb1-9daa-4c24-8673-538797a217cc 8769cfaf608349bd9fbb36f92b188fe3 e1e8e8397cde49899b00d09dec76b29e - default default] [instance: 5acb1dc3-0685-4980-977b-b6dfff6dfb45] Instance failed to spawn: libvirt.libvirtError: internal error: process exited while connecting to monitor: 2022-10-24T02:14:24.819644Z qemu-system-x86_64: -blockdev {"driver":"file","filename":"/var/lib/nova/mnt/99c4f7e8b15983b65e20cb7d37db899f/volume-8f478992-dde3-4c20-9005-61cd34eacf30","aio":"native","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}: Could not open '/var/lib/nova/mnt/99c4f7e8b15983b65e20cb7d37db899f/volume-8f478992-dde3-4c20-9005-61cd34eacf30': Permission denied I've added appropriate configs to apparmor profile. (Using Ubuntu 22.04) Apparmor isn't blocking this access. While the instance is spawning, I've checked ownership of the volume file on compute node: root at compute-node:/var/lib/nova/mnt$ ls -al total 17 drwxr-xr-x 3 nova nova 4096 Oct 24 04:19 . drwxr-xr-x 12 nova nova 4096 Oct 24 02:14 .. drwxr-x--- 2 64061 64061 11 Oct 24 04:19 99c4f7e8b15983b65e20cb7d37db899f It seems like cinder user on storage node creates volume file with UID/GID of 64061 (cinder user's UID/GID) But nova user on compute node has UID/GID of 64060, therefore cannot open volume file(/var/lib/nova/mnt/99c4f7e8b15983b65e20cb7d37db899f/volume-8f478992-dde3-4c20-9005-61cd34eacf30) Should I manually set the UID/GID of nova user on compute node to 64061, so both nova user on compute node and cinder user on storage node would have the same UID/GID? Feels like this duct taping isn't a proper solution. Did I miss something? Thank you Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. -------------- next part -------------- An HTML attachment was scrubbed... URL: From park0kyung0won at dgist.ac.kr Mon Oct 24 07:24:18 2022 From: park0kyung0won at dgist.ac.kr (=?UTF-8?B?67CV6rK97JuQ?=) Date: Mon, 24 Oct 2022 16:24:18 +0900 (KST) Subject: [yoga][cinder] Cinder NFS backend: Compute service cannot access volume file (UID/GID problem) Message-ID: <1729803334.406064.1666596258287.JavaMail.root@mailwas2> An HTML attachment was scrubbed... URL: From felix.huettner at mail.schwarz Mon Oct 24 07:33:06 2022 From: felix.huettner at mail.schwarz (=?utf-8?B?RmVsaXggSMO8dHRuZXI=?=) Date: Mon, 24 Oct 2022 07:33:06 +0000 Subject: [yoga][cinder] Cinder NFS backend: Compute service cannot access volume file (UID/GID problem) In-Reply-To: <1972308493.406063.1666596258257.JavaMail.root@mailwas2> References: <1972308493.406063.1666596258257.JavaMail.root@mailwas2> Message-ID: Sorry, no idea about that, for us the group also has write permissions -- Felix Huettner From: ??? Sent: Monday, October 24, 2022 9:24 AM To: Felix H?ttner ; openstack-discuss at lists.openstack.org Subject: RE: RE: [yoga][cinder] Cinder NFS backend: Compute service cannot access volume file (UID/GID problem) Hello Felix Thank you very much for kind reply Do I also need to change permission setting of volume file in /var/lib/nova/mnt/... ? By default its: drwxr-x--- 2 64061 64061 11 Oct 24 04:19 99c4f7e8b15983b65e20cb7d37db899f group has only read and execute permission, no write permission ---------- ?? ?? ---------- ????: "Felix H?ttner" > ????: "park0kyung0won at dgist.ac.kr" >, "openstack-discuss at lists.openstack.org" > ??: 2022-10-24 (?) 16:12:48 ??: RE: [yoga][cinder] Cinder NFS backend: Compute service cannot access volume file (UID/GID problem) Hi, we are solving this issue for us by creating a ?cinder? group on all hypervisors with the same gid (64061 in your case). Then we add the nova user to the cinder group and we are fine afterwards. You might need set ?dynamic_ownership = 0" In your libvirt qemu.conf -- Felix Huettner From: ??? > Sent: Monday, October 24, 2022 7:21 AM To: openstack-discuss at lists.openstack.org Subject: [yoga][cinder] Cinder NFS backend: Compute service cannot access volume file (UID/GID problem) Hi I'm trying to setup cinder-volume service with NFS backend When I create a new VM instance with a volume from web UI, cinder-volume service on storage node creates volume file just fine But I get the following error on compute node and instance fails to spawn. 2022-10-24 02:14:25.347 402789 ERROR nova.compute.manager [req-47ec9fb1-9daa-4c24-8673-538797a217cc 8769cfaf608349bd9fbb36f92b188fe3 e1e8e8397cde49899b00d09dec76b29e - default default] [instance: 5acb1dc3-0685-4980-977b-b6dfff6dfb45] Instance failed to spawn: libvirt.libvirtError: internal error: process exited while connecting to monitor: 2022-10-24T02:14:24.819644Z qemu-system-x86_64: -blockdev {"driver":"file","filename":"/var/lib/nova/mnt/99c4f7e8b15983b65e20cb7d37db899f/volume-8f478992-dde3-4c20-9005-61cd34eacf30","aio":"native","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}: Could not open '/var/lib/nova/mnt/99c4f7e8b15983b65e20cb7d37db899f/volume-8f478992-dde3-4c20-9005-61cd34eacf30': Permission denied I've added appropriate configs to apparmor profile. (Using Ubuntu 22.04) Apparmor isn't blocking this access. While the instance is spawning, I've checked ownership of the volume file on compute node: root at compute-node:/var/lib/nova/mnt$ ls -al total 17 drwxr-xr-x 3 nova nova 4096 Oct 24 04:19 . drwxr-xr-x 12 nova nova 4096 Oct 24 02:14 .. drwxr-x--- 2 64061 64061 11 Oct 24 04:19 99c4f7e8b15983b65e20cb7d37db899f It seems like cinder user on storage node creates volume file with UID/GID of 64061 (cinder user's UID/GID) But nova user on compute node has UID/GID of 64060, therefore cannot open volume file(/var/lib/nova/mnt/99c4f7e8b15983b65e20cb7d37db899f/volume-8f478992-dde3-4c20-9005-61cd34eacf30) Should I manually set the UID/GID of nova user on compute node to 64061, so both nova user on compute node and cinder user on storage node would have the same UID/GID? Feels like this duct taping isn't a proper solution. Did I miss something? Thank you Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnaud.morin at gmail.com Mon Oct 24 08:57:54 2022 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Mon, 24 Oct 2022 08:57:54 +0000 Subject: [large-scale][oslo.messaging] RPC workers and db connection In-Reply-To: References: Message-ID: Hey Belimiro, Thanks for your answer. Having ~1000 connection for each service is a lot to me. With the example below, I only talked about neutron RPC service on one node. On our biggest region, we are using multiple nodes (like 8 neutron controllers), which are running both neutron API and neutron RPC. So, we can end-up with something like 16k connections only for neutron :( Of courses, we limited that by lowering the default values, but we are still struggling figuring out the correct values for this. Cheers, Arnaud. On 22.10.22 - 18:37, Belmiro Moreira wrote: > Hi, > having the DB "max connections" ~ 1000 is not unreasonable and I have been > doing it since long ago. > This is also related to the number of nodes running the services. For > example in Nova, related to the number of nodes running APIs, conductors, > schedulers... > > cheers, > Belmiro > > On Fri, Oct 21, 2022 at 5:07 PM Arnaud Morin wrote: > > > Hey all, > > > > TLDR: How can I fine tune the number of DB connection on OpenStack > > services? > > > > > > Long story, with some inline questions: > > > > I am trying to figure out the maximum number of db connection we should > > allow on our db cluster. > > > > For this short speech, I will use neutron RPC as example service, but I > > think nova is acting similar. > > > > So, to do so, I identified few parameters that I can tweak: > > rpc_workers [1] > > max_pool_size [2] > > max_overflow [3] > > executor_thread_pool_size [4] > > > > > > rpc_worker default is half CPU threads available (result of nproc) > > max_pool_size default is 5 > > max_overflow default is 50 > > executor_thread_pool_size is 64 > > > > Now imagine I have a server with 40 cores, > > So rpc_worker will be 20. > > Each worker will have a DB pool with 5+50 connections available. > > Each worker will use up to 64 "green" thread. > > > > The theorical max connection that I should set on my database is then: > > rpc_workers*(max_pool_size+max_overflow) = 20*(5+50) = 1100 > > > > Q1: am I right here? > > I have the feeling that this is huge. > > > > Now, let's assume each thread is consuming 1 connection from the DB pool. > > Under heavy load, I am affraid that the 64 threads could exceed the > > number of max_pool_size+max_overflow. > > > > Also, I noticed that some green threads were consuming more than 1 > > connection from the pool, so I can reach the max even sooner! > > > > Another thing, I notice that I have 21 RPC workers, not 20. Is it > > normal? > > > > > > [1] > > https://docs.openstack.org/neutron/latest/configuration/neutron.html#DEFAULT.rpc_workers > > [2] > > https://docs.openstack.org/neutron/latest/configuration/neutron.html#database.max_pool_size > > [3] > > https://docs.openstack.org/neutron/latest/configuration/neutron.html#database.max_overflow > > [4] > > https://docs.openstack.org/neutron/latest/configuration/neutron.html#DEFAULT.executor_thread_pool_size > > > > Cheers, > > > > Arnaud. > > > > From felix.huettner at mail.schwarz Mon Oct 24 09:25:59 2022 From: felix.huettner at mail.schwarz (=?iso-8859-1?Q?Felix_H=FCttner?=) Date: Mon, 24 Oct 2022 09:25:59 +0000 Subject: [large-scale][oslo.messaging] RPC workers and db connection In-Reply-To: References: Message-ID: Hi everyone, we have also struggled to find reasonable values for our settings. These are currently based on some experience but not on actual data. Does anyone by chance know of a metric that can show the usage of the database connection pools? Otherwise that might be something to add to oslo.metrics maybe? -- Felix Huettner > -----Original Message----- > From: Arnaud Morin > Sent: Monday, October 24, 2022 10:58 AM > To: Belmiro Moreira > Cc: discuss openstack > Subject: Re: [large-scale][oslo.messaging] RPC workers and db connection > > Hey Belimiro, > > Thanks for your answer. > > Having ~1000 connection for each service is a lot to me. > With the example below, I only talked about neutron RPC service on one > node. > On our biggest region, we are using multiple nodes (like 8 neutron > controllers), which are running both neutron API and neutron RPC. > > So, we can end-up with something like 16k connections only for neutron > :( > > Of courses, we limited that by lowering the default values, but we are > still struggling figuring out the correct values for this. > > Cheers, > > Arnaud. > > > On 22.10.22 - 18:37, Belmiro Moreira wrote: > > Hi, > > having the DB "max connections" ~ 1000 is not unreasonable and I have been > > doing it since long ago. > > This is also related to the number of nodes running the services. For > > example in Nova, related to the number of nodes running APIs, conductors, > > schedulers... > > > > cheers, > > Belmiro > > > > On Fri, Oct 21, 2022 at 5:07 PM Arnaud Morin wrote: > > > > > Hey all, > > > > > > TLDR: How can I fine tune the number of DB connection on OpenStack > > > services? > > > > > > > > > Long story, with some inline questions: > > > > > > I am trying to figure out the maximum number of db connection we should > > > allow on our db cluster. > > > > > > For this short speech, I will use neutron RPC as example service, but I > > > think nova is acting similar. > > > > > > So, to do so, I identified few parameters that I can tweak: > > > rpc_workers [1] > > > max_pool_size [2] > > > max_overflow [3] > > > executor_thread_pool_size [4] > > > > > > > > > rpc_worker default is half CPU threads available (result of nproc) > > > max_pool_size default is 5 > > > max_overflow default is 50 > > > executor_thread_pool_size is 64 > > > > > > Now imagine I have a server with 40 cores, > > > So rpc_worker will be 20. > > > Each worker will have a DB pool with 5+50 connections available. > > > Each worker will use up to 64 "green" thread. > > > > > > The theorical max connection that I should set on my database is then: > > > rpc_workers*(max_pool_size+max_overflow) = 20*(5+50) = 1100 > > > > > > Q1: am I right here? > > > I have the feeling that this is huge. > > > > > > Now, let's assume each thread is consuming 1 connection from the DB pool. > > > Under heavy load, I am affraid that the 64 threads could exceed the > > > number of max_pool_size+max_overflow. > > > > > > Also, I noticed that some green threads were consuming more than 1 > > > connection from the pool, so I can reach the max even sooner! > > > > > > Another thing, I notice that I have 21 RPC workers, not 20. Is it > > > normal? > > > > > > > > > [1] > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.openstack.org%2Fneutron%2Flatest%2Fconfiguration%2 > Fneutron.html%23DEFAULT.rpc_workers&data=05%7C01%7C%7C2fcb72675ac148f6c8ee08dab59efc49%7Cd04f47175a6e4b98b3f > 96918e0385f4c%7C0%7C0%7C638021991684328050%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI > 6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8ZbaSFRcEUc0wZnaikxR4sUZjbQiZe%2Fj0Q1kUsvrTGk%3D&rese > rved=0 > > > [2] > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.openstack.org%2Fneutron%2Flatest%2Fconfiguration%2 > Fneutron.html%23database.max_pool_size&data=05%7C01%7C%7C2fcb72675ac148f6c8ee08dab59efc49%7Cd04f47175a6e4b98b > 3f96918e0385f4c%7C0%7C0%7C638021991684328050%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB > TiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IYhAbvxlASqJ34SMjPQ581JmoICHP7UPPGK%2BVPYRfeY%3D& > reserved=0 > > > [3] > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.openstack.org%2Fneutron%2Flatest%2Fconfiguration%2 > Fneutron.html%23database.max_overflow&data=05%7C01%7C%7C2fcb72675ac148f6c8ee08dab59efc49%7Cd04f47175a6e4b98b > 3f96918e0385f4c%7C0%7C0%7C638021991684484231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB > TiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=On94ArtrzPcn9S3R4QAWDvnJiOj%2FeBQ8AP4%2FsVDx9V8%3D&a > mp;reserved=0 > > > [4] > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.openstack.org%2Fneutron%2Flatest%2Fconfiguration%2 > Fneutron.html%23DEFAULT.executor_thread_pool_size&data=05%7C01%7C%7C2fcb72675ac148f6c8ee08dab59efc49%7Cd04f47 > 175a6e4b98b3f96918e0385f4c%7C0%7C0%7C638021991684484231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi > V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YP%2FqDxDi%2F2SksnsHKw2N9bQTdk0OU3n2m5sy3 > NhTOgQ%3D&reserved=0 > > > > > > Cheers, > > > > > > Arnaud. > > > > > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. From arnaud.morin at gmail.com Mon Oct 24 09:30:49 2022 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Mon, 24 Oct 2022 09:30:49 +0000 Subject: [large-scale][oslo.messaging] RPC workers and db connection In-Reply-To: References: Message-ID: Yup, I was exactly thinking about something like that, I did check the source code of oslo_metrics again, but this is only related to rabbit messaging metrics so far. That's a good point to include something in order to catch the db metrics as well. On our side, we use the "percona monitoring and management" [1] as sidecar, which help us a lot identifying which part of OpenStack is consuming db resources. [1] https://www.percona.com/software/pmm/quickstart Cheers, Arnaud. On 24.10.22 - 09:25, Felix H?ttner wrote: > Hi everyone, > > we have also struggled to find reasonable values for our settings. These are currently based on some experience but not on actual data. > Does anyone by chance know of a metric that can show the usage of the database connection pools? > > Otherwise that might be something to add to oslo.metrics maybe? > > -- > Felix Huettner > > > -----Original Message----- > > From: Arnaud Morin > > Sent: Monday, October 24, 2022 10:58 AM > > To: Belmiro Moreira > > Cc: discuss openstack > > Subject: Re: [large-scale][oslo.messaging] RPC workers and db connection > > > > Hey Belimiro, > > > > Thanks for your answer. > > > > Having ~1000 connection for each service is a lot to me. > > With the example below, I only talked about neutron RPC service on one > > node. > > On our biggest region, we are using multiple nodes (like 8 neutron > > controllers), which are running both neutron API and neutron RPC. > > > > So, we can end-up with something like 16k connections only for neutron > > :( > > > > Of courses, we limited that by lowering the default values, but we are > > still struggling figuring out the correct values for this. > > > > Cheers, > > > > Arnaud. > > > > > > On 22.10.22 - 18:37, Belmiro Moreira wrote: > > > Hi, > > > having the DB "max connections" ~ 1000 is not unreasonable and I have been > > > doing it since long ago. > > > This is also related to the number of nodes running the services. For > > > example in Nova, related to the number of nodes running APIs, conductors, > > > schedulers... > > > > > > cheers, > > > Belmiro > > > > > > On Fri, Oct 21, 2022 at 5:07 PM Arnaud Morin wrote: > > > > > > > Hey all, > > > > > > > > TLDR: How can I fine tune the number of DB connection on OpenStack > > > > services? > > > > > > > > > > > > Long story, with some inline questions: > > > > > > > > I am trying to figure out the maximum number of db connection we should > > > > allow on our db cluster. > > > > > > > > For this short speech, I will use neutron RPC as example service, but I > > > > think nova is acting similar. > > > > > > > > So, to do so, I identified few parameters that I can tweak: > > > > rpc_workers [1] > > > > max_pool_size [2] > > > > max_overflow [3] > > > > executor_thread_pool_size [4] > > > > > > > > > > > > rpc_worker default is half CPU threads available (result of nproc) > > > > max_pool_size default is 5 > > > > max_overflow default is 50 > > > > executor_thread_pool_size is 64 > > > > > > > > Now imagine I have a server with 40 cores, > > > > So rpc_worker will be 20. > > > > Each worker will have a DB pool with 5+50 connections available. > > > > Each worker will use up to 64 "green" thread. > > > > > > > > The theorical max connection that I should set on my database is then: > > > > rpc_workers*(max_pool_size+max_overflow) = 20*(5+50) = 1100 > > > > > > > > Q1: am I right here? > > > > I have the feeling that this is huge. > > > > > > > > Now, let's assume each thread is consuming 1 connection from the DB pool. > > > > Under heavy load, I am affraid that the 64 threads could exceed the > > > > number of max_pool_size+max_overflow. > > > > > > > > Also, I noticed that some green threads were consuming more than 1 > > > > connection from the pool, so I can reach the max even sooner! > > > > > > > > Another thing, I notice that I have 21 RPC workers, not 20. Is it > > > > normal? > > > > > > > > > > > > [1] > > > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.openstack.org%2Fneutron%2Flatest%2Fconfiguration%2 > > Fneutron.html%23DEFAULT.rpc_workers&data=05%7C01%7C%7C2fcb72675ac148f6c8ee08dab59efc49%7Cd04f47175a6e4b98b3f > > 96918e0385f4c%7C0%7C0%7C638021991684328050%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI > > 6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8ZbaSFRcEUc0wZnaikxR4sUZjbQiZe%2Fj0Q1kUsvrTGk%3D&rese > > rved=0 > > > > [2] > > > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.openstack.org%2Fneutron%2Flatest%2Fconfiguration%2 > > Fneutron.html%23database.max_pool_size&data=05%7C01%7C%7C2fcb72675ac148f6c8ee08dab59efc49%7Cd04f47175a6e4b98b > > 3f96918e0385f4c%7C0%7C0%7C638021991684328050%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB > > TiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IYhAbvxlASqJ34SMjPQ581JmoICHP7UPPGK%2BVPYRfeY%3D& > > reserved=0 > > > > [3] > > > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.openstack.org%2Fneutron%2Flatest%2Fconfiguration%2 > > Fneutron.html%23database.max_overflow&data=05%7C01%7C%7C2fcb72675ac148f6c8ee08dab59efc49%7Cd04f47175a6e4b98b > > 3f96918e0385f4c%7C0%7C0%7C638021991684484231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB > > TiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=On94ArtrzPcn9S3R4QAWDvnJiOj%2FeBQ8AP4%2FsVDx9V8%3D&a > > mp;reserved=0 > > > > [4] > > > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.openstack.org%2Fneutron%2Flatest%2Fconfiguration%2 > > Fneutron.html%23DEFAULT.executor_thread_pool_size&data=05%7C01%7C%7C2fcb72675ac148f6c8ee08dab59efc49%7Cd04f47 > > 175a6e4b98b3f96918e0385f4c%7C0%7C0%7C638021991684484231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi > > V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YP%2FqDxDi%2F2SksnsHKw2N9bQTdk0OU3n2m5sy3 > > NhTOgQ%3D&reserved=0 > > > > > > > > Cheers, > > > > > > > > Arnaud. > > > > > > > > > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. From nguyenhuukhoinw at gmail.com Mon Oct 24 09:49:56 2022 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Mon, 24 Oct 2022 16:49:56 +0700 Subject: Openstack Cinder and Nova Services cannot work when rabbitmq cluter node down Message-ID: Title: Openstack cluster cannot create when 1 of 3 rabbitmq cluster node down Bug description: Description =========== I set up 3 controllers and 3 compute nodes. My system cannot work when 1 rabbit node in cluster rabbitmq is down, cannot create volume or launch instance. It stucked at creating and scheduling respectively. Steps to reproduce =========== Openstack nodes point rabbit://node1:5672,node2:5672,node3:5672// * Reboot 1 of 3 rabbitmq node. * Create volume or launch instance then it stucked at creating and scheduling respectively. Workaround =========== I need reboot cinder and nova services to create volume and launch instance . More Info: I see in cinder_scheduler, it looks like cinder cannot change to another rabbitmq node. I hope we have ideas for that.. 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task Traceback (most recent call last): 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 441, in get 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return self._queues[msg_id].get(block=True, timeout=timeout) 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line 322, in get 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return waiter.wait() 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line 141, in wait 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return get_hub().switch() 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return self.greenlet.switch() 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task _queue.Empty 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task During handling of the above exception, another exception occurred: 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task Traceback (most recent call last): 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/periodic_task.py", line 216, in run_periodic_tasks 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task task(self, context) 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/compute/manager.py", line 9716, in _sync_power_states 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task db_instances = objects.InstanceList.get_by_host(context, self.host, 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_versionedobjects/base.py", line 175, in wrapper 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = cls.indirection_api.object_class_action_versions( 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/conductor/rpcapi.py", line 240, in object_class_action_versions 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return cctxt.call(context, 'object_class_action_versions', 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/rpc/client.py", line 189, in call 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = self.transport._send( 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/transport.py", line 123, in _send 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return self._driver.send(target, ctxt, message, 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 689, in send 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return self._send(target, ctxt, message, wait_for_reply, timeout, 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 678, in _send 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = self._waiter.wait(msg_id, timeout, 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 567, in wait 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task message = self.waiters.get(msg_id, timeout=timeout) 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 443, in get 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task raise oslo_messaging.MessagingTimeout( 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID c8a676a9709242908dcff97046d7976d Nguyen Huu Khoi -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Mon Oct 24 10:35:57 2022 From: eblock at nde.ag (Eugen Block) Date: Mon, 24 Oct 2022 10:35:57 +0000 Subject: Openstack cluster cannot create instances when 1 of 3 rabbitmq cluster node down In-Reply-To: References: Message-ID: <20221024103557.Horde.2TO2Li_lBDYO88_RsdHZz4I@webmail.nde.ag> You don't need to create a new thread with the same issue. Do the rabbitmq logs reveal anything? We create a cluster within rabbitmq and the output looks like this: ---snip--- control01:~ # rabbitmqctl cluster_status Cluster status of node rabbit at control01 ... Basics Cluster name: rabbit at rabbitmq-cluster Disk Nodes rabbit at control01 rabbit at control02 rabbit at control03 Running Nodes rabbit at control01 rabbit at control02 rabbit at control03 Versions rabbit at control01: RabbitMQ 3.8.3 on Erlang 22.2.7 rabbit at control02: RabbitMQ 3.8.3 on Erlang 22.2.7 rabbit at control03: RabbitMQ 3.8.3 on Erlang 22.2.7 ---snip--- During failover it's not unexpected that a message gets lost, but it should be resent, I believe. How is your openstack deployed? Zitat von Nguy?n H?u Kh?i : > Hello. > 2 remain nodes still running, here is my output: > Basics > > Cluster name: rabbit at controller01 > > Disk Nodes > > rabbit at controller01 > rabbit at controller02 > rabbit at controller03 > > Running Nodes > > rabbit at controller01 > rabbit at controller03 > > Versions > > rabbit at controller01: RabbitMQ 3.8.35 on Erlang 23.3.4.18 > rabbit at controller03: RabbitMQ 3.8.35 on Erlang 23.3.4.18 > > Maintenance status > > Node: rabbit at controller01, status: not under maintenance > Node: rabbit at controller03, status: not under maintenance > > Alarms > > (none) > > Network Partitions > > (none) > > Listeners > > Node: rabbit at controller01, interface: [::], port: 15672, protocol: http, > purpose: HTTP API > Node: rabbit at controller01, interface: 183.81.13.227, port: 25672, protocol: > clustering, purpose: inter-node and CLI tool communication > Node: rabbit at controller01, interface: 183.81.13.227, port: 5672, protocol: > amqp, purpose: AMQP 0-9-1 and AMQP 1.0 > Node: rabbit at controller03, interface: [::], port: 15672, protocol: http, > purpose: HTTP API > Node: rabbit at controller03, interface: 183.81.13.229, port: 25672, protocol: > clustering, purpose: inter-node and CLI tool communication > Node: rabbit at controller03, interface: 183.81.13.229, port: 5672, protocol: > amqp, purpose: AMQP 0-9-1 and AMQP 1.0 > > Feature flags > > Flag: drop_unroutable_metric, state: enabled > Flag: empty_basic_get_metric, state: enabled > Flag: implicit_default_bindings, state: enabled > Flag: maintenance_mode_status, state: enabled > Flag: quorum_queue, state: enabled > Flag: user_limits, state: enabled > Flag: virtual_host_metadata, state: enabled > > I used ha_queues mode all > But it is not better. > Nguyen Huu Khoi > > > On Tue, Oct 18, 2022 at 7:19 AM Nguy?n H?u Kh?i > wrote: > >> Description >> =========== >> I set up 3 controllers and 3 compute nodes. My system cannot work well >> when 1 rabbit node in cluster rabbitmq is down, cannot launch instances. It >> stucked at scheduling. >> >> Steps to reproduce >> =========== >> Openstack nodes point rabbit://node1:5672,node2:5672,node3:5672// >> * Reboot 1 of 3 rabbitmq node. >> * Create instances then it stucked at scheduling. >> >> Workaround >> =========== >> Point to rabbitmq VIP address. But We cannot share the load with this >> solution. Please give me some suggestions. Thank you very much. >> I did google and enabled system log's debug but I still cannot understand >> why. >> >> Nguyen Huu Khoi >> From nguyenhuukhoinw at gmail.com Mon Oct 24 11:23:35 2022 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Mon, 24 Oct 2022 18:23:35 +0700 Subject: Openstack cluster cannot create instances when 1 of 3 rabbitmq cluster node down In-Reply-To: <20221024103557.Horde.2TO2Li_lBDYO88_RsdHZz4I@webmail.nde.ag> References: <20221024103557.Horde.2TO2Li_lBDYO88_RsdHZz4I@webmail.nde.ag> Message-ID: Hello. Sorry for that. I just want to notice that both nova and cinder have this problem, When diving to logs on both service I see: ERROR oslo.messaging._drivers.impl_rabbit [-] [8634b511-7eee-4e50-8efd-b96d420e9914] AMQP server on [node was down]:5672 is unreachable: . Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: and 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task Traceback (most recent call last): 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 441, in get 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return self._queues[msg_id].get(block=True, timeout=timeout) 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line 322, in get 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return waiter.wait() 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line 141, in wait 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return get_hub().switch() 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return self.greenlet.switch() 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task _queue.Empty 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task During handling of the above exception, another exception occurred: 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task Traceback (most recent call last): 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/periodic_task.py", line 216, in run_periodic_tasks 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task task(self, context) 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/compute/manager.py", line 9716, in _sync_power_states 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task db_instances = objects.InstanceList.get_by_host(context, self.host, 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_versionedobjects/base.py", line 175, in wrapper 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = cls.indirection_api.object_class_action_versions( 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/conductor/rpcapi.py", line 240, in object_class_action_versions 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return cctxt.call(context, 'object_class_action_versions', 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/rpc/client.py", line 189, in call 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = self.transport._send( 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/transport.py", line 123, in _send 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return self._driver.send(target, ctxt, message, 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 689, in send 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return self._send(target, ctxt, message, wait_for_reply, timeout, 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 678, in _send 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = self._waiter.wait(msg_id, timeout, 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 567, in wait 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task message = self.waiters.get(msg_id, timeout=timeout) 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 443, in get 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task raise oslo_messaging.MessagingTimeout( 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID c8a676a9709242908dcff97046d7976d *** I use cluster rabbitmq with ha-policy for exchange and queue. These logs are gone when I restart cinder and nova services. Nguyen Huu Khoi On Mon, Oct 24, 2022 at 5:42 PM Eugen Block wrote: > You don't need to create a new thread with the same issue. > Do the rabbitmq logs reveal anything? We create a cluster within > rabbitmq and the output looks like this: > > ---snip--- > control01:~ # rabbitmqctl cluster_status > Cluster status of node rabbit at control01 ... > Basics > > Cluster name: rabbit at rabbitmq-cluster > > Disk Nodes > > rabbit at control01 > rabbit at control02 > rabbit at control03 > > Running Nodes > > rabbit at control01 > rabbit at control02 > rabbit at control03 > > Versions > > rabbit at control01: RabbitMQ 3.8.3 on Erlang 22.2.7 > rabbit at control02: RabbitMQ 3.8.3 on Erlang 22.2.7 > rabbit at control03: RabbitMQ 3.8.3 on Erlang 22.2.7 > ---snip--- > > During failover it's not unexpected that a message gets lost, but it > should be resent, I believe. How is your openstack deployed? > > > Zitat von Nguy?n H?u Kh?i : > > > Hello. > > 2 remain nodes still running, here is my output: > > Basics > > > > Cluster name: rabbit at controller01 > > > > Disk Nodes > > > > rabbit at controller01 > > rabbit at controller02 > > rabbit at controller03 > > > > Running Nodes > > > > rabbit at controller01 > > rabbit at controller03 > > > > Versions > > > > rabbit at controller01: RabbitMQ 3.8.35 on Erlang 23.3.4.18 > > rabbit at controller03: RabbitMQ 3.8.35 on Erlang 23.3.4.18 > > > > Maintenance status > > > > Node: rabbit at controller01, status: not under maintenance > > Node: rabbit at controller03, status: not under maintenance > > > > Alarms > > > > (none) > > > > Network Partitions > > > > (none) > > > > Listeners > > > > Node: rabbit at controller01, interface: [::], port: 15672, protocol: http, > > purpose: HTTP API > > Node: rabbit at controller01, interface: 183.81.13.227, port: 25672, > protocol: > > clustering, purpose: inter-node and CLI tool communication > > Node: rabbit at controller01, interface: 183.81.13.227, port: 5672, > protocol: > > amqp, purpose: AMQP 0-9-1 and AMQP 1.0 > > Node: rabbit at controller03, interface: [::], port: 15672, protocol: http, > > purpose: HTTP API > > Node: rabbit at controller03, interface: 183.81.13.229, port: 25672, > protocol: > > clustering, purpose: inter-node and CLI tool communication > > Node: rabbit at controller03, interface: 183.81.13.229, port: 5672, > protocol: > > amqp, purpose: AMQP 0-9-1 and AMQP 1.0 > > > > Feature flags > > > > Flag: drop_unroutable_metric, state: enabled > > Flag: empty_basic_get_metric, state: enabled > > Flag: implicit_default_bindings, state: enabled > > Flag: maintenance_mode_status, state: enabled > > Flag: quorum_queue, state: enabled > > Flag: user_limits, state: enabled > > Flag: virtual_host_metadata, state: enabled > > > > I used ha_queues mode all > > But it is not better. > > Nguyen Huu Khoi > > > > > > On Tue, Oct 18, 2022 at 7:19 AM Nguy?n H?u Kh?i < > nguyenhuukhoinw at gmail.com> > > wrote: > > > >> Description > >> =========== > >> I set up 3 controllers and 3 compute nodes. My system cannot work well > >> when 1 rabbit node in cluster rabbitmq is down, cannot launch > instances. It > >> stucked at scheduling. > >> > >> Steps to reproduce > >> =========== > >> Openstack nodes point rabbit://node1:5672,node2:5672,node3:5672// > >> * Reboot 1 of 3 rabbitmq node. > >> * Create instances then it stucked at scheduling. > >> > >> Workaround > >> =========== > >> Point to rabbitmq VIP address. But We cannot share the load with this > >> solution. Please give me some suggestions. Thank you very much. > >> I did google and enabled system log's debug but I still cannot > understand > >> why. > >> > >> Nguyen Huu Khoi > >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From roberto.acosta at luizalabs.com Mon Oct 24 11:52:38 2022 From: roberto.acosta at luizalabs.com (ROBERTO BARTZEN ACOSTA) Date: Mon, 24 Oct 2022 08:52:38 -0300 Subject: Openstack cluster cannot create instances when 1 of 3 rabbitmq cluster node down In-Reply-To: References: <20221024103557.Horde.2TO2Li_lBDYO88_RsdHZz4I@webmail.nde.ag> Message-ID: Hey folks, I believe this problem is related to the maximum timeout in the pool loop, and was introduced in this thread [1] with this specific commit [2]. [1] https://bugs.launchpad.net/oslo.messaging/+bug/1935864 [2] https://opendev.org/openstack/oslo.messaging/commit/bdcf915e788bb368774e5462ccc15e6f5b7223d7 Corey Bryant proposed a workaround removing this commit [2] and building an alternate ubuntu pkg in this thread [3], but the root cause needs to be investigated because it was originally modified to solve the issue [1]. [3] https://bugs.launchpad.net/ubuntu/jammy/+source/python-oslo.messaging/+bug/1993149 Regards, Roberto Em seg., 24 de out. de 2022 ?s 08:30, Nguy?n H?u Kh?i < nguyenhuukhoinw at gmail.com> escreveu: > Hello. Sorry for that. > I just want to notice that both nova and cinder have this problem, > When diving to logs on both service I see: > ERROR oslo.messaging._drivers.impl_rabbit [-] > [8634b511-7eee-4e50-8efd-b96d420e9914] AMQP server on [node was down]:5672 > is unreachable: . Trying again > in 1 seconds.: amqp.exceptions.RecoverableConnectionError: > > > and > > > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task Traceback (most > recent call last): > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", > line 441, in get > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return > self._queues[msg_id].get(block=True, timeout=timeout) > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line > 322, in get > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return > waiter.wait() > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line > 141, in wait > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return > get_hub().switch() > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/hubs/hub.py", > line 313, in switch2022-10-24 14:23:01.945 7 ERROR > oslo_service.periodic_task return self.greenlet.switch() > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task _queue.Empty > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task During handling > of the above exception, another exception occurred: > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task Traceback (most > recent call last): > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/periodic_task.py", > line 216, in run_periodic_tasks > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task task(self, > context) > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/compute/manager.py", > line 9716, in _sync_power_states > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task > db_instances = objects.InstanceList.get_by_host(context, self.host, > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_versionedobjects/base.py", > line 175, in wrapper > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = > cls.indirection_api.object_class_action_versions( > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/conductor/rpcapi.py", > line 240, in object_class_action_versions > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return > cctxt.call(context, 'object_class_action_versions', > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/rpc/client.py", > line 189, in call > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = > self.transport._send( > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/transport.py", > line 123, in _send > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return > self._driver.send(target, ctxt, message, > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", > line 689, in send > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return > self._send(target, ctxt, message, wait_for_reply, timeout, > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", > line 678, in _send > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = > self._waiter.wait(msg_id, timeout, > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", > line 567, in wait > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task message = > self.waiters.get(msg_id, timeout=timeout) > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File > "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", > line 443, in get > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task raise > oslo_messaging.MessagingTimeout( > 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task > oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply > to message ID c8a676a9709242908dcff97046d7976d > > *** I use cluster rabbitmq with ha-policy for exchange and queue. These > logs are gone when I restart cinder and nova services. > > > > Nguyen Huu Khoi > > > On Mon, Oct 24, 2022 at 5:42 PM Eugen Block wrote: > >> You don't need to create a new thread with the same issue. >> Do the rabbitmq logs reveal anything? We create a cluster within >> rabbitmq and the output looks like this: >> >> ---snip--- >> control01:~ # rabbitmqctl cluster_status >> Cluster status of node rabbit at control01 ... >> Basics >> >> Cluster name: rabbit at rabbitmq-cluster >> >> Disk Nodes >> >> rabbit at control01 >> rabbit at control02 >> rabbit at control03 >> >> Running Nodes >> >> rabbit at control01 >> rabbit at control02 >> rabbit at control03 >> >> Versions >> >> rabbit at control01: RabbitMQ 3.8.3 on Erlang 22.2.7 >> rabbit at control02: RabbitMQ 3.8.3 on Erlang 22.2.7 >> rabbit at control03: RabbitMQ 3.8.3 on Erlang 22.2.7 >> ---snip--- >> >> During failover it's not unexpected that a message gets lost, but it >> should be resent, I believe. How is your openstack deployed? >> >> >> Zitat von Nguy?n H?u Kh?i : >> >> > Hello. >> > 2 remain nodes still running, here is my output: >> > Basics >> > >> > Cluster name: rabbit at controller01 >> > >> > Disk Nodes >> > >> > rabbit at controller01 >> > rabbit at controller02 >> > rabbit at controller03 >> > >> > Running Nodes >> > >> > rabbit at controller01 >> > rabbit at controller03 >> > >> > Versions >> > >> > rabbit at controller01: RabbitMQ 3.8.35 on Erlang 23.3.4.18 >> > rabbit at controller03: RabbitMQ 3.8.35 on Erlang 23.3.4.18 >> > >> > Maintenance status >> > >> > Node: rabbit at controller01, status: not under maintenance >> > Node: rabbit at controller03, status: not under maintenance >> > >> > Alarms >> > >> > (none) >> > >> > Network Partitions >> > >> > (none) >> > >> > Listeners >> > >> > Node: rabbit at controller01, interface: [::], port: 15672, protocol: >> http, >> > purpose: HTTP API >> > Node: rabbit at controller01, interface: 183.81.13.227, port: 25672, >> protocol: >> > clustering, purpose: inter-node and CLI tool communication >> > Node: rabbit at controller01, interface: 183.81.13.227, port: 5672, >> protocol: >> > amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >> > Node: rabbit at controller03, interface: [::], port: 15672, protocol: >> http, >> > purpose: HTTP API >> > Node: rabbit at controller03, interface: 183.81.13.229, port: 25672, >> protocol: >> > clustering, purpose: inter-node and CLI tool communication >> > Node: rabbit at controller03, interface: 183.81.13.229, port: 5672, >> protocol: >> > amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >> > >> > Feature flags >> > >> > Flag: drop_unroutable_metric, state: enabled >> > Flag: empty_basic_get_metric, state: enabled >> > Flag: implicit_default_bindings, state: enabled >> > Flag: maintenance_mode_status, state: enabled >> > Flag: quorum_queue, state: enabled >> > Flag: user_limits, state: enabled >> > Flag: virtual_host_metadata, state: enabled >> > >> > I used ha_queues mode all >> > But it is not better. >> > Nguyen Huu Khoi >> > >> > >> > On Tue, Oct 18, 2022 at 7:19 AM Nguy?n H?u Kh?i < >> nguyenhuukhoinw at gmail.com> >> > wrote: >> > >> >> Description >> >> =========== >> >> I set up 3 controllers and 3 compute nodes. My system cannot work well >> >> when 1 rabbit node in cluster rabbitmq is down, cannot launch >> instances. It >> >> stucked at scheduling. >> >> >> >> Steps to reproduce >> >> =========== >> >> Openstack nodes point rabbit://node1:5672,node2:5672,node3:5672// >> >> * Reboot 1 of 3 rabbitmq node. >> >> * Create instances then it stucked at scheduling. >> >> >> >> Workaround >> >> =========== >> >> Point to rabbitmq VIP address. But We cannot share the load with this >> >> solution. Please give me some suggestions. Thank you very much. >> >> I did google and enabled system log's debug but I still cannot >> understand >> >> why. >> >> >> >> Nguyen Huu Khoi >> >> >> >> >> >> >> -- _?Esta mensagem ? direcionada apenas para os endere?os constantes no cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o imediatamente anuladas e proibidas?._ *?**?Apesar do Magazine Luiza tomar todas as precau??es razo?veis para assegurar que nenhum v?rus esteja presente nesse e-mail, a empresa n?o poder? aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos?.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Mon Oct 24 12:07:17 2022 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Mon, 24 Oct 2022 19:07:17 +0700 Subject: Openstack cluster cannot create instances when 1 of 3 rabbitmq cluster node down In-Reply-To: References: <20221024103557.Horde.2TO2Li_lBDYO88_RsdHZz4I@webmail.nde.ag> Message-ID: Thank you for your response. This is exactly what I am facing. But I don't know how I can workaround it. Because I deploy with Kolla-Ansible Xena.. My current workaround is point oslo.messaging to VIP. BTW, I am very glad when We know why it happened. Nguyen Huu Khoi On Mon, Oct 24, 2022 at 6:52 PM ROBERTO BARTZEN ACOSTA < roberto.acosta at luizalabs.com> wrote: > Hey folks, > > I believe this problem is related to the maximum timeout in the pool loop, > and was introduced in this thread [1] with this specific commit [2]. > > [1] https://bugs.launchpad.net/oslo.messaging/+bug/1935864 > [2] > https://opendev.org/openstack/oslo.messaging/commit/bdcf915e788bb368774e5462ccc15e6f5b7223d7 > > Corey Bryant proposed a workaround removing this commit [2] and building > an alternate ubuntu pkg in this thread [3], but the root cause needs to be > investigated because it was originally modified to solve the issue [1]. > > [3] > https://bugs.launchpad.net/ubuntu/jammy/+source/python-oslo.messaging/+bug/1993149 > > Regards, > Roberto > > > > Em seg., 24 de out. de 2022 ?s 08:30, Nguy?n H?u Kh?i < > nguyenhuukhoinw at gmail.com> escreveu: > >> Hello. Sorry for that. >> I just want to notice that both nova and cinder have this problem, >> When diving to logs on both service I see: >> ERROR oslo.messaging._drivers.impl_rabbit [-] >> [8634b511-7eee-4e50-8efd-b96d420e9914] AMQP server on [node was down]:5672 >> is unreachable: . Trying again >> in 1 seconds.: amqp.exceptions.RecoverableConnectionError: >> >> >> and >> >> >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task Traceback >> (most recent call last): >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", >> line 441, in get >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >> self._queues[msg_id].get(block=True, timeout=timeout) >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line >> 322, in get >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >> waiter.wait() >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line >> 141, in wait >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >> get_hub().switch() >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/hubs/hub.py", >> line 313, in switch2022-10-24 14:23:01.945 7 ERROR >> oslo_service.periodic_task return self.greenlet.switch() >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task _queue.Empty >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task During >> handling of the above exception, another exception occurred: >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task Traceback >> (most recent call last): >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/periodic_task.py", >> line 216, in run_periodic_tasks >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task task(self, >> context) >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/compute/manager.py", >> line 9716, in _sync_power_states >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task >> db_instances = objects.InstanceList.get_by_host(context, self.host, >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_versionedobjects/base.py", >> line 175, in wrapper >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = >> cls.indirection_api.object_class_action_versions( >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/conductor/rpcapi.py", >> line 240, in object_class_action_versions >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >> cctxt.call(context, 'object_class_action_versions', >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/rpc/client.py", >> line 189, in call >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = >> self.transport._send( >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/transport.py", >> line 123, in _send >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >> self._driver.send(target, ctxt, message, >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", >> line 689, in send >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >> self._send(target, ctxt, message, wait_for_reply, timeout, >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", >> line 678, in _send >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = >> self._waiter.wait(msg_id, timeout, >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", >> line 567, in wait >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task message = >> self.waiters.get(msg_id, timeout=timeout) >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", >> line 443, in get >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task raise >> oslo_messaging.MessagingTimeout( >> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task >> oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply >> to message ID c8a676a9709242908dcff97046d7976d >> >> *** I use cluster rabbitmq with ha-policy for exchange and queue. These >> logs are gone when I restart cinder and nova services. >> >> >> >> Nguyen Huu Khoi >> >> >> On Mon, Oct 24, 2022 at 5:42 PM Eugen Block wrote: >> >>> You don't need to create a new thread with the same issue. >>> Do the rabbitmq logs reveal anything? We create a cluster within >>> rabbitmq and the output looks like this: >>> >>> ---snip--- >>> control01:~ # rabbitmqctl cluster_status >>> Cluster status of node rabbit at control01 ... >>> Basics >>> >>> Cluster name: rabbit at rabbitmq-cluster >>> >>> Disk Nodes >>> >>> rabbit at control01 >>> rabbit at control02 >>> rabbit at control03 >>> >>> Running Nodes >>> >>> rabbit at control01 >>> rabbit at control02 >>> rabbit at control03 >>> >>> Versions >>> >>> rabbit at control01: RabbitMQ 3.8.3 on Erlang 22.2.7 >>> rabbit at control02: RabbitMQ 3.8.3 on Erlang 22.2.7 >>> rabbit at control03: RabbitMQ 3.8.3 on Erlang 22.2.7 >>> ---snip--- >>> >>> During failover it's not unexpected that a message gets lost, but it >>> should be resent, I believe. How is your openstack deployed? >>> >>> >>> Zitat von Nguy?n H?u Kh?i : >>> >>> > Hello. >>> > 2 remain nodes still running, here is my output: >>> > Basics >>> > >>> > Cluster name: rabbit at controller01 >>> > >>> > Disk Nodes >>> > >>> > rabbit at controller01 >>> > rabbit at controller02 >>> > rabbit at controller03 >>> > >>> > Running Nodes >>> > >>> > rabbit at controller01 >>> > rabbit at controller03 >>> > >>> > Versions >>> > >>> > rabbit at controller01: RabbitMQ 3.8.35 on Erlang 23.3.4.18 >>> > rabbit at controller03: RabbitMQ 3.8.35 on Erlang 23.3.4.18 >>> > >>> > Maintenance status >>> > >>> > Node: rabbit at controller01, status: not under maintenance >>> > Node: rabbit at controller03, status: not under maintenance >>> > >>> > Alarms >>> > >>> > (none) >>> > >>> > Network Partitions >>> > >>> > (none) >>> > >>> > Listeners >>> > >>> > Node: rabbit at controller01, interface: [::], port: 15672, protocol: >>> http, >>> > purpose: HTTP API >>> > Node: rabbit at controller01, interface: 183.81.13.227, port: 25672, >>> protocol: >>> > clustering, purpose: inter-node and CLI tool communication >>> > Node: rabbit at controller01, interface: 183.81.13.227, port: 5672, >>> protocol: >>> > amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >>> > Node: rabbit at controller03, interface: [::], port: 15672, protocol: >>> http, >>> > purpose: HTTP API >>> > Node: rabbit at controller03, interface: 183.81.13.229, port: 25672, >>> protocol: >>> > clustering, purpose: inter-node and CLI tool communication >>> > Node: rabbit at controller03, interface: 183.81.13.229, port: 5672, >>> protocol: >>> > amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >>> > >>> > Feature flags >>> > >>> > Flag: drop_unroutable_metric, state: enabled >>> > Flag: empty_basic_get_metric, state: enabled >>> > Flag: implicit_default_bindings, state: enabled >>> > Flag: maintenance_mode_status, state: enabled >>> > Flag: quorum_queue, state: enabled >>> > Flag: user_limits, state: enabled >>> > Flag: virtual_host_metadata, state: enabled >>> > >>> > I used ha_queues mode all >>> > But it is not better. >>> > Nguyen Huu Khoi >>> > >>> > >>> > On Tue, Oct 18, 2022 at 7:19 AM Nguy?n H?u Kh?i < >>> nguyenhuukhoinw at gmail.com> >>> > wrote: >>> > >>> >> Description >>> >> =========== >>> >> I set up 3 controllers and 3 compute nodes. My system cannot work well >>> >> when 1 rabbit node in cluster rabbitmq is down, cannot launch >>> instances. It >>> >> stucked at scheduling. >>> >> >>> >> Steps to reproduce >>> >> =========== >>> >> Openstack nodes point rabbit://node1:5672,node2:5672,node3:5672// >>> >> * Reboot 1 of 3 rabbitmq node. >>> >> * Create instances then it stucked at scheduling. >>> >> >>> >> Workaround >>> >> =========== >>> >> Point to rabbitmq VIP address. But We cannot share the load with this >>> >> solution. Please give me some suggestions. Thank you very much. >>> >> I did google and enabled system log's debug but I still cannot >>> understand >>> >> why. >>> >> >>> >> Nguyen Huu Khoi >>> >> >>> >>> >>> >>> >>> > > *?Esta mensagem ? direcionada apenas para os endere?os constantes no > cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no > cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa > mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o > imediatamente anuladas e proibidas?.* > > *?Apesar do Magazine Luiza tomar todas as precau??es razo?veis para > assegurar que nenhum v?rus esteja presente nesse e-mail, a empresa n?o > poder? aceitar a responsabilidade por quaisquer perdas ou danos causados > por esse e-mail ou por seus anexos?.* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Mon Oct 24 13:00:34 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Mon, 24 Oct 2022 14:00:34 +0100 Subject: [kolla-ansible][Yoga] Deployment stuck In-Reply-To: References: Message-ID: Anyone???? Le lun. 24 oct. 2022 ? 07:53, wodel youchi a ?crit : > Hi, > > My setup is simple, it's an hci deployment composed of 3 controllers nodes > and 6 compute and storage nodes. > I am using ceph-ansible for deploying the storage part and the deployment > goes well. > > My base OS is Rocky Linux 8 fully updated. > > My network is composed of a 1Gb management network for OS, application > deployment and server management. And a 40Gb with LACP (80Gb) data network. > I am using vlans to segregate openstack networks. > > I updated both Xena and Yoga kolla-ansible package I updated several times > the container images (I am using a local registry). > > No matter how many times I tried to deploy it's the same behavior. The > setup gets stuck somewhere. > > I tried to deploy the core modules without SSL, I tried to use an older > kernel, I tried to use the 40Gb network to deploy, nothing works. The > problem is the lack of error if there was one it would have been a starting > point but I have nothing. > > Regards. > > On Sun, Oct 23, 2022, 00:42 wodel youchi wrote: > >> Hi, >> >> Here you can find the kolla-ansible *deploy *log with ANSIBLE_DEBUG=1 >> >> Regards. >> >> Le sam. 22 oct. 2022 ? 23:55, wodel youchi a >> ?crit : >> >>> Hi, >>> >>> I am trying to deploy a new platform using kolla-ansible Yoga and I am >>> trying to upgrade another platform from Xena to yoga. >>> >>> On both platforms the prechecks went well, but when I start the process >>> of deployment for the first and upgrade for the second, the process gets >>> stuck. >>> >>> I tried to tail -f /var/log/kolla/*/*.log but I can't get hold of the >>> cause. >>> >>> In the first platform, some services get deployed, and at some point the >>> script gets stuck, several times in the modprobe phase. >>> >>> In the second platform, the upgrade gets stuck on : >>> >>> Escalation succeeded >>> [204/1859] >>> <20.3.0.28> (0, b'\n{"path": "/etc/kolla/cron", "changed": false, >>> "diff": {"before": {"path": "/etc/kolla/cro >>> n"}, "after": {"path": "/etc/kolla/cron"}}, "uid": 0, "gid": 0, "owner": >>> "root", "group": "root", "mode": "07 >>> 70", "state": "directory", "secontext": >>> "unconfined_u:object_r:etc_t:s0", "size": 70, "invocation": {"module_ >>> args": {"path": "/etc/kolla/cron", "owner": "root", "group": "root", >>> "mode": "0770", "recurse": false, "force >>> ": false, "follow": true, "modification_time_format": "%Y%m%d%H%M.%S", >>> "access_time_format": "%Y%m%d%H%M.%S", >>> "unsafe_writes": false, "state": "directory", "_original_basename": >>> null, "_diff_peek": null, "src": null, " >>> modification_time": null, "access_time": null, "seuser": null, "serole": >>> null, "selevel": null, "setype": nul >>> l, "attributes": null}}}\n', b'') >>> ok: [20.3.0.28] => (item={'key': 'cron', 'value': {'container_name': >>> 'cron', 'group': 'cron', 'enabled': True >>> , 'image': '20.3.0.34:4000/openstack.kolla/centos-source-cron:yoga', >>> 'environment': {'DUMMY_ENVIRONMENT': 'ko >>> lla_useless_env', 'KOLLA_LOGROTATE_SCHEDULE': 'daily'}, 'volumes': >>> ['/etc/kolla/cron/:/var/lib/kolla/config_f >>> iles/:ro', '/etc/localtime:/etc/localtime:ro', '', >>> 'kolla_logs:/var/log/kolla/'], 'dimensions': {}}}) => { >>> "ansible_loop_var": "item", >>> "changed": false, >>> "diff": { >>> "after": { >>> "path": "/etc/kolla/cron" >>> }, >>> "before": { >>> "path": "/etc/kolla/cron" >>> } >>> }, >>> "gid": 0, >>> "group": "root", >>> >>> How to start debugging the situation. >>> >>> Regards. >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Mon Oct 24 13:19:01 2022 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Mon, 24 Oct 2022 20:19:01 +0700 Subject: Openstack cluster cannot create instances when 1 of 3 rabbitmq cluster node down In-Reply-To: References: <20221024103557.Horde.2TO2Li_lBDYO88_RsdHZz4I@webmail.nde.ag> Message-ID: Hello. I have checked that my Openstack is Xena and oslo.messaging is 12.9.4. The problem still happens. I confirm that. Nguyen Huu Khoi On Mon, Oct 24, 2022 at 7:07 PM Nguy?n H?u Kh?i wrote: > Thank you for your response. This is exactly what I am facing. But I don't > know how I can workaround it. Because I deploy with Kolla-Ansible Xena.. My > current workaround is point oslo.messaging to VIP. BTW, I am very glad > when We know why it happened. > > Nguyen Huu Khoi > > > On Mon, Oct 24, 2022 at 6:52 PM ROBERTO BARTZEN ACOSTA < > roberto.acosta at luizalabs.com> wrote: > >> Hey folks, >> >> I believe this problem is related to the maximum timeout in the pool >> loop, and was introduced in this thread [1] with this specific commit [2]. >> >> [1] https://bugs.launchpad.net/oslo.messaging/+bug/1935864 >> [2] >> https://opendev.org/openstack/oslo.messaging/commit/bdcf915e788bb368774e5462ccc15e6f5b7223d7 >> >> Corey Bryant proposed a workaround removing this commit [2] and building >> an alternate ubuntu pkg in this thread [3], but the root cause needs to be >> investigated because it was originally modified to solve the issue [1]. >> >> [3] >> https://bugs.launchpad.net/ubuntu/jammy/+source/python-oslo.messaging/+bug/1993149 >> >> Regards, >> Roberto >> >> >> >> Em seg., 24 de out. de 2022 ?s 08:30, Nguy?n H?u Kh?i < >> nguyenhuukhoinw at gmail.com> escreveu: >> >>> Hello. Sorry for that. >>> I just want to notice that both nova and cinder have this problem, >>> When diving to logs on both service I see: >>> ERROR oslo.messaging._drivers.impl_rabbit [-] >>> [8634b511-7eee-4e50-8efd-b96d420e9914] AMQP server on [node was down]:5672 >>> is unreachable: . Trying again >>> in 1 seconds.: amqp.exceptions.RecoverableConnectionError: >>> >>> >>> and >>> >>> >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task Traceback >>> (most recent call last): >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>> line 441, in get >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >>> self._queues[msg_id].get(block=True, timeout=timeout) >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line >>> 322, in get >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >>> waiter.wait() >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line >>> 141, in wait >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >>> get_hub().switch() >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/hubs/hub.py", >>> line 313, in switch2022-10-24 14:23:01.945 7 ERROR >>> oslo_service.periodic_task return self.greenlet.switch() >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task _queue.Empty >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task During >>> handling of the above exception, another exception occurred: >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task Traceback >>> (most recent call last): >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/periodic_task.py", >>> line 216, in run_periodic_tasks >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task >>> task(self, context) >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/compute/manager.py", >>> line 9716, in _sync_power_states >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task >>> db_instances = objects.InstanceList.get_by_host(context, self.host, >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_versionedobjects/base.py", >>> line 175, in wrapper >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = >>> cls.indirection_api.object_class_action_versions( >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/conductor/rpcapi.py", >>> line 240, in object_class_action_versions >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >>> cctxt.call(context, 'object_class_action_versions', >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/rpc/client.py", >>> line 189, in call >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = >>> self.transport._send( >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/transport.py", >>> line 123, in _send >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >>> self._driver.send(target, ctxt, message, >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>> line 689, in send >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task return >>> self._send(target, ctxt, message, wait_for_reply, timeout, >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>> line 678, in _send >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task result = >>> self._waiter.wait(msg_id, timeout, >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>> line 567, in wait >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task message = >>> self.waiters.get(msg_id, timeout=timeout) >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task File >>> "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", >>> line 443, in get >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task raise >>> oslo_messaging.MessagingTimeout( >>> 2022-10-24 14:23:01.945 7 ERROR oslo_service.periodic_task >>> oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply >>> to message ID c8a676a9709242908dcff97046d7976d >>> >>> *** I use cluster rabbitmq with ha-policy for exchange and queue. These >>> logs are gone when I restart cinder and nova services. >>> >>> >>> >>> Nguyen Huu Khoi >>> >>> >>> On Mon, Oct 24, 2022 at 5:42 PM Eugen Block wrote: >>> >>>> You don't need to create a new thread with the same issue. >>>> Do the rabbitmq logs reveal anything? We create a cluster within >>>> rabbitmq and the output looks like this: >>>> >>>> ---snip--- >>>> control01:~ # rabbitmqctl cluster_status >>>> Cluster status of node rabbit at control01 ... >>>> Basics >>>> >>>> Cluster name: rabbit at rabbitmq-cluster >>>> >>>> Disk Nodes >>>> >>>> rabbit at control01 >>>> rabbit at control02 >>>> rabbit at control03 >>>> >>>> Running Nodes >>>> >>>> rabbit at control01 >>>> rabbit at control02 >>>> rabbit at control03 >>>> >>>> Versions >>>> >>>> rabbit at control01: RabbitMQ 3.8.3 on Erlang 22.2.7 >>>> rabbit at control02: RabbitMQ 3.8.3 on Erlang 22.2.7 >>>> rabbit at control03: RabbitMQ 3.8.3 on Erlang 22.2.7 >>>> ---snip--- >>>> >>>> During failover it's not unexpected that a message gets lost, but it >>>> should be resent, I believe. How is your openstack deployed? >>>> >>>> >>>> Zitat von Nguy?n H?u Kh?i : >>>> >>>> > Hello. >>>> > 2 remain nodes still running, here is my output: >>>> > Basics >>>> > >>>> > Cluster name: rabbit at controller01 >>>> > >>>> > Disk Nodes >>>> > >>>> > rabbit at controller01 >>>> > rabbit at controller02 >>>> > rabbit at controller03 >>>> > >>>> > Running Nodes >>>> > >>>> > rabbit at controller01 >>>> > rabbit at controller03 >>>> > >>>> > Versions >>>> > >>>> > rabbit at controller01: RabbitMQ 3.8.35 on Erlang 23.3.4.18 >>>> > rabbit at controller03: RabbitMQ 3.8.35 on Erlang 23.3.4.18 >>>> > >>>> > Maintenance status >>>> > >>>> > Node: rabbit at controller01, status: not under maintenance >>>> > Node: rabbit at controller03, status: not under maintenance >>>> > >>>> > Alarms >>>> > >>>> > (none) >>>> > >>>> > Network Partitions >>>> > >>>> > (none) >>>> > >>>> > Listeners >>>> > >>>> > Node: rabbit at controller01, interface: [::], port: 15672, protocol: >>>> http, >>>> > purpose: HTTP API >>>> > Node: rabbit at controller01, interface: 183.81.13.227, port: 25672, >>>> protocol: >>>> > clustering, purpose: inter-node and CLI tool communication >>>> > Node: rabbit at controller01, interface: 183.81.13.227, port: 5672, >>>> protocol: >>>> > amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >>>> > Node: rabbit at controller03, interface: [::], port: 15672, protocol: >>>> http, >>>> > purpose: HTTP API >>>> > Node: rabbit at controller03, interface: 183.81.13.229, port: 25672, >>>> protocol: >>>> > clustering, purpose: inter-node and CLI tool communication >>>> > Node: rabbit at controller03, interface: 183.81.13.229, port: 5672, >>>> protocol: >>>> > amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >>>> > >>>> > Feature flags >>>> > >>>> > Flag: drop_unroutable_metric, state: enabled >>>> > Flag: empty_basic_get_metric, state: enabled >>>> > Flag: implicit_default_bindings, state: enabled >>>> > Flag: maintenance_mode_status, state: enabled >>>> > Flag: quorum_queue, state: enabled >>>> > Flag: user_limits, state: enabled >>>> > Flag: virtual_host_metadata, state: enabled >>>> > >>>> > I used ha_queues mode all >>>> > But it is not better. >>>> > Nguyen Huu Khoi >>>> > >>>> > >>>> > On Tue, Oct 18, 2022 at 7:19 AM Nguy?n H?u Kh?i < >>>> nguyenhuukhoinw at gmail.com> >>>> > wrote: >>>> > >>>> >> Description >>>> >> =========== >>>> >> I set up 3 controllers and 3 compute nodes. My system cannot work >>>> well >>>> >> when 1 rabbit node in cluster rabbitmq is down, cannot launch >>>> instances. It >>>> >> stucked at scheduling. >>>> >> >>>> >> Steps to reproduce >>>> >> =========== >>>> >> Openstack nodes point rabbit://node1:5672,node2:5672,node3:5672// >>>> >> * Reboot 1 of 3 rabbitmq node. >>>> >> * Create instances then it stucked at scheduling. >>>> >> >>>> >> Workaround >>>> >> =========== >>>> >> Point to rabbitmq VIP address. But We cannot share the load with this >>>> >> solution. Please give me some suggestions. Thank you very much. >>>> >> I did google and enabled system log's debug but I still cannot >>>> understand >>>> >> why. >>>> >> >>>> >> Nguyen Huu Khoi >>>> >> >>>> >>>> >>>> >>>> >>>> >> >> *?Esta mensagem ? direcionada apenas para os endere?os constantes no >> cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no >> cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa >> mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o >> imediatamente anuladas e proibidas?.* >> >> *?Apesar do Magazine Luiza tomar todas as precau??es razo?veis para >> assegurar que nenhum v?rus esteja presente nesse e-mail, a empresa n?o >> poder? aceitar a responsabilidade por quaisquer perdas ou danos causados >> por esse e-mail ou por seus anexos?.* >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Sat Oct 22 23:42:04 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Sun, 23 Oct 2022 00:42:04 +0100 Subject: [kolla-ansible][Yoga] Deployment stuck In-Reply-To: References: Message-ID: Hi, Here you can find the kolla-ansible *deploy *log with ANSIBLE_DEBUG=1 Regards. Le sam. 22 oct. 2022 ? 23:55, wodel youchi a ?crit : > Hi, > > I am trying to deploy a new platform using kolla-ansible Yoga and I am > trying to upgrade another platform from Xena to yoga. > > On both platforms the prechecks went well, but when I start the process of > deployment for the first and upgrade for the second, the process gets stuck. > > I tried to tail -f /var/log/kolla/*/*.log but I can't get hold of the > cause. > > In the first platform, some services get deployed, and at some point the > script gets stuck, several times in the modprobe phase. > > In the second platform, the upgrade gets stuck on : > > Escalation succeeded > [204/1859] > <20.3.0.28> (0, b'\n{"path": "/etc/kolla/cron", "changed": false, "diff": > {"before": {"path": "/etc/kolla/cro > n"}, "after": {"path": "/etc/kolla/cron"}}, "uid": 0, "gid": 0, "owner": > "root", "group": "root", "mode": "07 > 70", "state": "directory", "secontext": "unconfined_u:object_r:etc_t:s0", > "size": 70, "invocation": {"module_ > args": {"path": "/etc/kolla/cron", "owner": "root", "group": "root", > "mode": "0770", "recurse": false, "force > ": false, "follow": true, "modification_time_format": "%Y%m%d%H%M.%S", > "access_time_format": "%Y%m%d%H%M.%S", > "unsafe_writes": false, "state": "directory", "_original_basename": null, > "_diff_peek": null, "src": null, " > modification_time": null, "access_time": null, "seuser": null, "serole": > null, "selevel": null, "setype": nul > l, "attributes": null}}}\n', b'') > ok: [20.3.0.28] => (item={'key': 'cron', 'value': {'container_name': > 'cron', 'group': 'cron', 'enabled': True > , 'image': '20.3.0.34:4000/openstack.kolla/centos-source-cron:yoga', > 'environment': {'DUMMY_ENVIRONMENT': 'ko > lla_useless_env', 'KOLLA_LOGROTATE_SCHEDULE': 'daily'}, 'volumes': > ['/etc/kolla/cron/:/var/lib/kolla/config_f > iles/:ro', '/etc/localtime:/etc/localtime:ro', '', > 'kolla_logs:/var/log/kolla/'], 'dimensions': {}}}) => { > "ansible_loop_var": "item", > "changed": false, > "diff": { > "after": { > "path": "/etc/kolla/cron" > }, > "before": { > "path": "/etc/kolla/cron" > } > }, > "gid": 0, > "group": "root", > > How to start debugging the situation. > > Regards. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Mon Oct 24 14:17:29 2022 From: zigo at debian.org (Thomas Goirand) Date: Mon, 24 Oct 2022 16:17:29 +0200 Subject: [telemetry][cloudkitty][ceilometer] Billing windows instances In-Reply-To: References: <07d23aac-fee9-cd11-bf25-5030dba2cf6c@debian.org> Message-ID: <0ad61cfd-4d00-6fb3-dc4d-a8f3384ddc48@debian.org> On 10/21/22 17:40, Rafael Weing?rtner wrote: > Hello Zigo!, > You might want to take a look at the new implementations we made in > ceilometer, and CloudKitty. > - https://review.opendev.org/c/openstack/cloudkitty/+/861806 > > - https://review.opendev.org/c/openstack/ceilometer/+/856178 > > - https://review.opendev.org/c/openstack/ceilometer/+/852021 > > - https://review.opendev.org/c/openstack/ceilometer/+/850253 > > - https://review.opendev.org/c/openstack/ceilometer/+/855953 > > > Not directly relate to this use case, but also might interest you: > - https://review.opendev.org/c/openstack/cloudkitty/+/861786 > > - https://review.opendev.org/c/openstack/cloudkitty/+/861807 > > - https://review.opendev.org/c/openstack/cloudkitty/+/861908 > > - https://review.opendev.org/c/openstack/ceilometer/+/856972 > > - https://review.opendev.org/c/openstack/ceilometer/+/861109 > > - https://review.opendev.org/c/openstack/ceilometer/+/856304 > > - https://review.opendev.org/c/openstack/ceilometer/+/856305 > > > > In short, we can now create Ceilometer compute dynamic pollsters, which > can execute scripts in the host, and check the actual operating system > installed in the VM. Then, this data can be pushed back to the storage > backend via Ceilometer as an attribute, which is then processed in > CloudKitty. Furthermore, we extended cloudkitty to generate different > ratings for the same metric. Therefore, by doing this, we do not need > multiple metrics to have different CloudKitty ratings appearing for > users. This allows us, for instance, to have one rating for the VM usage > itself, and others for each license, and so on. Hi Raphael! Thanks a lot for all of the above (both the patches merged upstream themselves, and taking the time to give me reference to them). We've backported this to our production version (ie: Victoria) without too much pain (it took me like 4 hours to do so, cherry-picking missing patches on top of which these were applied). We then wrote a quick command to produce a JSON containing the image type (as reported by the os_type property of the image) that we put in cache in the compute, and then dump this image type, and the associated project ID. This looks promising: we only need to write the dynamic pollster now! :) So really, thanks a lot. I'll let you know when we have a full solution (that I will also publish as free software). Cheers, Thomas Goirand (zigo) From jean-francois.taltavull at elca.ch Mon Oct 24 14:26:06 2022 From: jean-francois.taltavull at elca.ch (=?iso-8859-1?Q?Taltavull_Jean-Fran=E7ois?=) Date: Mon, 24 Oct 2022 14:26:06 +0000 Subject: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size metric Message-ID: <3bb519ce13094cec91c8409fe165f7b1@elca.ch> Hello, I'm trying to get the 'radosgw.objects.size' metric, that is the total bucket objects size per tenant. I expected to get one sample per tenant but I get one sample per bucket instead, as with the 'rados.containers.objects.size' metric. Here is my pollster definition: ''' - name: "radosgw.objects.size" sample_type: "gauge" unit: "B" value_attribute: ". | value['usage'] | value.get('rgw.main',{'size':0}) | value['size']" url_path: "FQDN/admin/bucket?stats=True" module: "awsauth" authentication_object: "S3Auth" authentication_parameters: my_access_key,my_secret_key,FQDN user_id_attribute: "owner | value.split('$') | value[0]" project_id_attribute: "tenant" resource_id_attribute: "id" ''' I tried with "resource_id_attribute: "tenant" but it does not work better. Any idea ? Is there something wrong in the pollster definition ? Regards, Jean-Francois From rafaelweingartner at gmail.com Mon Oct 24 14:39:31 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Mon, 24 Oct 2022 11:39:31 -0300 Subject: [telemetry][cloudkitty][ceilometer] Billing windows instances In-Reply-To: <0ad61cfd-4d00-6fb3-dc4d-a8f3384ddc48@debian.org> References: <07d23aac-fee9-cd11-bf25-5030dba2cf6c@debian.org> <0ad61cfd-4d00-6fb3-dc4d-a8f3384ddc48@debian.org> Message-ID: Awesome! i am glad to hear that you guys are managing to use it. If you need anything, please let me know. On Mon, Oct 24, 2022 at 11:21 AM Thomas Goirand wrote: > On 10/21/22 17:40, Rafael Weing?rtner wrote: > > Hello Zigo!, > > You might want to take a look at the new implementations we made in > > ceilometer, and CloudKitty. > > - https://review.opendev.org/c/openstack/cloudkitty/+/861806 > > > > - https://review.opendev.org/c/openstack/ceilometer/+/856178 > > > > - https://review.opendev.org/c/openstack/ceilometer/+/852021 > > > > - https://review.opendev.org/c/openstack/ceilometer/+/850253 > > > > - https://review.opendev.org/c/openstack/ceilometer/+/855953 > > > > > > Not directly relate to this use case, but also might interest you: > > - https://review.opendev.org/c/openstack/cloudkitty/+/861786 > > > > - https://review.opendev.org/c/openstack/cloudkitty/+/861807 > > > > - https://review.opendev.org/c/openstack/cloudkitty/+/861908 > > > > - https://review.opendev.org/c/openstack/ceilometer/+/856972 > > > > - https://review.opendev.org/c/openstack/ceilometer/+/861109 > > > > - https://review.opendev.org/c/openstack/ceilometer/+/856304 > > > > - https://review.opendev.org/c/openstack/ceilometer/+/856305 > > > > > > > > In short, we can now create Ceilometer compute dynamic pollsters, which > > can execute scripts in the host, and check the actual operating system > > installed in the VM. Then, this data can be pushed back to the storage > > backend via Ceilometer as an attribute, which is then processed in > > CloudKitty. Furthermore, we extended cloudkitty to generate different > > ratings for the same metric. Therefore, by doing this, we do not need > > multiple metrics to have different CloudKitty ratings appearing for > > users. This allows us, for instance, to have one rating for the VM usage > > itself, and others for each license, and so on. > > Hi Raphael! > > Thanks a lot for all of the above (both the patches merged upstream > themselves, and taking the time to give me reference to them). > > We've backported this to our production version (ie: Victoria) without > too much pain (it took me like 4 hours to do so, cherry-picking missing > patches on top of which these were applied). We then wrote a quick > command to produce a JSON containing the image type (as reported by the > os_type property of the image) that we put in cache in the compute, and > then dump this image type, and the associated project ID. This looks > promising: we only need to write the dynamic pollster now! :) > > So really, thanks a lot. I'll let you know when we have a full solution > (that I will also publish as free software). > > Cheers, > > Thomas Goirand (zigo) > > > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Mon Oct 24 16:49:46 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Mon, 24 Oct 2022 18:49:46 +0200 Subject: [nova][placement] 2023.1 Antelope PTG summary Message-ID: We're on a wetty Monday here and I guess this is time for me to take my pen again and write a PTG recap email for the current release, which is 2023.1 Antelope. Yet again, I *beg* you to *not* use any translation tool when reading any etherpad, including Google Chrome embedded translation or it would fully translate the whole etherpad content for *all* readers. In order to prevent any accidental misusage, here is a readonly copy of the Nova etherpad we had for the week https://etherpad.opendev.org/p/r.8ae4e0ef997aebfe626b2b272ff23f1b Again, I'm human and while I can write bugs (actually I write a lot of them), I can also write wrong things in this email and I can misinterpretate something precise. Apologies if so, and please don't refrain them yourself to correct me by replying to this email. Last but not the least, I'm more than open to any questions or remarks that would come from the reading of this long email. Good luck with this very long thread, you should maybe grab a coffee before starting to read it (and I promise to keep it as short as I can). ### Operator hours ### After the success of the productive Nova meet-and-greet session we held in Berlin (packed room w/ lots of feedback), we were eager to again have a discussion with operators, this time virtually which would have hopefully lowered the entry barrier by not requiring an in-person event in order to attend the session. Consequently, we allocated three timeslots of one hour, two back-to-back on Tuesday and one on Wednesday (on different times to arrange time differences between operators). Unfortunately, I have to write here that we only had a very small attendance on Tuesday with only three operators joining (but providing great feedback, thanks btw. !) and *none* on the 1-hour Wednesday slot. As a result, I won't provide statistics about release and project usage here as numbers are too low to be representative, but here is what we discussed : (again, a readonly etherpad is available here https://etherpad.opendev.org/p/r.aa8b12b385297b455138d35172698d48 for further context) # Upcoming features and general discussions - the possibility to mount Manila shares in Nova seems promising. - As nova doesn't support multiple physnets per neutron network today, this prevents the use of routed networks in some cases. - modifying our mod_wsgi usage to be allowed to pass some arguments is requested - operators seem interested in having PCI devices tracked in Placement # Pain points - getting inventories or used/availables resources in Placement becomes costly as it requires N calls, with N be the number of Resource Providers (instead of a single HTTP call). Problem has been acknowledged by the team and some design discussion occurred. - We should paginate on the flavors API and we need to fix the private > public flavor bug. - mediated devices disappearing at reboot is a problem. This has been discussed during the contributor session later, see below. - routing metrics are hard to manage when you have a long list of multiple ports attached. We eventually agreed on the fact the proposal fix could be an interesting feature for Neutron team to develop. That's it for the operators discussions. Let's now discuss about what we discussed in the contributors PTG agenda : ### Cross-project discussions Two projects this time were discussing with the Nova community : # Ironic-Nova The whole session was about the inconsistencies that were happening if a nova-compute was failing down with Ironic rebalancing the nodes. - we agreed on the fact this wasn't an easy problem to solve - a consensus was around the fact Ironic should support a sharding key so the Nova compute could use it instead of using hash rings. - JayF and johnthetubayguy agreed on codifying this idea into a spec - we'd hear feedback from operators about what they feel of the above migration (sharding their node cloud into sharding pieces and providing sharding details to nova) - in parallel of the above, documentation has to be amended in order to recommend for *existing deployments* to setup active/passive failover mode for Nova computes (instead of relying on hashring rebalances) # Neutron-Nova (I'll briefly cover the details here, as it will also be covered by the Neutron PTG recap) - while Nova is finishing up tracking of PCI devices in Placement this cycle, we agreed on defering the modeling of Neutron-based PCI requests until next cycle. This won't have any impact in terms of existing features that will continue to operate seamlessly. - the way we define switchdev capabilities as of now is incorrect. We agreed on modfying Nova to allow it to report such capabilities to Neutron. This may be a specless blueprint or just a bugfix (and then potential backportability), to be determined later. - MTUs are unfortunately immutable. If you change the MTU value by Neutron, you need to restart the instance( (or reattach the port). We consequently agreed on documenting this caveat, which is due to libvirt/qemu not able to change the MTU while the guest is running. - We eventually agreed on letting --hostname (parameter of an instance creation request) to represent a FQDN thru a nova microversion. This value won't be sanitized by nova and directly passed to Neutron as the hostname value. We also agreed on the fact Nova WON'T ever touch the port domain value in metadata, as this is not in the project scope to manage name domains. ### Procedural discussions (skip it if you're not interacting with the Nova contributors) ### # Zed retrospective - We had a productive Zed release and we made a good job on reducing bugs and bug reports. Kudos to the team again. - The microversions etherpad we had was nice but planning microversions usage is hard. WE agreed on rather providing a document to contributors explaining them how to write an API change that adds a new microversion and how to easily rebase this change if a merge conflict occurs (due to the micoversion being taken by another patch that merged) - We agreed on keeping an etherpad for tracking all API changes during milestone-3 - We agreed on filing blueprints for TC goals that impact Nova - bauzas will promote again the use of review-priority label in Gerrit during weekly meetings in order for cores to show their interest in reviewing a particular patch. # Promoting backlog features to new contributors and enhance mentoring - Starting this cycle, we'll put our list of small and easily actionable blueprints into Launchpad bug reports that have a "Wishlist" status and both a 'low-hanging-fruit' and a 'rfe' tag. New contributors or people wanting to mentor a new upstream member are more than welcome to consume that list of 'rfe' bugs and identify the ones they're willing to work on. A detailed process will be documented for helping newcomers to join. - We'll also draft our pile of work we defer into 'backlog specs' if they require further design (and are less actionable from a newcomer perspective) # Other procedural and release discussions - we'll stop using Storyboard for tracking Placement features and bugs and we'll pivot back to Launchpad for the 'placement' sub-project. - we agreed on a *feature* review day possibly 4 weeks before the feature freeze in order to catch up any late design issue we could have missed when reviewing the spec. - we will EOL stein and older nova branches with elodilles proposing the deletion patch. We agred on discussing EOL'ing train at next PTG. - gibi will work on providing a static requirements file for nova that would use minimum versions of our dependencies (non transitively) and modify unit and functional test jobs to rather use this capped requirements file for testing. - we discussed the User Survey and we agreed on discussing any question we may want to add in the next survey during the next weekly meetings. - "2023.1 Antelope" naming is a bit confusing, but we agreed on we should continue to either use "2023.1" or "2023.1 Antelope" for naming our docs. We also wait for guidelines in order to consistently name our next stable branch (2023.1 possibly). (That's the end of procedural and release discussions, please resume here if you skipped the above) ### Technical bits ### # VMware driver status - As currently no job runs happened since April, we agreed on communicating in the nova-compute logs at periodic times that the code isn't tested so operators running it would know its upstream status. - we'll update the supported matrix documentation to refiect this 'UNTESTED' state # TC goals this cycle - for the FIPS goal, we agreed on the fact the current FIPS job (running on centos 9 stream) shouldn't be running on gate and check pipelines. As the current job is busted (all periodic runs go into TIMEOUT state), we also want to modify the job timeout to allow 30 mins more time for running (due to a reboot in the job definition) - for the oslo.privsep goal, no effort is required from a nova perspective (all deliverables are already using privsep). sean-k-mooney can propose a 'rfe' bug (see the note above on new contributors) for modifying our privsep usage in nova (using different privsep context by callers) - for the Ubunutu 2022.4 goal, gmann already provided changes. We need to review them. - for the RBAC goal, let's discuss it in a proper section just below # Next steps on RBAC - we need to test new policy defaults in a full integrated tempest testsuite (meaning with other projects), ideally before the first milestone. - once we check everything works fine as expected, we can flip the default (enabling new policies) before milestone-2 - we'd like to start drafting the new service role usage by two new specs (one for nova, the other one for placement) # Power management in Nova A pile of work I'm proud we gonna start this cycle. This is about disabling/enabling cores on purpose, so power savings occur. - we agreed on supporting it directly in Nova (and to not design and support an external tool which would suppose to draft some heave interface for an easy quickwin). This would just be a config flag to turn on that would enable CPU cores on demand. - a potential follow-up *may* be to use a different CPU governor depending on flavors or images but this won't be scoped for this release. # Power monitoring in Nova I don't really like this name, as monitoring isn't part of the Nova mission statement so I'll clarify. Here, this is about providing an internal readonly interface for power monitoring tools running on guests that would be able to capture host consumption metrics. One example of such monitoring tools is Scaphandre, demonstrated during the last OpenInfraSummit at a keynote. - we agreed on reusing virtiofs support we're gonna introduce in this release for the Manila share attachment usecase - this share would be readonly and would be unique per instance (ie. we wouldn't be supporting multiple guest agents reading different directories) - this share would be enabled thru a configuration flag per compute, and would only be mounted per instance by specific flavor or image extraspec/metadata. # Database soft-deleted records and the archive/purge commands - We don't want to deprecate DB soft-deleted records as some APIs continue to rely on those records. - We rather prefer to add a new parameter to the purge command that will directly delete the 'soft-deleted' records from the main DBs and not only the shadow tables (skipping the need to archive, if the operator wants to use this parameter) # Nova-compute failing at reboot due to vGPU missing devices Talking of bug #1900800, the problem is that mediated devices aren't persistent so nova-compute isn't able to respawn the instances after a reboot if those are vGPU-flavored. - we agreed on the fact that, like any other device, Nova shouldn't try to create them and we should rather ask the operator to pre-create the mediated devices, exaclty like we do for SR-IOV VFs. - we'll accordingly deprecate the mdev creation in the libvirt driver (but we'll continue to support it) and we'll log a warning if Nova has to create one mdev. - we'll change the nova-compute init script to raise a better exception explaining which mdev is missing - we'll document a procedure for explaining how to get existing mdev information and persist them by udev rules (for upgrade cases) # Robustify our compute hostname changes To be clear, at first step, we will continue to *NOT* support nova-compute hostname changes but we'll better detect the hostname change to prevent later issues. - first step to persist the compute UUID on disk seems a good candidate so a spec is targeted this cycle. - next steps could be to robustify our relationships between instance, compute node and service object records but this design will be deferred for later in a backlog spec. # Move to OpenStackClient and SDK OSC is already fully supported in Zed but it continues to rely on novaclient python bindings for calling the Nova API. - we agreed on modifying OSC to rather use openstacksdk (instead of novaclient) for communicating to the Nova APIs. Contributors welcome on this move. - we agreed on stopping to use project client libraries in nova services (eg. cinderclient used by nova-compute) and rather use openstacksdk directly. A 'rfe' bug per project client will be issued, anyone willing to work on it is welcome. - we also agreed on continuing to support novaclient for a couple of releases, as operators or other consumers of this package could require substantial efforts to move to the sdk. - we agreed on changing the release model for novaclient to be independent so we can release anytime we need. # Evacuate to target state We understand the usecase (evacuate an instance shouldn't always turn the evacuated instance on) - that said, we don't want to amend the API for passing a parameter as this would carry a tech debt indefinitely) - we prefer to propose a new microversion that would stop the instance eventually instead of starting it - operators wanting to keep the original behaviour would need to negociate an older microversion to the Nova API as we don't intend to make spawning optionally chosen by API on evacuate. (aaaaaaaaaaaaaaaand that's it for the PTG recap) Kudos, you were brave, you reached that point. Hope your coffee was good and now you feel rejuvenated. Anyway, time for me now to rest my fingers and to enjoy a deserved time off. As said, I'm all up for any questions or remarks that would come from the reading of this enormous thread. -Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Mon Oct 24 18:06:07 2022 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Mon, 24 Oct 2022 18:06:07 +0000 Subject: [nova] Proposing EOL of nova project's old stable branches (Stein, Rocky, Queens) Message-ID: Hi, At the 2023.1 Antelope PTG, Nova team discussed that currently open Queens, Rocky and Stein branches for Nova teams' repositories will be moved to End of Life. There are multiple reasons behind this decision: - gate of these branches are broken for a couple of months now - zuul job definitions for these branches use the pre-"zuul-v3" [1] style, which makes the maintenance harder of these jobs - CI on Rocky and Queens use Ubuntu Xenial, which is also beyond its public maintenance window date And last but not least bugfix backports to these branches don't really get enough reviews anymore. If anyone has any interest to keep any of the above branches open then this is the last opportunity to step up, otherwise, as Nova team decided on the PTG, these branches will transition to End of Life. Please let us know before November 7th, we will proceed with the transition in case no responses will come till that day. [1] https://docs.openstack.org/project-team-guide/zuulv3.html Thanks in advance, El?d Ill?s irc:elodilles @ #openstack-nova -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Mon Oct 24 18:40:59 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 24 Oct 2022 19:40:59 +0100 Subject: [nova][placement] 2023.1 Antelope PTG summary In-Reply-To: References: Message-ID: On Mon, 2022-10-24 at 18:49 +0200, Sylvain Bauza wrote: > We're on a wetty Monday here and I guess this is time for me to take my pen > again and write a PTG recap email for the current release, which is 2023.1 > Antelope. > Yet again, I *beg* you to *not* use any translation tool when reading any > etherpad, including Google Chrome embedded translation or it would fully > translate the whole etherpad content for *all* readers. > In order to prevent any accidental misusage, here is a readonly copy of the > Nova etherpad we had for the week > https://etherpad.opendev.org/p/r.8ae4e0ef997aebfe626b2b272ff23f1b > > Again, I'm human and while I can write bugs (actually I write a lot of > them), I can also write wrong things in this email and I can > misinterpretate something precise. Apologies if so, and please don't > refrain them yourself to correct me by replying to this email. > > Last but not the least, I'm more than open to any questions or remarks that > would come from the reading of this long email. > > Good luck with this very long thread, you should maybe grab a coffee before > starting to read it (and I promise to keep it as short as I can). > > ### Operator hours ### > > After the success of the productive Nova meet-and-greet session we held in > Berlin (packed room w/ lots of feedback), we were eager to again have a > discussion with operators, this time virtually which would have hopefully > lowered the entry barrier by not requiring an in-person event in order to > attend the session. > Consequently, we allocated three timeslots of one hour, two back-to-back on > Tuesday and one on Wednesday (on different times to arrange time > differences between operators). > Unfortunately, I have to write here that we only had a very small > attendance on Tuesday with only three operators joining (but providing > great feedback, thanks btw. !) and *none* on the 1-hour Wednesday slot. > > As a result, I won't provide statistics about release and project usage > here as numbers are too low to be representative, but here is what we > discussed : > (again, a readonly etherpad is available here > https://etherpad.opendev.org/p/r.aa8b12b385297b455138d35172698d48 for > further context) > > # Upcoming features and general discussions > > - the possibility to mount Manila shares in Nova seems promising. > - As nova doesn't support multiple physnets per neutron network today, this > prevents the use of routed networks in some cases. to be precise nova has never supporte the?Multi Provider Network extention https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/multiprovidernet.py which is the extention that allows a neutron network to span multipel phsyical networks. multiple network segments on the same phsynet (e.g. using mulitple vlans all ont eh same phsyent or multi segment netowrks that ahve a single physnet + tunneled segments) can be supported today. https://github.com/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc9589c291a added support for multi segment netwroks with a single physnet in pike but there was never a nova sibling spec to add support for the multiproviernet extention. as such we only ever use the first phsynet for all segments and its a non trivial amount of work to change that in nova escpially in the hardware offload ovs or generic sriov case. it would require the pci manager to accept a list of possible physnets and it will require addtional work when using placment in the future. its not imposible to do but it would require careful impelemtation to get right and is a new feature that woudl require a spec. > - modifying our mod_wsgi usage to be allowed to pass some arguments is > requested this however is more of a bug or over sight. we expected that you would jsut pass --config-dir to the wsgi script as command line args which is possibel with uwsgi however its not possibel with mod_wsgi. i have filed a trackign bug for this here https://bugs.launchpad.net/nova/+bug/1994056 and assgined it to myself for now. > - operators seem interested in having PCI devices tracked in Placement > > # Pain points > > - getting inventories or used/availables resources in Placement becomes > costly as it requires N calls, with N be the number of Resource Providers > (instead of a single HTTP call). Problem has been acknowledged by the team > and some design discussion occurred. > - We should paginate on the flavors API and we need to fix the private > > public flavor bug. > - mediated devices disappearing at reboot is a problem. This has been > discussed during the contributor session later, see below. > - routing metrics are hard to manage when you have a long list of multiple > ports attached. We eventually agreed on the fact the proposal fix could be > an interesting feature for Neutron team to develop. > > That's it for the operators discussions. Let's now discuss about what we > discussed in the contributors PTG agenda : > > > ### Cross-project discussions > > Two projects this time were discussing with the Nova community : > > # Ironic-Nova > > The whole session was about the inconsistencies that were happening if a > nova-compute was failing down with Ironic rebalancing the nodes. > > - we agreed on the fact this wasn't an easy problem to solve > - a consensus was around the fact Ironic should support a sharding key so > the Nova compute could use it instead of using hash rings. > - JayF and johnthetubayguy agreed on codifying this idea into a spec > - we'd hear feedback from operators about what they feel of the above > migration (sharding their node cloud into sharding pieces and providing > sharding details to nova) > - in parallel of the above, documentation has to be amended in order to > recommend for *existing deployments* to setup active/passive failover mode > for Nova computes (instead of relying on hashring rebalances) > > # Neutron-Nova > > (I'll briefly cover the details here, as it will also be covered by the > Neutron PTG recap) > > - while Nova is finishing up tracking of PCI devices in Placement this > cycle, we agreed on defering the modeling of Neutron-based PCI requests > until next cycle. This won't have any impact in terms of existing features > that will continue to operate seamlessly. > - the way we define switchdev capabilities as of now is incorrect. We > agreed on modfying Nova to allow it to report such capabilities to Neutron. > This may be a specless blueprint or just a bugfix (and then potential > backportability), to be determined later. i will work on this this cycle. i can either use the old bug https://bugs.launchpad.net/neutron/+bug/1713590 which was incorrectly "fixed" in neutron and add the nova project or i can file a new one. i have a slight prefernce to use the old for the added context but its proably cleaner to file a new one so i dont really mind either way. > - MTUs are unfortunately immutable. If you change the MTU value by Neutron, > you need to restart the instance( (or reattach the port). We consequently > agreed on documenting this caveat, which is due to libvirt/qemu not able to > change the MTU while the guest is running. we also agreed to move forward with https://review.opendev.org/c/openstack/nova/+/855664 to not provide a mtu if the network is configured with dhcp enabled. this is because neutron will provide the mtu via dhcp in that case. i have abandoned https://review.opendev.org/c/openstack/nova/+/852367 > - We eventually agreed on letting --hostname (parameter of an instance > creation request) to represent a FQDN thru a nova microversion. This value > won't be sanitized by nova and directly passed to Neutron as the hostname > value. We also agreed on the fact Nova WON'T ever touch the port domain > value in metadata, as this is not in the project scope to manage name > domains. just to clarify the last point we will pass it once whne the port is created by nova as we do today usting the value of instance.hostname. it it will have the behvior that that has today where it is never change after first boot. i propossed that if desginate wants to automaticly update the hostname they can do so by extending the designate-sink componet which listens for notifcation to automaticly registry fixed/floating ips with dns record. if the desinate project feel its in socpe they coudl repond to the instnace update notification if the instace.hostname is updated and modify the relevent neutron port or dns record but we strongly advise them not to ever propagage such a change to floating ip. even for the fixed ip records without a new neutron extention on the port to opt into the dns_name update its likely that updating the dns records on instace hostname update would break exsiting usage so i advise caution if desinate choose to add this feature in the future. > > > ### Procedural discussions (skip it if you're not interacting with the Nova > contributors) ### > > # Zed retrospective > > - We had a productive Zed release and we made a good job on reducing bugs > and bug reports. Kudos to the team again. > - The microversions etherpad we had was nice but planning microversions > usage is hard. WE agreed on rather providing a document to contributors > explaining them how to write an API change that adds a new microversion and > how to easily rebase this change if a merge conflict occurs (due to the > micoversion being taken by another patch that merged) > - We agreed on keeping an etherpad for tracking all API changes during > milestone-3 > - We agreed on filing blueprints for TC goals that impact Nova > - bauzas will promote again the use of review-priority label in Gerrit > during weekly meetings in order for cores to show their interest in > reviewing a particular patch. > > # Promoting backlog features to new contributors and enhance mentoring > > - Starting this cycle, we'll put our list of small and easily actionable > blueprints into Launchpad bug reports that have a "Wishlist" status and > both a 'low-hanging-fruit' and a 'rfe' tag. New contributors or people > wanting to mentor a new upstream member are more than welcome to consume > that list of 'rfe' bugs and identify the ones they're willing to work on. A > detailed process will be documented for helping newcomers to join. > - We'll also draft our pile of work we defer into 'backlog specs' if they > require further design (and are less actionable from a newcomer perspective) > > # Other procedural and release discussions > > - we'll stop using Storyboard for tracking Placement features and bugs and > we'll pivot back to Launchpad for the 'placement' sub-project. > - we agreed on a *feature* review day possibly 4 weeks before the feature > freeze in order to catch up any late design issue we could have missed when > reviewing the spec. i would also suggest we maybe have one at the start of december before we hit PTO sesion to see if we can land any features that are ready early. we can figure that out on irc or in the team meeting. > - we will EOL stein and older nova branches with elodilles proposing the > deletion patch. We agred on discussing EOL'ing train at next PTG. > - gibi will work on providing a static requirements file for nova that > would use minimum versions of our dependencies (non transitively) and > modify unit and functional test jobs to rather use this capped requirements > file for testing. > - we discussed the User Survey and we agreed on discussing any question we > may want to add in the next survey during the next weekly meetings. > - "2023.1 Antelope" naming is a bit confusing, but we agreed on we should > continue to either use "2023.1" or "2023.1 Antelope" for naming our docs. did we because i tought we agreeed to use "27.0.0 (antelope)" in all docs or gmans suggestion of "2023.1 Antelope (nova 27.0.0)" so that we continuity in our docs and release notes. '"2023.1" or "2023.1 Antelope" or 2023.1 (27.0.0)' were what i propsed we should not use in our docs as both are confusting based on our past naming. > We also wait for guidelines in order to consistently name our next stable > branch (2023.1 possibly). for the branch name i do think stable/2023.1 makes sense and the docs url shoudl be https://docs.openstack.org/nova/2023.1/ to match the branch name. the other option is https://docs.openstack.org/nova/antelope/ > > > (That's the end of procedural and release discussions, please resume here > if you skipped the above) > > ### Technical bits ### > > > # VMware driver status > > - As currently no job runs happened since April, we agreed on communicating > in the nova-compute logs at periodic times that the code isn't tested so > operators running it would know its upstream status. > - we'll update the supported matrix documentation to refiect this > 'UNTESTED' state > > # TC goals this cycle > > - for the FIPS goal, we agreed on the fact the current FIPS job (running on > centos 9 stream) shouldn't be running on gate and check pipelines. As the > current job is busted (all periodic runs go into TIMEOUT state), we also > want to modify the job timeout to allow 30 mins more time for running (due > to a reboot in the job definition) > - for the oslo.privsep goal, no effort is required from a nova perspective > (all deliverables are already using privsep). sean-k-mooney can propose a > 'rfe' bug (see the note above on new contributors) for modifying our > privsep usage in nova (using different privsep context by callers) i can file an RFE bug although it might be better to also file a backlog spec for this or contibutors doc to describe the correct way to use prvisep going forward. 1 context per class of permsions (networking, file, ectra), do not import privespe functionl define them only in the modules where they are used. include privaldged in the name. have a narrow contract taking a fixed number of named parmaters never commands as string. > - for the Ubunutu 2022.4 goal, gmann already provided changes. We need to > review them. > - for the RBAC goal, let's discuss it in a proper section just below > > # Next steps on RBAC > > - we need to test new policy defaults in a full integrated tempest > testsuite (meaning with other projects), ideally before the first milestone. > - once we check everything works fine as expected, we can flip the default > (enabling new policies) before milestone-2 > - we'd like to start drafting the new service role usage by two new specs > (one for nova, the other one for placement) > > # Power management in Nova > > A pile of work I'm proud we gonna start this cycle. This is about > disabling/enabling cores on purpose, so power savings occur. > - we agreed on supporting it directly in Nova (and to not design and > support an external tool which would suppose to draft some heave interface > for an easy quickwin). This would just be a config flag to turn on that > would enable CPU cores on demand. https://review.opendev.org/c/openstack/nova/+/821228 intoduced cpu_external_management as a proposed config name i woudl suggest based on the change of direction we shoudl add cpu_power_state_management instead. > - a potential follow-up *may* be to use a different CPU governor depending > on flavors or images but this won't be scoped for this release. yes we agreed to not do this this cycle and evaulate it in the future. > > # Power monitoring in Nova > > I don't really like this name, as monitoring isn't part of the Nova mission > statement so I'll clarify. Here, this is about providing an internal > readonly interface for power monitoring tools running on guests that would > be able to capture host consumption metrics. One example of such monitoring > tools is Scaphandre, demonstrated during the last OpenInfraSummit at a > keynote. > - we agreed on reusing virtiofs support we're gonna introduce in this > release for the Manila share attachment usecase > - this share would be readonly and would be unique per instance (ie. we > wouldn't be supporting multiple guest agents reading different directories) > - this share would be enabled thru a configuration flag per compute, and > would only be mounted per instance by specific flavor or image > extraspec/metadata. > > # Database soft-deleted records and the archive/purge commands > > - We don't want to deprecate DB soft-deleted records as some APIs continue > to rely on those records. > - We rather prefer to add a new parameter to the purge command that will > directly delete the 'soft-deleted' records from the main DBs and not only > the shadow tables (skipping the need to archive, if the operator wants to > use this parameter) > > # Nova-compute failing at reboot due to vGPU missing devices > > Talking of bug #1900800, the problem is that mediated devices aren't > persistent so nova-compute isn't able to respawn the instances after a > reboot if those are vGPU-flavored. > - we agreed on the fact that, like any other device, Nova shouldn't try to > create them and we should rather ask the operator to pre-create the > mediated devices, exaclty like we do for SR-IOV VFs. > - we'll accordingly deprecate the mdev creation in the libvirt driver (but > we'll continue to support it) and we'll log a warning if Nova has to create > one mdev. > - we'll change the nova-compute init script to raise a better exception > explaining which mdev is missing > - we'll document a procedure for explaining how to get existing mdev > information and persist them by udev rules (for upgrade cases) > > # Robustify our compute hostname changes > > To be clear, at first step, we will continue to *NOT* support nova-compute > hostname changes but we'll better detect the hostname change to prevent > later issues. > - first step to persist the compute UUID on disk seems a good candidate so > a spec is targeted this cycle. specifricaly we need to persit the compute service uuid in the compute manager. we can discuss the details in teh sepc as we may also want to persice the comptue node uuid but that would have differnt behavior depening on the virt driver in use e.g. vmware/ironic (1:n) vs livbirt (1:1) > - next steps could be to robustify our relationships between instance, > compute node and service object records but this design will be deferred > for later in a backlog spec. > > > # Move to OpenStackClient and SDK > > OSC is already fully supported in Zed but it continues to rely on > novaclient python bindings for calling the Nova API. > - we agreed on modifying OSC to rather use openstacksdk (instead of > novaclient) for communicating to the Nova APIs. Contributors welcome on > this move. > - we agreed on stopping to use project client libraries in nova services > (eg. cinderclient used by nova-compute) and rather use openstacksdk > directly. A 'rfe' bug per project client will be issued, anyone willing to > work on it is welcome. > - we also agreed on continuing to support novaclient for a couple of > releases, as operators or other consumers of this package could require > substantial efforts to move to the sdk. > - we agreed on changing the release model for novaclient to be independent > so we can release anytime we need. > > # Evacuate to target state > > We understand the usecase (evacuate an instance shouldn't always turn the > evacuated instance on) > - that said, we don't want to amend the API for passing a parameter as this > would carry a tech debt > indefinitely) > - we prefer to propose a new microversion that would stop the instance > eventually instead of starting it we also agreed to file a bug for the existing behaivor of evacuating a stop instance we currently start it then stop it which is what will initally happen for the new microverion but long term we want to avoid start the vm entirly. This is safer as the admin does not know what workload is in the guest and it may not be safe (disk currouption) or possible (encypted volumes) to start teh guest when we evacuate. we can and should catch the exectption raised in teh latter case to allow the vm to be evacuated and powered off in the short term. medium to long term we agreed to not start the vm with the new micorverison and also adress this for resize verify wehre we shoudl not attempt to start the vm if it was stoped. this will crrect the current odd behaivor in the instnace action log where the vm is started and stop. > - operators wanting to keep the original behaviour would need to negociate > an older microversion to the Nova API as we don't intend to make spawning > optionally chosen by API on evacuate. specificly if the old micorverion is used then the behvior and limiations we have today will be supproted indefintly. with teh new micorverion the opeartor is opting in to evacuate to shutoff state. > > (aaaaaaaaaaaaaaaand that's it for the PTG recap) > > Kudos, you were brave, you reached that point. Hope your coffee was good > and now you feel rejuvenated. Anyway, time for me now to rest my fingers > and to enjoy a deserved time off. > > As said, I'm all up for any questions or remarks that would come from the > reading of this enormous thread. > > -Sylvain From jake.yip at ardc.edu.au Mon Oct 24 23:12:21 2022 From: jake.yip at ardc.edu.au (Jake Yip) Date: Tue, 25 Oct 2022 10:12:21 +1100 Subject: [magnum] Antelope PTG summary Message-ID: <4f870050-6342-ec2d-bf7c-7d7a1052e1dd@ardc.edu.au> Dear all, Magnum had a great meeting during the latest PTG. Thanks to OpenInfra Foundation for holding it. Also, thanks to all who has found precious time to attend. As the current Magnum team is quite small, we have decided on the following: - Deprecate the Fedora Atomic driver - This is EOL. - Deprecate the CoreOS driver - This is similarly EOL. - Deprecate the Docker Swarm driver. Kubernetes has became the container orchestration engine of choice now. With our limited resources, it will be more efficient to support the most popular choice well, instead of multiple choices. - Mark branches before Victoria as unmaintained. These branches only support Kubernetes versions that has been EOL. - Most excitingly, move towards adding Cluster API support for Kubernetes. This allows the Magnum project to leverage off upstream effort in creating and managing a Kubernetes cluster. We understand that removing support is difficult for users, but we believe this is for the better. Our main aim is to keep the project moving fast enough to keep up with upstream Kubernetes. If anyone is willing to help maintain any of the items we plan on deprecating, please drop me a message. Regards, Magnum Team. -- Jake Yip DevOps Engineer, ARDC Nectar Research Cloud From gmann at ghanshyammann.com Tue Oct 25 01:00:07 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 24 Oct 2022 18:00:07 -0700 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Oct 27 at 1500 UTC Message-ID: <1840ca763ed.11a1d388f223718.5578574495996418279@ghanshyammann.com> Hello Everyone, The technical Committee's next weekly meeting is scheduled for 2022 Oct 27, at 1500 UTC. If you would like to add topics for discussion, please add them to the below wiki page by Wednesday, Oct 26 at 2100 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting -gmann From yu-kishimoto at kddi.com Tue Oct 25 01:07:29 2022 From: yu-kishimoto at kddi.com (yu-kishimoto at kddi.com) Date: Tue, 25 Oct 2022 01:07:29 +0000 Subject: [Cinder][Nova]Zombie process prevention for Cinder and Nova APIs In-Reply-To: References: Message-ID: Hi all, I'm still working on this issue, and here are the logs of this. Sending a request from Keystone to the Nova API, such as 'openstack server list', will result in the following error regardless of the timeout occurring: cat /var/log/httpd/nova_api_wsgi_error.log [Thu Oct 20 15:13:04.358981 2022] [wsgi:warn] [pid 2611850] mod_wsgi (pid=2611850): Callback registration for signal 12 ignored. [Thu Oct 20 15:13:04.359279 2022] [wsgi:warn] [pid 2611850] File "/var/www/cgi-bin/nova/nova-api", line 52, in [Thu Oct 20 15:13:04.359291 2022] [wsgi:warn] [pid 2611850] application = init_application() [Thu Oct 20 15:13:04.359297 2022] [wsgi:warn] [pid 2611850] File "/usr/lib/python3.6/site-packages/nova/api/openstack/compute/wsgi.py", line 20, in init_application [Thu Oct 20 15:13:04.359299 2022] [wsgi:warn] [pid 2611850] return wsgi_app.init_application(NAME) [Thu Oct 20 15:13:04.359302 2022] [wsgi:warn] [pid 2611850] File "/usr/lib/python3.6/site-packages/nova/api/openstack/wsgi_app.py", line 125, in init_application [Thu Oct 20 15:13:04.359304 2022] [wsgi:warn] [pid 2611850] init_global_data(conf_files, name) [Thu Oct 20 15:13:04.359313 2022] [wsgi:warn] [pid 2611850] File "/usr/lib/python3.6/site-packages/nova/utils.py", line 1123, in wrapper [Thu Oct 20 15:13:04.359314 2022] [wsgi:warn] [pid 2611850] return func(*args, **kwargs) [Thu Oct 20 15:13:04.359317 2022] [wsgi:warn] [pid 2611850] File "/usr/lib/python3.6/site-packages/nova/api/openstack/wsgi_app.py", line 103, in init_global_data [Thu Oct 20 15:13:04.359319 2022] [wsgi:warn] [pid 2611850] version, conf=CONF, service_name=service_name) [Thu Oct 20 15:13:04.359322 2022] [wsgi:warn] [pid 2611850] File "/usr/lib/python3.6/site-packages/oslo_reports/guru_meditation_report.py", line 153, in setup_autorun [Thu Oct 20 15:13:04.359323 2022] [wsgi:warn] [pid 2611850] version, service_name, log_dir) [Thu Oct 20 15:13:04.359326 2022] [wsgi:warn] [pid 2611850] File "/usr/lib/python3.6/site-packages/oslo_reports/guru_meditation_report.py", line 186, in _setup_signal [Thu Oct 20 15:13:04.359328 2022] [wsgi:warn] [pid 2611850] lambda sn, f: cls.handle_signal( Does someone can kindly help me to solve the issue or give me some clues like parameter changes that can be expected to be effective? >-----Original Message----- >From: ?? ?? >Sent: Monday, October 24, 2022 11:46 AM >To: openstack-discuss at lists.openstack.org >Cc: opetech-scg at kddi.com >Subject: [Cinder][Nova]Zombie process prevention for Cinder and Nova APIs > >Hi all, >I'm trying to fix issues of spawning zombie processes for Cinder and Nova APIs during instantiation in an IaC (Ansible >which is not openstack-ansible). >Does someone can kindly help me to solve the issue or give me some clues like parameter changes that can be expected >to be effective? > >- Issues >The cinder api service process(port: 8776) existed in a zombie state. >The neutron api service process(port: 9696) required for nova existed in a zombie state. > >- What the IaC does >Use an existing tenant to create a network and subnet, then use 8volumes to instantiate 8VMs. > >- Workarounds and what I've done so far >Identify and kill zombie processes. > >- Environment >OS: CentOS Stream 8 >Kernel: 4.18.0-408.el8.x86_64 >OpenStack: Yoga(Deployed by PackStack: https://www.rdoproject.org/install/packstack/) >Nova: 25.0.1 >Neutron: 20.2.0 >Cinder: 20.0.1 >KeyStone: 21.0.0 > >-- >Yukihiro Kishimoto >Technologist >KDDI Co., Ltd. >Tokyo Japan From obondarev at mirantis.com Tue Oct 25 06:03:56 2022 From: obondarev at mirantis.com (Oleg Bondarev) Date: Tue, 25 Oct 2022 10:03:56 +0400 Subject: [Neutron] Bug Deputy Report Oct 17 - 24 Message-ID: Hello Neutron Team, It was probably the most quiet week in the history of Bug Reports: Low ----- - https://bugs.launchpad.net/neutron/+bug/1993181 - [OVN] OVN metadata "MetadataProxyHandler" not working if workers=0 - Fix Released: https://review.opendev.org/c/openstack/neutron/+/861649 - Fixed by Rodolfo Alonso - https://bugs.launchpad.net/neutron/+bug/1993502 - failing unit tests when not running them all - Fix Released: https://review.opendev.org/c/openstack/neutron/+/861869 - Fixed Rodolfo Alonso - https://bugs.launchpad.net/neutron/+bug/1993498 - [CI] Create oslo master branch jobs (apart from py39)8 - In progress: https://review.opendev.org/c/openstack/neutron/+/861859 - assigned to Rodolfo Alonso RFEs: ----- - https://bugs.launchpad.net/neutron/+bug/1993288 - RFE: Adopt Keystone unified limits as quota driver for Neutron - New (PTG decision - wait till it become more mature) - Unassigned Thanks, Oleg -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Tue Oct 25 11:00:10 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Tue, 25 Oct 2022 13:00:10 +0200 Subject: [neutron] Antelope PTG summary Message-ID: Hello Neutrinos: This is a quick summary of what we have discussed during the PTG week. Zed summary and retrospective: - Here is the output of this session: https://etherpad.opendev.org/p/neutron-antelope-ptg#L83 Spec handling: - To have a core reviewer assigned per spec. This reviewer will engage the community to review the spec and will be a kind of "godfather". - To have a weekly status report of the active specs in the community, to visibilize them, boost their impact and "coerce" the community to review them. Neutron CLI deprecation: - To remove the CLI code. - Python bindings: - Investigate and report the effort needed to make this migration in neutron client consumers (Nova, Horizon, Heat, etc). - Request other projects using the python bindings to move to OpenStack SDK. - Stop merging new features. Quota classes: - Information: https://docs.openstack.org/project-team-guide/technical-guides/unified-limits.html - Neutron RFE: https://bugs.launchpad.net/neutron/+bug/1993288 - Still under development in the community, this is still not an accepted community goal, thus we won't start the development until some cons are discussed with the TC members (see the Neutron RFE description). Pyroute2 session: - IPmock, a new class to mock some pyroute2 objects: https://lab.pyroute2.org/ - Pluggable netlink engines. Still not compatible with NetNS (namespaces) - Transactional engine: - A daemon is always reading the netlink and inotify, and builds a DB with the system information. Data is read faster. This could be not compatible with the current sync calls using privsep. - Using NDB class. - Customize message parser, to reduce the amount of data retrieved from the kernel. - Dump/list operators will be generators (to check the compatibility with privsep). Remove the failed devstack migration and get back to what is called "neutron-legacy" (that was discussed before in a PTG but we didn't start the reversion). DNS subdomain support in ML2/OVN: - https://review.opendev.org/c/openstack/neutron-specs/+/832658 - On hold until there is a use case. DNS resolver in ML2/OVN: - https://bugzilla.redhat.com/show_bug.cgi?id=2104568 - The main devoplment part falls in the OVN core team. Once finished, the Neutron part should be almost trivial. ML2/OVN flow metering. - Same as we currently have with ML2/OVS + iptables firewall (currently not supported with native firewall because we can't count OF flows). - Same as the previous feature, the main development is in the OVN core team. Neutron QA: - Implement a grenade job based on the SLURP cadence, testing from Y to A (done). - Have the "tempest-integrated-networking" job in our gate/check queues (patch under review). PCI Device Tracking In Placement (Nova-Neutron session): - The different approaches are still under development. - Spec to be proposed during this release (see etherpad to check the possible implementation options). Port binding switchdev capabilities (Nova-Neutron session): - We agreed that the current implementation is incorrect: Neutron should not modify the port binding dictionary. Nova is not reading it but overwriting it. - We won't change the current code but no new features will rely on it. Nova mutable MTU support (Nova-Neutron session): - For now, Neutron will document that a network MTU change requires a VM reboot (ot port detach/attach operations) to be applied on this port. S-RBAC: - Neutron has three implemented personas: system wide admin, project user and project reader. - Tempest will integrate all RBAC testing projects during this release. Once done, the new RBAC flag will be enabled by default. - Neutron will create the corresponding RBAC job/jobs. - Neutron needs to identify what is called "service role" calls (done from other projects). This is a new concept that will be included in next releases. nftables migration: - The current iptables legacy interface could be deprecated. During this release, Neutron will start an epic to move the current iptables API to the new nftables API; if possible, incrementally, mixing both APIs in the same agent. Please, if there is any missing topic, let me know. Regards and thank you for attending and participating. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Tue Oct 25 11:00:18 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Tue, 25 Oct 2022 08:00:18 -0300 Subject: [cloudkitty] Antelope PTG summary Message-ID: Dear all, The CloudKitty project had a great PTG meeting. Thanks to the OpenInfra Foundation for holding it. Also, thanks to all who have dedicated precious time to attend. We started the meeting by revisiting the latest release and discussing the review format we have adopted in the past two years. Looking at the result of our process, we decided to maintain it, as it seems to bring results and concentrate our efforts on getting CloudKitty better and evolving throughout time. While reviewing the past release, we discussed the situation with patch [1] (Allow rating rules that have 12 digits in the integer part of the number). This patch is a bug fix, and we discussed the idea of backporting it. However, as it is a patch that affects the database schema, it can be troublesome to backport it. Therefore, before doing any work, we will consult the mailing list to gather others opinions on this matter. Users could upgrade CloudKitty to address this issue, if they need to, instead of applying a patch. Pierre volunteered to execute this "inquiry" with the CloudKitty user base in the mailing list. While discussing the ElasticSearch (ES) improvements in CloudKitty to support ES 7, we raised the concern that Kolla-ansible is moving from ElasticSearch to OpenSearch. Therefore, we need to evaluate if this can have an impact in CloudKitty. Most certainly it will impact people using ES as the backend. The goal is to revisit this discussion in our next bi-weekly meeting, and then communicate for users and community (Topics for meeting of 31/10/2022 [2]). Furthermore, as the current CloudKitty team is a bit small, we have decided to focus on the following topics for the Antelope cycle. - Multiple rating types for the same metric -- One metric being used to generate different rates - Different time frame group by options -- provide group by options to group by day of the year, week of the year, month, and year. - Improve some timeouts handling when using Gnocchi backends. This one needs some checking with Zigo, who was the one that reported this situation. Zigo, I added you to this e-mail, can you reply back with more details regarding this situation? - A regression API that takes the resource ID Or project ID or scope ID, and executes a regression (prediction) based on the collected data so far for the given resources/elements/scope. - Add CloudKitty API reference docs ( https://docs.openstack.org/cloudkitty/latest/api-reference/index.html) to https://docs.openstack.org/api-ref/ We also briefly discussed the situation with Monasca, which has already been deprecated from Kolla-ansible, and might be losing some traction and support. Therefore, we (as a community), might need to start thinking about deprecating it, and removing some non-tested code from our code base. We do understand that deprecating and removing support is complicated for users, but nobody is maintaining and testing that integration. Therefore, it tends to get broken and unstable over time. However, if somebody wants to volunteer to maintain and improve it, you are welcome :) In summary that is the outcome of our small, but efficient PTG meeting. Let's keep improving and making CloudKitty better. If I missed something, please do not hesitate to reply back. Also, if somebody needs any help, you can always ping me :) See you guys. Have a nice week! [1] https://review.opendev.org/c/openstack/cloudkitty/+/837200 [2] https://etherpad.opendev.org/p/cloudkitty-meeting-topics -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Tue Oct 25 12:41:13 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Tue, 25 Oct 2022 13:41:13 +0100 Subject: [kolla-ansible][Yoga] How to configure rabbitmq heartbeat values Message-ID: Hi, I need to tweak these variables in my Rabbitmq : heartbeat_timeout_threshold heartbeat_rate heartbeat_interval When I tried to add heartbeat_timeout_threshold = 600 to rabbitmq.conf and reconfigure, the service failed to start with : 13:39:41.756 [error] You've tried to set heartbeat_timeout_threshold, but there is no setting with that name. 13:39:41.758 [error] Did you mean one of these? 13:39:41.838 [error] socket_writer.gc_threshold 13:39:41.838 [error] handshake_timeout 13:39:41.838 [error] auth_http.http_method 13:39:41.838 [error] You've tried to set heartbeat_rate, but there is no setting with that name. 13:39:41.838 [error] Did you mean one of these? 13:39:41.888 [error] heartbeat 13:39:41.889 [error] cluster_name 13:39:41.889 [error] channel_max 13:39:41.889 [error] Error preparing configuration in phase transform_datatypes: 13:39:41.889 [error] - Conf file attempted to set unknown variable: heartbeat_rate 13:39:41.889 [error] - Conf file attempted to set unknown variable: heartbeat_timeout_threshold How can I modify those variables? Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michal.arbet at ultimum.io Tue Oct 25 12:59:43 2022 From: michal.arbet at ultimum.io (Michal Arbet) Date: Tue, 25 Oct 2022 14:59:43 +0200 Subject: [kolla-ansible][Yoga] How to configure rabbitmq heartbeat values In-Reply-To: References: Message-ID: Hi, Configuration reference includes only the "heartbeat" config option ... https://www.rabbitmq.com/configure.html We also needed to change this value, I also proposed a patch for kolla-ansible. https://review.opendev.org/c/openstack/kolla-ansible/+/861727 Kevko Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Po???? 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet at ultimum.io *https://ultimum.io * LinkedIn | Twitter | Facebook ?t 25. 10. 2022 v 14:47 odes?latel wodel youchi napsal: > Hi, > I need to tweak these variables in my Rabbitmq : > > heartbeat_timeout_threshold > heartbeat_rate > heartbeat_interval > > When I tried to add > > heartbeat_timeout_threshold = 600 to rabbitmq.conf and reconfigure, the service failed to start with : > > 13:39:41.756 [error] You've tried to set heartbeat_timeout_threshold, but there is no setting with that name. > 13:39:41.758 [error] Did you mean one of these? > 13:39:41.838 [error] socket_writer.gc_threshold > 13:39:41.838 [error] handshake_timeout > 13:39:41.838 [error] auth_http.http_method > 13:39:41.838 [error] You've tried to set heartbeat_rate, but there is no setting with that name. > 13:39:41.838 [error] Did you mean one of these? > 13:39:41.888 [error] heartbeat > 13:39:41.889 [error] cluster_name > 13:39:41.889 [error] channel_max > 13:39:41.889 [error] Error preparing configuration in phase transform_datatypes: > 13:39:41.889 [error] - Conf file attempted to set unknown variable: heartbeat_rate > 13:39:41.889 [error] - Conf file attempted to set unknown variable: heartbeat_timeout_threshold > > > How can I modify those variables? > > Regards. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramishra at redhat.com Tue Oct 25 13:06:07 2022 From: ramishra at redhat.com (Rabi Mishra) Date: Tue, 25 Oct 2022 18:36:07 +0530 Subject: [TripleO] 2023.1 Antelope PTG summary Message-ID: Hi All, Hope y'all had a great PTG last week and a great weekend. I would like to thank everyone who participated/contributed to Tripleo PTG sessions and helped all of us "share and learn". Though personally I don't like long emails, this summary is going to be a bit longer, please bear with me. We had a total of ten sessions spread over two days with very good participation (most sessions with 35+ [max 44] in attendance) and we shared/ discussed a multitude of topics. # Welcome and Zed Retrospective The highlights of the Zed cycle was collaboration across teams to add some new features and fine-tune features implemented in previous cycles that includes; Standalone roles, nftables switch, OVN RAFT support, ephemeral heat for upgrade use cases, CI jobs for Deployed Ceph and Ceph Container Promotion in CI etc. There was a discussion on the suggestion to stop backporting feature removals from stable branches as we did for wallaby in the last few cycles. Though TripleO does not follow stable policy, backporting feature removals with dependent patches across repos more often than not leads to broken CI and user support cases. We would be well placed if we can identify and do those before zed release and avoid the firefighting later. [Action Items] - Update zed review priorities etherpad[1] with deprecation/removal patches [All] # Standalone Ansible Roles Status Update James provided an update on the current status of "Standalone Ansible Roles' POC and we discussed the possible future scope of it beyond the current "scale-out compute" use case. He also demoed molecule based automation of deploying/testing a compute node with the new meta-role. Chandan shared the current work on CI jobs/scenarios to deploy computes with standalone roles. [Action Items] - Prioritize review/merge of proposed patches[2] on this topic [tripleo-core] - Finish the CI jobs for testing standalone roles and make them voting [Chandan/tripleo-ci-core] # Standalone Ansible Roles and External Ceph John presented his work to integrate computes deployed with standalone roles with external ceph. We don't plan to extend this effort to include tripleo deploying ceph using standalone roles, unless there is a requirement for that in the future. [Action Items] - Prioritize review of ceph related standalone role patches.[tripleo-core] # Multi-rhel support for Compute Brendan gave an update on the ongoing effort for multi-rhel support, different options evaluated and current POC status with composable roles approach. There is still some work left on automating role split during upgrade, which is expected to be completed this cycle. [Action Items] - Finish composable roles POC and test upgrade procedure manually [upgrades team] - Add CI jobs to test the proposed procedure upstream as feasible [upgrades team] - Implement role-split automation during upgrade [Takashi/Rabi] # OpenStack SDK, Ansible Collections OpenStack and TripleO We discussed the issues around openstacksdk backward incompatibility and the current pinning for it in RDO/TripleO. We agreed on an action plan to pre-release ansible-openstack-collection (AOC) for RDO/TripleO (though it is still not fully compatible with latest openstacksdk) and remove the pin so that Zed RDO can be released with openstacksdk 0.101.0. [Action Items] - Bump openstacksdk to 0.101.0 for RDO master and zed [Jakob/Alfredo] - Update RDO ansible-collections-openstack to the 2.0.0 pre-release [Jakob/Alfredo] # TripleO and Distributed Project Leadership (DPL) In this session, we discussed the possibility of opting for DPL in TripleO and sharing the responsibilities across multiple leaders in the team. There were a number of volunteers for different liaison roles and we in general agreed that it would be a good thing to do. We would propose the required change to governance sometime this cycle, unless there is any concern. [Action Items] - Finalize all required liaison roles and propose patch to openstack governance repo [Rabi] # Migration from puppetlabs/apache As part of the broader effort to reduce the puppet footprint, Cedric presented the ongoing effort to replace puppetlabs/apache with the new ansible role "tripleo_httpd_vhost"[3]. A few services have already been migrated and remaining ones are expected to be done this cycle with broader participation from sub teams. [Action Items] - Review/merge the already proposed patches for migration [tripleo-core] - Plan and work on migrating remaining services [Cedric] # State of CI CI team presented on a vast range of topics that included, - Added coverage of upgrade jobs across upstream check/gate, periodic component pipeline and periodic integration pipeline in zed (Marios) - Upstream jobs for multi-rhel testing (Marios) - Current state of OVB Jobs (Chandan) - TripleO CI Jobs on IBM Cloud and the new feature to hold nodes for troubleshooting (Chandan) - Tempest Allow List (Arx/Pooja/Soniya) - Tempest Dashboard (Arx/Lukas) # Container Capabilities We revisited the topic of privileged containers for services and how we can possibly avoid those by allowing the required limited container capabilities. As all these services are mostly for the compute role, we agreed that it would probably be better to align this effort with "Standalone Roles" for better testing. # OS Migrate Status Update Jirka provided an update on the tenant migration tool 'os-migrate', an ansible collection that is not a TripleO umbrella project, but used by the upgrades team. It's possible that I missed some important points from these discussions. Please feel free to add them by replying to this thread. All session etherpad links are in the main schedule etherpad[4] for reference. I'll share the session recordings once they're made available to us. Thanks Again. [1] https://etherpad.opendev.org/p/tripleo-zed-review-priorities [2] https://review.opendev.org/q/topic:standalone-roles [3] https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_httpd_vhost [4] https://etherpad.opendev.org/p/oct2022-ptg-tripleo -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From michal.arbet at ultimum.io Tue Oct 25 13:17:19 2022 From: michal.arbet at ultimum.io (Michal Arbet) Date: Tue, 25 Oct 2022 15:17:19 +0200 Subject: [kolla-ansible][Yoga] How to configure rabbitmq heartbeat values In-Reply-To: References: Message-ID: Hi, maybe you meant oslo_messaging options ? https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html Kevko Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Po???? 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet at ultimum.io *https://ultimum.io * LinkedIn | Twitter | Facebook ?t 25. 10. 2022 v 14:59 odes?latel Michal Arbet napsal: > Hi, > > Configuration reference includes only the "heartbeat" config option ... > https://www.rabbitmq.com/configure.html > We also needed to change this value, I also proposed a patch for > kolla-ansible. > > https://review.opendev.org/c/openstack/kolla-ansible/+/861727 > > Kevko > > > Michal Arbet > Openstack Engineer > > Ultimum Technologies a.s. > Na Po???? 1047/26, 11000 Praha 1 > Czech Republic > > +420 604 228 897 > michal.arbet at ultimum.io > *https://ultimum.io * > > LinkedIn | Twitter > | Facebook > > > > ?t 25. 10. 2022 v 14:47 odes?latel wodel youchi > napsal: > >> Hi, >> I need to tweak these variables in my Rabbitmq : >> >> heartbeat_timeout_threshold >> heartbeat_rate >> heartbeat_interval >> >> When I tried to add >> >> heartbeat_timeout_threshold = 600 to rabbitmq.conf and reconfigure, the service failed to start with : >> >> 13:39:41.756 [error] You've tried to set heartbeat_timeout_threshold, but there is no setting with that name. >> 13:39:41.758 [error] Did you mean one of these? >> 13:39:41.838 [error] socket_writer.gc_threshold >> 13:39:41.838 [error] handshake_timeout >> 13:39:41.838 [error] auth_http.http_method >> 13:39:41.838 [error] You've tried to set heartbeat_rate, but there is no setting with that name. >> 13:39:41.838 [error] Did you mean one of these? >> 13:39:41.888 [error] heartbeat >> 13:39:41.889 [error] cluster_name >> 13:39:41.889 [error] channel_max >> 13:39:41.889 [error] Error preparing configuration in phase transform_datatypes: >> 13:39:41.889 [error] - Conf file attempted to set unknown variable: heartbeat_rate >> 13:39:41.889 [error] - Conf file attempted to set unknown variable: heartbeat_timeout_threshold >> >> >> How can I modify those variables? >> >> Regards. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Tue Oct 25 13:42:42 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Tue, 25 Oct 2022 14:42:42 +0100 Subject: [kolla-ansible][xena][Masakari] masakarimonitors-instancemonitor won't start Message-ID: Hi, I have this in my computes nodes; masakarimonitors-instancemonitor won't start 2022-10-25 14:37:27.354 7 INFO masakarimonitors.service [-] Starting masakarimonitors-instancemonitor 2022-10-25 14:37:27.357 7 WARNING masakarimonitors.instancemonitor.instance [-] Error from libvirt : authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found) *2022-10-25 14:37:27.357 7 ERROR oslo_service.service [-] Error starting thread.: libvirt.libvirtError: authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found) 2022-10-25 14:37:27.357 7 ERROR oslo_service.service Traceback (most recent call last): 2022-10-25 14:37:27.357 7 ERROR oslo_service.service File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_service/service.py"* , line 806, in run_service 2022-10-25 14:37:27.357 7 ERROR oslo_service.service service.start() 2022-10-25 14:37:27.357 7 ERROR oslo_service.service File "/var/lib/kolla/venv/lib/python3.6/site-packages/masakarimonitors/service .py", line 62, in start 2022-10-25 14:37:27.357 7 ERROR oslo_service.service self.manager.main() 2022-10-25 14:37:27.357 7 ERROR oslo_service.service File "/var/lib/kolla/venv/lib/python3.6/site-packages/masakarimonitors/instanc emonitor/instance.py", line 166, in main 2022-10-25 14:37:27.357 7 ERROR oslo_service.service self._virt_event(uri) 2022-10-25 14:37:27.357 7 ERROR oslo_service.service File "/var/lib/kolla/venv/lib/python3.6/site-packages/masakarimonitors/instanc emonitor/instance.py", line 128, in _virt_event 2022-10-25 14:37:27.357 7 ERROR oslo_service.service vc = libvirt.openReadOnly(uri) 2022-10-25 14:37:27.357 7 ERROR oslo_service.service File "/usr/lib64/python3.6/site-packages/libvirt.py", line 350, in openReadOnl y 2022-10-25 14:37:27.357 7 ERROR oslo_service.service raise libvirtError('virConnectOpenReadOnly() failed') 2022-10-25 14:37:27.357 7 ERROR oslo_service.service libvirt.libvirtError: authentication failed: Failed to start SASL negotiation: - 4 (SASL(-4): no mechanism available: No worthy mechs found) 2022-10-25 14:37:27.357 7 ERROR oslo_service.service 2022-10-25 14:37:27.359 7 INFO masakarimonitors.service [-] Stopping masakarimonitors-instancemonitor I have verified the auth.conf file on both containers the password is the same. In the other hand, in /var/log/kolla/libvirt/libvirtd.log I have this : 2022-10-25 13:29:16.911+0000: 1391031: error : virNetSocketReadWire:1792 : End of file while reading data: Input/output error 2022-10-25 13:30:18.226+0000: 1391031: error : virNetSocketReadWire:1792 : End of file while reading data: Input/output error 2022-10-25 13:31:19.516+0000: 1391031: error : virNetSocketReadWire:1792 : End of file while reading data: Input/output error 2022-10-25 13:32:20.817+0000: 1391031: error : virNetSocketReadWire:1792 : End of file while reading data: Input/output error 2022-10-25 13:33:22.066+0000: 1391031: error : virNetSocketReadWire:1792 : End of file while reading data: Input/output error 2022-10-25 13:34:23.359+0000: 1391031: error : virNetSocketReadWire:1792 : End of file while reading data: Input/output error 2022-10-25 13:35:24.740+0000: 1391031: error : virNetSocketReadWire:1792 : End of file while reading data: Input/output error Any ideas? Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Tue Oct 25 15:06:17 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Tue, 25 Oct 2022 16:06:17 +0100 Subject: [kolla-ansible][Yoga] Deployment stuck In-Reply-To: References: Message-ID: Hi, I think I found what causes the problem, but I don't understand why. I removed the verbosity, i.e I removed -vvv I only kept just one and I disabled ANSIBLE_DEBUG variable, and voila the deployment went till the end. First I suspected the tmux process, some kind of buffer overflow because of the quantity of the logs, but then I connected to the VM's console and it is the behavior. With one -v the process goes without problem, but if I put more -vvv it gets stuck somewhere. If someone can explain this to me!!!!??? Regards. Le lun. 24 oct. 2022 ? 14:00, wodel youchi a ?crit : > Anyone???? > > Le lun. 24 oct. 2022 ? 07:53, wodel youchi a > ?crit : > >> Hi, >> >> My setup is simple, it's an hci deployment composed of 3 controllers >> nodes and 6 compute and storage nodes. >> I am using ceph-ansible for deploying the storage part and the deployment >> goes well. >> >> My base OS is Rocky Linux 8 fully updated. >> >> My network is composed of a 1Gb management network for OS, application >> deployment and server management. And a 40Gb with LACP (80Gb) data network. >> I am using vlans to segregate openstack networks. >> >> I updated both Xena and Yoga kolla-ansible package I updated several >> times the container images (I am using a local registry). >> >> No matter how many times I tried to deploy it's the same behavior. The >> setup gets stuck somewhere. >> >> I tried to deploy the core modules without SSL, I tried to use an older >> kernel, I tried to use the 40Gb network to deploy, nothing works. The >> problem is the lack of error if there was one it would have been a starting >> point but I have nothing. >> >> Regards. >> >> On Sun, Oct 23, 2022, 00:42 wodel youchi wrote: >> >>> Hi, >>> >>> Here you can find the kolla-ansible *deploy *log with ANSIBLE_DEBUG=1 >>> >>> Regards. >>> >>> Le sam. 22 oct. 2022 ? 23:55, wodel youchi a >>> ?crit : >>> >>>> Hi, >>>> >>>> I am trying to deploy a new platform using kolla-ansible Yoga and I am >>>> trying to upgrade another platform from Xena to yoga. >>>> >>>> On both platforms the prechecks went well, but when I start the process >>>> of deployment for the first and upgrade for the second, the process gets >>>> stuck. >>>> >>>> I tried to tail -f /var/log/kolla/*/*.log but I can't get hold of the >>>> cause. >>>> >>>> In the first platform, some services get deployed, and at some point >>>> the script gets stuck, several times in the modprobe phase. >>>> >>>> In the second platform, the upgrade gets stuck on : >>>> >>>> Escalation succeeded >>>> [204/1859] >>>> <20.3.0.28> (0, b'\n{"path": "/etc/kolla/cron", "changed": false, >>>> "diff": {"before": {"path": "/etc/kolla/cro >>>> n"}, "after": {"path": "/etc/kolla/cron"}}, "uid": 0, "gid": 0, >>>> "owner": "root", "group": "root", "mode": "07 >>>> 70", "state": "directory", "secontext": >>>> "unconfined_u:object_r:etc_t:s0", "size": 70, "invocation": {"module_ >>>> args": {"path": "/etc/kolla/cron", "owner": "root", "group": "root", >>>> "mode": "0770", "recurse": false, "force >>>> ": false, "follow": true, "modification_time_format": "%Y%m%d%H%M.%S", >>>> "access_time_format": "%Y%m%d%H%M.%S", >>>> "unsafe_writes": false, "state": "directory", "_original_basename": >>>> null, "_diff_peek": null, "src": null, " >>>> modification_time": null, "access_time": null, "seuser": null, >>>> "serole": null, "selevel": null, "setype": nul >>>> l, "attributes": null}}}\n', b'') >>>> ok: [20.3.0.28] => (item={'key': 'cron', 'value': {'container_name': >>>> 'cron', 'group': 'cron', 'enabled': True >>>> , 'image': '20.3.0.34:4000/openstack.kolla/centos-source-cron:yoga', >>>> 'environment': {'DUMMY_ENVIRONMENT': 'ko >>>> lla_useless_env', 'KOLLA_LOGROTATE_SCHEDULE': 'daily'}, 'volumes': >>>> ['/etc/kolla/cron/:/var/lib/kolla/config_f >>>> iles/:ro', '/etc/localtime:/etc/localtime:ro', '', >>>> 'kolla_logs:/var/log/kolla/'], 'dimensions': {}}}) => { >>>> "ansible_loop_var": "item", >>>> "changed": false, >>>> "diff": { >>>> "after": { >>>> "path": "/etc/kolla/cron" >>>> }, >>>> "before": { >>>> "path": "/etc/kolla/cron" >>>> } >>>> }, >>>> "gid": 0, >>>> "group": "root", >>>> >>>> How to start debugging the situation. >>>> >>>> Regards. >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ozzzo at yahoo.com Tue Oct 25 16:42:17 2022 From: ozzzo at yahoo.com (Albert Braden) Date: Tue, 25 Oct 2022 16:42:17 +0000 (UTC) Subject: [kolla] [train] Endpoints fail when one controller is down References: <1815971281.3111787.1666716137784.ref@mail.yahoo.com> Message-ID: <1815971281.3111787.1666716137784@mail.yahoo.com> Some of our clusters are heavily used, and in those clusters we get complaints when we reboot a controller (or sometimes when we deploy and containers restart). Is that normal, or does it mean that we have something configured wrong? The symptoms are intermittent 504 from endpoints, and VM creation/deletion failing or partially completing, for example the VM is created but without DNS records. We are not following the "Removing existing controllers" procedure [1] before rebooting the controller; is that necessary to avoid these issues? 1. https://docs.openstack.org/kolla-ansible/latest/user/adding-and-removing-hosts.html From rdhasman at redhat.com Tue Oct 25 17:08:31 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Tue, 25 Oct 2022 22:38:31 +0530 Subject: [cinder] Cancelling this weeks meeting 26 Oct Message-ID: Hello Argonauts, Since we just had PTG last week, it seems reasonable to skip this week's meeting. Also I will be on leave tomorrow so cancelling this week's cinder meeting (26 October, 2022). - Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Tue Oct 25 18:32:46 2022 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 25 Oct 2022 13:32:46 -0500 Subject: Help us plan the next PTG! Message-ID: Hello Everyone! Congratulations on a great virtual PTG last week :) The OpenInfra Foundation is hosting the OpenInfra Summit in Vancouver [1], June 13 -15, 2023. We are trying to determine the level of interest among contributors to OpenInfra projects on attending a co-located PTG in Vancouver. *The exact format and dates are still being determined.* At this time, we are evaluating the level of interest from an attendee and employer perspective. Please complete the following poll so we can measure the level of interest and plan accordingly. Future updates will be distributed to the project mailing lists as well as previous PTG attendees. Poll: https://openinfrafoundation.formstack.com/forms/openinfra_ptg_2023 As a reminder, we are also gathering feedback for the virtual PTG here: https://etherpad.opendev.org/p/Oct2022_PTGFeedback -Kendall Nelson (diablo_rojo) [1] https://openinfra.dev/summit/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Tue Oct 25 18:52:16 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 25 Oct 2022 11:52:16 -0700 Subject: Help us plan the next PTG! In-Reply-To: References: Message-ID: <184107cfa1a.f47f66a8305665.9192260266772408602@ghanshyammann.com> ---- On Tue, 25 Oct 2022 11:32:46 -0700 Kendall Nelson wrote --- > Hello Everyone! > Congratulations on a great virtual PTG last week :) > > The OpenInfra Foundation is hosting the OpenInfra Summit in Vancouver [1], June 13 -15, 2023.? > We are trying to determine the level of interest among contributors to OpenInfra projects on attending a co-located PTG in Vancouver. The exact format and dates are still being determined. At this time, we are evaluating the level of interest from an attendee and employer perspective. Please complete the following poll so we can measure the level of interest and plan accordingly. Future updates will be distributed to the project mailing lists as well as previous PTG attendees.? > Thanks, Kendall for collecting the feedback and survey. One question to understand the future PTGs schedule. As PTGs are aligned with the OpenStack new development cycle timing, they were very helpful to plan the new cycle features/work well at the start of the cycle. But seeing the summit co-located PTG timing which is June, I am curious to know if there will be a PTG for the 2023.2 (B) cycle in April (with 2023.1 Antelope releasing at the March end) also? Or we are going to have only one in June which will be co-located in Vancouver Summit (once it is final based on survey results). Definitely, having a co-located PTGs in Summit is a very good idea, saving travel, and being much more productive also but it's just timing from OpenStack release perspective making it a little bit difficult. -gmann > Poll:?https://openinfrafoundation.formstack.com/forms/openinfra_ptg_2023 > > As a reminder, we are also gathering feedback for the virtual PTG here:?https://etherpad.opendev.org/p/Oct2022_PTGFeedback > > -Kendall Nelson (diablo_rojo) > [1]?https://openinfra.dev/summit/ From eblock at nde.ag Wed Oct 26 07:44:17 2022 From: eblock at nde.ag (Eugen Block) Date: Wed, 26 Oct 2022 07:44:17 +0000 Subject: [keystone][cache] How to tune role cache Message-ID: <20221026074417.Horde.2xZbFrscSR34uUqXAQy5PBQ@webmail.nde.ag> Hi *, one of our customers has two almost identical clouds (Victoria), the only difference is that one of them has three control nodes (HA via pacemaker) and the other one only one control node. They use terraform to deploy lots of different k8s clusters and other stuff. In the HA cloud they noticed keystone errors when they purged a project (cleanly) and started the redeployment immediately after that. We did some tests to find out which exact keystone cache it is and it seems to be the role cache (default 600 seconds) which leads to an error in terraform, it reports that the project was not found and refers to the previous ID of the project. The same deployment seems to work in the single-control environment without these errors, it just works although the cache is enabled as well. I already tried to reduce the cache_time to 30 seconds but that doesn't help (although it takes more than 30 seconds until terraform is ready after the prechecks). But the downside of disabling the role cache entirely leads to significantly longer response times when using the dashboard or querying the APIs. Is there any way to tune the role cache in a way so we could have both a reasonable performance as well as being able to redeploy projects without a "sleep 600"? Any comments or recommendations are appreciated! Regards, Eugen From ralonsoh at redhat.com Wed Oct 26 08:00:52 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Wed, 26 Oct 2022 10:00:52 +0200 Subject: [neutron] Neutron team meeting cancelled Nov 1st Message-ID: Hello Neutrinos: The next Neutron team meeting on November 1st is cancelled (All Saints' Day [1]). The next one will take place at the same time on November 8th. Regards. [1]https://en.wikipedia.org/wiki/All_Saints%27_Day -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Wed Oct 26 10:16:02 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Wed, 26 Oct 2022 11:16:02 +0100 Subject: [Kolla-ansible][Xena] How to update an Openstack deployment with new containers? Message-ID: Hi, The documentation of kolla-ansible Xena does not talk about updating an existing Xena deployment with new containers. Could you please help with this? I found some lines about that in Yoga version, saying that, to update an existing deployment you have to : 1 - Update kolla-ansible it self : $ source xenavenv (xenavenv) $ pip install --upgrade git+ https://opendev.org/openstack/kolla-ansible at stable/xena 2 - Update the container Images with Docker pull 3 - Update my local registry if I am using one, and in my case I am, so I deleted the registry images then I recreated them. 4 - Then finally deploy again (xenavenv) $ kolla-ansible -i multinode deploy Is this the right procedure? because I followed the same procedure and I think it didn't change anything. For example I am taking nova-libvirt container as and example, in my local registry I have this : [root at rcdndeployer2 ~]# docker images | grep nova-libvirt 192.168.2.34:4000/openstack.kolla/centos-source-nova-libvirt xena 5be83d680102 31 hours ago 2. 34GB quay.io/openstack.kolla/centos-source-nova-libvirt xena *5be83d680102* *31 hours ago * 2. 34GB [root at rcdndeployer2 ~]# docker inspect -f '{{ .Created }}' *5be83d680102 * *2022-10-25*T02:33:13.172550584Z But in my compute nodes I have this : root at computehci24 ~]# docker ps | grep nova-lib b56a12bfd482 192.168.2.34:4000/openstack.kolla/centos-source-nova-libvirt:xena "dumb-init --single-?" *5 months ag**o Up Up 5 months (healthy) nova_libvirt* Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Wed Oct 26 11:00:00 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 26 Oct 2022 08:00:00 -0300 Subject: [cinder] Bug report from 10-12-2022 to 10-26-2022. Message-ID: This is a bug report from 10-12-2022 to 10-26-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- Undecided - https://bugs.launchpad.net/cinder/+bug/1994018 "The volume multiattach and in-use after retyping another backend, then can not detach it." - https://bugs.launchpad.net/cinder/+bug/1994021 "Cinder cannot work when 1 node of 3 rabbit node cluster down." Low - https://bugs.launchpad.net/cinder/+bug/1993282 "Fail to get pools by volume_type filter key." Assigned to zhaoleilc. No fix proposed to master yet. - https://bugs.launchpad.net/os-brick/+bug/1994083 "FileNotFoundError: [Errno 2] No such file or directory: is raised when dmidecode is not installed." Fix proposed to os-brick master. Wishlist - https://bugs.launchpad.net/cinder/+bug/1994150 "Image signature verification does not verify certificates." Unassigned. - https://bugs.launchpad.net/cinder/+bug/1992493 "Cinder fails to backup/snapshot/clone/extend volumes when the pool is full." Unassigned. - https://bugs.launchpad.net/cinder/+bug/1992685 "Automate generation of snapshot transfer api-ref samples for MV 3.55." Assigned. No fix proposed to master yet. Invalid - https://bugs.launchpad.net/cinder/+bug/1993577 " [JovianDSS] Unable to provide target prefix through iscsi_target_prefix ." Cheers, -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From pdeore at redhat.com Tue Oct 25 19:43:59 2022 From: pdeore at redhat.com (Pranali Deore) Date: Wed, 26 Oct 2022 01:13:59 +0530 Subject: [Glance] Antelope PTG Summary Message-ID: Hello Everyone, We had our sixth virtual PTG between 18th OCt to 21st OCT 2022. Thanks to everyone who joined the virtual PTG sessions. Using bluejeans app we had all the discussion around different topics for glance, glance + cinder, glance + ceph, fips and secure RBAC. I have created etherpad[1] with notes from the session and also included the recordings of each session. Here is the short summary of the discussions. Tuesday, OCT 18th 2022 # Zed Retrospective On the positive note, we have merged a number of useful features this cycle. We managed to implement glance-download internal plugin to download the image from remote glance, Implemented support for immediate caching of an image, Extended the functionality of stores-detail API to expose store details of other stores and we have removed the scope scheck from scope_types for all project resources and done with phase 1 as per the revised community goal. In addition to that we have successfully organized the review party before each milestone to perform group review to cover the review load till the final milestone. On the other side we decided to organize a midcycle general cross project meetup/drivers meetup towards the end of the 2nd milestone to increase our presence in cross-projects. Recording: https://bluejeans.com/s/II_ at CAqrZdd - Chapter 1 # Distributed responsibilities among cores/team This cycle also we have decided to follow the distributed leadership internally, we are going to distribute below responsibilities among the team, Release management:pranali Bug management: cyril Meetings: pranali Stable branch management: cyril and Erno Cross project communication: abhishekk Mailing lists: share responsibility, pranali PTG/summit preparation:pranali Vulnerability management: glance-coresec group Infra management: periodic-jobs - abhishekk migration of test jobs -abhishekk Recording: https://bluejeans.com/s/II_ at CAqrZdd - Chapter 2 # Default Glance to configure multiple stores Glance has deprecated single store configuration since Stein Cycle and this cycle we are going to start putting our efforts to deploy glance using multistore by default and then remove the single store support from glance. Recording: https://bluejeans.com/s/II_ at CAqrZdd - Chapter 3 # Add missing CLI support for some Glance API in Openstack Client The CLI support for all glance APIS is already there in GlanceClient and the similar CLI support we need to have in OpenstackClient, in this Cycle we are going to put efforts to have OSC support for all the missing glance APIs. Recording: https://bluejeans.com/s/II_ at CAqrZdd - Chapter 4 # Glance Cache improvements, restrict duplicate downloads This is abou how we can avoid multiple downloading of the same image in cache on first download Spec : https://review.opendev.org/c/openstack/glance-specs/+/734683 We had this spec in the last cycle & decided to break the image into chunks and when the first request gets to the backend store it will start caching that image and if any other request comes in between and if caching is still in process it will read from the chunks created by the first request. But currently we have made one caching state to check if the image is still in caching but that was relying on md checksum to check if the image iterator has read the image completely or not but in new images we might not have checksum, multihash or size of the image, bcz if that is not present with the image we won't be able to change the state of image and thus we will never be able to resolve the issue of checking the image is in caching or not. Decided to dig more on the size verification & need to revisit this topic during mid-cycle meeting and update the spec with the solution for handling the multiple request and multiple chunks Recording: https://bluejeans.com/s/II_ at CAqrZdd - Chapter 5 Wednesday OCT 19th , 2022 # Image uploads to the filesystem driver are not fully atomic No efficient way to reproduce this issue, so it's decided to mark it as 'Won't fix' Recording: https://bluejeans.com/s/j_UgZFw_jEV - Chapter 1 # DB migration constant change handling Till now we have all the migration scripts by the name of the cycle and since currently release has been change to 2023.1 which is going to break the migration test because when our DB sync tool runs it will check the initial version liberty and it finds the migration script from the liberty and traverse through all the directories till the current release and executes all the scripts available in that path. Decided to fix this by updating the data migration current release to '2023.1' and check with actual migration script to check whether there is any regression or not and check if it executes the scripts in serial manner. Recording: https://bluejeans.com/s/j_UgZFw_jEV - Chapter 2 # Configurable Soft Delete Stephen initiated this topic for nova and oslo.db but also sent out a mail for glance, cinder and for other projects, if we would be interested in the idea https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030729.html But Glance replies a lot on soft delete and doesn't allow hard delete as glance allows the user to specify the UUID of the image , so it's part of a security promise of an immutable images that you are not able to delete an image and then recreate image with same UUID right after, hence soft delete model can't be removed from Glance. Recording: https://bluejeans.com/s/j_UgZFw_jEV - Chapter 3 # Secure RBAC As per the revised community goal, till zed cycle we must have the project prosona implementation & drop the system scope which we have already done in glance. We have implemented project scope for image apis in wallaby & in Xena we have managed to move all policy checks to API layer and implemented project scope of metadef APIs. During zed cycle, we had discussed that like which apis should be exposed to system scope but after the operators feedback & as per the revised community goal it was decided to drop the system scope, we just had to remove the scope scheck from scope_types for all project resources, so we are done with Phase 1. In Antelope Cycle glance is going to switch the new defaults flag ON, once it is verified by tempest for all the services. Recording: https://bluejeans.com/s/j_UgZFw_jEV - Chapter 4 # Image Export with Metadata This is about exporting images with associated metadata for importing into another Glance deployment. This cycle we need some volunteers to work on this, the glance team will help in terms of reviews/finalizing the design etc otherwise we will revisit this in the next cycle. This session was not recorded as it was a small discussion # Fips Overview Path forward: - Investigate on the time outing of the existing glance fips job - Try to do the centos9 jobs stable till wallaby, so could possibly move to voting - Have ubuntu jobs working & running and make them stable Recording: https://bluejeans.com/s/j_UgZFw_jEV - Chapter 5 Thursday, OCT 20th, 2022 # Option needed to create image in erasure coded ceph pool To use the erasure coded ceph pool feature where the data will be held needs to be specified during image creation, a config option is needed. Decided to write a spec first describing this in detail and modify the existing devstack job for ceph. Recording: https://bluejeans.com/s/cNEJ at Yq_hv5 - Chapter 1 # Parallelization of RADOS image writes when creating an image This is for parallelization of image writes into ceph, by increasing the amount of data written at once. Decided to not have this without testing to measure if there's an actual gain. Recording: https://bluejeans.com/s/cNEJ at Yq_hv5 - Chapter 2 # Add chunk download support for rbd backend Rbd supports random reading, and the rbd driver is designed to support partial download. But the current version disables this feature, and doesn't implement the chunk support. There is potential future use for it, but there's nothing that users would gain from this. If not usable through the API then we will drop this. Recording: https://bluejeans.com/s/cNEJ at Yq_hv5 - chapter 3 # Operators hour No operator joined the session # Speedup upload images for Swift backend Upload images using Swift backend is slow due to the synchronous way of uploading fragments and which causes loading that can take several hours, especially in cases with very large images. Decided to update the spec with two implementations one as traditional & one for multithreaded one, by having a new configuration option ``swift_store_thread_pool_size`` to the Swift store backend. Recording: https://bluejeans.com/s/cNEJ at Yq_hv5 - Chapter 4 Friday, OCT 21st, 2022 # Cross project meet with Cinder # Glance Image Direct URL access Issue Glance has OSSN-0090[2] describing the security risk when you are operating glance with 'show_multiple_locations' or if the end user facing glance-api has 'show_image_direct_url' options set to true. When glance shares a common storage backend with nova & cinder, it is possible to open some known attack vendors by which malicious data modification can occur. Decided to fix this by going with the solution proposed by Rajat during last cycle, to remove the show_multiple_locations config option and to have below 2 new location APIs[3] which will replace the image-update mechanism for consumers like cinder and nova in glance, 1. Location ADD API: Design this api in way that the location will be added only once during image create when the image is in QUEUED state and no-one should be allowed to add location after the image is active This wouldn't require the 'service role' and a basic check on the glance side to check image status should suffice. 2. Location GET API: This will show all the locations associated with an existing image. Returns an empty list if an image contains no locations. This would still require the 'service role' since we don't want to expose locations to end users. Glance has dependency on keystone for 'service role' which is going to be implemented in this cycle as per phase 2 target mentioned in SRBAC community goals[4]. Recording : https://bluejeans.com/s/B4Rlifuwx_l - Chapter 1 You will find the detailed information about the same in the PTG etherpad[1] along with the recordings of the sessions and milestone wise priorities at the bottom of the etherpad. Kindly let me know if you have any questions about the same. [1]: https://etherpad.opendev.org/p/antelope-glance-ptg [2]: https://wiki.openstack.org/wiki/OSSN/OSSN-0090 [3]: https://specs.openstack.org/openstack/glance-specs/specs/zed/approved/glance/new-location-info-apis.html [4]: https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rbac.html#phase-2 Thanks, Pranali -------------- next part -------------- An HTML attachment was scrubbed... URL: From ces.eduardo98 at gmail.com Tue Oct 25 19:52:59 2022 From: ces.eduardo98 at gmail.com (Carlos Silva) Date: Tue, 25 Oct 2022 16:52:59 -0300 Subject: [manila] Antelope PTG Summary Message-ID: Hello, Zorillas and interested stackers. Thank you for the productive PTG we had last week. Here goes the summary for the past week's discussions. In case you would like to see the expanded versions, please refer to [0] or check out the recordings in the Manila YouTube Channel [11]. Different approaches for the images we use in our jobs - Recent changes in the Manila CI (related to the service image) made the jobs take longer to run and become more resource intensive, causing jobs to fail more often due to lack of resources in dsvm - We have been looking for approaches to tackle this issue - In the short term, if a job keeps failing due to such issues, we will split it into two jobs - One to run scenario tests (these tests spawn VMs) - One to run API tests - We agreed that this can get worse in the future with dsvm images that could be even more resource demanding, and that we will look for containerized approaches to try and solve this issue Share backup - (kpdev) The proposed approach for share backup has changed - An existing specification [1] was restored - The idea is to have new APIs for share backup and a new generic Backup driver in a similar way to what Cinder does. - This would allow backends to have their specific implementations for share backups - Reviews for this specification are ongoing Secure RBAC changes - Goutham has shared with the community all the good progress we had in Manila, and also mentioned what we did in Zed, after all the operator's feedback. - Now, we are entering a phase where we want to test a lot! - We have a CI jobs and some test cases covered, but we still want to have more coverage - During Antelope, we will promote a hackathon in order to accelerate the functional tests coverage Outreachy Internships - On this topic, ashrodri, fmount, gouthamr and carloss shared the two proposed Outreachy project that are Manila related - ashrodri and fmount are proposing a project to have an intern creating a multi-node test environment with Devstack and Ceph [2], as these tests are becoming more resource demanding - carloss and gouthamr are proposing a project for an intern to work in Manila UI [3]. The idea is to close down the gap of features between Manila UI and the Manila API. OpenStackSDK Manila Status - This has been an effort for some time, and during the Zed cycle some code was merged for integrating share groups and share group snapshots. - This is a good candidate project for an internship, and the idea is that we will be able to get some interns working on this project (potentially a project for a college capstone project) OSC Updates - Zed was the release we targeted to reach feature parity with the native client, and we made it! - We had an idea to add a deprecation warning to the native manila client, but that will need to wait - There is still a missing bit for adding the deprecation warning: having version autonegotiation working, which is something we are targeting for Antelope - OSC is now the primary focus of the project when doing implementations for CLIs. Bandit Testing / VMT - The Manila team was introduced to the VMT (vulnerability management team) and Bandit, and we had contributors sharing their ideas around having manila under the VMT and running bandit tests. - VMT - The audience agreed with the conditions to be under the overseen repos. - Goutham has volunteered to be the security liaison for Manila - Bandit - There were some errors pointed out in a preliminar test run [4]. - We will add a bandit non-voting job and file bugs against third party drivers that have issues pointed out by the new job - New bugs will be filed and will be distributed across community members All Things CephFS - Ingress aware NFS Ganesha - Ceph Orchestrator is now capable of creating an ingress service to front-end a cluster of active-active NFS Ganesha instances via a single load-balanced IP address - Previously, we would only use one single instance of NFS Ganesha. This introduces a SPOF and is not suitable for production environments without HA being handled externally (like TripleO deployments do). With this change, Manila CephFS NFS users would be able to deploy multiple NFS Ganesha servers and leverage inbuilt HA capabilities - Currently though manila client restrictions (access rules) will not work since NFS-Ganesha sees the ingress service's IP instead of the client's IP address. - ffilz proposed his design to support the PROXY protocol with NFS-Ganesha. Ceph Ingress can pass the Client's IP address to NFS-Ganesha [5] - An alternative would be to have cephadm assign stable IP addresses to ganesha instances - No driver changes are anticipated in either approach - AI: Investigating deploying ceph-ingress with devstack - Migration tool for Manila with CephFS NFS - After the new helper was introduced to the driver, we are able to interact with cephadm-deployed CephNFS service (which comes with a lot of benefits) - Now, we need to figure out how to upgrade deployments with CephFS NFS using the current helper (which interacts with NFS Ganesha through DBUS) - There are two main issues: - Representing a change in exports when migrating across NFS Servers (current use case) or when decommissioning NFS Servers - Representing exports from multiple servers - a special case enabling migration where an old server will eventually be decommissioned: - These issues were talked through and possible solutions were proposed and will be worked on in the following cycles. More details on [6] - DHSS=True for CephFS driver - Currently, CephFS driver, whether you'd like to use it for Native CephFS, NFS via DBUS, NFS via ceph-mgr only supports DHSS=False. The request of this feature was raised during the OpenInfra Summit in Berlin from several Manila users. - We discussed some alternatives for making it happen: - 1) Operator determines Ceph cluster limits, and creates isolated file systems and declares multiple manila backends with each filesystem - 2) Subvolume groups - pinning a subvolume group to mds can isolate/dedicate mds load - CephFS CI issues - The lack of resources in the dsvms is affecting the CephFS Native and NFS jobs, causing them to be unstable. The jobs often run into timeouts - The situation could get worse as Jammy Jellyfish packages are not available yet - We will ask the ceph-users if jammy release bits can be made available Oversubscription enhancements - Storage pools supporting thin provisioning are open to oversubscription. This caused some problems mentioned in [7]. - We have an open specification which we intend to merge in the Antelope cycle, as well as the changes to address such issue FIPS - We have shared our testing status with regards to FIPS. We have jobs merged on stable branches up to wallaby for both manila and python-manilaclient repositories. - The next steps would be a more in-depth code audit to identify non-compliant libraries and making our jobs voting - We agreed to make our jobs voting when the Ubuntu images supporting FIPS are out - For drivers using non FIPS compliant libraries, we will notify the maintainers Metadata API update - Metadata APIs for share snapshots were added during the Zed cycle - The goal for this cycle is to have functional testing and the CLI merged. Both patches are under review and expected to be merged not late in the release. Manila CSI - The CSI plugin's been pretty stable in the last six months [8] - There was a good talk presented at KubeCon by Robert Vasek [9] - The next steps include getting a fix for an issue involving long snapshot names in the CephFS backend and supporting volume expansion in the OpenShift Manila CSI driver operator Manila Configuration of VLAN Network information - A issue was found where Contrail Neutron Plugin did not return the VLAN ID during port allocation [10] - To tackle this issue, we have agreed to add metadata to the share network APIs so the administrators would be able to add VLAN as metadata and drivers would be able to consume it. *Better use of bug statuses* - Our bugs were stuck in "New" instead of "Confirmed" or "Triaged" and this could be misleading. - We agreed to tag the bugs as confirmed or triaged depending on the outcome of our triaging and not leave bugs as "new" - We will ask the bug assignees to update their open bugs in case one of them has the new status, so we can have a better visibility [0] https://etherpad.opendev.org/p/antelope-ptg-manila [1] https://review.opendev.org/c/openstack/manila-specs/+/330306 [2] https://www.outreachy.org/outreachy-december-2022-internship-round/communities/openstack/#create-multi-node-testing-job-with-ceph [3] https://www.outreachy.org/outreachy-december-2022-internship-round/communities/openstack/#implement-features-for-manila-user-interface [4] https://gist.github.com/gouthampacha/c0d96966670956761b2b620be730efe2 [5] https://docs.google.com/document/d/1orjNjtEeeyRrgvbQuFU5BdsCnZG9gF3MymDCZl-gOTs/edit?usp=sharing [6] https://etherpad.opendev.org/p/antelope-ptg-manila-cephfs [7] https://etherpad.opendev.org/p/manila_oversubscription_enhancements [8] https://github.com/kubernetes/cloud-provider-openstack/commits/master/pkg/csi/manila [9] https://www.youtube.com/watch?v=XfpP9pBTXfY&t=1145s [10] https://bugs.launchpad.net/charm-manila/+bug/1987315 [11] https://www.youtube.com/playlist?list=PLnpzT0InFrqBzbSP6lcYDStKpbEl3GNHK Thanks, carloss -------------- next part -------------- An HTML attachment was scrubbed... URL: From rishat.azizov at gmail.com Wed Oct 26 00:07:41 2022 From: rishat.azizov at gmail.com (=?UTF-8?B?0KDQuNGI0LDRgiDQkNC30LjQt9C+0LI=?=) Date: Wed, 26 Oct 2022 06:07:41 +0600 Subject: [cinder] Problem with restoring purestorage volume backup Message-ID: Hello! We have a problem with cinder-backup with cephbackup driver and volume on purestorage. When backing up purestorage volume with cinder-backup with cephbackup driver it creates in ceph pool and everything is ok. But when we try to restore the backup, it is not restored with an error "rbd.ImageNotFound" in the screenshot attached to this email. This happens because the original image is not in rbd, it is in purestorage. It is not clear why the cinder is trying to look for a disk in the ceph. Could you please help with this? Thanks. Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: photo_2022-05-16_15-54-32.jpg Type: image/jpeg Size: 165793 bytes Desc: not available URL: From kennelson11 at gmail.com Wed Oct 26 16:13:30 2022 From: kennelson11 at gmail.com (Kendall Nelson) Date: Wed, 26 Oct 2022 11:13:30 -0500 Subject: Help us plan the next PTG! In-Reply-To: <184107cfa1a.f47f66a8305665.9192260266772408602@ghanshyammann.com> References: <184107cfa1a.f47f66a8305665.9192260266772408602@ghanshyammann.com> Message-ID: On Tue, Oct 25, 2022 at 1:52 PM Ghanshyam Mann wrote: > ---- On Tue, 25 Oct 2022 11:32:46 -0700 Kendall Nelson wrote --- > > Hello Everyone! > > Congratulations on a great virtual PTG last week :) > > > > The OpenInfra Foundation is hosting the OpenInfra Summit in Vancouver > [1], June 13 -15, 2023. > > We are trying to determine the level of interest among contributors to > OpenInfra projects on attending a co-located PTG in Vancouver. The exact > format and dates are still being determined. At this time, we are > evaluating the level of interest from an attendee and employer perspective. > Please complete the following poll so we can measure the level of interest > and plan accordingly. Future updates will be distributed to the project > mailing lists as well as previous PTG attendees. > > > > Thanks, Kendall for collecting the feedback and survey. One question to > understand the future PTGs schedule. > > As PTGs are aligned with the OpenStack new development cycle timing, they > were very helpful to plan the > new cycle features/work well at the start of the cycle. But seeing the > summit co-located PTG timing which > is June, I am curious to know if there will be a PTG for the 2023.2 (B) > cycle in April (with 2023.1 Antelope releasing > at the March end) also? Or we are going to have only one in June which > will be co-located in Vancouver Summit (once > it is final based on survey results). > We would still do the usual virtual PTG on the ''normal" timeline. This potential add on to Vancouver would be in addition to the virtual PTG. > Definitely, having a co-located PTGs in Summit is a very good idea, saving > travel, and being much more productive also > but it's just timing from OpenStack release perspective making it a little > bit difficult. > > -gmann > > > Poll: > https://openinfrafoundation.formstack.com/forms/openinfra_ptg_2023 > > > > As a reminder, we are also gathering feedback for the virtual PTG here: > https://etherpad.opendev.org/p/Oct2022_PTGFeedback > > > > -Kendall Nelson (diablo_rojo) > > [1] https://openinfra.dev/summit/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Oct 26 16:28:58 2022 From: smooney at redhat.com (Sean Mooney) Date: Wed, 26 Oct 2022 17:28:58 +0100 Subject: Help us plan the next PTG! In-Reply-To: References: <184107cfa1a.f47f66a8305665.9192260266772408602@ghanshyammann.com> Message-ID: <530554b30f54a1b841a343e5f495badb3f65a616.camel@redhat.com> On Wed, 2022-10-26 at 11:13 -0500, Kendall Nelson wrote: > On Tue, Oct 25, 2022 at 1:52 PM Ghanshyam Mann > wrote: > > > ---- On Tue, 25 Oct 2022 11:32:46 -0700 Kendall Nelson wrote --- > > > Hello Everyone! > > > Congratulations on a great virtual PTG last week :) > > > > > > The OpenInfra Foundation is hosting the OpenInfra Summit in Vancouver > > [1], June 13 -15, 2023. > > > We are trying to determine the level of interest among contributors to > > OpenInfra projects on attending a co-located PTG in Vancouver. The exact > > format and dates are still being determined. At this time, we are > > evaluating the level of interest from an attendee and employer perspective. > > Please complete the following poll so we can measure the level of interest > > and plan accordingly. Future updates will be distributed to the project > > mailing lists as well as previous PTG attendees. > > > > > > > Thanks, Kendall for collecting the feedback and survey. One question to > > understand the future PTGs schedule. > > > > As PTGs are aligned with the OpenStack new development cycle timing, they > > were very helpful to plan the > > new cycle features/work well at the start of the cycle. But seeing the > > summit co-located PTG timing which > > is June, I am curious to know if there will be a PTG for the 2023.2 (B) > > cycle in April (with 2023.1 Antelope releasing > > at the March end) also? Or we are going to have only one in June which > > will be co-located in Vancouver Summit (once > > it is final based on survey results). > > > > We would still do the usual virtual PTG on the ''normal" timeline. This > potential add on to Vancouver would be in addition to the virtual PTG. so replacing the fourm? > > > > Definitely, having a co-located PTGs in Summit is a very good idea, saving > > travel, and being much more productive also > > but it's just timing from OpenStack release perspective making it a little > > bit difficult. > > > > -gmann > > > > > Poll: > > https://openinfrafoundation.formstack.com/forms/openinfra_ptg_2023 > > > > > > As a reminder, we are also gathering feedback for the virtual PTG here: > > https://etherpad.opendev.org/p/Oct2022_PTGFeedback > > > > > > -Kendall Nelson (diablo_rojo) > > > [1] https://openinfra.dev/summit/ > > From senrique at redhat.com Wed Oct 26 17:12:19 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 26 Oct 2022 14:12:19 -0300 Subject: [cinder] Problem with restoring purestorage volume backup In-Reply-To: References: Message-ID: Hello, This is a major bug in Cinder that has been reported. Please check https://launchpad.net/bugs/1895035. The bug has a fix proposed to master to haven't merged yet [2]. I'll mention it again in next week's Cinder meeting. Regards, Sofia [2] https://review.opendev.org/c/openstack/cinder/+/750782 On Wed, Oct 26, 2022 at 9:45 AM ????? ?????? wrote: > Hello! > > We have a problem with cinder-backup with cephbackup driver and volume on > purestorage. When backing up purestorage volume with cinder-backup with > cephbackup driver it creates in ceph pool and everything is ok. But when > we try to restore the backup, it is not restored with an error > "rbd.ImageNotFound" in the screenshot attached to this email. This happens > because the original image is not in rbd, it is in purestorage. It is not > clear why the cinder is trying to look for a disk in the ceph. Could you > please help with this? > > Thanks. Regards. > -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Wed Oct 26 18:24:37 2022 From: kennelson11 at gmail.com (Kendall Nelson) Date: Wed, 26 Oct 2022 13:24:37 -0500 Subject: Help us plan the next PTG! In-Reply-To: <530554b30f54a1b841a343e5f495badb3f65a616.camel@redhat.com> References: <184107cfa1a.f47f66a8305665.9192260266772408602@ghanshyammann.com> <530554b30f54a1b841a343e5f495badb3f65a616.camel@redhat.com> Message-ID: No that wasn't the plan. I think the Forum will also still happen as per usual. -Kendall On Wed, Oct 26, 2022 at 11:29 AM Sean Mooney wrote: > On Wed, 2022-10-26 at 11:13 -0500, Kendall Nelson wrote: > > On Tue, Oct 25, 2022 at 1:52 PM Ghanshyam Mann > > wrote: > > > > > ---- On Tue, 25 Oct 2022 11:32:46 -0700 Kendall Nelson wrote --- > > > > Hello Everyone! > > > > Congratulations on a great virtual PTG last week :) > > > > > > > > The OpenInfra Foundation is hosting the OpenInfra Summit in > Vancouver > > > [1], June 13 -15, 2023. > > > > We are trying to determine the level of interest among contributors > to > > > OpenInfra projects on attending a co-located PTG in Vancouver. The > exact > > > format and dates are still being determined. At this time, we are > > > evaluating the level of interest from an attendee and employer > perspective. > > > Please complete the following poll so we can measure the level of > interest > > > and plan accordingly. Future updates will be distributed to the project > > > mailing lists as well as previous PTG attendees. > > > > > > > > > > Thanks, Kendall for collecting the feedback and survey. One question to > > > understand the future PTGs schedule. > > > > > > As PTGs are aligned with the OpenStack new development cycle timing, > they > > > were very helpful to plan the > > > new cycle features/work well at the start of the cycle. But seeing the > > > summit co-located PTG timing which > > > is June, I am curious to know if there will be a PTG for the 2023.2 (B) > > > cycle in April (with 2023.1 Antelope releasing > > > at the March end) also? Or we are going to have only one in June which > > > will be co-located in Vancouver Summit (once > > > it is final based on survey results). > > > > > > > We would still do the usual virtual PTG on the ''normal" timeline. This > > potential add on to Vancouver would be in addition to the virtual PTG. > so replacing the fourm? > > > > > > > > Definitely, having a co-located PTGs in Summit is a very good idea, > saving > > > travel, and being much more productive also > > > but it's just timing from OpenStack release perspective making it a > little > > > bit difficult. > > > > > > -gmann > > > > > > > Poll: > > > https://openinfrafoundation.formstack.com/forms/openinfra_ptg_2023 > > > > > > > > As a reminder, we are also gathering feedback for the virtual PTG > here: > > > https://etherpad.opendev.org/p/Oct2022_PTGFeedback > > > > > > > > -Kendall Nelson (diablo_rojo) > > > > [1] https://openinfra.dev/summit/ > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Wed Oct 26 19:01:43 2022 From: melwittt at gmail.com (melanie witt) Date: Wed, 26 Oct 2022 12:01:43 -0700 Subject: Help us plan the next PTG! In-Reply-To: References: <184107cfa1a.f47f66a8305665.9192260266772408602@ghanshyammann.com> <530554b30f54a1b841a343e5f495badb3f65a616.camel@redhat.com> Message-ID: <35d6d3d6-4fd2-8151-b4d9-644deb191bba@gmail.com> On Wed Oct 26 2022 11:24:37 GMT-0700 (Pacific Daylight Time), Kendall Nelson wrote: > No that wasn't the plan. I think the Forum will also still happen as per > usual. TBH, I wouldn't mind it (and I think I would actually prefer it) if we went back to the OG model of having the summit and the design summit at the same time. I was never into the split off into the PTG but wanted to be open minded. I think having the user-oriented energy of the summit combined with the ability to participate on the dev side at the same event was a good thing. I might be alone in this feeling though. That said, I think gmann highlighted the challenge with this would be how we really need to have two events per year to collaborate on each upcoming release. I wonder if we could do one co-located event with the summit and then a virtual PTG for the other cycle? I dunno if the virtual PTG is too resource intensive. I'm generally not a fan of virtual events but I have quite liked the virtual PTG. I think it has been running really smoothly and very productive. Just my 2c. -melwitt > On Wed, Oct 26, 2022 at 11:29 AM Sean Mooney > wrote: > > On Wed, 2022-10-26 at 11:13 -0500, Kendall Nelson wrote: > > On Tue, Oct 25, 2022 at 1:52 PM Ghanshyam Mann > > > > wrote: > > > > >? ---- On Tue, 25 Oct 2022 11:32:46 -0700? Kendall Nelson? wrote --- > > >? > Hello Everyone! > > >? > Congratulations on a great virtual PTG last week :) > > >? > > > >? > The OpenInfra Foundation is hosting the OpenInfra Summit in > Vancouver > > > [1], June 13 -15, 2023. > > >? > We are trying to determine the level of interest among > contributors to > > > OpenInfra projects on attending a co-located PTG in Vancouver. > The exact > > > format and dates are still being determined. At this time, we are > > > evaluating the level of interest from an attendee and employer > perspective. > > > Please complete the following poll so we can measure the level > of interest > > > and plan accordingly. Future updates will be distributed to the > project > > > mailing lists as well as previous PTG attendees. > > >? > > > > > > > Thanks, Kendall for collecting the feedback and survey. One > question to > > > understand the future PTGs schedule. > > > > > > As PTGs are aligned with the OpenStack new development cycle > timing, they > > > were very helpful to plan the > > > new cycle features/work well at the start of the cycle.? But > seeing the > > > summit co-located PTG timing which > > > is June, I am curious to know if there will be a PTG for the > 2023.2 (B) > > > cycle in April (with 2023.1 Antelope releasing > > > at the March end) also? Or we are going to have only one in > June which > > > will be co-located in Vancouver Summit (once > > > it is final based on survey results). > > > > > > > We would still do the usual virtual PTG on the ''normal" > timeline. This > > potential add on to Vancouver would be in addition to the virtual > PTG. > so replacing the fourm? > > > > > > > > Definitely, having a co-located PTGs in Summit is a very good > idea, saving > > > travel, and being much more productive also > > > but it's just timing from OpenStack release perspective making > it a little > > > bit difficult. > > > > > > -gmann > > > > > >? > Poll: > > > > https://openinfrafoundation.formstack.com/forms/openinfra_ptg_2023 > > > >? > > > >? > As a reminder, we are also gathering feedback for the > virtual PTG here: > > > https://etherpad.opendev.org/p/Oct2022_PTGFeedback > > > >? > > > >? > -Kendall Nelson (diablo_rojo) > > >? > [1] https://openinfra.dev/summit/ > > > > > From gmann at ghanshyammann.com Wed Oct 26 19:42:19 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 26 Oct 2022 12:42:19 -0700 Subject: Help us plan the next PTG! In-Reply-To: <35d6d3d6-4fd2-8151-b4d9-644deb191bba@gmail.com> References: <184107cfa1a.f47f66a8305665.9192260266772408602@ghanshyammann.com> <530554b30f54a1b841a343e5f495badb3f65a616.camel@redhat.com> <35d6d3d6-4fd2-8151-b4d9-644deb191bba@gmail.com> Message-ID: <18415d1284c.ffea1dfa21249.6917737375854831301@ghanshyammann.com> ---- On Wed, 26 Oct 2022 12:01:43 -0700 melanie witt wrote --- > On Wed Oct 26 2022 11:24:37 GMT-0700 (Pacific Daylight Time), Kendall > Nelson kennelson11 at gmail.com> wrote: > > No that wasn't the plan. I think the Forum will also still happen as per > > usual. > > TBH, I wouldn't mind it (and I think I would actually prefer it) if we > went back to the OG model of having the summit and the design summit at > the same time. I was never into the split off into the PTG but wanted to > be open minded. I think having the user-oriented energy of the summit > combined with the ability to participate on the dev side at the same > event was a good thing. I might be alone in this feeling though. I agree and it will definitely help to connect users/operators more in the dev community. Honestly saying, I always liked that model. -gmann > > That said, I think gmann highlighted the challenge with this would be > how we really need to have two events per year to collaborate on each > upcoming release. I wonder if we could do one co-located event with the > summit and then a virtual PTG for the other cycle? I dunno if the > virtual PTG is too resource intensive. I'm generally not a fan of > virtual events but I have quite liked the virtual PTG. I think it has > been running really smoothly and very productive. > > Just my 2c. > > -melwitt > > > On Wed, Oct 26, 2022 at 11:29 AM Sean Mooney smooney at redhat.com > > smooney at redhat.com>> wrote: > > > > On Wed, 2022-10-26 at 11:13 -0500, Kendall Nelson wrote: > > > On Tue, Oct 25, 2022 at 1:52 PM Ghanshyam Mann > > gmann at ghanshyammann.com gmann at ghanshyammann.com>> > > > wrote: > > > > > > >? ---- On Tue, 25 Oct 2022 11:32:46 -0700? Kendall Nelson? wrote --- > > > >? > Hello Everyone! > > > >? > Congratulations on a great virtual PTG last week :) > > > >? > > > > >? > The OpenInfra Foundation is hosting the OpenInfra Summit in > > Vancouver > > > > [1], June 13 -15, 2023. > > > >? > We are trying to determine the level of interest among > > contributors to > > > > OpenInfra projects on attending a co-located PTG in Vancouver. > > The exact > > > > format and dates are still being determined. At this time, we are > > > > evaluating the level of interest from an attendee and employer > > perspective. > > > > Please complete the following poll so we can measure the level > > of interest > > > > and plan accordingly. Future updates will be distributed to the > > project > > > > mailing lists as well as previous PTG attendees. > > > >? > > > > > > > > > Thanks, Kendall for collecting the feedback and survey. One > > question to > > > > understand the future PTGs schedule. > > > > > > > > As PTGs are aligned with the OpenStack new development cycle > > timing, they > > > > were very helpful to plan the > > > > new cycle features/work well at the start of the cycle.? But > > seeing the > > > > summit co-located PTG timing which > > > > is June, I am curious to know if there will be a PTG for the > > 2023.2 (B) > > > > cycle in April (with 2023.1 Antelope releasing > > > > at the March end) also? Or we are going to have only one in > > June which > > > > will be co-located in Vancouver Summit (once > > > > it is final based on survey results). > > > > > > > > > > We would still do the usual virtual PTG on the ''normal" > > timeline. This > > > potential add on to Vancouver would be in addition to the virtual > > PTG. > > so replacing the fourm? > > > > > > > > > > > > Definitely, having a co-located PTGs in Summit is a very good > > idea, saving > > > > travel, and being much more productive also > > > > but it's just timing from OpenStack release perspective making > > it a little > > > > bit difficult. > > > > > > > > -gmann > > > > > > > >? > Poll: > > > > > > https://openinfrafoundation.formstack.com/forms/openinfra_ptg_2023 > > https://openinfrafoundation.formstack.com/forms/openinfra_ptg_2023> > > > >? > > > > >? > As a reminder, we are also gathering feedback for the > > virtual PTG here: > > > > https://etherpad.opendev.org/p/Oct2022_PTGFeedback > > https://etherpad.opendev.org/p/Oct2022_PTGFeedback> > > > >? > > > > >? > -Kendall Nelson (diablo_rojo) > > > >? > [1] https://openinfra.dev/summit/ > > https://openinfra.dev/summit/> > > > > > > > > > From gmann at ghanshyammann.com Wed Oct 26 23:11:46 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 26 Oct 2022 16:11:46 -0700 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Oct 27 at 1500 UTC In-Reply-To: <1840ca763ed.11a1d388f223718.5578574495996418279@ghanshyammann.com> References: <1840ca763ed.11a1d388f223718.5578574495996418279@ghanshyammann.com> Message-ID: <1841690e876.afe6fc9b25421.7225549985167953354@ghanshyammann.com> Hello Everyone, Below is the agenda for Tomorrow's TC meeting scheduled at 1500 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting * Roll call * Follow up on past action items * Gate health check ** Bare 'recheck' state *** https://etherpad.opendev.org/p/recheck-weekly-summary ** Zuul config error *** https://etherpad.opendev.org/p/zuul-config-error-openstack * 2023.1 cycle PTG * TC questions for the 2023 user survey ** https://etherpad.opendev.org/p/tc-2023-user-survey-questions ** Deadline: Oct 30 ** TC chair election process * TC weekly meeting time ** https://framadate.org/xR6HoeDpdXXfiueb * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open -gmann ---- On Mon, 24 Oct 2022 18:00:07 -0700 Ghanshyam Mann wrote --- > Hello Everyone, > > The technical Committee's next weekly meeting is scheduled for 2022 Oct 27, at 1500 UTC. > > If you would like to add topics for discussion, please add them to the below wiki page by > Wednesday, Oct 26 at 2100 UTC. > > https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting > > -gmann > > From p.aminian.server at gmail.com Thu Oct 27 09:51:31 2022 From: p.aminian.server at gmail.com (Parsa Aminian) Date: Thu, 27 Oct 2022 13:21:31 +0330 Subject: network monitor Message-ID: hello Im using openstack kolla-ansible . Could you please tell me how I can monitor instances' network usage ? 1-download and upload speed 2-traffic usage for example send and receive per month for each instance -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Oct 27 11:10:08 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 27 Oct 2022 13:10:08 +0200 Subject: [openstack-ansible][ptg] Antelope PTG results Message-ID: Hi everyone! As you might know, the OpenStack-Ansible team had Project Team Gathering on October 18, 2022 and we had quite good discussions. This email is a relatively short sum-up of decisions that were taken during it: * Add Zookepeer cluster deployment as coordination service. Coordination is required if you want to have active/active cinder-volume setup and also actively used by other projects, like Octavia or Designate. Zookeeper will be deployed in a separate set of containers for LXC path * Move out Ubuntu Jammy from experimental state except Ceph integration. This is because community repo does not have ceph packages built for Ubuntu 22.04, so we will switch default to use Ubuntu's native repository to get ceph from. With that you won't be able to pick what ceph version to use, but it will work for most usecases. * Continue supporting Ubuntu Focal (20.04) until Antelope release as upgrade path from Yoga to Antelope should exist and on TC PTG it was decided that it should be tested by projects on Ubuntu 20.04. * Update docs regarding level of support for integrated ceph-ansible deployment with openstack-ansible to address expectations on upgrade path. * Address concerns regarding Glance configuration with COW backends, with regards to OSSN notice: https://wiki.openstack.org/wiki/OSSN/OSSN-0090 * Switch distro path to experimental due to lack of maintainers. We should document that and mention that help is needed to keep it alive. If not - the feature can be removed in the future. * Cover OVN with TLS encryption. Leverage PKI role for that. * We also agreed to switch the default network driver used by OpenStack-Ansible to OVN as default. However, at the moment there is not much hands-on experience on OVN from the core team, so we can not switch default right now as we won't be efficiently helping with OVN deployments. So following switching plan was defined: ** Document reference OVN architecture that we are going to provide ** Document implementation of extra networks that are required for octavia, trove, etc. ** Switch AIO to OVN as default ** Add LXB jobs to neutron role ** Switch default for non-AIO ** Ensure that users who were relying on previous default (and using LXB) have proper override during upgrade. * Improve documentation for the Ironic role with adding a couple of scenarios. * Convert our dynamic inventory to an inventory plugin which will be installed during ansible bootstrap. * Create a repository for the skyline role. We are allowed to take https://github.com/jrosser/openstack-ansible-os_skyline as base and move it under openstack-ansible umbrella * Simplify documentation regarding provider_networks in openstack_user_config and suggest using neutron_provider_networks instead. * Continue working on improving our CI coverage, including returning to molecule testing but rely on the ansible and constraints version from the integrated repo. At the same time finish cleanup of old functional tests, including run_tests.sh, tox, etc. * Deprecate rsyslog_server/client roles as they are hardly used as of today. You can also check etherpad [1] where notes were taken during PTG and that we aim to update with progress on these points implementation: [1] https://etherpad.opendev.org/p/osa-antelope-ptg From sbauza at redhat.com Thu Oct 27 11:22:35 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Thu, 27 Oct 2022 13:22:35 +0200 Subject: Help us plan the next PTG! In-Reply-To: <35d6d3d6-4fd2-8151-b4d9-644deb191bba@gmail.com> References: <184107cfa1a.f47f66a8305665.9192260266772408602@ghanshyammann.com> <530554b30f54a1b841a343e5f495badb3f65a616.camel@redhat.com> <35d6d3d6-4fd2-8151-b4d9-644deb191bba@gmail.com> Message-ID: Le mer. 26 oct. 2022 ? 21:08, melanie witt a ?crit : > On Wed Oct 26 2022 11:24:37 GMT-0700 (Pacific Daylight Time), Kendall > Nelson wrote: > > No that wasn't the plan. I think the Forum will also still happen as per > > usual. > > TBH, I wouldn't mind it (and I think I would actually prefer it) if we > went back to the OG model of having the summit and the design summit at > the same time. I was never into the split off into the PTG but wanted to > be open minded. I think having the user-oriented energy of the summit > combined with the ability to participate on the dev side at the same > event was a good thing. I might be alone in this feeling though. > > That said, I think gmann highlighted the challenge with this would be > how we really need to have two events per year to collaborate on each > upcoming release. I wonder if we could do one co-located event with the > summit and then a virtual PTG for the other cycle? I dunno if the > virtual PTG is too resource intensive. I'm generally not a fan of > virtual events but I have quite liked the virtual PTG. I think it has > been running really smoothly and very productive. > > I agree with Melanie here. My personal view is that we could take the opportunity to gather back operators and developers into one single event that would happen once a year, ideally be the OIF Summit which would incidentally be at the beginning of a release period (ideal time to showcase the recent new features and to hear from feeeback before starting to draft other features) Virtual PTGs would still be necessary at every odd release since there were no physical PTG but in this situation, we would rotate between physical and virtual every 6 months, which should give us benefits of both. Last note, as we started having tick-tock releases [1], this sounds fitting perfectly our new release model : physical PTGs could happen at the beginning of a tock release and virtual PTGs at the beginning of a tick. HTH, -Sylvain [1] https://governance.openstack.org/tc/resolutions/20220210-release-cadence-adjustment.html > Just my 2c. > > -melwitt > > > On Wed, Oct 26, 2022 at 11:29 AM Sean Mooney > > wrote: > > > > On Wed, 2022-10-26 at 11:13 -0500, Kendall Nelson wrote: > > > On Tue, Oct 25, 2022 at 1:52 PM Ghanshyam Mann > > > > > > wrote: > > > > > > > ---- On Tue, 25 Oct 2022 11:32:46 -0700 Kendall Nelson wrote > --- > > > > > Hello Everyone! > > > > > Congratulations on a great virtual PTG last week :) > > > > > > > > > > The OpenInfra Foundation is hosting the OpenInfra Summit in > > Vancouver > > > > [1], June 13 -15, 2023. > > > > > We are trying to determine the level of interest among > > contributors to > > > > OpenInfra projects on attending a co-located PTG in Vancouver. > > The exact > > > > format and dates are still being determined. At this time, we > are > > > > evaluating the level of interest from an attendee and employer > > perspective. > > > > Please complete the following poll so we can measure the level > > of interest > > > > and plan accordingly. Future updates will be distributed to the > > project > > > > mailing lists as well as previous PTG attendees. > > > > > > > > > > > > > Thanks, Kendall for collecting the feedback and survey. One > > question to > > > > understand the future PTGs schedule. > > > > > > > > As PTGs are aligned with the OpenStack new development cycle > > timing, they > > > > were very helpful to plan the > > > > new cycle features/work well at the start of the cycle. But > > seeing the > > > > summit co-located PTG timing which > > > > is June, I am curious to know if there will be a PTG for the > > 2023.2 (B) > > > > cycle in April (with 2023.1 Antelope releasing > > > > at the March end) also? Or we are going to have only one in > > June which > > > > will be co-located in Vancouver Summit (once > > > > it is final based on survey results). > > > > > > > > > > We would still do the usual virtual PTG on the ''normal" > > timeline. This > > > potential add on to Vancouver would be in addition to the virtual > > PTG. > > so replacing the fourm? > > > > > > > > > > > > Definitely, having a co-located PTGs in Summit is a very good > > idea, saving > > > > travel, and being much more productive also > > > > but it's just timing from OpenStack release perspective making > > it a little > > > > bit difficult. > > > > > > > > -gmann > > > > > > > > > Poll: > > > > > > https://openinfrafoundation.formstack.com/forms/openinfra_ptg_2023 > > > > > > > > > > > > As a reminder, we are also gathering feedback for the > > virtual PTG here: > > > > https://etherpad.opendev.org/p/Oct2022_PTGFeedback > > > > > > > > > > > > -Kendall Nelson (diablo_rojo) > > > > > [1] https://openinfra.dev/summit/ > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Thu Oct 27 15:58:48 2022 From: eblock at nde.ag (Eugen Block) Date: Thu, 27 Oct 2022 15:58:48 +0000 Subject: network monitor In-Reply-To: Message-ID: <20221027155848.Horde.tinsgMUVK5RM1nlQSeXkC1i@webmail.nde.ag> Hi, I assume ceilometer is what you?re looking for: https://docs.openstack.org/ceilometer/latest/ Zitat von Parsa Aminian : > hello > Im using openstack kolla-ansible . Could you please tell me how I can > monitor instances' network usage ? > 1-download and upload speed > 2-traffic usage for example send and receive per month for each instance From ralonsoh at redhat.com Thu Oct 27 16:30:17 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 27 Oct 2022 18:30:17 +0200 Subject: [neutron] Neutron drivers meeting Oct 28th Message-ID: Hello Neutrinos: The drivers meeting will take place tomorrow at 14UTC. The agenda has one single topic: * https://bugs.launchpad.net/neutron/+bug/1994137: [RFE] Specify the precedence of port routes if multiple ports attached to a VM Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.aminian.server at gmail.com Thu Oct 27 16:38:48 2022 From: p.aminian.server at gmail.com (Parsa Aminian) Date: Thu, 27 Oct 2022 20:08:48 +0330 Subject: network monitor In-Reply-To: <20221027155848.Horde.tinsgMUVK5RM1nlQSeXkC1i@webmail.nde.ag> References: <20221027155848.Horde.tinsgMUVK5RM1nlQSeXkC1i@webmail.nde.ag> Message-ID: Thanks but im looking for something with gui and graphical interface On Thu, 27 Oct 2022, 19:35 Eugen Block, wrote: > Hi, > > I assume ceilometer is what you?re looking for: > https://docs.openstack.org/ceilometer/latest/ > > Zitat von Parsa Aminian : > > > hello > > Im using openstack kolla-ansible . Could you please tell me how I can > > monitor instances' network usage ? > > 1-download and upload speed > > 2-traffic usage for example send and receive per month for each instance > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Oct 27 21:00:28 2022 From: smooney at redhat.com (Sean Mooney) Date: Thu, 27 Oct 2022 22:00:28 +0100 Subject: network monitor In-Reply-To: References: <20221027155848.Horde.tinsgMUVK5RM1nlQSeXkC1i@webmail.nde.ag> Message-ID: i think the service you want to enable is skydive https://github.com/openstack/kolla-ansible/tree/master/ansible/roles/skydive https://docs.openstack.org/kolla-ansible/train/reference/logging-and-monitoring/skydive-guide.html https://github.com/skydive-project/skydive/ On Thu, Oct 27, 2022 at 5:48 PM Parsa Aminian wrote: > > Thanks but im looking for something with gui and graphical interface > > On Thu, 27 Oct 2022, 19:35 Eugen Block, wrote: >> >> Hi, >> >> I assume ceilometer is what you?re looking for: >> https://docs.openstack.org/ceilometer/latest/ >> >> Zitat von Parsa Aminian : >> >> > hello >> > Im using openstack kolla-ansible . Could you please tell me how I can >> > monitor instances' network usage ? >> > 1-download and upload speed >> > 2-traffic usage for example send and receive per month for each instance >> >> >> >> From fkr at hazardous.org Thu Oct 27 21:10:52 2022 From: fkr at hazardous.org (Felix Kronlage-Dammers) Date: Thu, 27 Oct 2022 23:10:52 +0200 Subject: Help us plan the next PTG! In-Reply-To: References: <184107cfa1a.f47f66a8305665.9192260266772408602@ghanshyammann.com> <530554b30f54a1b841a343e5f495badb3f65a616.camel@redhat.com> <35d6d3d6-4fd2-8151-b4d9-644deb191bba@gmail.com> Message-ID: <537ADDD5-6703-482A-A1D7-8D05AF0FC742@hazardous.org> On 27 Oct 2022, at 13:22, Sylvain Bauza wrote: > I agree with Melanie here. My personal view is that we could take the > opportunity to gather back operators and developers into one single event > that would happen once a year, Since there were some sessions in Berlin where a good mix of operators and devs would?ve been good (the lbaas forum session for example), I?d think this would be good to have again. The operator sessions last week (octavia as an example) were also very good and show that enable more perator<>dev dialogue is worth pursuing. felix -- GPG: 824CE0F0 / 2082 651E 5104 F989 4D18 BB2E 0B26 6738 824C E0F0 fkr at hazardous.org - fkr at irc - @fkronlage:matrix.org - @felixkronlage From skaplons at redhat.com Fri Oct 28 07:09:27 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 28 Oct 2022 09:09:27 +0200 Subject: [neutron] CI meeting Nov 1st cancelled Message-ID: <9370655.sjJMvzL01K@p1> Hi, As we discussed on IRC, Nov 1st is public holiday for me and many other folks who usually attend Neutron CI meeting so meeting next week is cancelled. See You on the CI meeting on Nov 8th. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From jean-francois.taltavull at elca.ch Fri Oct 28 07:28:41 2022 From: jean-francois.taltavull at elca.ch (=?iso-8859-1?Q?Taltavull_Jean-Fran=E7ois?=) Date: Fri, 28 Oct 2022 07:28:41 +0000 Subject: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size metric In-Reply-To: <3bb519ce13094cec91c8409fe165f7b1@elca.ch> References: <3bb519ce13094cec91c8409fe165f7b1@elca.ch> Message-ID: Hello, I can ask the question another way: what's the difference between 'radosgw.containers.objects.size' and 'radosgw.objects.size' metrics ? Thanks, JF > -----Original Message----- > From: Taltavull Jean-Fran?ois > Sent: lundi, 24 octobre 2022 16:26 > To: openstack-discuss > Subject: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size > metric > > Hello, > > I'm trying to get the 'radosgw.objects.size' metric, that is the total bucket > objects size per tenant. I expected to get one sample per tenant but I get one > sample per bucket instead, as with the 'rados.containers.objects.size' metric. > > Here is my pollster definition: > ''' > - name: "radosgw.objects.size" > sample_type: "gauge" > unit: "B" > value_attribute: ". | value['usage'] | value.get('rgw.main',{'size':0}) | > value['size']" > url_path: "FQDN/admin/bucket?stats=True" > module: "awsauth" > authentication_object: "S3Auth" > authentication_parameters: my_access_key,my_secret_key,FQDN > user_id_attribute: "owner | value.split('$') | value[0]" > project_id_attribute: "tenant" > resource_id_attribute: "id" > ''' > > I tried with "resource_id_attribute: "tenant" but it does not work better. > > Any idea ? Is there something wrong in the pollster definition ? > > Regards, > Jean-Francois From ralonsoh at redhat.com Fri Oct 28 10:04:17 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Fri, 28 Oct 2022 12:04:17 +0200 Subject: [neutron][release] Proposing to EOL Queens, Rocky and Stein (all Neutron related projects) Message-ID: Hello: In the last PTG, the Neutron team has decided [1] to move the stable branches Queens, Rocky and Stein to EOL (end-of-life) status. According to the steps to achieve this [2], we need first to announce it. That will affect all Neutron related projects. The patch to mark these branches as EOL will be pushed in one week. If you have any inconvenience, please let me know in this mail chain or in IRC (ralonsoh, #openstack-neutron channel). You can also contact any Neutron core reviewer in the IRC channel. Regards. [1]https://etherpad.opendev.org/p/neutron-antelope-ptg#L131 [2] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Fri Oct 28 10:26:31 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Fri, 28 Oct 2022 07:26:31 -0300 Subject: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size metric In-Reply-To: References: <3bb519ce13094cec91c8409fe165f7b1@elca.ch> Message-ID: Can you show us the json you are trying to process with Ceilometer? Then,we can move on from there. You can post here a minimalistic version of the json output. On Fri, Oct 28, 2022 at 4:32 AM Taltavull Jean-Fran?ois < jean-francois.taltavull at elca.ch> wrote: > Hello, > > I can ask the question another way: what's the difference between > 'radosgw.containers.objects.size' and 'radosgw.objects.size' metrics ? > > Thanks, > > JF > > > -----Original Message----- > > From: Taltavull Jean-Fran?ois > > Sent: lundi, 24 octobre 2022 16:26 > > To: openstack-discuss > > Subject: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size > > metric > > > > Hello, > > > > I'm trying to get the 'radosgw.objects.size' metric, that is the total > bucket > > objects size per tenant. I expected to get one sample per tenant but I > get one > > sample per bucket instead, as with the 'rados.containers.objects.size' > metric. > > > > Here is my pollster definition: > > ''' > > - name: "radosgw.objects.size" > > sample_type: "gauge" > > unit: "B" > > value_attribute: ". | value['usage'] | > value.get('rgw.main',{'size':0}) | > > value['size']" > > url_path: "FQDN/admin/bucket?stats=True" > > module: "awsauth" > > authentication_object: "S3Auth" > > authentication_parameters: my_access_key,my_secret_key,FQDN > > user_id_attribute: "owner | value.split('$') | value[0]" > > project_id_attribute: "tenant" > > resource_id_attribute: "id" > > ''' > > > > I tried with "resource_id_attribute: "tenant" but it does not work > better. > > > > Any idea ? Is there something wrong in the pollster definition ? > > > > Regards, > > Jean-Francois > > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.ekwueme at gmail.com Fri Oct 28 11:08:29 2022 From: victor.ekwueme at gmail.com (Victor Ekwueme) Date: Fri, 28 Oct 2022 12:08:29 +0100 Subject: Outreachy Mentorship Programme Message-ID: Hello, I am a programmer who is visually impaired. I set up devstack on a Digital Ocean droplet on Ubuntu 20.04 Server. It installed successfully. I am trying to make a contribution on 1599140. So my questions are: 1. How do I set up a development environment for this or any other contribution? 2. Do I work directly on the droplet and push from to the repo from there? Any assistance will be of tremendous help. Regards, Victor O. Ekwueme -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Oct 28 12:45:47 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 28 Oct 2022 12:45:47 +0000 Subject: [cinder][first-contact-sig] Outreachy Mentorship Programme In-Reply-To: References: Message-ID: <20221028124546.3ltnbxplze5zubpu@yuggoth.org> [Keeping you in Cc since it seems like you're not subscribed to the mailing list, but please still reply to the list.] On 2022-10-28 12:08:29 +0100 (+0100), Victor Ekwueme wrote: [...] > I set up devstack on a Digital Ocean droplet on Ubuntu 20.04 > Server. It installed successfully. I am trying to make a > contribution on 1599140. [...] That bug report is about improving Cinder's unit testing coverage, so DevStack isn't really needed or involved at all, though it can still be helpful as an example deployment of Cinder along with other OpenStack services with which it interacts. For this particular task though, you just need to be able to locally run unit tests (preferably with the `tox` utility), and then create more of them. You can do that pretty much anywhere you have a POSIX (Linux/Unix) shell account and the ability to install some packages Cinder or its tests depend on. You've probably already seen the overall OpenStack Contributor Guide, but you may have missed that the Cinder team maintains guidance more specific to their subproject here which covers things like unit tests in greater detail: https://docs.openstack.org/cinder/latest/contributor/ You can also seek assistance from other Cinder contributors in the #openstack-cinder channel on the OFTC IRC network, which may be a quicker way to get answers to some of your questions. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jean-francois.taltavull at elca.ch Fri Oct 28 12:46:44 2022 From: jean-francois.taltavull at elca.ch (=?utf-8?B?VGFsdGF2dWxsIEplYW4tRnJhbsOnb2lz?=) Date: Fri, 28 Oct 2022 12:46:44 +0000 Subject: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size metric In-Reply-To: References: <3bb519ce13094cec91c8409fe165f7b1@elca.ch> Message-ID: <62156413e8ac4b259a140048b1a340e3@elca.ch> See below. Hope this will help ! [[{'bucket': 'huge', 'num_shards': 11, 'tenant': '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': 'default-placement', 'explicit_placement': {'data_pool': '', 'data_extra_pool': '', 'index_pool': ''}, 'id': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.3', 'marker': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.3', 'index_type': 'Normal', 'owner': '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': '0#1,1#1,2#1,3#1,4#1,5#1,6#1,7#2,8#1,9#1,10#1', 'master_ver': '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': '2022-10-26T11:32:05.185527Z', 'creation_time': '2022-10-26T11:32:05.181022Z', 'max_marker': '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {'rgw.main': {'size': 8461984, 'size_actual': 8462336, 'size_utilized': 8461984, 'size_kb': 8264, 'size_kb_actual': 8264, 'size_kb_utilized': 8264, 'num_objects': 1}}, 'bucket_quota': {'enabled': False, 'check_on_raw': True, 'max_size': -1, 'max_size_kb': 0, 'max_objects': -1}}, {'bucket': 'empty', 'num_shards': 11, 'tenant': '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': 'default-placement', 'explicit_placement': {'data_pool': '', 'data_extra_pool': '', 'index_pool': ''}, 'id': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27142035.4', 'marker': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27142035.4', 'index_type': 'Normal', 'owner': '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': '0#1,1#1,2#1,3#1,4#1,5#1,6#1,7#1,8#1,9#1,10#1', 'master_ver': '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': '2022-10-26T11:31:40.229337Z', 'creation_time': '2022-10-26T11:31:40.224401Z', 'max_marker': '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {}, 'bucket_quota': {'enabled': False, 'check_on_raw': True, 'max_size': -1, 'max_size_kb': 0, 'max_objects': -1}}, {'bucket': 'photos', 'num_shards': 11, 'tenant': '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': 'default-placement', 'explicit_placement': {'data_pool': '', 'data_extra_pool': '', 'index_pool': ''}, 'id': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.2', 'marker': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.2', 'index_type': 'Normal', 'owner': '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': '0#2,1#1,2#1,3#3,4#1,5#1,6#1,7#1,8#1,9#1,10#1', 'master_ver': '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': '2022-10-24T11:54:18.320141Z', 'creation_time': '2022-10-24T11:54:18.315194Z', 'max_marker': '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {'rgw.main': {'size': 14, 'size_actual': 4096, 'size_utilized': 14, 'size_kb': 1, 'size_kb_actual': 4, 'size_kb_utilized': 1, 'num_objects': 1}}, 'bucket_quota': {'enabled': False, 'check_on_raw': True, 'max_size': -1, 'max_size_kb': 0, 'max_objects': -1}}, {'bucket': 'big', 'num_shards': 11, 'tenant': '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': 'default-placement', 'explicit_placement': {'data_pool': '', 'data_extra_pool': '', 'index_pool': ''}, 'id': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27100595.1', 'marker': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27100595.1', 'index_type': 'Normal', 'owner': '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': '0#2,1#1,2#1,3#1,4#1,5#1,6#1,7#1,8#1,9#1,10#1', 'master_ver': '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': '2022-10-24T13:28:45.864925Z', 'creation_time': '2022-10-24T13:28:45.860346Z', 'max_marker': '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {'rgw.main': {'size': 249, 'size_actual': 4096, 'size_utilized': 249, 'size_kb': 1, 'size_kb_actual': 4, 'size_kb_utilized': 1, 'num_objects': 1}}, 'bucket_quota': {'enabled': False, 'check_on_raw': True, 'max_size': -1, 'max_size_kb': 0, 'max_objects': -1}}] From: Rafael Weing?rtner Sent: vendredi, 28 octobre 2022 12:27 To: Taltavull Jean-Fran?ois Cc: openstack-discuss Subject: Re: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size metric EXTERNAL MESSAGE - This email comes from outside ELCA companies. Can you show us the json you are trying to process with Ceilometer? Then,we can move on from there. You can post here a minimalistic version of the json output. On Fri, Oct 28, 2022 at 4:32 AM Taltavull Jean-Fran?ois > wrote: Hello, I can ask the question another way: what's the difference between 'radosgw.containers.objects.size' and 'radosgw.objects.size' metrics ? Thanks, JF > -----Original Message----- > From: Taltavull Jean-Fran?ois > Sent: lundi, 24 octobre 2022 16:26 > To: openstack-discuss > > Subject: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size > metric > > Hello, > > I'm trying to get the 'radosgw.objects.size' metric, that is the total bucket > objects size per tenant. I expected to get one sample per tenant but I get one > sample per bucket instead, as with the 'rados.containers.objects.size' metric. > > Here is my pollster definition: > ''' > - name: "radosgw.objects.size" > sample_type: "gauge" > unit: "B" > value_attribute: ". | value['usage'] | value.get('rgw.main',{'size':0}) | > value['size']" > url_path: "FQDN/admin/bucket?stats=True" > module: "awsauth" > authentication_object: "S3Auth" > authentication_parameters: my_access_key,my_secret_key,FQDN > user_id_attribute: "owner | value.split('$') | value[0]" > project_id_attribute: "tenant" > resource_id_attribute: "id" > ''' > > I tried with "resource_id_attribute: "tenant" but it does not work better. > > Any idea ? Is there something wrong in the pollster definition ? > > Regards, > Jean-Francois -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Fri Oct 28 12:59:25 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Fri, 28 Oct 2022 09:59:25 -0300 Subject: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size metric In-Reply-To: <62156413e8ac4b259a140048b1a340e3@elca.ch> References: <3bb519ce13094cec91c8409fe165f7b1@elca.ch> <62156413e8ac4b259a140048b1a340e3@elca.ch> Message-ID: I am not understanding. Your expression to obtain the value is ". | value['usage'] | value.get('rgw.main',{'size':0}) | value['size']". That assumes a response with a "value" entry in the JSON; then, you get the 'rgw.main' attribute, and then, you get the size. So, you are working with samples that are similar to the following: ``` { "bucket":"huge", "num_shards":11, "tenant":"08bb8ee9c5bd41248025268ee1aea481", "zonegroup":"d28c435f-57a5-49ca-91e8-481a2ced1f18", "placement_rule":"default-placement", "explicit_placement":{ "data_pool":"", "data_extra_pool":"", "index_pool":"" }, "id":"ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.3", "marker":"ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.3", "index_type":"Normal", "owner":"08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481", "ver":"0#1,1#1,2#1,3#1,4#1,5#1,6#1,7#2,8#1,9#1,10#1", "master_ver":"0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0", "mtime":"2022-10-26T11:32:05.185527Z", "creation_time":"2022-10-26T11:32:05.181022Z", "max_marker":"0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#", "usage":{ "rgw.main":{ "size":8461984, "size_actual":8462336, "size_utilized":8461984, "size_kb":8264, "size_kb_actual":8264, "size_kb_utilized":8264, "num_objects":1 } }, "bucket_quota":{ "enabled":false, "check_on_raw":true, "max_size":-1, "max_size_kb":0, "max_objects":-1 } } ``` These samples are probably coming from an API with "usage" JSON attribute, and that is why it works with your expression. To answer your initial question, Ceilometer dynamic pollster will work with whatever you have in the response. If data comes in a bucket fashion, each sample is going to represent a bucket. If you want to group/aggregate that data in a project/tenant fashion you might need to do some working. Either using a different API, or doing some groupby in Gnocchi with the aggregates API. Furthermore, what about the "admin/usage" instead of the "admin/bucket?stats=True" . The admin API will bring data grouped in a user fashion. On Fri, Oct 28, 2022 at 9:46 AM Taltavull Jean-Fran?ois < jean-francois.taltavull at elca.ch> wrote: > See below. Hope this will help ! > > > > [[{'bucket': 'huge', 'num_shards': 11, 'tenant': > '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': > 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': > 'default-placement', 'explicit_placement': {'data_pool': '', > 'data_extra_pool': '', 'index_pool': ''}, 'id': > 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.3', 'marker': > 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.3', 'index_type': 'Normal', > 'owner': > '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': > '0#1,1#1,2#1,3#1,4#1,5#1,6#1,7#2,8#1,9#1,10#1', 'master_ver': > '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': > '2022-10-26T11:32:05.185527Z', 'creation_time': > '2022-10-26T11:32:05.181022Z', 'max_marker': > '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {'rgw.main': {'size': > 8461984, 'size_actual': 8462336, 'size_utilized': 8461984, 'size_kb': 8264, > 'size_kb_actual': 8264, 'size_kb_utilized': 8264, 'num_objects': 1}}, > 'bucket_quota': {'enabled': False, 'check_on_raw': True, 'max_size': -1, > 'max_size_kb': 0, 'max_objects': -1}}, {'bucket': 'empty', 'num_shards': > 11, 'tenant': '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': > 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': > 'default-placement', 'explicit_placement': {'data_pool': '', > 'data_extra_pool': '', 'index_pool': ''}, 'id': > 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27142035.4', 'marker': > 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27142035.4', 'index_type': 'Normal', > 'owner': > '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': > '0#1,1#1,2#1,3#1,4#1,5#1,6#1,7#1,8#1,9#1,10#1', 'master_ver': > '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': > '2022-10-26T11:31:40.229337Z', 'creation_time': > '2022-10-26T11:31:40.224401Z', 'max_marker': > '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {}, 'bucket_quota': > {'enabled': False, 'check_on_raw': True, 'max_size': -1, 'max_size_kb': 0, > 'max_objects': -1}}, {'bucket': 'photos', 'num_shards': 11, 'tenant': > '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': > 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': > 'default-placement', 'explicit_placement': {'data_pool': '', > 'data_extra_pool': '', 'index_pool': ''}, 'id': > 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.2', 'marker': > 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.2', 'index_type': 'Normal', > 'owner': > '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': > '0#2,1#1,2#1,3#3,4#1,5#1,6#1,7#1,8#1,9#1,10#1', 'master_ver': > '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': > '2022-10-24T11:54:18.320141Z', 'creation_time': > '2022-10-24T11:54:18.315194Z', 'max_marker': > '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {'rgw.main': {'size': 14, > 'size_actual': 4096, 'size_utilized': 14, 'size_kb': 1, 'size_kb_actual': > 4, 'size_kb_utilized': 1, 'num_objects': 1}}, 'bucket_quota': {'enabled': > False, 'check_on_raw': True, 'max_size': -1, 'max_size_kb': 0, > 'max_objects': -1}}, {'bucket': 'big', 'num_shards': 11, 'tenant': > '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': > 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': > 'default-placement', 'explicit_placement': {'data_pool': '', > 'data_extra_pool': '', 'index_pool': ''}, 'id': > 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27100595.1', 'marker': > 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27100595.1', 'index_type': 'Normal', > 'owner': > '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': > '0#2,1#1,2#1,3#1,4#1,5#1,6#1,7#1,8#1,9#1,10#1', 'master_ver': > '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': > '2022-10-24T13:28:45.864925Z', 'creation_time': > '2022-10-24T13:28:45.860346Z', 'max_marker': > '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {'rgw.main': {'size': 249, > 'size_actual': 4096, 'size_utilized': 249, 'size_kb': 1, 'size_kb_actual': > 4, 'size_kb_utilized': 1, 'num_objects': 1}}, 'bucket_quota': {'enabled': > False, 'check_on_raw': True, 'max_size': -1, 'max_size_kb': 0, > 'max_objects': -1}}] > > > > > > *From:* Rafael Weing?rtner > *Sent:* vendredi, 28 octobre 2022 12:27 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] RADOS GW metrics : cannot get > radosgw.objects.size metric > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > Can you show us the json you are trying to process with Ceilometer? > Then,we can move on from there. You can post here a minimalistic version of > the json output. > > > > On Fri, Oct 28, 2022 at 4:32 AM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > Hello, > > I can ask the question another way: what's the difference between > 'radosgw.containers.objects.size' and 'radosgw.objects.size' metrics ? > > Thanks, > > JF > > > -----Original Message----- > > From: Taltavull Jean-Fran?ois > > Sent: lundi, 24 octobre 2022 16:26 > > To: openstack-discuss > > Subject: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size > > metric > > > > Hello, > > > > I'm trying to get the 'radosgw.objects.size' metric, that is the total > bucket > > objects size per tenant. I expected to get one sample per tenant but I > get one > > sample per bucket instead, as with the 'rados.containers.objects.size' > metric. > > > > Here is my pollster definition: > > ''' > > - name: "radosgw.objects.size" > > sample_type: "gauge" > > unit: "B" > > value_attribute: ". | value['usage'] | > value.get('rgw.main',{'size':0}) | > > value['size']" > > url_path: "FQDN/admin/bucket?stats=True" > > module: "awsauth" > > authentication_object: "S3Auth" > > authentication_parameters: my_access_key,my_secret_key,FQDN > > user_id_attribute: "owner | value.split('$') | value[0]" > > project_id_attribute: "tenant" > > resource_id_attribute: "id" > > ''' > > > > I tried with "resource_id_attribute: "tenant" but it does not work > better. > > > > Any idea ? Is there something wrong in the pollster definition ? > > > > Regards, > > Jean-Francois > > > > -- > > Rafael Weing?rtner > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-francois.taltavull at elca.ch Fri Oct 28 13:23:11 2022 From: jean-francois.taltavull at elca.ch (=?utf-8?B?VGFsdGF2dWxsIEplYW4tRnJhbsOnb2lz?=) Date: Fri, 28 Oct 2022 13:23:11 +0000 Subject: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size metric In-Reply-To: References: <3bb519ce13094cec91c8409fe165f7b1@elca.ch> <62156413e8ac4b259a140048b1a340e3@elca.ch> Message-ID: <87110d7e4e924a2aba534b94d047be99@elca.ch> I tried "admin/usage" API instead of the "admin/bucket?stats=True" but the returned JSON does not contain information about bucket objects size. So, I will keep on with ?admin/bucket? and try to do the aggregations I need at gnocchi level. Thanks again for your help ! JF From: Rafael Weing?rtner Sent: vendredi, 28 octobre 2022 14:59 To: Taltavull Jean-Fran?ois Cc: openstack-discuss Subject: Re: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size metric EXTERNAL MESSAGE - This email comes from outside ELCA companies. I am not understanding. Your expression to obtain the value is ". | value['usage'] | value.get('rgw.main',{'size':0}) | value['size']". That assumes a response with a "value" entry in the JSON; then, you get the 'rgw.main' attribute, and then, you get the size. So, you are working with samples that are similar to the following: ``` { "bucket":"huge", "num_shards":11, "tenant":"08bb8ee9c5bd41248025268ee1aea481", "zonegroup":"d28c435f-57a5-49ca-91e8-481a2ced1f18", "placement_rule":"default-placement", "explicit_placement":{ "data_pool":"", "data_extra_pool":"", "index_pool":"" }, "id":"ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.3", "marker":"ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.3", "index_type":"Normal", "owner":"08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481", "ver":"0#1,1#1,2#1,3#1,4#1,5#1,6#1,7#2,8#1,9#1,10#1", "master_ver":"0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0", "mtime":"2022-10-26T11:32:05.185527Z", "creation_time":"2022-10-26T11:32:05.181022Z", "max_marker":"0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#", "usage":{ "rgw.main":{ "size":8461984, "size_actual":8462336, "size_utilized":8461984, "size_kb":8264, "size_kb_actual":8264, "size_kb_utilized":8264, "num_objects":1 } }, "bucket_quota":{ "enabled":false, "check_on_raw":true, "max_size":-1, "max_size_kb":0, "max_objects":-1 } } ``` These samples are probably coming from an API with "usage" JSON attribute, and that is why it works with your expression. To answer your initial question, Ceilometer dynamic pollster will work with whatever you have in the response. If data comes in a bucket fashion, each sample is going to represent a bucket. If you want to group/aggregate that data in a project/tenant fashion you might need to do some working. Either using a different API, or doing some groupby in Gnocchi with the aggregates API. Furthermore, what about the "admin/usage" instead of the "admin/bucket?stats=True" . The admin API will bring data grouped in a user fashion. On Fri, Oct 28, 2022 at 9:46 AM Taltavull Jean-Fran?ois > wrote: See below. Hope this will help ! [[{'bucket': 'huge', 'num_shards': 11, 'tenant': '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': 'default-placement', 'explicit_placement': {'data_pool': '', 'data_extra_pool': '', 'index_pool': ''}, 'id': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.3', 'marker': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.3', 'index_type': 'Normal', 'owner': '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': '0#1,1#1,2#1,3#1,4#1,5#1,6#1,7#2,8#1,9#1,10#1', 'master_ver': '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': '2022-10-26T11:32:05.185527Z', 'creation_time': '2022-10-26T11:32:05.181022Z', 'max_marker': '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {'rgw.main': {'size': 8461984, 'size_actual': 8462336, 'size_utilized': 8461984, 'size_kb': 8264, 'size_kb_actual': 8264, 'size_kb_utilized': 8264, 'num_objects': 1}}, 'bucket_quota': {'enabled': False, 'check_on_raw': True, 'max_size': -1, 'max_size_kb': 0, 'max_objects': -1}}, {'bucket': 'empty', 'num_shards': 11, 'tenant': '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': 'default-placement', 'explicit_placement': {'data_pool': '', 'data_extra_pool': '', 'index_pool': ''}, 'id': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27142035.4', 'marker': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27142035.4', 'index_type': 'Normal', 'owner': '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': '0#1,1#1,2#1,3#1,4#1,5#1,6#1,7#1,8#1,9#1,10#1', 'master_ver': '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': '2022-10-26T11:31:40.229337Z', 'creation_time': '2022-10-26T11:31:40.224401Z', 'max_marker': '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {}, 'bucket_quota': {'enabled': False, 'check_on_raw': True, 'max_size': -1, 'max_size_kb': 0, 'max_objects': -1}}, {'bucket': 'photos', 'num_shards': 11, 'tenant': '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': 'default-placement', 'explicit_placement': {'data_pool': '', 'data_extra_pool': '', 'index_pool': ''}, 'id': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.2', 'marker': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27108481.2', 'index_type': 'Normal', 'owner': '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': '0#2,1#1,2#1,3#3,4#1,5#1,6#1,7#1,8#1,9#1,10#1', 'master_ver': '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': '2022-10-24T11:54:18.320141Z', 'creation_time': '2022-10-24T11:54:18.315194Z', 'max_marker': '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {'rgw.main': {'size': 14, 'size_actual': 4096, 'size_utilized': 14, 'size_kb': 1, 'size_kb_actual': 4, 'size_kb_utilized': 1, 'num_objects': 1}}, 'bucket_quota': {'enabled': False, 'check_on_raw': True, 'max_size': -1, 'max_size_kb': 0, 'max_objects': -1}}, {'bucket': 'big', 'num_shards': 11, 'tenant': '08bb8ee9c5bd41248025268ee1aea481', 'zonegroup': 'd28c435f-57a5-49ca-91e8-481a2ced1f18', 'placement_rule': 'default-placement', 'explicit_placement': {'data_pool': '', 'data_extra_pool': '', 'index_pool': ''}, 'id': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27100595.1', 'marker': 'ba604862-46ad-4cf1-a554-7da4e7168ac3.27100595.1', 'index_type': 'Normal', 'owner': '08bb8ee9c5bd41248025268ee1aea481$08bb8ee9c5bd41248025268ee1aea481', 'ver': '0#2,1#1,2#1,3#1,4#1,5#1,6#1,7#1,8#1,9#1,10#1', 'master_ver': '0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0', 'mtime': '2022-10-24T13:28:45.864925Z', 'creation_time': '2022-10-24T13:28:45.860346Z', 'max_marker': '0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#', 'usage': {'rgw.main': {'size': 249, 'size_actual': 4096, 'size_utilized': 249, 'size_kb': 1, 'size_kb_actual': 4, 'size_kb_utilized': 1, 'num_objects': 1}}, 'bucket_quota': {'enabled': False, 'check_on_raw': True, 'max_size': -1, 'max_size_kb': 0, 'max_objects': -1}}] From: Rafael Weing?rtner > Sent: vendredi, 28 octobre 2022 12:27 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size metric EXTERNAL MESSAGE - This email comes from outside ELCA companies. Can you show us the json you are trying to process with Ceilometer? Then,we can move on from there. You can post here a minimalistic version of the json output. On Fri, Oct 28, 2022 at 4:32 AM Taltavull Jean-Fran?ois > wrote: Hello, I can ask the question another way: what's the difference between 'radosgw.containers.objects.size' and 'radosgw.objects.size' metrics ? Thanks, JF > -----Original Message----- > From: Taltavull Jean-Fran?ois > Sent: lundi, 24 octobre 2022 16:26 > To: openstack-discuss > > Subject: [Ceilometer] RADOS GW metrics : cannot get radosgw.objects.size > metric > > Hello, > > I'm trying to get the 'radosgw.objects.size' metric, that is the total bucket > objects size per tenant. I expected to get one sample per tenant but I get one > sample per bucket instead, as with the 'rados.containers.objects.size' metric. > > Here is my pollster definition: > ''' > - name: "radosgw.objects.size" > sample_type: "gauge" > unit: "B" > value_attribute: ". | value['usage'] | value.get('rgw.main',{'size':0}) | > value['size']" > url_path: "FQDN/admin/bucket?stats=True" > module: "awsauth" > authentication_object: "S3Auth" > authentication_parameters: my_access_key,my_secret_key,FQDN > user_id_attribute: "owner | value.split('$') | value[0]" > project_id_attribute: "tenant" > resource_id_attribute: "id" > ''' > > I tried with "resource_id_attribute: "tenant" but it does not work better. > > Any idea ? Is there something wrong in the pollster definition ? > > Regards, > Jean-Francois -- Rafael Weing?rtner -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Fri Oct 28 13:51:55 2022 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 28 Oct 2022 09:51:55 -0400 Subject: [horizon] missing reviews for cinder backup filtering Message-ID: Hi folks, We're nearing a very long time that this patch has been open, it's undergone several revisions from our team and sitting idle almost 2 months. https://review.opendev.org/c/openstack/horizon/+/791532/ Can we please help get this landed, we're trying our best to stick to our upstream only :) Thanks Mohammed -- Mohammed Naser VEXXHOST, Inc. From elod.illes at est.tech Fri Oct 28 18:20:12 2022 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Fri, 28 Oct 2022 18:20:12 +0000 Subject: [all][stable][ptl] Propose to EOL Queens series Message-ID: Hi, As more and more teams decide about moving their Queens branches to End of Life, it looks like the time has come to transition the complete Queens stable release for every project. The reasons behind this are the following things: - gates are mostly broken - minimal number of bugfix backports are pushed to these branches - gate job definitions are still using the old, legacy zuul syntax - gate jobs are based on Ubuntu Xenial, which is also beyond its public maintenance window date and hard to maintain - lack of reviews / reviewers on this branch Based on the above, if no objection comes from teams, then I'll start the process of EOL'ing Queens stable series. Please let the community know what you think, or indicate if any of the projects' stable/queens branch should be kept open in Extended Maintenance. Thanks, El?d Ill?s irc: elodilles @ #openstack-stable / #openstack-release -------------- next part -------------- An HTML attachment was scrubbed... URL: From yasufum.o at gmail.com Fri Oct 28 18:38:51 2022 From: yasufum.o at gmail.com (Yasufumi Ogawa) Date: Sat, 29 Oct 2022 03:38:51 +0900 Subject: [tacker][ptg] Antelope PTG Summary Message-ID: <238e6d10-610d-7e6b-964f-47288ea03681@gmail.com> Hi all, We had great PTG sessions in Tacker team, and hopefully all. This is a summary of what we discussed during our sessions in three days. All of items and the results of discussion is here [1]. Day 1 (18 Oct 2022) 1. Reducing resources for FTs - We shared suggestions for fixing critical issues in functional tests in which we've had some unreasonable failures, especially at the end of releases. - Removing/deprecating unecessary legacy features and its tests. - Quick analysis for the failures [1]. - Wishlist for the item [2]. - Agree to focus on jobs has the large num of failures to revise tests. 2. Support tacker-db-manage for Multi DB backend - Support for PostgreSQL doesn't work, for insatnce, some issues in tacker-db-manage. - We agreed to not only fix the issues, but also take care other backends. - Revise tacker-db-manage for supporting multi DB backend. Start to support Postgres in antelope first, and other ones in the next releases then. - Proposed design is not so mature for now, so continue to discuss to revise. 3. Migrate OpenStack testing from Ubuntu Focal (20.04) to Ubuntu Jammy (22.04) - We agreed to migrate Focal to Jammy Ubuntu release as one of the community goal in this release [4]. - Candidate patches for the update have already proposed by manpreetk not only for tackder but also other releated projs. - https://review.opendev.org/c/openstack/tacker/+/861137 - https://review.opendev.org/c/openstack/tacker-horizon/+/861571 - https://review.opendev.org/c/openstack/python-tackerclient/+/861572 - https://review.opendev.org/c/openstack/tosca-parser/+/861136 - https://review.opendev.org/c/openstack/heat-translator/+/861158 https://etherpad.opendev.org/p/migrate-to-jammy 4. Bug Triage in Tacker - Add item of bug triage in IRC etherpad to discuss about bug triage. Day 2 (19 Oct 2022) 1. Operator Hour - Guys from KDDI and NTT Docomo joined and proposed their proposals as telcom operators [5]. 1.1. Support of redanduncy - For availability, essential to make Tacker redundant for commercial use, but less than we expected for the current impl. - What is the tacker specific consideration point is store VNF packages. 1.2. Error handling and recovering operation - More tests for changing error status, disaster recovery are required. 1.3. Detailed logs - Not enough log messages especially for management drivers. - Error message in tacker-server.log and tacker-conductor.log is usefull and enough for vnfm-admin, but like openstack vnflcm op show comannd and api's information is not enough for vnfm-user. 1.4. Roadmap - Having explicit roadmap is helpful for oprators to be interested. - Comply with ETSI NFV Rel4 is one of the good example for the direction. 2. Add sample coordinate VNF script using coordination API - We agreed to implement interfaces ETSI NFV SOL002 v3.5.1 defines the Coordination VNF interface for coordination with external management components. 3. Revise automatic operation state transition for N-Act configuration - v2API has a function to automatically transition opocc in an intermediate state to the final state when the tacker-conductor starts up, to recover inoperable opocc in the PROCESSING state by conductor down. - It causes a problem in N-Act setting. The opocc which are being handled by other working conductors are also automatically transitioned to the final state. We propose solutions for limiting opocc to be transitioned to those handled by the downed conductor. 4. Set different number of instances for same delta and aspectId - Fix overwriting the value for same aspect and delta id. - Multi VDUs Scale of V1 API isn?t supported by the issue. 5. Continue to update the remaining patch from Zed version - Move forward reviews on gerrit. Day 3 (20 Oct 2022) 1. Enhancement of Tacker API Resource Access Policy - Current Tacker policy control access to API resources by default role such as admin or any only. - We propose Fine-grained API resource access management based on user and VNF information according to operator usecases. 2. Secure RBAC: Implement support for project-reader persona in Tacker - As per the TC and community wide goal, next cycle 2023.1 is the must for all projects to implement the phase-1 (project personas). Plan to do it in Tacker in 2023.1 cycle will be helpful in community wide goal [6][7]. - Conclusions: Clarify conflicts can be caused while introducing new roles of S-RBAC to implement the fine grained APIs 3. Support Tacker auto-scale and auto-heal without NFVO (k_fukaya) - Zed release supported, Fault Management/ Performance Management(FM/PM) interfaces, and AutoHeal and AutoScale with external monitoring. However, Heal or Scale execution must be triggered by NFVO. This feature proposes implementing support receiving alerts from external monitoring tools, which can be VNFM driven AutoHeal and AutoScale without NFVO. 4. Discuss Tacker support versions and updates for K8s/Helm/Prometheus - The support versions in Antelope should be determined, as some versions may be inconsistent and support may expire before release. Considering the support period and development risk, it seems good to decide on the following. - k8s : 1.25 (current latest version) - helm : 3.10 (current latest version) - prometheus : 2.42 (next LTS version) - Update test patch for k8s 1.25.2 and helm 3.10.1 - https://review.opendev.org/c/openstack/tacker/+/860633 (Zuul +1) - Add version information under user guide. 5. AWS vim support - Add support for EC2 as VIM for the first step for AWS. - Whole discussion is here [8]. 6. Marking Deprecated and obsoleting of Legacy API - Should start to discussion about deprecation and obsoleting Legacy Tacker API (excluding VIM feature). - Deprecation process should follow as "Deprecation Guidelines" [9]. - The key point is - APIs should be marked deprecated before obsoleting and should be marked for at least 12 months. - An email thread will be started on openstack-discuss to determine how many people are using the deprecated API. - We will continue our discussion and move towards removing the Legacy API (excluding VIM feature). [1] https://etherpad.opendev.org/p/tacker-antelope-ptg [2] https://etherpad.opendev.org/p/tacker-antelope-failures-analysis [3] https://bugs.launchpad.net/tacker/+bug/1993187 [4] https://governance.openstack.org/tc/goals/selected/migrate-ci-jobs-to-ubuntu-jammy.html [5] https://etherpad.opendev.org/p/oct2022-ptg-operator-hour-tacker [6] https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rbac.html#release-timeline [7] https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030863.html [8] https://etherpad.opendev.org/p/tacker-antelope-aws-vim-support [9] https://docs.openstack.org/project-team-guide/deprecation.html#guidelines From gmann at ghanshyammann.com Fri Oct 28 21:30:48 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 28 Oct 2022 14:30:48 -0700 Subject: [all][tc] What's happening in Technical Committee: summary 2022 Oct 28: Reading: 5 min Message-ID: <18420812ffa.1232c7569177334.1475728295646581220@ghanshyammann.com> Hello Everyone, Here is this week's summary of the Technical Committee activities. 1. TC Meetings: ============ * We had this week's meeting on Oct 27. Most of the meeting discussions are summarized in this email. Meeting logs are available @ https://meetings.opendev.org/meetings/tc/2022/tc.2022-10-27-15.00.log.html * Next TC weekly meeting will be on Nov 3 Thursday at 15:00 UTC, feel free to add the topic to the agenda[1] by Nov 2. 2. What we completed this week: ========================= * 2021 User Survey TC Question Analysis[2] * Appointed hongbin as Zun PTL[3] 3. Activities In progress: ================== TC Tracker for 2023.1 cycle --------------------------------- * I have prepared the 2023.1 tracker etherpad which includes the TC working items for the 2023.1 cycle[4]. Open Reviews ----------------- * Four open reviews for ongoing activities[5]. 2023.1 cycle TC PTG Summary ------------------------------------- I sent the TC sessions and TC+Leader discussion summary on ML[7][8]. User Survey: --------------- TC worked on the modification of TC questions for the 2023 survey[9]. TC chair nomination & election process ----------------------------------------------- We are formalizing the process of TC chair nomination process. Two options are up for the review[10][11]. Fixing Zuul config error ---------------------------- We request projects having zuul config error to fix them, Keep supported stable branches as a priority and Extended maintenance stable branch as low priority[12]. Project updates ------------------- * None. 4. How to contact the TC: ==================== If you would like to discuss or give feedback to TC, you can reach out to us in multiple ways: 1. Email: you can send the email with tag [tc] on openstack-discuss ML[13]. 2. Weekly meeting: The Technical Committee conduct a weekly meeting every Thursday 15:00 UTC [14] 3. Ping us using 'tc-members' nickname on #openstack-tc IRC channel. See you all next week in PTG! [1] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting [2] https://review.opendev.org/c/openstack/governance/+/836888 [3] https://review.opendev.org/c/openstack/governance/+/860759 [4] https://etherpad.opendev.org/p/tc-zed-tracker [5] https://review.opendev.org/q/projects:openstack/governance+status:open [7] https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030954.html [8] https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030953.html [9] https://etherpad.opendev.org/p/tc-2023-user-survey-questions [10] https://review.opendev.org/c/openstack/governance/+/862772 [11] https://review.opendev.org/c/openstack/governance/+/862774 [12] https://etherpad.opendev.org/p/zuul-config-error-openstack [13] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss [14] http://eavesdrop.openstack.org/#Technical_Committee_Meeting -gmann From christian.rohmann at inovex.de Mon Oct 31 11:29:11 2022 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Mon, 31 Oct 2022 12:29:11 +0100 Subject: [oslo][tooz][openstack-ansible] Discussion about coordination (tooz), too many backend options, their state and deployment implications Message-ID: <431ef18d-b52f-92d8-5543-dd63e10b012b@inovex.de> Hallo openstack-discuss, apologies for this being quite a long message - I tried my best to collect my thoughts on the matter. 1) The role of deployment tooling in fulfilling the requirement for a coordination backend I honestly write this, triggered by openstack-ansible plans to add coordination via the Zookeeper backend (https://lists.openstack.org/pipermail/openstack-discuss/2022-October/031013.html). On 27/10/2022 13:10, Dmitriy Rabotyagov wrote: > * Add Zookepeer cluster deployment as coordination service. > Coordination is required if you want to have active/active > cinder-volume setup and also actively used by other projects, like > Octavia or Designate. Zookeeper will be deployed in a separate set of > containers for LXC path First of all I believe it's essential for any OpenStack deployment tooling to handle the deployment of a coordination backend as many OS projects just rely in their design and code to have it in place. But I am convinced though there too many options, that some stronger guidance should be given to people designing and then deploying OS for their platform. This guidance certainly can be in the form of a comparison table - but when it comes to using deployment tooling like openstack-ansible, the provided "default" component or options for something might just be worth more than written text explaining all of the possible approaches. This hold especially true to me as you can get quite far with no coordination configured which then results in frustration and invalid bugs being raised. And it's not just openstack-ansible thinking about coordination deployment / configurations. To just point to a few: ?* Kolla-Ansible: https://lists.openstack.org/pipermail/openstack-discuss/2020-November/018838.html ?* Charms: https://bugs.launchpad.net/charm-designate/+bug/1759597 ?* Puppet: https://review.opendev.org/c/openstack/puppet-oslo/+/791628/ ?* ... 2)? Choosing the "right" backend driver I've recently been looking into the question what would be the "best" tooz driver to cover all coordination use cases the various OS projects require. Yes, the dependencies and use of coordination within the OS projects (cinder, designate, gnocchi, ...) are very different. I don't want to sound unfair, but most don't communicate which of the Tooz services they actually require. In any case, I might just like something to cover all possible requirements to "set it (up) and forget it, no matter what OS projects run on the platform. Apart from basic compatibility, there are qualities I would expect (in no particular order) from a coordination backend: ?* no "best-effort" coordination, but allowing for actual reliance on it (CP if talking CAP) ?* HA - this needs to be working just as reliably as my database as otherwise the cloud cannot function ?* efficient in getting the job done (e.g. support for events / watches to reduce latency) ?* lightweight (deployment), no additional components, readily packaged ?* very little maintenance operations ?* easy monitoring I started by reading into the tooz drivers (https://docs.openstack.org/tooz/latest/user/drivers.html), of which there are more than enough to require some research. Here are my rough thoughts: ? a) I ruled out the IPC, file or RDBMs (mysql, postgresql) backend options as they all have strong side-notes (issues when doing replication or no HA at all). Additionally they usually are not partition tolerant or support watches. ? b) Redis seems quite capable, but there are many side notes about HA and this also requires setting up and maintaining sentinel. ? c) Memcached supports all three services (locks, groups, leader-election) tooz provides and is usually already part of an OpenStack infrastructure. So looked appealing. But it's non-replicating architecture and lack of any strong consistency guarantees make it less of a good "standard". I was even wondering how tooz would try it's best to work with multiple memcached nodes (https://bugs.launchpad.net/python-tooz/+bug/1970659). ? d) Then there only is Zookeeper left, which also ticks all the (feature-)boxes (https://docs.openstack.org/tooz/latest/user/compatibility.html) and is quite a proven tool for coordination also outside of the OpenStack ecosystem. On the downside it's not really that well known and common (anymore) outside the "data processing" context (see https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy). Being a Java application it requires a JVM and its dependencies and is quite memory heavy to store just a few megabytes of config data. Looking at more and more people putting their OS control plane into something like Kubernetes it also seems even less suitable to be "moved around" a lot. Another issue might be the lack of a recent and non-EoL version packaged in Ubuntu - see https://bugs.launchpad.net/ubuntu/+source/zookeeper/+bug/1854331. Maybe (!) this could be an indication of how commonly it is used outside of e.g. Support from TLS was only added in 3.5.5 (https://zookeeper.apache.org/doc/r3.5.5/zookeeperAdmin.html#Quorum+TLS) ? e) Consul - While also well known and loved, it has, like Zookeeper, quite a big footprint and is way more than just a CP-focused database. It's more of an application with man use cases. ? f) A last "strong" candidate is etcd. It did not surprise me to see it on the list of possible drivers and certainly is a tool known to many from running e.g. Kubernetes. It's actually already part of openstack-ansible deployment code as a role (https://github.com/openstack/openstack-ansible/commit/2f240dd485b123763442aa94130c6ddd3109ce34) as it is required when using Calico as SDN. While etcd is also something one must know how to monitor and operate, I allow me to say it might just be more common to find this operational knowledge. Also etcd has a smaller footprint than Zookeeper and it beeing "just a Golang binary" comes with (no) less dependencies. But I noticed that it does not even support "grouping", according to the feature matrix. But apparently this is just a documentation delay, seehttps://bugs.launchpad.net/python-tooz/+bug/1995125. What's left to implement would be leader-election, but there seems to be no technical reason why this cannot be done. this by no means is a comparison with a clear winner. I just want to stress how confusing having lots of options with no real guidance are. The requirement to chose and deploy coordination might not be a focus when looking into designing an OS cloud. 3) Stronger guidance /? "common default", setup via OS deployment tooling and also used for DevStack and tested via CI To summarize, there are just too many options and implications in the compatibility list to quickly chose the "right" one for one's own deployment. While large-scale deployments might likely not mind for coordination to have a bigger footprint and requiring more attention in general. But for smaller and even mid-size deployments, it's just convenient to offload the configuration of coordination and the selection the backend driver to the deployment tooling. Making it way too easy for such installations to not use coordination and running into issues or every other installation using a different backend creates a very fragmented landscape. Add different operating system distributions and versions, different deployment tooling, different set and versions of OS projects used, there will be so many combinations. This will likely just cause OS projects to receive more and non-reproducible bugs. Also not having (a somewhat common) coordination (backend) used within CI and DevStack does not expose the relevant code paths to enough testing. I'd like to make the analogy to having "just" MySQL as the default database engine, while still allowing other engines to be used (https://governance.openstack.org/tc/resolutions/20170613-postgresql-status.html). Or labeling certain options as "experimental" as Neutron just did with "linuxbridge" (https://docs.openstack.org/neutron/latest//admin/config-experimental-framework.html) or cinder with naming drivers unsupported (https://docs.openstack.org/cinder/ussuri/drivers-all-about.html#unsupported-drivers). My point is that just having all those backends and no active guidance might make Tooz a very open and flexible component. I myself would wish for some less confusion around this topic and having a little less to think about this myself. Maybe the "selection" of Zookeeper by openstack-ansible is just that? I would love to hear your thoughts on coordination and why and how you ended up with using what. And certainly what your opinion on the matter of a stronger communicated "default" is. Thanks for your time and thoughts! Christian From nurmatov.mamatisa at huawei.com Mon Oct 31 12:49:59 2022 From: nurmatov.mamatisa at huawei.com (Nurmatov Mamatisa) Date: Mon, 31 Oct 2022 12:49:59 +0000 Subject: [neutron] Bug Deputy Report October 24-30 Message-ID: <5dc97f85a5ce4052a8e8df0ecfc2c5d1@huawei.com> Hi, Below is the week summary of bug deputy report for last week. One RFE was proposed and already discussed on drivers meeting, more information should be provided. Details: Critical -------- - https://bugs.launchpad.net/neutron/+bug/1995091 - [CI] "neutron-functional-with-oslo-master" failing with timeout - Confirmed - Unassigned - https://bugs.launchpad.net/neutron/+bug/1994491 - Functional tests job is failing on Ubuntu 22.04 - Incomplete - Assigned to Rodolfo Alonso Medium ------ - https://bugs.launchpad.net/neutron/+bug/1995031 - [CI][periodic] neutron-functional-with-uwsgi-fips job failing - Confirmed - Unassigned - https://bugs.launchpad.net/neutron/+bug/1994635 - [CI][tempest] Error in "test_multiple_create_with_reservation_return" - Confirmed - Assigned to Rodolfo Alonso Undecided --------- - https://bugs.launchpad.net/neutron/+bug/1995078 - OVN: HA chassis group priority is different than gateway chassis priority - New - Unassigned RFEs ---- - https://bugs.launchpad.net/neutron/+bug/1994137 - [RFE] Specify the precedence of port routes if multiple ports attached to a VM - Incomplete - Unassigned Best regards, Mamatisa Nurmatov Advanced Software Technology Lab / Cloud Technologies Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.urdin at binero.com Mon Oct 31 14:07:43 2022 From: tobias.urdin at binero.com (Tobias Urdin) Date: Mon, 31 Oct 2022 14:07:43 +0000 Subject: [oslo][tooz][openstack-ansible] Discussion about coordination (tooz), too many backend options, their state and deployment implications In-Reply-To: <431ef18d-b52f-92d8-5543-dd63e10b012b@inovex.de> References: <431ef18d-b52f-92d8-5543-dd63e10b012b@inovex.de> Message-ID: Hello, Interesting topic, we use Redis because frankly we see that as the most logical choice due to the complexity of others. You might have seen my thread about investigating replacing RabbitMQ with NATS; our plan is to then also investigate getting Tooz and oslo.cache using the Jetstream Key-Value feature. Best regards Tobias > On 31 Oct 2022, at 12:29, Christian Rohmann wrote: > > Hallo openstack-discuss, > > > apologies for this being quite a long message - I tried my best to collect my thoughts on the matter. > > > 1) The role of deployment tooling in fulfilling the requirement for a coordination backend > > I honestly write this, triggered by openstack-ansible plans to add coordination via the Zookeeper backend (https://lists.openstack.org/pipermail/openstack-discuss/2022-October/031013.html). > > On 27/10/2022 13:10, Dmitriy Rabotyagov wrote: > >> * Add Zookepeer cluster deployment as coordination service. >> Coordination is required if you want to have active/active >> cinder-volume setup and also actively used by other projects, like >> Octavia or Designate. Zookeeper will be deployed in a separate set of >> containers for LXC path > > First of all I believe it's essential for any OpenStack deployment tooling to handle the deployment of a coordination backend as many OS projects just rely in their design and code to have it in place. > But I am convinced though there too many options, that some stronger guidance should be given to people designing and then deploying OS for their platform. > This guidance certainly can be in the form of a comparison table - but when it comes to using deployment tooling like openstack-ansible, > the provided "default" component or options for something might just be worth more than written text explaining all of the possible approaches. > > This hold especially true to me as you can get quite far with no coordination configured which then results in frustration and invalid bugs being raised. > And it's not just openstack-ansible thinking about coordination deployment / configurations. To just point to a few: > > * Kolla-Ansible: https://lists.openstack.org/pipermail/openstack-discuss/2020-November/018838.html > * Charms: https://bugs.launchpad.net/charm-designate/+bug/1759597 > * Puppet: https://review.opendev.org/c/openstack/puppet-oslo/+/791628/ > * ... > > > > 2) Choosing the "right" backend driver > > I've recently been looking into the question what would be the "best" tooz driver to cover all coordination use cases > the various OS projects require. Yes, the dependencies and use of coordination within the OS projects (cinder, designate, gnocchi, ...) are very different. > I don't want to sound unfair, but most don't communicate which of the Tooz services they actually require. In any case, I might just like something to cover all possible > requirements to "set it (up) and forget it, no matter what OS projects run on the platform. > > Apart from basic compatibility, there are qualities I would expect (in no particular order) from a coordination backend: > > * no "best-effort" coordination, but allowing for actual reliance on it (CP if talking CAP) > * HA - this needs to be working just as reliably as my database as otherwise the cloud cannot function > * efficient in getting the job done (e.g. support for events / watches to reduce latency) > * lightweight (deployment), no additional components, readily packaged > * very little maintenance operations > * easy monitoring > > I started by reading into the tooz drivers (https://docs.openstack.org/tooz/latest/user/drivers.html), > of which there are more than enough to require some research. Here are my rough thoughts: > > a) I ruled out the IPC, file or RDBMs (mysql, postgresql) backend options as they all have strong side-notes (issues when doing replication or no HA at all). > Additionally they usually are not partition tolerant or support watches. > > b) Redis seems quite capable, but there are many side notes about HA and this also requires setting up and maintaining sentinel. > > c) Memcached supports all three services (locks, groups, leader-election) tooz provides and is usually already part of an OpenStack infrastructure. So looked appealing. > But it's non-replicating architecture and lack of any strong consistency guarantees make it less of a good "standard". I was even wondering how tooz would try it's best to work with multiple memcached nodes (https://bugs.launchpad.net/python-tooz/+bug/1970659). > > d) Then there only is Zookeeper left, which also ticks all the (feature-)boxes (https://docs.openstack.org/tooz/latest/user/compatibility.html) and is quite a proven tool for coordination also outside of the OpenStack ecosystem. > On the downside it's not really that well known and common (anymore) outside the "data processing" context (see https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy). > Being a Java application it requires a JVM and its dependencies and is quite memory heavy to store just a few megabytes of config data. Looking at more and more people putting their OS control plane into something like Kubernetes it also seems even less suitable to be "moved around" a lot. Another issue might be the lack of a recent and non-EoL version packaged in Ubuntu - see https://bugs.launchpad.net/ubuntu/+source/zookeeper/+bug/1854331. Maybe (!) this could be an indication of how commonly it is used outside of e.g. Support from TLS was only added in 3.5.5 (https://zookeeper.apache.org/doc/r3.5.5/zookeeperAdmin.html#Quorum+TLS) > > e) Consul - While also well known and loved, it has, like Zookeeper, quite a big footprint and is way more than just a CP-focused database. It's more of an application with man use cases. > > f) A last "strong" candidate is etcd. It did not surprise me to see it on the list of possible drivers and certainly is a tool known to many from running e.g. Kubernetes. It's actually already part of openstack-ansible deployment code as a role (https://github.com/openstack/openstack-ansible/commit/2f240dd485b123763442aa94130c6ddd3109ce34) as it is required when using Calico as SDN. While etcd is also something one must know how to monitor and operate, I allow me to say it might just be more common to find this operational knowledge. Also etcd has a smaller footprint than Zookeeper and it beeing "just a Golang binary" comes with (no) less dependencies. But I noticed that it does not even support "grouping", according to the feature matrix. But apparently this is just a documentation delay, seehttps://bugs.launchpad.net/python-tooz/+bug/1995125. What's left to implement would be leader-election, but there seems to be no technical reason why this cannot be done. > > > this by no means is a comparison with a clear winner. I just want to stress how confusing having lots of options with no > real guidance are. The requirement to chose and deploy coordination might not be a focus when looking into designing an OS cloud. > > > > 3) Stronger guidance / "common default", setup via OS deployment tooling and also used for DevStack and tested via CI > > To summarize, there are just too many options and implications in the compatibility list to quickly chose the "right" one for one's own deployment. > > While large-scale deployments might likely not mind for coordination to have a bigger footprint and requiring more attention in general. > But for smaller and even mid-size deployments, it's just convenient to offload the configuration of coordination and the selection the backend driver to the deployment tooling. > Making it way too easy for such installations to not use coordination and running into issues or every other installation using a different backend creates a very fragmented landscape. > Add different operating system distributions and versions, different deployment tooling, different set and versions of OS projects used, there will be so many combinations. > This will likely just cause OS projects to receive more and non-reproducible bugs. Also not having (a somewhat common) coordination (backend) used within CI and DevStack does not expose > the relevant code paths to enough testing. > > I'd like to make the analogy to having "just" MySQL as the default database engine, while still allowing other engines to be used (https://governance.openstack.org/tc/resolutions/20170613-postgresql-status.html). > Or labeling certain options as "experimental" as Neutron just did with "linuxbridge" (https://docs.openstack.org/neutron/latest//admin/config-experimental-framework.html) or cinder with naming drivers unsupported > (https://docs.openstack.org/cinder/ussuri/drivers-all-about.html#unsupported-drivers). > > My point is that just having all those backends and no active guidance might make Tooz a very open and flexible component. > I myself would wish for some less confusion around this topic and having a little less to think about this myself. > > Maybe the "selection" of Zookeeper by openstack-ansible is just that? > > > > I would love to hear your thoughts on coordination and why and how you ended up with using what. > And certainly what your opinion on the matter of a stronger communicated "default" is. > > > Thanks for your time and thoughts! > > Christian > From christian.rohmann at inovex.de Mon Oct 31 15:59:29 2022 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Mon, 31 Oct 2022 16:59:29 +0100 Subject: [oslo][tooz][openstack-ansible] Discussion about coordination (tooz), too many backend options, their state and deployment implications In-Reply-To: References: <431ef18d-b52f-92d8-5543-dd63e10b012b@inovex.de> Message-ID: <8e9e8005-6a13-fdfc-c021-a72342de15d3@inovex.de> On 31/10/2022 15:07, Tobias Urdin wrote: > Interesting topic, we use Redis because frankly we see that as the most logical choice > due to the complexity of others. Interesting now one's mileage varies :-) > You might have seen my thread about investigating replacing RabbitMQ with NATS; our plan is to > then also investigate getting Tooz and oslo.cache using the Jetstream Key-Value feature. That sounds really interesting, I shall follow that discussion then. If one tool, e.g. NATS in your case, could cover more than one communication use case, e.g. (async) messaging and distributed locking, this would reduce the number of different components required to assemble a cloud, thus reducing the complexity. Even if there was more than once instance of that software required. As I was also arguing that adding more and more implementations and "ways" to do things, does neither help the operators nor the developers. To me, software developers benefit from clear abstractions for such cross-cutting concerns as messaging or coordination. While e.g. tooz already aims to be such an abstraction, when deploying OpenStack or operating a cloud things can look so vastly different. No coordination at all, different drivers with different features and inherently different guarantees and behavior in case of problems. Discussing broadly, and then agreeing not only on a common library and its interface, but also on an implementation to me is not inflexible, but makes sense to keep the complexity manageable. It happened with MySQL/MariaDB as db engine and actually also with AMQP as messaging protocol (including it's paradigms). It's progress to simply revisit such decisions and conventions over time. Regards Christian From cboylan at sapwetik.org Mon Oct 31 15:59:26 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 31 Oct 2022 08:59:26 -0700 Subject: [oslo][tooz][openstack-ansible] Discussion about coordination (tooz), too many backend options, their state and deployment implications In-Reply-To: <431ef18d-b52f-92d8-5543-dd63e10b012b@inovex.de> References: <431ef18d-b52f-92d8-5543-dd63e10b012b@inovex.de> Message-ID: <9661409e-be25-4fcf-8f98-860079de1015@app.fastmail.com> On Mon, Oct 31, 2022, at 4:29 AM, Christian Rohmann wrote: > snip > ? d) Then there only is Zookeeper left, which also ticks all the > (feature-)boxes > (https://docs.openstack.org/tooz/latest/user/compatibility.html) and is > quite a proven tool for coordination also outside of the OpenStack > ecosystem. > On the downside it's not really that well known and common (anymore) > outside the "data processing" context (see > https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy). > Being a Java application it requires a JVM and its dependencies and is > quite memory heavy to store just a few megabytes of config data. Looking > at more and more people putting their OS control plane into something > like Kubernetes it also seems even less suitable to be "moved around" a > lot. Another issue might be the lack of a recent and non-EoL version > packaged in Ubuntu - see > https://bugs.launchpad.net/ubuntu/+source/zookeeper/+bug/1854331. Maybe > (!) this could be an indication of how commonly it is used outside of > e.g. Support from TLS was only added in 3.5.5 > (https://zookeeper.apache.org/doc/r3.5.5/zookeeperAdmin.html#Quorum+TLS) > Zuul relies on Zookeeper for its coordination and shared state (without tooz). This is nice because it means we can look at the OpenDev Zuul ZK cluster stats for more info. We currently run a three node cluster. Each node is a 4vcpu 4GB memory VM. The JVM itself seems to consume just under a gig of memory per node. Total system memory stats can be seen here [0]. According to `docker image list` the zookeeper container images we are running are 265MB large. If you scroll to the bottom of this grafana dashboard [1] you'll see operating stats for the cluster. All that to show that zookeeper isn't free, but it also isn't terribly expensive to run either. Particularly when it tends to fill an important role of preventing software from trampling over itself. As far as installing it goes, we've been happily using the official docker images [2]. They have worked well for us and have been kept up to date (including TLS support). If you don't want to use those images the tarballs upstream publishes [3] include init scripts that can be used to manage zookeeper as a proper service. You just download, verify, extract, and execute the script (assuming you have java installed) and the service runs. I'm not going to try and convince anyone that they should use Zookeeper or not. I just want to put concrete details on some of these concerns. [0] http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=70034&rra_id=all [1] https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1&from=now-7d&to=now [2] https://hub.docker.com/_/zookeeper [3] https://zookeeper.apache.org/releases.html From adivya1.singh at gmail.com Mon Oct 31 18:26:01 2022 From: adivya1.singh at gmail.com (Adivya Singh) Date: Mon, 31 Oct 2022 23:56:01 +0530 Subject: (openstack-ansible) Container installation in openstack Message-ID: Hi Team, Any input on this, to install container service in openstack using ansible. standard global parametre Regards Adivya Singh -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Mon Oct 31 18:36:40 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Mon, 31 Oct 2022 19:36:40 +0100 Subject: (openstack-ansible) Container installation in openstack In-Reply-To: References: Message-ID: Hi Adivya, Can you please elaborate more about what container service you are thinking about? Is it Magnum or Zun or your question is more about how to install all openstack services in containers? ??, 31 ???. 2022 ?. ? 19:34, Adivya Singh : > > Hi Team, > > Any input on this, to install container service in openstack using ansible. > > standard global parametre > > Regards > Adivya Singh From jay at gr-oss.io Mon Oct 31 21:04:34 2022 From: jay at gr-oss.io (Jay Faulkner) Date: Mon, 31 Oct 2022 14:04:34 -0700 Subject: [OSSN-0091] BMC emulators developed in OpenStack community do not preserve passwords on VMs Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 ## Summary ## When deploying VirtualBMC or Sushy-Tools in an unsupported, production-like configuration, it can remove secret data, including VNC passwords, from a libvirt domain permanently. Operators impacted by this vulnerability must reconfigure any secret data, including VNC passwords, for the libvirt domain. These virtual machine emulators are tools to help emulate a physical machine's Baseboard Management Controller (BMC) to aid in development and testing of software that would otherwise require physical machines to perform integration testing activities. They are not intended or supported for production or long-term use of any kind. ## Affected Services / Software ## * Sushy-Tools, <=0.21.0 * VirtualBMC, <=2.2.2 There is no impact to any OpenStack software or services intended for production use. ## Patches ## * VirtualBMC: https://review.opendev.org/c/openstack/virtualbmc/+/862620 * Sushy-Tools: https://review.opendev.org/c/openstack/sushy-tools/+/862625 ## Discussion ## To perform some advanced operations on Libvirt virtual machines, the underlying XML document describing the virtual machine's domain must be extracted, modified, and then updated. These specific actions are for aspects such as "setting a boot device" (VirtualBMC, Sushy-Tools), Setting a boot mode (Sushy-Tools), and setting a virtual media device (Sushy-Tools). This issue is triggered when a VM has any kind of "secure" information defined in the XML domain definition. If an operator deploys VirtualBMC or Sushy-Tools to manage one of these libvirt VMs, the first time any action is performed that requires rewriting of the XML domain definition, all secure information -- including a VNC console password, if set -- is lost and removed from the domain definition, leaving the libvirt VM's exposed to a malicious console user. ## Recommended Actions ## Operators who may have been impacted by this vulnerability should immediately remove use of VirtualBMC and/or Sushy-Tools from their production environment. Then, validate and if necessary, reconfigure passwords for VNC access or any other impacted secrets. ## Notes ## The OpenStack team will ensure documentation is updated to clearly state these software packages are intended for development/CI use only, and are not safe to run in production. ## Credits ## Julia Kreger from Red Hat ## References ## Author: Jay Faulkner, G-Research Open Source Software This OSSN: https://wiki.openstack.org/wiki/OSSN/OSSN-0090 Original Storyboard bug: https://storyboard.openstack.org/#!/story/2010382 Mailing List : [Security] tag on openstack-discuss at lists.openstack.org OpenStack Security Project : https://launchpad.net/~openstack-ossg CVE: CVE-2022-44020 -----BEGIN PGP SIGNATURE----- Version: FlowCrypt Email Encryption 8.3.8 Comment: Seamlessly send and receive encrypted email wsFzBAEBCgAGBQJjYDhhACEJEGt12Tm0JMbUFiEEvF1YmsGLSYuWqE+ta3XZ ObQkxtSCUw/9FEeakvlf06BWrk5Lc3TGwUKV0WiLaE4M0xjljkzg/3/580/E nhOTl/raPszlzgkGdrQTaH3Sj4AwUTdPHqqxjyK/Xb1DIm+CfS3bdbP0aLHG Y3Su4Z74unMaKbnbyDYhM1qMIzPyBruLpqiyYJGhSuzU/fu1O/LCWfSicvKK YDmAHJ9TjXuTMdWrLrkMknvJaLe0aJrNW5iqDnIh6YrUC2Pioi5h+OFKwDpn Ea+YnlAxKR7OQGRGcY3AwP1Jq87pdHZagcVThc/wnCATKT/FtaIogDkUnoMn qI+6MNjV3R4kyQCbyo35KeIDWm+541XsK0GoR5hcvR1AkwciSPBAkt3VHxpa p0g9hVcNTv+tWwN8LrdLRPMDuqKA51eNUvQCV8W+H42wS0uoaMPXglbZIuwv AmEoK8UC8Gii8cPoIkiZGSSOo4i+tlE/q+L/Mgs1opyt1Klcxs/Lm1PNylET XqLw70qKrfqWabZpKUxMS3F9JwyCkgnD5+t2x/qsqg5Hq+kUZqP8be3Oc7K7 He/gIneWDMpH1+J9Tm5ofyxtJCA+V96+cXoXYk8SncVf/O5djgd48UkQo1iJ NZlKxJsaKH5+JyPuXkR6hyqDrIkmbJRh4aU9nJBQyFho0fXuQVlC2iUOGa0F BgUFs5J6oQtglAAuyUoNuhuBJBwdW09NxQ4= =6yXE -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: 0x6B75D939B424C6D4.asc Type: application/pgp-keys Size: 3356 bytes Desc: not available URL: From jay at gr-oss.io Mon Oct 31 21:07:20 2022 From: jay at gr-oss.io (Jay Faulkner) Date: Mon, 31 Oct 2022 14:07:20 -0700 Subject: [OSSN-0091] BMC emulators developed in OpenStack community do not preserve passwords on VMs In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 > This OSSN: https://wiki.openstack.org/wiki/OSSN/OSSN-0090 The correct link to the OSSN is https://wiki.openstack.org/wiki/OSSN/OSSN-0091. My apologies for the error. - -- Jay Faulkner On 2022-10-31 at 21:04, jay at gr-oss.io wrote: > ## Summary ## > When deploying VirtualBMC or Sushy-Tools in an unsupported, production-like > configuration, it can remove secret data, including VNC passwords, from a > libvirt domain permanently. Operators impacted by this vulnerability must > reconfigure any secret data, including VNC passwords, for the libvirt > domain. > > These virtual machine emulators are tools to help emulate a physical > machine's Baseboard Management Controller (BMC) to aid in development and > testing of software that would otherwise require physical machines to > perform integration testing activities. They are not intended or supported > for production or long-term use of any kind. > > ## Affected Services / Software ## > * Sushy-Tools, <=0.21.0 > * VirtualBMC, <=2.2.2 > > There is no impact to any OpenStack software or services intended for > production use. > > ## Patches ## > * VirtualBMC: https://review.opendev.org/c/openstack/virtualbmc/+/862620 > * Sushy-Tools: https://review.opendev.org/c/openstack/sushy-tools/+/862625 > > ## Discussion ## > To perform some advanced operations on Libvirt virtual machines, the > underlying XML document describing the virtual machine's domain must be > extracted, modified, and then updated. These specific actions are for > aspects such as "setting a boot device" (VirtualBMC, Sushy-Tools), Setting > a boot mode (Sushy-Tools), and setting a virtual media device > (Sushy-Tools). > > This issue is triggered when a VM has any kind of "secure" information > defined in the XML domain definition. If an operator deploys VirtualBMC or > Sushy-Tools to manage one of these libvirt VMs, the first time any action > is performed that requires rewriting of the XML domain definition, all > secure information -- including a VNC console password, if set -- is lost > and removed from the domain definition, leaving the libvirt VM's exposed to > a malicious console user. > > ## Recommended Actions ## > Operators who may have been impacted by this vulnerability should > immediately remove use of VirtualBMC and/or Sushy-Tools from their > production environment. Then, validate and if necessary, reconfigure > passwords for VNC access or any other impacted secrets. > > ## Notes ## > The OpenStack team will ensure documentation is updated to clearly state > these software packages are intended for development/CI use only, and are > not safe to run in production. > > ## Credits ## > Julia Kreger from Red Hat > > ## References ## > Author: Jay Faulkner, G-Research Open Source Software > This OSSN: https://wiki.openstack.org/wiki/OSSN/OSSN-0090 > Original Storyboard bug: https://storyboard.openstack.org/#!/story/2010382 > Mailing List : [Security] tag on openstack-discuss at lists.openstack.org > OpenStack Security Project : https://launchpad.net/~openstack-ossg > CVE: CVE-2022-44020 -----BEGIN PGP SIGNATURE----- Version: FlowCrypt Email Encryption 8.3.8 Comment: Seamlessly send and receive encrypted email wsFzBAEBCgAGBQJjYDkHACEJEGt12Tm0JMbUFiEEvF1YmsGLSYuWqE+ta3XZ ObQkxtQIlQ/+OYBQY7DkwJkdZWKSXoaEAe2wyNwnnU9vbbJm/t13gg0h68/c 1zo7M9ZlvAO/lKPWB7GoWmV0wIFB+f70s8uZB4thDwheKV+99Sg7HHS6JzgU xU5+1/cq4F/6Ht8bmh1FV2/6TLLTQfC36YzkG3eS/q8Dehxmji5zjZdlVAnb ErLOS9/w8uWsXqHuY+jxM2evBt4wo8qmXgSzPpBRoYOC4Nx/jZQtN2sZmlfZ b6nE+40LIvjKbrmT5lpGfytVuboqi9gHuAF/CWckJUNNd2GbEKcguAH5aRL2 3TO5X1myX+N8RrOoo5wxEjosH36Th4TrKNRDWTQqe3zSGS8s30H5Ryu82XkH vZRncsg5p27VvL06Yrl2/uLUHzbLBJ7pJ07dhA2sjjTY46poix74xhwbde2I DVP8OaHhumHWlU8yBEqapuNMhhU20BiwpFLijUhQhKnGfb9hw/ZNlYgT5Jh9 vEubBfKcw4FmZIwvXFVJGs0GwQxoVYravUx8bgQbK5tb/e3omlDj+VOKrVeV uAp82/OLrgOvr6L0wCvFyJu+9uEMiPRuvJQJNKBNIv4ec4r9fpAEgcMlnFqo YAIzpg1jfPWbCn154dvhOxguqNIPtu2SiLTmD2Vvg8mwJu7gEkRkTsiz6KXv GdhY0ogG20TaqyfrKTDmddUgaleq+pD0VAk= =2622 -----END PGP SIGNATURE----- From mark at stackhpc.com Mon Oct 31 22:26:06 2022 From: mark at stackhpc.com (mark at stackhpc.com) Date: Tue, 1 Nov 2022 07:26:06 +0900 Subject: Delivery reports about your e-mail Message-ID: The original message was received at Tue, 1 Nov 2022 07:26:06 +0900 from stackhpc.com [42.37.153.10] ----- The following addresses had permanent fatal errors -----