From rdhasman at redhat.com Mon Aug 1 07:16:35 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Mon, 1 Aug 2022 12:46:35 +0530 Subject: [cinder] midcycle - 2 Planning Message-ID: Hello Argonauts, As discussed in our last cinder meeting[1], most of the cinder team (present during the meeting) don't have conflicts on 10th August, 1400-1600 UTC so we are planning to conduct our Cinder midcycle-2 on the proposed date (R-8 week). This will be 3 weeks prior to Milestone-3/Feature Freeze and 2 weeks before non-client library release (os-brick) so will give plenty of time to know we're on track and work on it if we're not. Please add your topics in the etherpad along with your IRC nick (from L#33). Date: 10th August, 2022 Time: 1400-1600 UTC Etherpad: https://etherpad.opendev.org/p/cinder-zed-midcycles [1] https://etherpad.opendev.org/p/cinder-zed-meetings#L113 -------------- next part -------------- An HTML attachment was scrubbed... URL: From nurmatov.mamatisa at huawei.com Mon Aug 1 08:19:54 2022 From: nurmatov.mamatisa at huawei.com (Nurmatov Mamatisa) Date: Mon, 1 Aug 2022 08:19:54 +0000 Subject: [neutron] Bug deputy July 25 to August 1 Message-ID: <900b0cea97604f18bb296f509e7cd858@huawei.com> Hi, I was bug deputy last week. Below is the week summary. Bug #1940425 was occurred last week due to os-vif 3.0.0 release. One rfe was proposed. Undecided bug needs further triage. Details: Critical -------- - https://bugs.launchpad.net/neutron/+bug/1940425 - test_live_migration_with_trunk tempest test fails due to port remains in down state - Confirmed - Assigned to Slawek Kaplonski - https://bugs.launchpad.net/neutron/+bug/1982818 - Periodic job openstack-tox-py39-with-oslo-master is broken - In progress: https://review.opendev.org/c/openstack/neutron/+/851433 - Assigned to Rodolfo Alonso - https://bugs.launchpad.net/neutron/+bug/1982720 - stable/train: neutron-grenade job consistently fails in reqirements repo - In progress: https://review.opendev.org/c/openstack/neutron/+/851506 - Assigned to Slawek Kaplonski Medium ------ - https://bugs.launchpad.net/neutron/+bug/1982962 - Quota driver "DbQuotaNoLockDriver" should implement "get_detailed_project_quotas" - In progress: https://review.opendev.org/c/openstack/neutron/+/851357 - Assigned to Rodolfo Alonso Low --- - https://bugs.launchpad.net/neutron/+bug/1982951 - [OVN][QoS] Add minimum bandwidth rule support to ML2/OVN - In progress: https://review.opendev.org/c/openstack/neutron/+/842292 - Assigned to Rodolfo Alonso Undecided --------- - https://bugs.launchpad.net/neutron/+bug/1982882 - Failed to process compatible router - network is unreachable - New RFEs ---- - https://bugs.launchpad.net/neutron/+bug/1983053 - [RFE] Add possibility to define default security group rules - New - Assigned Slawek Kaplonski Best regards, Isa Advanced Software Technology Lab / Cloud Technologies Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Mon Aug 1 08:21:18 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 1 Aug 2022 10:21:18 +0200 Subject: [DIB][diskimage-builder] Rocky Linux image build method Message-ID: Hello, I am curious about the choice of providing only a rocky-container element in DIB, which works differently to the centos element, which uses cloud images. It makes it hard to produce working images for VMs or bare metal, as various packages that would normally be installed are missing, such as cloud-utils-growpart or openssh-server. See the kickstarts for reference [1] [2]. It seems to also occasionally cause complex failures such as the one that rendered Rocky Linux 8 images unbootable last week [3]. I am guessing this wouldn't have happened had the build been from a cloud image. Would the DIB community be open to also have a rocky element using GenericCloud images, like centos? Thanks, Pierre Riteau (priteau) [1] https://git.rockylinux.org/rocky/kickstarts/-/blob/r8/Rocky-8-Container-Base.ks [2] https://git.rockylinux.org/rocky/kickstarts/-/blob/r8/Rocky-8-GenericCloud.ks [3] https://review.opendev.org/c/openstack/diskimage-builder/+/851687 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Mon Aug 1 09:40:35 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Mon, 1 Aug 2022 11:40:35 +0200 Subject: [nova] Zed API microversion planning Message-ID: Hey, tl;dr: if you have an open API change asking for a microversion, add it into the proposal list BEFORE TOMORROW AUG 2ND https://etherpad.opendev.org/p/nova-zed-microversions-plan For those who write API changes, they probably hit merge conflicts everytime we merge a new API microversion. FWIW, we recently merged 2.91 and 2.92, which means you now have to rebase at least with using a 2.93 version. That's why I'm proposing something new this cycle that may help you : IF YOU HAVE AN OPEN API CHANGE THAT'S JUST NEEDS TO BE REVIEWED, PLEASE ADD IT TO https://etherpad.opendev.org/p/nova-zed-microversions-plan BEFORE AUGUST 2ND 1600UTC. During the next Nova meeting, we'll look at all of the provided API changes in the etherpad, and we'll try to organize them by saying "ok, this one will have this API microversion, this other one that other API microversion". After this meeting, you should be able to know which API microversion you would have for Zed so you could then rebase your changes to use this one. That being said, please make sure to be able to rebase your change very quickly, because we don't want to wait for some API change before accepting another one that's important and which is ready to be merged. Thanks, -Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdhall at binghamton.edu Mon Aug 1 10:12:29 2022 From: kdhall at binghamton.edu (Dave Hall) Date: Mon, 1 Aug 2022 06:12:29 -0400 Subject: [openstack-ansible][glance][nfs] Requesting Example of Working Config Message-ID: Hello, Release - yoga. Host OS - Debian 11 I'm looking for an example of how to get NFS working as storage for glance and cinder - both the openstack_user_config.yml stanzas and the NFS server setup (/etc/exports, etc.). So far I've adapted the stanzas from the openstack_user_config.yml examples. The shares are mounted and writable (by root) from the glance/cinder containers, but glance keeps on throwing 410 errors. Thanks. -Dave -- Dave Hall Binghamton University kdhall at binghamton.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Mon Aug 1 11:39:35 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 1 Aug 2022 11:39:35 +0000 Subject: [DIB][diskimage-builder] Rocky Linux image build method In-Reply-To: References: Message-ID: <20220801113934.kctrhrf5prx6nvxf@yuggoth.org> On 2022-08-01 10:21:18 +0200 (+0200), Pierre Riteau wrote: > I am curious about the choice of providing only a rocky-container element > in DIB, which works differently to the centos element, which uses cloud > images. > > It makes it hard to produce working images for VMs or bare metal, as > various packages that would normally be installed are missing, such as > cloud-utils-growpart or openssh-server. See the kickstarts for reference > [1] [2]. > > It seems to also occasionally cause complex failures such as the one that > rendered Rocky Linux 8 images unbootable last week [3]. I am guessing this > wouldn't have happened had the build been from a cloud image. > > Would the DIB community be open to also have a rocky element using > GenericCloud images, like centos? [...] At least for images we're booting in OpenDev, we've been gradually switching them over to the containerfile mechanism for the specific reason that it's increasingly hard to guarantee random distros' package bootstrapping tools can all be made to work outside a chroot on a single common/foreign distro, and we don't want "fat" images preinstalled with a lot of unnecessary packages (and certainly not with things like cloud-init). For CentOS image building, we're still relying on the centos-minimal element, because we're able to install a working release of yum outside the image on the builder, but there have been times where some distros tools were simply not available for installation or required too new (or too old) system libraries to be able to run. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From swogatpradhan22 at gmail.com Mon Aug 1 06:59:30 2022 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Mon, 1 Aug 2022 12:29:30 +0530 Subject: Security vulnerabilities in Horizon dashboard | Openstack Wallaby | Tripleo | Openstack Horizon Message-ID: Hi, I am setting up an openstack wallaby cloud for a client using tripleo. After setting everything up the client ran a WEB scan and found some vulnerabilities (attached snapshot for reference). Can you please guide me on how to fix these vulnerabilities in the dashboard service? With regards, Swogat pradhan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: web-vapt.PNG Type: image/png Size: 150618 bytes Desc: not available URL: From wghaojue at cn.ibm.com Mon Aug 1 09:41:10 2022 From: wghaojue at cn.ibm.com (Hao Jue PX Wang) Date: Mon, 1 Aug 2022 09:41:10 +0000 Subject: Seek help for glance error "Configuration for store failed. Adding images to this store is disabled" Message-ID: Hi folks, I am using a glance config like this to leverage IBM gpfs backend to store image, however, when upload images openstack image create --file rhel-8.5-official.qcow2 test1?, I ran into the following error. Any idea or suggestion for fixing the error? Thanks ============================== [glance_store] default_backend = filesystem enabled_backends = filesystem filesystem_store_datadir = /foundation_gpfs2/icic/images/ filesystem_store_file_perm = 0644 ============================== ========================================================================================== 2022-08-01 10:18:43.590 3663156 INFO glance_store.capabilities [req-2c007ca2-a0c4-4a31-a94d-43f0f25cb2cc 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9 b334d58126ee4193bc439ea0cb806aaa - default default] haojuedebug req_cap: [] 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data [req-2c007ca2-a0c4-4a31-a94d-43f0f25cb2cc 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9 b334d58126ee4193bc439ea0cb806aaa - default default] Error in store configuration. Adding images to store is disabled.: glance_store.exceptions.StoreAddDisabled: Configuration for store failed. Adding images to this store is disabled. 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data Traceback (most recent call last): 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/api/v2/image_data.py", line 182, in upload 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data image.set_data(data, size, backend=backend) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/domain/proxy.py", line 198, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data self.base.set_data(data, size, backend=backend, set_active=set_active) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/notifier.py", line 501, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data _send_notification(notify_error, 'image.upload', msg) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data self.force_reraise() 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data six.reraise(self.type_, self.value, self.tb) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data raise value 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/notifier.py", line 448, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data set_active=set_active) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/api/policy.py", line 204, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data return self.image.set_data(*args, **kwargs) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/quota/__init__.py", line 319, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data set_active=set_active) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/location.py", line 559, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data self._upload_to_store(data, verifier, backend, size) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/location.py", line 486, in _upload_to_store 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data verifier=verifier) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance_store/backend.py", line 491, in add_to_backend_with_multihash 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data image_id, data, size, hashing_algo, store, context, verifier) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance_store/backend.py", line 468, in store_add_to_backend_with_multihash 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data image_id, data, size, hashing_algo, context=context, verifier=verifier) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance_store/driver.py", line 279, in add_adapter 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data metadata_dict) = store_add_fun(*args, **kwargs) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance_store/capabilities.py", line 175, in op_checker 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data raise op_exec_map[op](**kwargs) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data glance_store.exceptions.StoreAddDisabled: Configuration for store failed. Adding images to this store is disabled. 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data ========================================================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil at shrug.pw Mon Aug 1 13:08:44 2022 From: neil at shrug.pw (Neil Hanlon) Date: Mon, 01 Aug 2022 09:08:44 -0400 Subject: [DIB][diskimage-builder] Rocky Linux image build method In-Reply-To: References: Message-ID: <1a38c8b62d54134454fe72eb7790c250b6aa0aca.camel@shrug.pw> On Mon, 2022-08-01 at 10:21 +0200, Pierre Riteau wrote: > Hello, > > I am curious about the choice of providing only a rocky-container element > in DIB, which works differently to the centos element, which uses cloud > images. As Jeremy mentions, it was a conscious choice to use the containerfile method to build Rocky images for the reasons he discusses. It is more in line with the Fedora images, and from a building perspective, it's often better to layer things on top, rather than try and remove them afterwards. > > It makes it hard to produce working images for VMs or bare metal, as > various packages that would normally be installed are missing, such as > cloud-utils-growpart or openssh-server. See the kickstarts for reference > [1] [2]. > > It seems to also occasionally cause complex failures such as the one that > rendered Rocky Linux 8 images unbootable last week [3]. I am guessing this > wouldn't have happened had the build been from a cloud image. > > Would the DIB community be open to also have a rocky element using > GenericCloud images, like centos? It's possible it may have not happened when building from a Cloud image, but you similarly begin to rely on the upstream to produce images the same way forever. Building from a known minimal source and layering on the elements required, including writing any boot/kernel files needed will a more manageable process for the DIB community, in my opinion (as the one who makes the Upstream images for Rocky). Building from the ground up guarantees a much more repeatable process for DIB image building. In the short term, there's some uplift to make sure the images build correctly, but once they're working, working from Containerfile should lead to faster builds of more lean images with only what's needed on them-- and that's good for Security, too. Please do reach out via email or IRC/Chat (either of OFTC or Libera in the #rockylinux- channels) any time, by the way :) I'm one of the Infrastructure leads for Rocky Linux and spend time over here with the OSA folks! Nice to meet you, and thank you for using Rocky! > > Thanks, > Pierre Riteau (priteau) > > [1] > https://git.rockylinux.org/rocky/kickstarts/-/blob/r8/Rocky-8-Container-Base.ks > [2] > https://git.rockylinux.org/rocky/kickstarts/-/blob/r8/Rocky-8-GenericCloud.ks > [3] https://review.opendev.org/c/openstack/diskimage-builder/+/851687 From kdhall at binghamton.edu Mon Aug 1 13:57:22 2022 From: kdhall at binghamton.edu (Dave Hall) Date: Mon, 1 Aug 2022 09:57:22 -0400 Subject: Seek help for glance error "Configuration for store failed. Adding images to this store is disabled" In-Reply-To: References: Message-ID: Hao, I am seeing the same error right now, but using NFS for my Image store. Perhaps the issue is not with the storage backend, but with some other part of the setup? I have not seen any instructions indicating that it might be necessary to enable the store once it is created. -Dave -- Dave Hall Binghamton University kdhall at binghamton.edu On Mon, Aug 1, 2022 at 9:04 AM Hao Jue PX Wang wrote: > Hi folks, > > I am using a glance config like this to leverage IBM gpfs backend to store > image, however, when upload images > openstack image create --file rhel-8.5-official.qcow2 test1, I ran into > the following error. Any idea or suggestion for fixing the error? Thanks > > > ============================== > [glance_store] > default_backend = filesystem > enabled_backends = filesystem > filesystem_store_datadir = /foundation_gpfs2/icic/images/ > filesystem_store_file_perm = 0644 > ============================== > > > > ========================================================================================== > 2022-08-01 10:18:43.590 3663156 INFO glance_store.capabilities > [req-2c007ca2-a0c4-4a31-a94d-43f0f25cb2cc > 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9 > b334d58126ee4193bc439ea0cb806aaa - default default] haojuedebug req_cap: > [] > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > [req-2c007ca2-a0c4-4a31-a94d-43f0f25cb2cc > 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9 > b334d58126ee4193bc439ea0cb806aaa - default default] Error in store > configuration. Adding images to store is disabled.: > glance_store.exceptions.StoreAddDisabled: Configuration for store failed. > Adding images to this store is disabled. > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data Traceback > (most recent call last): > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance/api/v2/image_data.py", line 182, > in upload > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > image.set_data(data, size, backend=backend) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance/domain/proxy.py", line 198, in > set_data > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > self.base.set_data(data, size, backend=backend, set_active=set_active) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance/notifier.py", line 501, in set_data > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > _send_notification(notify_error, 'image.upload', msg) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in > __exit__ > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > self.force_reraise() > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in > force_reraise > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > six.reraise(self.type_, self.value, self.tb) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data raise > value > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance/notifier.py", line 448, in set_data > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > set_active=set_active) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance/api/policy.py", line 204, in > set_data > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data return > self.image.set_data(*args, **kwargs) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance/quota/__init__.py", line 319, in > set_data > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > set_active=set_active) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance/location.py", line 559, in set_data > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > self._upload_to_store(data, verifier, backend, size) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance/location.py", line 486, in > _upload_to_store > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > verifier=verifier) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance_store/backend.py", line 491, in > add_to_backend_with_multihash > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > image_id, data, size, hashing_algo, store, context, verifier) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance_store/backend.py", line 468, in > store_add_to_backend_with_multihash > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > image_id, data, size, hashing_algo, context=context, verifier=verifier) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance_store/driver.py", line 279, in > add_adapter > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > metadata_dict) = store_add_fun(*args, **kwargs) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File > "/usr/lib/python3.6/site-packages/glance_store/capabilities.py", line 175, > in op_checker > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data raise > op_exec_map[op](**kwargs) > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > glance_store.exceptions.StoreAddDisabled: Configuration for store failed. > Adding images to this store is disabled. > 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data > > ========================================================================================== > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Mon Aug 1 14:17:31 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 1 Aug 2022 14:17:31 +0000 Subject: [horizon][security-sig][tripleo] Security vulnerabilities in Horizon dashboard | Openstack Wallaby In-Reply-To: References: Message-ID: <20220801141730.u3pvancnurroomrl@yuggoth.org> [I've moved some of the subject keywords to topic tags in hopes they'll match more people's mail filters.] On 2022-08-01 12:29:30 +0530 (+0530), Swogat Pradhan wrote: > I am setting up an openstack wallaby cloud for a client using tripleo. > After setting everything up the client ran a WEB scan and found some > vulnerabilities (attached snapshot for reference). > > Can you please guide me on how to fix these vulnerabilities in the > dashboard service? I'm one of the vulnerability coordinators for OpenStack, and while I don't have deep knowledge of Horizon or TripleO, I'll do my best to address some of these points until others are able to jump in with more specifics. No WAF Detected: This looks like your scanner wants you to put a "web application firewall" in front of Horizon. I'm going to guess TripleO doesn't incorporate one in its deployments, but you should theoretically be able to use whatever WAF you're using for other web-based services you're operating, or install one of your choice in your network. jQuery is Vulnerable: This is https://launchpad.net/bugs/1955556 and seems currently blocked by incompatibilities in jQuery-Migrate per https://launchpad.net/bugs/1914782 (as best I can tell). No Anti-CSRF tokens were found in a HTML submission form: It's hard to know whether this is a missed implementation for some interface or a misconfiguration. Is CSRF_COOKIE_SECURE turned on in your Horizon config? I see what looks like a HorizonSecureCookies option in tripleo-heat-templates and tripleo-ansible, which appears to default to false, so you might have to toggle that to true, though as I said I'm not all that familiar with TripleO's implementation, and it looks like it might normally get switched on if SSL/TLS is enabled, so maybe there's something else going on in your case. Brute force attack: The description there is vague. Is it talking about Keystone credential brute-forcing? If so, there are options you can turn on, for example PCI-DSS compliance related ones, to automatically lock out accounts after too many login failures. See https://docs.openstack.org/keystone/latest/admin/configuration.html#security-compliance-and-pci-dss for details on these features. Hopefully that helps for a start, but others should be able to provide more in-depth answers. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From wghaojue at cn.ibm.com Mon Aug 1 14:09:42 2022 From: wghaojue at cn.ibm.com (Hao Jue PX Wang) Date: Mon, 1 Aug 2022 14:09:42 +0000 Subject: =?UTF-8?Q?=E5=9B=9E=E5=A4=8D:__Re:_Seek_help_for_glance_error_"Configurat?= =?UTF-8?Q?ion_for_store_failed._Adding_images_to_this_store_is_disabled"?= In-Reply-To: References: Message-ID: Hi Dave, Thanks for reply! I think the path setup is fine, I was wonder how glance determines the store is disabled or not? I found req_cap's value is [] in `if not store.is_capable(*req_cap):` of glance_store/capabilities.py. here is our gpfs sytem which is used: foundation_gpfs2 100G 67G 34G 67% /foundation_gpfs2 and here is the path which is used to store image: drwxrwxrwx. 2 glance glance system_u:object_r:glance_var_lib_t:s0 4096 Jul 29 10:36 images ________________________________ ???: Dave Hall ????: 2022?8?1? 21:57 ???: Hao Jue PX Wang ??: openstack-discuss at lists.openstack.org ??: [EXTERNAL] Re: Seek help for glance error "Configuration for store failed. Adding images to this store is disabled" Hao, I am seeing the same error right now, but using NFS for my Image store. Perhaps the issue is not with the storage backend, but with some other part of the setup? I have not seen any instructions indicating that it might be necessary ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hao, I am seeing the same error right now, but using NFS for my Image store. Perhaps the issue is not with the storage backend, but with some other part of the setup? I have not seen any instructions indicating that it might be necessary to enable the store once it is created. -Dave -- Dave Hall Binghamton University kdhall at binghamton.edu On Mon, Aug 1, 2022 at 9:04 AM Hao Jue PX Wang > wrote: Hi folks, I am using a glance config like this to leverage IBM gpfs backend to store image, however, when upload images openstack image create --file rhel-8.5-official.qcow2 test1, I ran into the following error. Any idea or suggestion for fixing the error? Thanks ============================== [glance_store] default_backend = filesystem enabled_backends = filesystem filesystem_store_datadir = /foundation_gpfs2/icic/images/ filesystem_store_file_perm = 0644 ============================== ========================================================================================== 2022-08-01 10:18:43.590 3663156 INFO glance_store.capabilities [req-2c007ca2-a0c4-4a31-a94d-43f0f25cb2cc 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9 b334d58126ee4193bc439ea0cb806aaa - default default] haojuedebug req_cap: [] 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data [req-2c007ca2-a0c4-4a31-a94d-43f0f25cb2cc 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9 b334d58126ee4193bc439ea0cb806aaa - default default] Error in store configuration. Adding images to store is disabled.: glance_store.exceptions.StoreAddDisabled: Configuration for store failed. Adding images to this store is disabled. 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data Traceback (most recent call last): 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/api/v2/image_data.py", line 182, in upload 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data image.set_data(data, size, backend=backend) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/domain/proxy.py", line 198, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data self.base.set_data(data, size, backend=backend, set_active=set_active) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/notifier.py", line 501, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data _send_notification(notify_error, 'image.upload', msg) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data self.force_reraise() 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data six.reraise(self.type_, self.value, self.tb) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data raise value 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/notifier.py", line 448, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data set_active=set_active) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/api/policy.py", line 204, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data return self.image.set_data(*args, **kwargs) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/quota/__init__.py", line 319, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data set_active=set_active) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/location.py", line 559, in set_data 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data self._upload_to_store(data, verifier, backend, size) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance/location.py", line 486, in _upload_to_store 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data verifier=verifier) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance_store/backend.py", line 491, in add_to_backend_with_multihash 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data image_id, data, size, hashing_algo, store, context, verifier) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance_store/backend.py", line 468, in store_add_to_backend_with_multihash 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data image_id, data, size, hashing_algo, context=context, verifier=verifier) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance_store/driver.py", line 279, in add_adapter 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data metadata_dict) = store_add_fun(*args, **kwargs) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data File "/usr/lib/python3.6/site-packages/glance_store/capabilities.py", line 175, in op_checker 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data raise op_exec_map[op](**kwargs) 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data glance_store.exceptions.StoreAddDisabled: Configuration for store failed. Adding images to this store is disabled. 2022-08-01 10:18:43.593 3663156 ERROR glance.api.v2.image_data ========================================================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Mon Aug 1 15:23:02 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Mon, 1 Aug 2022 17:23:02 +0200 Subject: [ironic][swift][vitrage][OpenStackSDK] missing releases in Zed Message-ID: <9d894f78-8065-aadf-82ca-1ae8b16caf17@est.tech> Hi PTLS/release liaisons of teams in $SUBJECT, Here we are at the time in Zed cycle where release team evaluates if every cycle-with-intermediary deliverables had their releases (like we had in yoga [1] for example). The list-deliverables script showed that we miss some of the deliverables [2]. Here we should create a patch that proposes the transition from 'cycle-with-intermediary' to 'cycle-with-rc' for every such deliverables. As these are usually the ones that missing at this time of the cycles and we end up usually abandoning the transition patches and proposing releases: could you please propose release patches for these deliverables [2]? (Let me know if any of the deliverables needs anyway a transition though, and I'll create the patch) [1] https://review.opendev.org/q/topic:not-yet-released-yoga [2] https://paste.opendev.org/show/bXsaCISJzOfOsRnECvE9/ Thanks in advance, El?d Ill?s irc: elodilles From iurygregory at gmail.com Mon Aug 1 17:04:47 2022 From: iurygregory at gmail.com (Iury Gregory) Date: Mon, 1 Aug 2022 14:04:47 -0300 Subject: [ironic] Proposing Jacob Anders to sushy-core Message-ID: Hello ironic-cores and sushy-cores, I would like to propose Jacob Anders (janders irc) for sushy-core. He made great contributions to improve sushy to cover corner cases from different HW in the last releases, you can find some of his contributions in [1], please vote with +1/-1. [1] https://review.opendev.org/q/owner:janders%2540redhat.com+project:openstack/sushy+status:merged -- *Att[]'s* *Iury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Ironic PTL * *Senior Software Engineer at Red Hat Brazil* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Mon Aug 1 18:31:34 2022 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 1 Aug 2022 11:31:34 -0700 Subject: [ironic] Proposing Jacob Anders to sushy-core In-Reply-To: References: Message-ID: Greetings! So, from the knowledge/contribution of code standpoint, I agree. However, I don't see much in the way of reviews in stackalytics[0]. Generally I would prefer to encourage this before granting core privileges. It is not a requirement to be a core to engage in code review. -Julia [0]: https://www.stackalytics.io/?user_id=janders%40redhat.com On Mon, Aug 1, 2022 at 10:13 AM Iury Gregory wrote: > > Hello ironic-cores and sushy-cores, > > I would like to propose Jacob Anders (janders irc) for sushy-core. > He made great contributions to improve sushy to cover corner cases from different HW in the last releases, you can find some of his contributions in [1], please vote with +1/-1. > > [1] https://review.opendev.org/q/owner:janders%2540redhat.com+project:openstack/sushy+status:merged > > -- > Att[]'s > Iury Gregory Melo Ferreira > MSc in Computer Science at UFCG > Ironic PTL > Senior Software Engineer at Red Hat Brazil > Social: https://www.linkedin.com/in/iurygregory > E-mail: iurygregory at gmail.com From kdhall at binghamton.edu Mon Aug 1 18:35:03 2022 From: kdhall at binghamton.edu (Dave Hall) Date: Mon, 1 Aug 2022 14:35:03 -0400 Subject: [openstack-ansible][glance][nfs] Requesting Example of Working Config In-Reply-To: References: Message-ID: Solved, sort of. I finally figured out the right Google search to lead me to https://docs.openstack.org/glance/yoga/configuration/configuring.html, According to this document, Configuring Glance Storage Backends? > > > There are a number of configuration options in Glance that control how > Glance stores disk images. These configuration options are specified in the > glance-api.conf configuration file in the section [glance_store]. > default_store=STORE > > Optional. Default: file > > Can only be specified in configuration files. > > Sets the storage backend to use by default when storing images in Glance. > Available options for this option are (file, swift, rbd, cinder or vsphere). > In order to select a default store it must also be listed in the stores > list described below. > stores=STORES > > Optional. Default: file, http > > A comma separated list of enabled glance stores. Some available options > for this option are (filesystem, http, rbd, swift, cinder, vmware) > I looked at the glance-api.conf in my containers and found neither of these lines present. When I added them [glance_store] > default_backend = file > > > *default_store = filestores = file* > > the image creation worked. Now my question will be how to get openstack-ansible to generate these lines. -Dave -- Dave Hall Binghamton University kdhall at binghamton.edu 607-760-2328 (Cell) 607-777-4641 (Office) On Mon, Aug 1, 2022 at 6:12 AM Dave Hall wrote: > Hello, > > Release - yoga. Host OS - Debian 11 > > I'm looking for an example of how to get NFS working as storage for glance > and cinder - both the openstack_user_config.yml stanzas and the NFS server > setup (/etc/exports, etc.). > > So far I've adapted the stanzas from the openstack_user_config.yml > examples. The shares are mounted and writable (by root) from the > glance/cinder containers, but glance keeps on throwing 410 errors. > > Thanks. > > -Dave > > -- > Dave Hall > Binghamton University > kdhall at binghamton.edu > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.denton at rackspace.com Mon Aug 1 19:01:29 2022 From: james.denton at rackspace.com (James Denton) Date: Mon, 1 Aug 2022 19:01:29 +0000 Subject: [openstack-ansible][glance][nfs] Requesting Example of Working Config In-Reply-To: References: Message-ID: Hi Dave, When I used NFS, here?s the config I set in user_variables.yml: -- glance_default_store: file glance_nfs_local_directory: "images" glance_nfs_client: - server: "10.22.0.4" remote_path: "/volume2/glance_images/images" local_path: "/var/lib/glance/images" type: "nfs" options: "_netdev,auto" config_overrides: "{}" glance_system_user_uid: 1029 glance_system_group_gid: 65537 Changing your server address and remote_path to match. I needed to set my local uid/gid to match my NFS server; not sure if you?d need to do the same. James Denton Rackspace Private Cloud From: Dave Hall Date: Monday, August 1, 2022 at 1:51 PM To: openstack-discuss , Hao Jue PX Wang Cc: Dmitriy Rabotyagov Subject: Re: [openstack-ansible][glance][nfs] Requesting Example of Working Config CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Solved, sort of. I finally figured out the right Google search to lead me to https://docs.openstack.org/glance/yoga/configuration/configuring.html, According to this document, Configuring Glance Storage Backends? There are a number of configuration options in Glance that control how Glance stores disk images. These configuration options are specified in the glance-api.conf configuration file in the section [glance_store]. default_store=STORE Optional. Default: file Can only be specified in configuration files. Sets the storage backend to use by default when storing images in Glance. Available options for this option are (file, swift, rbd, cinder or vsphere). In order to select a default store it must also be listed in the stores list described below. stores=STORES Optional. Default: file, http A comma separated list of enabled glance stores. Some available options for this option are (filesystem, http, rbd, swift, cinder, vmware) I looked at the glance-api.conf in my containers and found neither of these lines present. When I added them [glance_store] default_backend = file default_store = file stores = file the image creation worked. Now my question will be how to get openstack-ansible to generate these lines. -Dave -- Dave Hall Binghamton University kdhall at binghamton.edu 607-760-2328 (Cell) 607-777-4641 (Office) On Mon, Aug 1, 2022 at 6:12 AM Dave Hall > wrote: Hello, Release - yoga. Host OS - Debian 11 I'm looking for an example of how to get NFS working as storage for glance and cinder - both the openstack_user_config.yml stanzas and the NFS server setup (/etc/exports, etc.). So far I've adapted the stanzas from the openstack_user_config.yml examples. The shares are mounted and writable (by root) from the glance/cinder containers, but glance keeps on throwing 410 errors. Thanks. -Dave -- Dave Hall Binghamton University kdhall at binghamton.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Mon Aug 1 19:10:19 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 1 Aug 2022 19:10:19 +0000 Subject: [dev][requirements][tripleo] Return of the revenge of lockfile strikes back part II In-Reply-To: <20220728200418.mzghpxdbynlkmboz@yuggoth.org> References: <20220709132635.v5ljgnc7lsmu25xk@yuggoth.org> <20220716015210.7pzcrwfyzcho6opc@yuggoth.org> <20220728200418.mzghpxdbynlkmboz@yuggoth.org> Message-ID: <20220801191018.yvaalnfzp4kuboyg@yuggoth.org> On 2022-07-28 20:04:18 +0000 (+0000), Jeremy Stanley wrote: > On 2022-07-28 15:58:18 -0400 (-0400), James Slagle wrote: > [...] > > I don't have any objection to removing openstackci as a maintainer of > > lockfile on PypI. > > Thanks for confirming! I'll get the ball rolling on that and let the > original maintainer know. And openstackci is no longer a maintainer of lockfile: https://discuss.python.org/t/17219/29 Thanks again, everyone! Now if we can just help the lockfile maintainer to wean everyone else off of it too... -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From iurygregory at gmail.com Mon Aug 1 18:04:21 2022 From: iurygregory at gmail.com (Iury Gregory) Date: Mon, 1 Aug 2022 19:04:21 +0100 Subject: [baremetal-sig][ironic] No meeting on August Message-ID: Dear all, We would like to inform you that this month the Bare Metal SIG meeting will be skipped. Best regards, -- *Att[]'s* *Iury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Ironic PTL * *Senior Software Engineer at Red Hat Brazil* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From iwienand at redhat.com Tue Aug 2 02:17:58 2022 From: iwienand at redhat.com (Ian Wienand) Date: Tue, 2 Aug 2022 12:17:58 +1000 Subject: [DIB][diskimage-builder] Rocky Linux image build method In-Reply-To: References: Message-ID: On Mon, Aug 01, 2022 at 10:21:18AM +0200, Pierre Riteau wrote: > It seems to also occasionally cause complex failures such as the one that > rendered Rocky Linux 8 images unbootable last week [3]. I am guessing this > wouldn't have happened had the build been from a cloud image. > [3] https://review.opendev.org/c/openstack/diskimage-builder/+/851687 Others have responded and I don't really have anything to add; I would just say that the reason we moved from the cloud images many years ago was because various things just like [3] kept happening. We used to "simply" boot generic upstream images on clouds and run an array of scripts to pre-configure them, take a snapshot image and then boot off that for the day. It was constantly breaking :) The centos element has had it's fair share of "interesting" issues with disk-layouts, etc. > Would the DIB community be open to also have a rocky element using > GenericCloud images, like centos? In terms of having your dependencies available in final images, nothing really beats specifying them explicitly. Something like [1]. I think I'd encourage this rather than trying to add another platform to support in dib. -i [1] https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/infra-package-needs From kkchn.in at gmail.com Tue Aug 2 03:28:00 2022 From: kkchn.in at gmail.com (KK CHN) Date: Tue, 2 Aug 2022 08:58:00 +0530 Subject: Horizon dashboard query Message-ID: Hi everyone! I need a guide or advice or anything. I am administering a private cloud in openstack. As I am using the Horizon dashboard for the VM provisioning and administering activities. But the users are sending requests in mail and creating the VMs for them. can we make workflow automation for the request and approval and provisioning automatically in the Horizon dashboard on approval? Has anyone already done /have idea on these kinds of solutions for your openstack cloud ??. Is there some kind of documentation that could help me?. Any guidance much appreciated, where to start, what to refer and which tool/programming language best if I need to code from scratch. thanks in advance. Krish -------------- next part -------------- An HTML attachment was scrubbed... URL: From auniyal at redhat.com Tue Aug 2 07:35:51 2022 From: auniyal at redhat.com (Amit Uniyal) Date: Tue, 2 Aug 2022 13:05:51 +0530 Subject: Horizon dashboard query In-Reply-To: References: Message-ID: Hello, You want to automate and provide a web interface solution. The simplest way would be to create a new web application (which can have only 2-3 pages) with an input form asking for VM details and usage. 1. Get all info in json format, update it as per available image, flavor details. Now you have all details of request, you can add an approval system here manual/automated (as per usage and quota assigned). 2. Convert this to a heat template, upload to swift(for future reference), and call heat api. Why heat ? It will allow you to create n+ number of VM at once, for example can create a full lab, having different instance flavor on different networks. 3. Update VM deployment status and access info back in the web application. Tools: web app: Django or node js Docs: https://docs.openstack.org/api-ref/orchestration/v1/ https://docs.openstack.org/heat/rocky/template_guide/hot_guide.html https://docs.openstack.org/ocata/cli-reference/heat.html#heat-stack-create Regards On Tue, Aug 2, 2022 at 9:14 AM KK CHN wrote: > Hi everyone! > I need a guide or advice or anything. > > I am administering a private cloud in openstack. As I am using the Horizon > dashboard for the VM provisioning and administering activities. > But the users are sending requests in mail and creating the VMs for them. > can we make workflow automation for the request and approval and > provisioning automatically in the Horizon dashboard on approval? > > Has anyone already done /have idea on these kinds of solutions for your > openstack cloud ??. Is there some kind of documentation that could help > me?. > > Any guidance much appreciated, where to start, what to refer and which > tool/programming language best if I need to code from scratch. > > thanks in advance. > Krish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From homelandmailbox at gmail.com Tue Aug 2 08:02:52 2022 From: homelandmailbox at gmail.com (f.loghmani) Date: Tue, 2 Aug 2022 12:32:52 +0430 Subject: resize or migrate problem Message-ID: hello I use OpenStack and I create some VMs in different compute in one region. when I resize the VM. if the compute doesn't have enough space, automatically the vm will be migrated to another compute. but my problem is here that when it migrated, the _base file doesn't migrate and remains in the first compute that it doesn't have enough space. because of this, the VM appears with an error status in OpenStack. if I move the _base from first compute to final compute and reboot the server this problem has been solved. so i need to not transfer _base manually and if it migrated to another compute the disk and _base transfer with each other. could you please help me to solve this problem? thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From fereshtehloghmani at gmail.com Tue Aug 2 08:03:56 2022 From: fereshtehloghmani at gmail.com (fereshteh loghmani) Date: Tue, 2 Aug 2022 12:33:56 +0430 Subject: resize or migrate problem Message-ID: hello I use OpenStack and I create some VMs in different compute in one region. when I resize the VM. if the compute doesn't have enough space, automatically the vm will be migrated to another compute. but my problem is here that when it migrated, the _base file doesn't migrate and remains in the first compute that it doesn't have enough space. because of this, the VM appears with an error status in OpenStack. if I move the _base from first compute to final compute and reboot the server this problem has been solved. so i need to not transfer _base manually and if it migrated to another compute the disk and _base transfer with each other. could you please help me to solve this problem? thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Aug 2 08:25:23 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 02 Aug 2022 10:25:23 +0200 Subject: [neutron] CI meeting agenda for 2.08.2022 Message-ID: <1997484.mzMxnXppAc@p1> Hi, It's just a reminder that today at 1500 UTC we will have our CI meeting. It will be on video this time [1]. Agenda is on etherpad [2]. [1] https://meetpad.opendev.org/neutron-ci-meetings [2] https://etherpad.opendev.org/p/neutron-ci-meetings -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From eblock at nde.ag Tue Aug 2 08:58:07 2022 From: eblock at nde.ag (Eugen Block) Date: Tue, 02 Aug 2022 08:58:07 +0000 Subject: Horizon dashboard query In-Reply-To: References: Message-ID: <20220802085807.Horde.ImuuQV5l-gxRWtDua4lN2OX@webmail.nde.ag> I don't have any suggestions for the automation here but a question. Couldn't you just give each user a project so they can login and create their instances themselves? Zitat von Amit Uniyal : > Hello, > > You want to automate and provide a web interface solution. > > The simplest way would be to create a new web application (which can have > only 2-3 pages) with an input form asking for VM details and usage. > > 1. Get all info in json format, update it as per available image, flavor > details. > > Now you have all details of request, you can add an approval system here > manual/automated (as per usage and quota assigned). > > 2. Convert this to a heat template, upload to swift(for future reference), > and call heat api. > Why heat ? > It will allow you to create n+ number of VM at once, for example can > create a full lab, having different instance flavor on different networks. > > 3. Update VM deployment status and access info back in the web application. > > Tools: > web app: Django or node js > > Docs: > https://docs.openstack.org/api-ref/orchestration/v1/ > https://docs.openstack.org/heat/rocky/template_guide/hot_guide.html > https://docs.openstack.org/ocata/cli-reference/heat.html#heat-stack-create > > Regards > > On Tue, Aug 2, 2022 at 9:14 AM KK CHN wrote: > >> Hi everyone! >> I need a guide or advice or anything. >> >> I am administering a private cloud in openstack. As I am using the Horizon >> dashboard for the VM provisioning and administering activities. >> But the users are sending requests in mail and creating the VMs for them. >> can we make workflow automation for the request and approval and >> provisioning automatically in the Horizon dashboard on approval? >> >> Has anyone already done /have idea on these kinds of solutions for your >> openstack cloud ??. Is there some kind of documentation that could help >> me?. >> >> Any guidance much appreciated, where to start, what to refer and which >> tool/programming language best if I need to code from scratch. >> >> thanks in advance. >> Krish >> From ueha.ayumu at fujitsu.com Tue Aug 2 09:49:51 2022 From: ueha.ayumu at fujitsu.com (ueha.ayumu at fujitsu.com) Date: Tue, 2 Aug 2022 09:49:51 +0000 Subject: [ceilometer][gnocchi][tacker] Internal Server Error in devstack for Zuul gate job Message-ID: Hi telemetry team, I?m Ueha from Tacker team, The Zuul gate job of Tacker failed with the following error. Do you know the solution? The gate job has failed, so we would appreciate it if you could deal it with high priority. Thanks! for reference, the same error occurring in the ceilometer patch. (https://review.opendev.org/c/openstack/ceilometer/+/851338 ?s telemetry-dsvm-integration-centos-9s job) ------------------ ++ /opt/stack/ceilometer/devstack/plugin.sh:start_ceilometer:322 : /usr/local/bin/ceilometer-upgrade 2022-07-29 05:20:42.125 61523 DEBUG ceilometer.cmd.storage [-] Upgrading Gnocchi resource types upgrade /opt/stack/ceilometer/ceilometer/cmd/storage.py:42 2022-07-29 05:20:42.228 61523 CRITICAL ceilometer [-] Unhandled error: gnocchiclient.exceptions.ClientException: Internal Server Error (HTTP 500) 2022-07-29 05:20:42.228 61523 ERROR ceilometer Traceback (most recent call last): 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/usr/local/bin/ceilometer-upgrade", line 10, in 2022-07-29 05:20:42.228 61523 ERROR ceilometer sys.exit(upgrade()) 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/opt/stack/ceilometer/ceilometer/cmd/storage.py", line 49, in upgrade 2022-07-29 05:20:42.228 61523 ERROR ceilometer tenacity.Retrying( ......... omit ......... 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/usr/local/lib/python3.8/dist-packages/gnocchiclient/client.py", line 52, in request 2022-07-29 05:20:42.228 61523 ERROR ceilometer raise exceptions.from_response(resp, method) 2022-07-29 05:20:42.228 61523 ERROR ceilometer gnocchiclient.exceptions.ClientException: Internal Server Error (HTTP 500) 2022-07-29 05:20:42.228 61523 ERROR ceilometer ------------------ Full log: https://zuul.opendev.org/t/openstack/build/71262e66ecf34827a8a3435657aa9b3f Best Regards, Ueha -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Tue Aug 2 10:03:04 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 2 Aug 2022 12:03:04 +0200 Subject: [ceilometer][gnocchi][tacker] Internal Server Error in devstack for Zuul gate job In-Reply-To: References: Message-ID: Hi Ueha, It seems gnocchi is failing and requires a regeneration of the protobuf client: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_712/851478/2/check/tacker-functional-devstack-multinode-sol/71262e6/controller-tacker/logs/screen-gnocchi-api.txt --- Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 devstack at gnocchi-api.service[61040]: File "/usr/local/lib/python3.8/dist-packages/google/protobuf/descriptor.py", line 755, in __new__ Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 devstack at gnocchi-api.service[61040]: _message.Message._CheckCalledFromGeneratedFile() Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 devstack at gnocchi-api.service[61040]: TypeError: Descriptors cannot not be created directly. Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 devstack at gnocchi-api.service[61040]: If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. --- Kind regards, Radek -yoctozepto On Tue, 2 Aug 2022 at 11:53, ueha.ayumu at fujitsu.com wrote: > > Hi telemetry team, > > > > I?m Ueha from Tacker team, > > The Zuul gate job of Tacker failed with the following error. Do you know the solution? > > The gate job has failed, so we would appreciate it if you could deal it with high priority. > > Thanks! > > > > for reference, the same error occurring in the ceilometer patch. > > (https://review.opendev.org/c/openstack/ceilometer/+/851338 ?s telemetry-dsvm-integration-centos-9s job) > > > > ------------------ > > ++ /opt/stack/ceilometer/devstack/plugin.sh:start_ceilometer:322 : /usr/local/bin/ceilometer-upgrade > > 2022-07-29 05:20:42.125 61523 DEBUG ceilometer.cmd.storage [-] Upgrading Gnocchi resource types upgrade /opt/stack/ceilometer/ceilometer/cmd/storage.py:42 > > 2022-07-29 05:20:42.228 61523 CRITICAL ceilometer [-] Unhandled error: gnocchiclient.exceptions.ClientException: Internal Server Error (HTTP 500) > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer Traceback (most recent call last): > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/usr/local/bin/ceilometer-upgrade", line 10, in > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer sys.exit(upgrade()) > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/opt/stack/ceilometer/ceilometer/cmd/storage.py", line 49, in upgrade > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer tenacity.Retrying( > > ......... omit ......... > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/usr/local/lib/python3.8/dist-packages/gnocchiclient/client.py", line 52, in request > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer raise exceptions.from_response(resp, method) > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer gnocchiclient.exceptions.ClientException: Internal Server Error (HTTP 500) > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer > > ------------------ > > Full log: https://zuul.opendev.org/t/openstack/build/71262e66ecf34827a8a3435657aa9b3f > > > > Best Regards, > > Ueha > > From smooney at redhat.com Tue Aug 2 10:06:53 2022 From: smooney at redhat.com (Sean Mooney) Date: Tue, 02 Aug 2022 11:06:53 +0100 Subject: resize or migrate problem In-Reply-To: References: Message-ID: <0e10ad6bb29e5c7dc0845abcb6fc2d314b42cf54.camel@redhat.com> On Tue, 2022-08-02 at 12:33 +0430, fereshteh loghmani wrote: > hello > I use OpenStack and I create some VMs in different compute in one region. > when I resize the VM. if the compute doesn't have enough space, > automatically the vm will be migrated to another compute. but my problem is > here that when it migrated, the _base file doesn't migrate and remains in > the first compute that it doesn't have enough space. because of this, the > VM appears with an error status in OpenStack. if I move the _base from > first compute to final compute and reboot the server this problem has been > solved. > so i need to not transfer _base manually and if it migrated to another > compute the disk and _base transfer with each other. > could you please help me to solve this problem? the _base file shoudl be automatically copied when requried. has the image been deleted in glance? what release of openstack are you using and what storage backend are you using for nova and glance. can i assume nova is useing the defaul qcow2 backend? > thanks in advance From kkchn.in at gmail.com Tue Aug 2 11:43:46 2022 From: kkchn.in at gmail.com (KK CHN) Date: Tue, 2 Aug 2022 17:13:46 +0530 Subject: Horizon dashboard query In-Reply-To: <20220802085807.Horde.ImuuQV5l-gxRWtDua4lN2OX@webmail.nde.ag> References: <20220802085807.Horde.ImuuQV5l-gxRWtDua4lN2OX@webmail.nde.ag> Message-ID: The use case is, the user requests need to be controlled. Cannot provide each project for each user. Even Though your infrastructure is large enough to support each project for each user ( for a large user base automation with approval/rejection like this is a fair practice.) On Tue, Aug 2, 2022 at 2:33 PM Eugen Block wrote: > I don't have any suggestions for the automation here but a question. > Couldn't you just give each user a project so they can login and > create their instances themselves? > > > Zitat von Amit Uniyal : > > > Hello, > > > > You want to automate and provide a web interface solution. > > > > The simplest way would be to create a new web application (which can have > > only 2-3 pages) with an input form asking for VM details and usage. > > > > 1. Get all info in json format, update it as per available image, flavor > > details. > > > > Now you have all details of request, you can add an approval system here > > manual/automated (as per usage and quota assigned). > > > > 2. Convert this to a heat template, upload to swift(for future > reference), > > and call heat api. > > Why heat ? > > It will allow you to create n+ number of VM at once, for example can > > create a full lab, having different instance flavor on different > networks. > > > > 3. Update VM deployment status and access info back in the web > application. > > > > Tools: > > web app: Django or node js > > > > Docs: > > https://docs.openstack.org/api-ref/orchestration/v1/ > > https://docs.openstack.org/heat/rocky/template_guide/hot_guide.html > > > https://docs.openstack.org/ocata/cli-reference/heat.html#heat-stack-create > > > > Regards > > > > On Tue, Aug 2, 2022 at 9:14 AM KK CHN wrote: > > > >> Hi everyone! > >> I need a guide or advice or anything. > >> > >> I am administering a private cloud in openstack. As I am using the > Horizon > >> dashboard for the VM provisioning and administering activities. > >> But the users are sending requests in mail and creating the VMs for > them. > >> can we make workflow automation for the request and approval and > >> provisioning automatically in the Horizon dashboard on approval? > >> > >> Has anyone already done /have idea on these kinds of solutions for your > >> openstack cloud ??. Is there some kind of documentation that could help > >> me?. > >> > >> Any guidance much appreciated, where to start, what to refer and which > >> tool/programming language best if I need to code from scratch. > >> > >> thanks in advance. > >> Krish > >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokendrarathour at gmail.com Tue Aug 2 03:54:53 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Tue, 2 Aug 2022 09:24:53 +0530 Subject: [Triple0 - Wallaby] Overcloud deployment getting failed with SSL In-Reply-To: References: Message-ID: Hi Team, Any lead about this issue raised? On Thu, Jul 28, 2022 at 10:02 AM Lokendra Rathour wrote: > Hi Brendan, > Thanks for the advice. > bug is reported: > https://bugs.launchpad.net/tripleo/+bug/1982996 > > On Thu, Jul 28, 2022 at 5:34 AM Brendan Shephard > wrote: > >> Hey, >> >> It?s probably best that you raise a bug here at this stage: >> https://bugs.launchpad.net/tripleo >> >> Can you attach all of the templates you?re using to that bug, the >> overcloud deploy command script that you?re running and also the log files >> that you have shared here? >> >> I wasn?t able to reproduce your issue, but if you raise a bug we can >> direct it to the right team who can help out with your keystone errors. >> >> Brendan Shephard >> Senior Software Engineer >> Brisbane, Australia >> >> >> >> On 28 Jul 2022, at 2:55 am, Lokendra Rathour >> wrote: >> >> Hi Team, >> I tried again with DNS enabled, but the error remains the same. >> >> tone_resources : Create identity public endpoint | undercloud | >> 0:24:59.456181 | 2.31s >> 2022-07-27 15:20:48.735838 | 5254006e-bbd1-cd20-647c-00000000736c | >> TASK | Create identity internal endpoint >> 2022-07-27 15:20:51.227000 | 5254006e-bbd1-cd20-647c-00000000736c | >> FATAL | Create identity internal endpoint | undercloud | error={"changed": >> false, "extra_data": {"data": null, "details": "The request you have made >> requires authentication.", "response": >> "{\"error\":{\"code\":401,\"message\":\"The request you have made requires >> authentication.\",\"title\":\"Unauthorized\"}}\n"}, "msg": "Failed to >> list services: Client Error for url: https://overcloud-publ >> ic.myhsc.com:13000/v3/services, The request you have made requires >> authentication."} >> >> Checking further in the keystone logs in container: >> >> >> 2022-07-27 19:35:37.447 33 WARNING keystone.server.flask.application >> [req-bb4621d8-73ad-4bad-831f-5c2370e92e71 - - - - -] Authorization failed. >> The request you have made requires authentication. from >> fd00:fd00:fd00:9900::29: keystone.exception.Unauthorized: The request you >> have made requires authentication. >> 2022-07-27 19:35:37.998 26 WARNING py.warnings >> [req-54d44e3a-5e34-4e40-b2dc-e8213353ea05 ab5e9670632544f8a8c7e1b3ac175bcd >> e4185872cadb442aa9a59980b3227941 - default default] >> /usr/lib/python3.6/site-packages/oslo_policy/policy.py:1065: UserWarning: >> Policy identity:list_projects failed scope check. The token used to make >> the request was project scoped but the policy requires ['system', 'domain'] >> scope. This behavior may change in the future where using the intended >> scope is required >> >> I am kind of blocked now, any lead would let me understand the problem >> more and maybe it can solve the issue. >> >> Best Regards, >> Lokendra >> >> On Mon, Jul 25, 2022 at 3:12 PM Lokendra Rathour < >> lokendrarathour at gmail.com> wrote: >> >>> Hi Brendan, >>> Apologies for this delay, i had to redo the setup to reach this point, >>> and also this time just to eliminate my Doubt i removed SSL for overcloud. >>> Now I am only using DNS Server. In this case also I am getting the same >>> error. >>> >>> | 0:13:20.198877 | 1.86s >>> 2022-07-25 14:37:29.657118 | 525400a7-0932-2ed1-d313-000000007193 | >>> TASK | Create identity internal endpoint >>> 2022-07-25 14:37:31.995131 | 525400a7-0932-2ed1-d313-000000007193 | >>> FATAL | Create identity internal endpoint | undercloud | error={"changed": >>> false, "extra_data": {"data": null, "details": "The request you have made >>> requires authentication.", "response": >>> "{\"error\":{\"code\":401,\"message\":\"The request you have made requires >>> authentication.\",\"title\":\"Unauthorized\"}}\n"}, "msg": "Failed to list >>> services: Client Error for url: >>> http://[fd00:fd00:fd00:9900::a0]:5000/v3/services, The request you have >>> made requires authentication."} >>> >>> >>> To answer your question please note: >>> >>> "OS_CLOUD=overcloud openstack endpoint list" >>> >>> [root at GGNLABPM4 ~]# ssh stack at 10.0.1.29 >>> stack at 10.0.1.29's password: >>> Activate the web console with: systemctl enable --now cockpit.socket >>> >>> Last login: Mon Jul 25 14:38:44 2022 from 10.0.1.4 >>> [stack at undercloud ~]$ OS_CLOUD=overcloud openstack endpoint list >>> >>> +----------------------------------+-----------+--------------+--------------+---------+-----------+---------------------------------------+ >>> | ID | Region | Service Name | Service >>> Type | Enabled | Interface | URL | >>> >>> +----------------------------------+-----------+--------------+--------------+---------+-----------+---------------------------------------+ >>> | 1ecd328b5ea1426bb411d157b8339dd2 | regionOne | keystone | identity >>> | True | public | http://[fd00:fd00:fd00:9900::a0]:5000 | >>> | 518cfa0f2ece43b684710006c9fa5b25 | regionOne | keystone | identity >>> | True | admin | http://30.30.30.181:35357 | >>> | 8cda413052c24718b073578bb497f483 | regionOne | keystone | identity >>> | True | internal | http://[fd00:fd00:fd00:2000::a0]:5000 | >>> >>> +----------------------------------+-----------+--------------+--------------+---------+-----------+---------------------------------------+ >>> [stack at undercloud ~]$ >>> >>> >>> it is giving us only keystone endpoints. >>> >>> Also note that I am trying to deploy the end to end setup with FQDN >>> only. and in this case as well I am facing the same issue as old. >>> >>> thanks once again for your inputs. >>> >>> -Lokendra >>> >>> >>> >>> On Wed, Jul 20, 2022 at 3:07 PM Brendan Shephard >>> wrote: >>> >>>> Hey, >>>> >>>> I think it's weird that you got a response at all when you run the >>>> openstack endpoint list, since you said haproxy isn't running. So there >>>> should be nothing serving that endpoint. >>>> >>>> I noticed you have the stackrc file sourced. Try it again without that >>>> file sourced, so: >>>> $ su - stack >>>> $ OS_CLOUD=overcloud openstack endpoint list >>>> >>>> I would suspect that nothing should be responding. It could be the >>>> stackrc file causing issues with some of the environment variables. If the >>>> above command doesn't return anything, then my suggestion would be to >>>> re-run the deployment like this: >>>> >>>> $ su - stack >>>> $ export OS_CLOUD=undercloud >>>> # Then run your deployment script again >>>> $ bash overcloud_deploy.sh >>>> >>>> The OS_CLOUD variable tells the openstackclient to lookup the details >>>> about that cloud from your clouds.yaml file. Which will be located in >>>> /home/stack/.config/openstack/clouds.yaml. >>>> >>>> This method is preferable to the sourcing of RC files. >>>> >>>> Reference: >>>> >>>> https://docs.openstack.org/openstacksdk/latest/user/guides/connect_from_config.html >>>> >>>> Regarding the HAProxy warnings. I don't think they should be fatal. >>>> afaik, HAProxy should still be starting. If it's not, there might be >>>> another error that you will need to look for in the log files under >>>> /var/log/containers/haproxy/ >>>> >>>> I wasn't able to reproduce that warning by following the documentation >>>> for enabling TLS though. So it seems like an odd error to be getting. >>>> >>>> Brendan Shephard >>>> Software Engineer >>>> >>>> Red Hat APAC >>>> 193 N Quay >>>> Brisbane City QLD 4000 >>>> @RedHat Red Hat >>>> Red Hat >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Jul 20, 2022 at 7:02 PM Lokendra Rathour < >>>> lokendrarathour at gmail.com> wrote: >>>> >>>>> Hi Brendan / Team, >>>>> Any lead for the issue raised? >>>>> >>>>> -Lokendra >>>>> >>>>> >>>>> >>>>> On Tue, Jul 19, 2022 at 11:46 AM Lokendra Rathour < >>>>> lokendrarathour at gmail.com> wrote: >>>>> >>>>>> Hi Brendan,, >>>>>> Thanks for the inputs. >>>>>> when i run the command as you suggested I get this: >>>>>> >>>>>> (undercloud) [stack at undercloud ~]$ OS_CLOUD=overcloud openstack >>>>>> endpoint list >>>>>> >>>>>> +----------------------------------+-----------+--------------+--------------+---------+-----------+----------------------------------------+ >>>>>> | ID | Region | Service Name | >>>>>> Service Type | Enabled | Interface | URL >>>>>> | >>>>>> >>>>>> +----------------------------------+-----------+--------------+--------------+---------+-----------+----------------------------------------+ >>>>>> | 1bfe43c9cf174bd8a01a3a681538766a | regionOne | keystone | >>>>>> identity | True | internal | >>>>>> http://[fd00:fd00:fd00:2000::326]:5000 | >>>>>> | 707e92fc11df4a74bceb5e48f2561357 | regionOne | keystone | >>>>>> identity | True | admin | http://30.30.30.173:35357 >>>>>> | >>>>>> | fab4e66170c8402f899c5f43fd4c39fe | regionOne | keystone | >>>>>> identity | True | public | https://overcloud-hsc.com:13000 >>>>>> | >>>>>> >>>>>> +----------------------------------+-----------+--------------+--------------+---------+-----------+----------------------------------------+ >>>>>> (undercloud) [stack at undercloud ~]$ >>>>>> >>>>>> >>>>>> On the other note that i notices was as below: >>>>>> >>>>>> - HAproxy container is not running. >>>>>> - [root at overcloud-controller-2 stdouts]# podman ps -a | grep >>>>>> haproxy >>>>>> e91dbde042db >>>>>> undercloud.ctlplane.localdomain:8787/tripleowallaby/openstack-haproxy:current-tripleo >>>>>> 24 hours ago Exited (1) Less than a >>>>>> second ago container-puppet-haproxy\ >>>>>> - Checking logs: >>>>>> - 2022-07-19T08:47:00.496212294+05:30 stderr F + ARGS= >>>>>> 2022-07-19T08:47:00.496300242+05:30 stderr F + [[ ! -n '' ]] >>>>>> 2022-07-19T08:47:00.496323705+05:30 stderr F + . >>>>>> kolla_extend_start >>>>>> 2022-07-19T08:47:00.496578173+05:30 stderr F + echo 'Running >>>>>> command: '\''bash -c $* -- eval if [ -f /usr/sbin/haproxy-systemd-wrapper >>>>>> ]; then exec /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg; >>>>>> else exec /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -Ws; fi'\''' >>>>>> 2022-07-19T08:47:00.496605469+05:30 stdout F Running command: >>>>>> 'bash -c $* -- eval if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then exec >>>>>> /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg; else exec >>>>>> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -Ws; fi' >>>>>> 2022-07-19T08:47:00.496895618+05:30 stderr F + exec bash -c >>>>>> '$*' -- eval if '[' -f /usr/sbin/haproxy-systemd-wrapper '];' then exec >>>>>> /usr/sbin/haproxy-systemd-wrapper -f '/etc/haproxy/haproxy.cfg;' else exec >>>>>> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg '-Ws;' fi >>>>>> 2022-07-19T08:47:00.513182490+05:30 stderr F [WARNING] >>>>>> 199/084700 (7) : parsing [/etc/haproxy/haproxy.cfg:28] : 'bind >>>>>> fd00:fd00:fd00:9900::81:13776' : >>>>>> 2022-07-19T08:47:00.513182490+05:30 stderr F unable to load >>>>>> default 1024 bits DH parameter for certificate >>>>>> '/etc/pki/tls/private/overcloud_endpoint.pem'. >>>>>> 2022-07-19T08:47:00.513182490+05:30 stderr F , SSL library >>>>>> will use an automatically generated DH parameter. >>>>>> automatically2022-07-19T08:47:00.513967576+05:30 stderr F >>>>>> [WARNING] 199/084700 (7) : parsing [/etc/haproxy/haproxy.cfg:45] : 'bind >>>>>> fd00:fd00:fd00:9900::81:13292' : >>>>>> 2022-07-19T08:47:00.513967576+05:30 stderr F unable to load >>>>>> default 1024 bits DH parameter for certificate >>>>>> '/etc/pki/tls/private/overcloud_endpoint.pem'. >>>>>> 2022-07-19T08:47:00.513967576+05:30 stderr F , SSL library >>>>>> will use an automatically generated DH parameter. >>>>>> 2022-07-19T08:47:00.514736662+05:30 stderr F [WARNING] >>>>>> 199/084700 (7) : parsing [/etc/haproxy/haproxy.cfg:69] : 'bind >>>>>> fd00:fd00:fd00:9900::81:13004' : >>>>>> 2022-07-19T08:47:00.514736662+05:30 stderr F unable to load >>>>>> default 1024 bits DH parameter for certificate >>>>>> '/etc/pki/tls/private/overcloud_endpoint.pem'. >>>>>> 2022-07-19T08:47:00.514736662+05:30 stderr F , SSL library >>>>>> will use an automatically generated DH parameter. >>>>>> 2022-07-19T08:47:00.515461787+05:30 stderr F [WARNING] >>>>>> 199/084700 (7) : parsing [/etc/haproxy/haproxy.cfg:89] : 'bind >>>>>> fd00:fd00:fd00:9900::81:13005' : >>>>>> 2022-07-19T08:47:00.515461787+05:30 stderr F unable to load >>>>>> default 1024 bits DH parameter for certificate >>>>>> '/etc/pki/tls/private/overcloud_endpoint.pem'. >>>>>> 2022-07-19T08:47:00.515461787+05:30 stderr F , SSL library >>>>>> will use an automatically generated DH parameter. >>>>>> 2022-07-19T08:47:00.516167406+05:30 stderr F [WARNING] >>>>>> 199/084700 (7) : parsing [/etc/haproxy/haproxy.cfg:108] : 'bind >>>>>> fd00:fd00:fd00:2000::326:443' : >>>>>> - 2022-07-19T08:47:00.517937930+05:30 stderr F , SSL library >>>>>> will use an automatically generated DH parameter. >>>>>> 2022-07-19T08:47:00.518534123+05:30 stderr F [WARNING] >>>>>> 199/084700 (7) : parsing [/etc/haproxy/haproxy.cfg:172] : 'bind >>>>>> fd00:fd00:fd00:9900::81:13000' : >>>>>> 2022-07-19T08:47:00.518534123+05:30 stderr F unable to load >>>>>> default 1024 bits DH parameter for certificate >>>>>> '/etc/pki/tls/private/overcloud_endpoint.pem'. >>>>>> 2022-07-19T08:47:00.518534123+05:30 stderr F , SSL library >>>>>> will use an automatically generated DH parameter. >>>>>> 2022-07-19T08:47:00.519127743+05:30 stderr F [WARNING] >>>>>> 199/084700 (7) : parsing [/etc/haproxy/haproxy.cfg:201] : 'bind >>>>>> fd00:fd00:fd00:9900::81:13696' : >>>>>> 2022-07-19T08:47:00.519127743+05:30 stderr F unable to load >>>>>> default 1024 bits DH parameter for certificate >>>>>> '/etc/pki/tls/private/overcloud_endpoint.pem'. >>>>>> 2022-07-19T08:47:00.519127743+05:30 stderr F , SSL library >>>>>> will use an automatically generated DH parameter. >>>>>> 2022-07-19T08:47:00.519734281+05:30 stderr F [WARNING] >>>>>> 199/084700 (7) : parsing [/etc/haproxy/haproxy.cfg:233] : 'bind >>>>>> fd00:fd00:fd00:9900::81:13080' : >>>>>> 2022-07-19T08:47:00.519734281+05:30 stderr F unable to load >>>>>> default 1024 bits DH parameter for certificate >>>>>> '/etc/pki/tls/private/overcloud_endpoint.pem'. >>>>>> 2022-07-19T08:47:00.519734281+05:30 stderr F , SSL library >>>>>> will use an automatically generated DH parameter. >>>>>> 2022-07-19T08:47:00.520285158+05:30 stderr F [WARNING] >>>>>> 199/084700 (7) : parsing [/etc/haproxy/haproxy.cfg:250] : 'bind >>>>>> fd00:fd00:fd00:9900::81:13774' : >>>>>> 2022-07-19T08:47:00.520285158+05:30 stderr F unable to load >>>>>> default 1024 bits DH parameter for certificate >>>>>> '/etc/pki/tls/private/overcloud_endpoint.pem'. >>>>>> 2022-07-19T08:47:00.520285158+05:30 stderr F , SSL library >>>>>> will use an automatically generated DH parameter. >>>>>> 2022-07-19T08:47:00.520830405+05:30 stderr F [WARNING] >>>>>> 199/084700 (7) : parsing [/etc/haproxy/haproxy.cfg:266] : >>>>>> 'bind fd00:fd00:fd00:9900::81:13778' : >>>>>> 2022-07-19T08:47:00.520830405+05:30 stderr F unable to load >>>>>> default 1024 bits DH parameter for certificate >>>>>> '/etc/pki/tls/private/overcloud_endpoint.pem'. >>>>>> 2022-07-19T08:47:00.520830405+05:30 stderr F , SSL library >>>>>> will use an automatically generated DH parameter. >>>>>> 2022-07-19T08:47:00.521517271+05:30 stderr F [WARNING] >>>>>> 199/084700 (7) : parsing [/etc/haproxy/haproxy.cfg:281] : 'bind >>>>>> fd00:fd00:fd00:9900::81:13808' : >>>>>> 2022-07-19T08:47:00.521517271+05:30 stderr F unable to load >>>>>> default 1024 bits DH parameter for certificate >>>>>> '/etc/pki/tls/private/overcloud_endpoint.pem'. >>>>>> 2022-07-19T08:47:00.521517271+05:30 stderr F , SSL library >>>>>> will use an automatically generated DH parameter. >>>>>> 2022-07-19T08:47:00.524065508+05:30 stderr F [WARNING] >>>>>> 199/084700 (7) : Setting tune.ssl.default-dh-param to 1024 by default, if >>>>>> your workload permits it you should set it to at least 2048. Please set a >>>>>> value >= 1024 to make this warning disappear. >>>>>> - pcs status also show that proxy is down for the controller >>>>>> with VIP: >>>>>> - Failed Resource Actions: >>>>>> * haproxy-bundle-podman-2_start_0 on overcloud-controller-2 >>>>>> 'error' (1): call=139, status='complete', exitreason='podman failed to >>>>>> launch container (rc: 1)', last-rc-change='Mon Jul 18 15:14:34 2022', >>>>>> queued=0ms, exec=1222ms >>>>>> * haproxy-bundle-podman-1_start_0 on overcloud-controller-1 >>>>>> 'error' (1): call=191, status='complete', exitreason='podman failed to >>>>>> launch container (rc: 1)', last-rc-change='Mon Jul 18 23:54:17 2022', >>>>>> queued=0ms, exec=1171ms >>>>>> * haproxy-bundle-podman-2_start_0 on overcloud-controller-1 >>>>>> 'error' (1): call=193, status='complete', exitreason='podman failed to >>>>>> launch container (rc: 1)', last-rc-change='Mon Jul 18 23:54:20 2022', >>>>>> queued=0ms, exec=1256ms >>>>>> >>>>>> do let me know in case we need anything more around it. >>>>>> thanks once again for the support. >>>>>> -Lokendra >>>>>> >>>>>> On Tue, Jul 19, 2022 at 11:07 AM Brendan Shephard < >>>>>> bshephar at redhat.com> wrote: >>>>>> >>>>>>> Hey, >>>>>>> >>>>>>> Doesn't look like there is anything wrong with the certificate >>>>>>> there. You would be getting a TLS error if that was the problem. >>>>>>> >>>>>>> What does your clouds.yaml file look like now? What happens if you >>>>>>> run this command from the Undercloud node: >>>>>>> $ OS_CLOUD=overcloud openstack endpoint list >>>>>>> >>>>>>> Do you get the same error? >>>>>>> >>>>>>> Brendan Shephard >>>>>>> Software Engineer >>>>>>> >>>>>>> Red Hat APAC >>>>>>> 193 N Quay >>>>>>> Brisbane City QLD 4000 >>>>>>> @RedHat Red Hat >>>>>>> Red Hat >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 19, 2022 at 1:28 PM Lokendra Rathour < >>>>>>> lokendrarathour at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Swogat and Vikarna, >>>>>>>> We have tried adding the DNS entry for the overcloud domain. we are >>>>>>>> getting the same error: >>>>>>>> >>>>>>>> 022-07-19 00:09:41.491498 | 525400ae-089b-c832-8e34-00000000704f | >>>>>>>> TIMING | tripleo_keystone_resources : Create identity public endpoint | >>>>>>>> undercloud | 0:11:18.785769 | 2.16s >>>>>>>> 2022-07-19 00:09:41.507319 | 525400ae-089b-c832-8e34-000000007050 | >>>>>>>> TASK | Create identity internal endpoint >>>>>>>> 2022-07-19 00:09:43.778910 | 525400ae-089b-c832-8e34-000000007050 | >>>>>>>> FATAL | Create identity internal endpoint | undercloud | >>>>>>>> error={"changed": false, "extra_data": {"data": null, "details": "The >>>>>>>> request you have made requires authentication.", "response": >>>>>>>> "{\"error\":{\"code\":401,\"message\":\"The request you have made requires >>>>>>>> authentication.\",\"title\":\"Unauthorized\"}}\n"}, "msg": "Failed to list >>>>>>>> services: Client Error for url: >>>>>>>> https://overcloud-hsc.com:13000/v3/services, The request you have >>>>>>>> made requires authentication."} >>>>>>>> 2022-07-19 00:09:43.780306 | 525400ae-089b-c832-8e34-000000007050 | >>>>>>>> TIMING | tripleo_keystone_resources : Create identity internal endpoint >>>>>>>> | undercloud | 0:11:21.074605 | 2. >>>>>>>> >>>>>>>> >>>>>>>> Certificate configs: >>>>>>>> >>>>>>>> [stack at undercloud oc-domain-name]$ cat server.csr.cnf >>>>>>>> [req] >>>>>>>> default_bits = 2048 >>>>>>>> prompt = no >>>>>>>> default_md = sha256 >>>>>>>> distinguished_name = dn >>>>>>>> [dn] >>>>>>>> C=IN >>>>>>>> ST=UTTAR PRADESH >>>>>>>> L=NOIDA >>>>>>>> O=HSC >>>>>>>> OU=HSC >>>>>>>> emailAddress=demo at demo.com >>>>>>>> CN=overcloud-hsc.com >>>>>>>> [stack at undercloud oc-domain-name]$ cat v3.ext >>>>>>>> authorityKeyIdentifier=keyid,issuer >>>>>>>> basicConstraints=CA:FALSE >>>>>>>> keyUsage = digitalSignature, nonRepudiation, keyEncipherment, >>>>>>>> dataEncipherment >>>>>>>> subjectAltName = @alt_names >>>>>>>> [alt_names] >>>>>>>> DNS.1=overcloud-hsc.com >>>>>>>> [stack at undercloud oc-domain-name]$ >>>>>>>> >>>>>>>> the difference we see from others is that we are using self-signed >>>>>>>> certificates. >>>>>>>> >>>>>>>> please let me know in case we need to check something else. Somehow >>>>>>>> this issue remains stuck. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jul 15, 2022 at 2:17 AM Swogat Pradhan < >>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>> >>>>>>>>> I was facing a similar kind of issue. >>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=2089442 >>>>>>>>> Here is the solution that helped me fix it. >>>>>>>>> Also make sure the cn that you will use is reachable from >>>>>>>>> undercloud (maybe) script should take care of it. >>>>>>>>> >>>>>>>>> Also please follow Mr. Tathe's mail to add the cn first. >>>>>>>>> >>>>>>>>> With regards >>>>>>>>> Swogat Pradhan >>>>>>>>> >>>>>>>>> On Thu, Jul 14, 2022 at 8:49 AM Vikarna Tathe < >>>>>>>>> vikarnatathe at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Lokendra, >>>>>>>>>> >>>>>>>>>> The CN field is missing. Can you add that and generate the >>>>>>>>>> certificate again. >>>>>>>>>> >>>>>>>>>> CN=ipaddress >>>>>>>>>> >>>>>>>>>> Also add dns.1=ipaddress under alt_names for precaution. >>>>>>>>>> >>>>>>>>>> Vikarna >>>>>>>>>> >>>>>>>>>> On Wed, 13 Jul, 2022, 23:02 Lokendra Rathour, < >>>>>>>>>> lokendrarathour at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> HI Vikarna, >>>>>>>>>>> Thanks for the inputs. >>>>>>>>>>> I am note able to access any tabs in GUI. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> to re-state, we are failing at the time of deployment at step4 : >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> PLAY [External deployment step 4] >>>>>>>>>>> ********************************************** >>>>>>>>>>> 2022-07-13 21:35:22.505148 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000000d7 | TASK | External deployment >>>>>>>>>>> step 4 >>>>>>>>>>> 2022-07-13 21:35:22.534899 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000000d7 | OK | External deployment >>>>>>>>>>> step 4 | undercloud -> localhost | result={ >>>>>>>>>>> "changed": false, >>>>>>>>>>> "msg": "Use --start-at-task 'External deployment step 4' to >>>>>>>>>>> resume from this task" >>>>>>>>>>> } >>>>>>>>>>> [WARNING]: ('undercloud -> localhost', >>>>>>>>>>> '525400ae-089b-870a-fab6-0000000000d7') >>>>>>>>>>> missing from stats >>>>>>>>>>> 2022-07-13 21:35:22.591268 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000000d8 | TIMING | include_tasks | >>>>>>>>>>> undercloud | 0:11:21.683453 | 0.04s >>>>>>>>>>> 2022-07-13 21:35:22.605901 | >>>>>>>>>>> f29c4b58-75a5-4993-97b8-3921a49d79d7 | INCLUDED | >>>>>>>>>>> /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step4.yaml >>>>>>>>>>> | undercloud >>>>>>>>>>> 2022-07-13 21:35:22.627112 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007239 | TASK | Clean up legacy Cinder >>>>>>>>>>> keystone catalog entries >>>>>>>>>>> 2022-07-13 21:35:25.110635 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007239 | OK | Clean up legacy Cinder >>>>>>>>>>> keystone catalog entries | undercloud | item={'service_name': 'cinderv2', >>>>>>>>>>> 'service_type': 'volumev2'} >>>>>>>>>>> 2022-07-13 21:35:25.112368 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007239 | TIMING | Clean up legacy Cinder >>>>>>>>>>> keystone catalog entries | undercloud | 0:11:24.204562 | 2.48s >>>>>>>>>>> 2022-07-13 21:35:27.029270 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007239 | OK | Clean up legacy Cinder >>>>>>>>>>> keystone catalog entries | undercloud | item={'service_name': 'cinderv3', >>>>>>>>>>> 'service_type': 'volume'} >>>>>>>>>>> 2022-07-13 21:35:27.030383 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007239 | TIMING | Clean up legacy Cinder >>>>>>>>>>> keystone catalog entries | undercloud | 0:11:26.122584 | 4.40s >>>>>>>>>>> 2022-07-13 21:35:27.032091 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007239 | TIMING | Clean up legacy Cinder >>>>>>>>>>> keystone catalog entries | undercloud | 0:11:26.124296 | 4.40s >>>>>>>>>>> 2022-07-13 21:35:27.047913 | >>>>>>>>>>> 525400ae-089b-870a-fab6-00000000723c | TASK | Manage Keystone >>>>>>>>>>> resources for OpenStack services >>>>>>>>>>> 2022-07-13 21:35:27.077672 | >>>>>>>>>>> 525400ae-089b-870a-fab6-00000000723c | TIMING | Manage Keystone >>>>>>>>>>> resources for OpenStack services | undercloud | 0:11:26.169842 | 0.03s >>>>>>>>>>> 2022-07-13 21:35:27.120270 | >>>>>>>>>>> 525400ae-089b-870a-fab6-00000000726b | TASK | Gather variables for >>>>>>>>>>> each operating system >>>>>>>>>>> 2022-07-13 21:35:27.161225 | >>>>>>>>>>> 525400ae-089b-870a-fab6-00000000726b | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Gather variables for each operating system | >>>>>>>>>>> undercloud | 0:11:26.253383 | 0.04s >>>>>>>>>>> 2022-07-13 21:35:27.177798 | >>>>>>>>>>> 525400ae-089b-870a-fab6-00000000726c | TASK | Create Keystone Admin >>>>>>>>>>> resources >>>>>>>>>>> 2022-07-13 21:35:27.207430 | >>>>>>>>>>> 525400ae-089b-870a-fab6-00000000726c | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Create Keystone Admin resources | undercloud | >>>>>>>>>>> 0:11:26.299608 | 0.03s >>>>>>>>>>> 2022-07-13 21:35:27.230985 | >>>>>>>>>>> 46e05e2d-2e9c-467b-ac4f-c5f0bc7286b3 | INCLUDED | >>>>>>>>>>> /usr/share/ansible/roles/tripleo_keystone_resources/tasks/admin.yml | >>>>>>>>>>> undercloud >>>>>>>>>>> 2022-07-13 21:35:27.256076 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072ad | TASK | Create default domain >>>>>>>>>>> 2022-07-13 21:35:29.343399 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072ad | OK | Create default domain | >>>>>>>>>>> undercloud >>>>>>>>>>> 2022-07-13 21:35:29.345172 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072ad | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Create default domain | undercloud | >>>>>>>>>>> 0:11:28.437360 | 2.09s >>>>>>>>>>> 2022-07-13 21:35:29.361643 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072ae | TASK | Create admin and >>>>>>>>>>> service projects >>>>>>>>>>> 2022-07-13 21:35:29.391295 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072ae | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Create admin and service projects | undercloud >>>>>>>>>>> | 0:11:28.483468 | 0.03s >>>>>>>>>>> 2022-07-13 21:35:29.402539 | >>>>>>>>>>> af7a4a76-4998-4679-ac6f-58acc0867554 | INCLUDED | >>>>>>>>>>> /usr/share/ansible/roles/tripleo_keystone_resources/tasks/projects.yml | >>>>>>>>>>> undercloud >>>>>>>>>>> 2022-07-13 21:35:29.428918 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007304 | TASK | Async creation of >>>>>>>>>>> Keystone project >>>>>>>>>>> 2022-07-13 21:35:30.144295 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007304 | CHANGED | Async creation of >>>>>>>>>>> Keystone project | undercloud | item=admin >>>>>>>>>>> 2022-07-13 21:35:30.145884 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007304 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Async creation of Keystone project | >>>>>>>>>>> undercloud | 0:11:29.238078 | 0.72s >>>>>>>>>>> 2022-07-13 21:35:30.493458 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007304 | CHANGED | Async creation of >>>>>>>>>>> Keystone project | undercloud | item=service >>>>>>>>>>> 2022-07-13 21:35:30.494386 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007304 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Async creation of Keystone project | >>>>>>>>>>> undercloud | 0:11:29.586587 | 1.06s >>>>>>>>>>> 2022-07-13 21:35:30.495729 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007304 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Async creation of Keystone project | >>>>>>>>>>> undercloud | 0:11:29.587916 | 1.07s >>>>>>>>>>> 2022-07-13 21:35:30.511748 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007306 | TASK | Check Keystone project >>>>>>>>>>> status >>>>>>>>>>> 2022-07-13 21:35:30.908189 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007306 | WAITING | Check Keystone project >>>>>>>>>>> status | undercloud | 30 retries left >>>>>>>>>>> 2022-07-13 21:35:36.166541 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007306 | OK | Check Keystone project >>>>>>>>>>> status | undercloud | item=admin >>>>>>>>>>> 2022-07-13 21:35:36.168506 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007306 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Check Keystone project status | undercloud | >>>>>>>>>>> 0:11:35.260666 | 5.66s >>>>>>>>>>> 2022-07-13 21:35:36.400914 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007306 | OK | Check Keystone project >>>>>>>>>>> status | undercloud | item=service >>>>>>>>>>> 2022-07-13 21:35:36.402534 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007306 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Check Keystone project status | undercloud | >>>>>>>>>>> 0:11:35.494729 | 5.89s >>>>>>>>>>> 2022-07-13 21:35:36.406576 | >>>>>>>>>>> 525400ae-089b-870a-fab6-000000007306 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Check Keystone project status | undercloud | >>>>>>>>>>> 0:11:35.498771 | 5.89s >>>>>>>>>>> 2022-07-13 21:35:36.427719 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072af | TASK | Create admin role >>>>>>>>>>> 2022-07-13 21:35:38.632266 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072af | OK | Create admin role | >>>>>>>>>>> undercloud >>>>>>>>>>> 2022-07-13 21:35:38.633754 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072af | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Create admin role | undercloud | >>>>>>>>>>> 0:11:37.725949 | 2.20s >>>>>>>>>>> 2022-07-13 21:35:38.649721 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b0 | TASK | Create _member_ role >>>>>>>>>>> 2022-07-13 21:35:38.689773 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b0 | SKIPPED | Create _member_ role | >>>>>>>>>>> undercloud >>>>>>>>>>> 2022-07-13 21:35:38.691172 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b0 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Create _member_ role | undercloud | >>>>>>>>>>> 0:11:37.783369 | 0.04s >>>>>>>>>>> 2022-07-13 21:35:38.706920 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b1 | TASK | Create admin user >>>>>>>>>>> 2022-07-13 21:35:42.051623 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b1 | CHANGED | Create admin user | >>>>>>>>>>> undercloud >>>>>>>>>>> 2022-07-13 21:35:42.053285 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b1 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Create admin user | undercloud | >>>>>>>>>>> 0:11:41.145472 | 3.34s >>>>>>>>>>> 2022-07-13 21:35:42.069370 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b2 | TASK | Assign admin role to >>>>>>>>>>> admin project for admin user >>>>>>>>>>> 2022-07-13 21:35:45.194891 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b2 | OK | Assign admin role to >>>>>>>>>>> admin project for admin user | undercloud >>>>>>>>>>> 2022-07-13 21:35:45.196669 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b2 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Assign admin role to admin project for admin >>>>>>>>>>> user | undercloud | 0:11:44.288848 | 3.13s >>>>>>>>>>> 2022-07-13 21:35:45.212674 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b3 | TASK | Assign _member_ role to >>>>>>>>>>> admin project for admin user >>>>>>>>>>> 2022-07-13 21:35:45.252884 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b3 | SKIPPED | Assign _member_ role to >>>>>>>>>>> admin project for admin user | undercloud >>>>>>>>>>> 2022-07-13 21:35:45.254283 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b3 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Assign _member_ role to admin project for >>>>>>>>>>> admin user | undercloud | 0:11:44.346479 | 0.04s >>>>>>>>>>> 2022-07-13 21:35:45.270310 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b4 | TASK | Create identity service >>>>>>>>>>> 2022-07-13 21:35:46.928715 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b4 | OK | Create identity service >>>>>>>>>>> | undercloud >>>>>>>>>>> 2022-07-13 21:35:46.930167 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b4 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Create identity service | undercloud | >>>>>>>>>>> 0:11:46.022362 | 1.66s >>>>>>>>>>> 2022-07-13 21:35:46.946797 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b5 | TASK | Create identity public >>>>>>>>>>> endpoint >>>>>>>>>>> 2022-07-13 21:35:49.139298 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b5 | OK | Create identity public >>>>>>>>>>> endpoint | undercloud >>>>>>>>>>> 2022-07-13 21:35:49.141158 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b5 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Create identity public endpoint | undercloud | >>>>>>>>>>> 0:11:48.233349 | 2.19s >>>>>>>>>>> 2022-07-13 21:35:49.157768 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b6 | TASK | Create identity >>>>>>>>>>> internal endpoint >>>>>>>>>>> 2022-07-13 21:35:51.566826 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b6 | FATAL | Create identity >>>>>>>>>>> internal endpoint | undercloud | error={"changed": false, "extra_data": >>>>>>>>>>> {"data": null, "details": "The request you have made requires >>>>>>>>>>> authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The >>>>>>>>>>> request you have made requires >>>>>>>>>>> authentication.\",\"title\":\"Unauthorized\"}}\n"}, "msg": "Failed to list >>>>>>>>>>> services: Client Error for url: >>>>>>>>>>> https://[fd00:fd00:fd00:9900::81]:13000/v3/services, The >>>>>>>>>>> request you have made requires authentication."} >>>>>>>>>>> 2022-07-13 21:35:51.568473 | >>>>>>>>>>> 525400ae-089b-870a-fab6-0000000072b6 | TIMING | >>>>>>>>>>> tripleo_keystone_resources : Create identity internal endpoint | undercloud >>>>>>>>>>> | 0:11:50.660654 | 2.41s >>>>>>>>>>> >>>>>>>>>>> PLAY RECAP >>>>>>>>>>> ********************************************************************* >>>>>>>>>>> localhost : ok=1 changed=0 unreachable=0 >>>>>>>>>>> failed=0 skipped=2 rescued=0 ignored=0 >>>>>>>>>>> overcloud-controller-0 : ok=437 changed=103 unreachable=0 >>>>>>>>>>> failed=0 skipped=214 rescued=0 ignored=0 >>>>>>>>>>> overcloud-controller-1 : ok=435 changed=101 unreachable=0 >>>>>>>>>>> failed=0 skipped=214 rescued=0 ignored=0 >>>>>>>>>>> overcloud-controller-2 : ok=432 changed=101 unreachable=0 >>>>>>>>>>> failed=0 skipped=214 rescued=0 ignored=0 >>>>>>>>>>> overcloud-novacompute-0 : ok=345 changed=82 unreachable=0 >>>>>>>>>>> failed=0 skipped=198 rescued=0 ignored=0 >>>>>>>>>>> undercloud : ok=39 changed=7 unreachable=0 >>>>>>>>>>> failed=1 skipped=6 rescued=0 ignored=0 >>>>>>>>>>> >>>>>>>>>>> Also : >>>>>>>>>>> (undercloud) [stack at undercloud oc-cert]$ cat server.csr.cnf >>>>>>>>>>> [req] >>>>>>>>>>> default_bits = 2048 >>>>>>>>>>> prompt = no >>>>>>>>>>> default_md = sha256 >>>>>>>>>>> distinguished_name = dn >>>>>>>>>>> [dn] >>>>>>>>>>> C=IN >>>>>>>>>>> ST=UTTAR PRADESH >>>>>>>>>>> L=NOIDA >>>>>>>>>>> O=HSC >>>>>>>>>>> OU=HSC >>>>>>>>>>> emailAddress=demo at demo.com >>>>>>>>>>> >>>>>>>>>>> v3.ext: >>>>>>>>>>> (undercloud) [stack at undercloud oc-cert]$ cat v3.ext >>>>>>>>>>> authorityKeyIdentifier=keyid,issuer >>>>>>>>>>> basicConstraints=CA:FALSE >>>>>>>>>>> keyUsage = digitalSignature, nonRepudiation, keyEncipherment, >>>>>>>>>>> dataEncipherment >>>>>>>>>>> subjectAltName = @alt_names >>>>>>>>>>> [alt_names] >>>>>>>>>>> IP.1=fd00:fd00:fd00:9900::81 >>>>>>>>>>> >>>>>>>>>>> Using these files we create other certificates. >>>>>>>>>>> Please check and let me know in case we need anything else. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 13, 2022 at 10:00 PM Vikarna Tathe < >>>>>>>>>>> vikarnatathe at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Lokendra, >>>>>>>>>>>> >>>>>>>>>>>> Are you able to access all the tabs in the OpenStack dashboard >>>>>>>>>>>> without any error? If not, please retry generating the certificate. Also, >>>>>>>>>>>> share the openssl.cnf or server.cnf. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, 13 Jul 2022 at 18:18, Lokendra Rathour < >>>>>>>>>>>> lokendrarathour at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Team, >>>>>>>>>>>>> Any input on this case raised. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Lokendra >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jul 12, 2022 at 10:18 PM Lokendra Rathour < >>>>>>>>>>>>> lokendrarathour at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Shephard/Swogat, >>>>>>>>>>>>>> I tried changing the setting as suggested and it looks like >>>>>>>>>>>>>> it has failed at step 4 with error: >>>>>>>>>>>>>> >>>>>>>>>>>>>> :31:32.169420 | 525400ae-089b-fb79-67ac-0000000072ce | >>>>>>>>>>>>>> TIMING | tripleo_keystone_resources : Create identity public endpoint | >>>>>>>>>>>>>> undercloud | 0:24:47.736198 | 2.21s >>>>>>>>>>>>>> 2022-07-12 21:31:32.185594 | >>>>>>>>>>>>>> 525400ae-089b-fb79-67ac-0000000072cf | TASK | Create identity >>>>>>>>>>>>>> internal endpoint >>>>>>>>>>>>>> 2022-07-12 21:31:34.468996 | >>>>>>>>>>>>>> 525400ae-089b-fb79-67ac-0000000072cf | FATAL | Create identity >>>>>>>>>>>>>> internal endpoint | undercloud | error={"changed": false, "extra_data": >>>>>>>>>>>>>> {"data": null, "details": "The request you have made requires >>>>>>>>>>>>>> authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The >>>>>>>>>>>>>> request you have made requires >>>>>>>>>>>>>> authentication.\",\"title\":\"Unauthorized\"}}\n"}, "msg": "Failed to list >>>>>>>>>>>>>> services: Client Error for url: >>>>>>>>>>>>>> https://[fd00:fd00:fd00:9900::81]:13000/v3/services, The >>>>>>>>>>>>>> request you have made requires authentication."} >>>>>>>>>>>>>> 2022-07-12 21:31:34.470415 | 525400ae-089b-fb79-67ac-000000 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Checking further the endpoint list: >>>>>>>>>>>>>> I see only one endpoint for keystone is gettin created. >>>>>>>>>>>>>> >>>>>>>>>>>>>> DeprecationWarning >>>>>>>>>>>>>> >>>>>>>>>>>>>> +----------------------------------+-----------+--------------+--------------+---------+-----------+-----------------------------------------+ >>>>>>>>>>>>>> | ID | Region | Service Name >>>>>>>>>>>>>> | Service Type | Enabled | Interface | URL >>>>>>>>>>>>>> | >>>>>>>>>>>>>> >>>>>>>>>>>>>> +----------------------------------+-----------+--------------+--------------+---------+-----------+-----------------------------------------+ >>>>>>>>>>>>>> | 4378dc0a4d8847ee87771699fc7b995e | regionOne | keystone >>>>>>>>>>>>>> | identity | True | admin | >>>>>>>>>>>>>> http://30.30.30.173:35357 | >>>>>>>>>>>>>> | 67c829e126944431a06ed0c2b97a295f | regionOne | keystone >>>>>>>>>>>>>> | identity | True | internal | >>>>>>>>>>>>>> http://[fd00:fd00:fd00:2000::326]:5000 | >>>>>>>>>>>>>> | 8a9a3de4993c4ff7903caf95b8ae40fa | regionOne | keystone >>>>>>>>>>>>>> | identity | True | public | >>>>>>>>>>>>>> https://[fd00:fd00:fd00:9900::81]:13000 | >>>>>>>>>>>>>> >>>>>>>>>>>>>> +----------------------------------+-----------+--------------+--------------+---------+-----------+-----------------------------------------+ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> it looks like something related to the SSL, we have also >>>>>>>>>>>>>> verified that the GUI login screen shows that Certificates are applied. >>>>>>>>>>>>>> exploring more in logs, meanwhile any suggestions or know >>>>>>>>>>>>>> observation would be of great help. >>>>>>>>>>>>>> thanks again for the support. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>> Lokendra >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Jul 9, 2022 at 11:24 AM Swogat Pradhan < >>>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I had faced a similar kind of issue, for ip based setup you >>>>>>>>>>>>>>> need to specify the domain name as the ip that you are going to use, this >>>>>>>>>>>>>>> error is showing up because the ssl is ip based but the fqdns seems to be >>>>>>>>>>>>>>> undercloud.com or overcloud.example.com. >>>>>>>>>>>>>>> I think for undercloud you can change the undercloud.conf. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> And will it work if we specify clouddomain parameter to the >>>>>>>>>>>>>>> IP address for overcloud? because it seems he has not specified the >>>>>>>>>>>>>>> clouddomain parameter and overcloud.example.com is the >>>>>>>>>>>>>>> default domain for overcloud.example.com. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, 8 Jul 2022, 6:01 pm Swogat Pradhan, < >>>>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What is the domain name you have specified in the >>>>>>>>>>>>>>>> undercloud.conf file? >>>>>>>>>>>>>>>> And what is the fqdn name used for the generation of the >>>>>>>>>>>>>>>> SSL cert? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, 8 Jul 2022, 5:38 pm Lokendra Rathour, < >>>>>>>>>>>>>>>> lokendrarathour at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Team, >>>>>>>>>>>>>>>>> We were trying to install overcloud with SSL enabled for >>>>>>>>>>>>>>>>> which the UC is installed, but OC install is getting failed at step 4: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ERROR >>>>>>>>>>>>>>>>> :nectionPool(host='fd00:fd00:fd00:9900::2ef', >>>>>>>>>>>>>>>>> port=13000): Max retries exceeded with url: / (Caused by >>>>>>>>>>>>>>>>> SSLError(CertificateError(\"hostname 'fd00:fd00:fd00:9900::2ef' doesn't >>>>>>>>>>>>>>>>> match 'undercloud.com'\",),))\n", "module_stdout": "", >>>>>>>>>>>>>>>>> "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1} >>>>>>>>>>>>>>>>> 2022-07-08 17:03:23.606739 | >>>>>>>>>>>>>>>>> 5254009a-6a3c-adb1-f96f-0000000072ac | FATAL | Clean up legacy Cinder >>>>>>>>>>>>>>>>> keystone catalog entries | undercloud | item={'service_name': 'cinderv3', >>>>>>>>>>>>>>>>> 'service_type': 'volume'} | error={"ansible_index_var": >>>>>>>>>>>>>>>>> "cinder_api_service", "ansible_loop_var": "item", "changed": false, >>>>>>>>>>>>>>>>> "cinder_api_service": 1, "item": {"service_name": "cinderv3", >>>>>>>>>>>>>>>>> "service_type": "volume"}, "module_stderr": "Failed to discover available >>>>>>>>>>>>>>>>> identity versions when contacting >>>>>>>>>>>>>>>>> https://[fd00:fd00:fd00:9900::2ef]:13000. Attempting to >>>>>>>>>>>>>>>>> parse version from URL.\nTraceback (most recent call last):\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/urllib3/connectionpool.py\", line 600, >>>>>>>>>>>>>>>>> in urlopen\n chunked=chunked)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/urllib3/connectionpool.py\", line 343, >>>>>>>>>>>>>>>>> in _make_request\n self._validate_conn(conn)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/urllib3/connectionpool.py\", line 839, >>>>>>>>>>>>>>>>> in _validate_conn\n conn.connect()\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/urllib3/connection.py\", line 378, in >>>>>>>>>>>>>>>>> connect\n _match_hostname(cert, self.assert_hostname or >>>>>>>>>>>>>>>>> server_hostname)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/urllib3/connection.py\", line 388, in >>>>>>>>>>>>>>>>> _match_hostname\n match_hostname(cert, asserted_hostname)\n File >>>>>>>>>>>>>>>>> \"/usr/lib64/python3.6/ssl.py\", line 291, in match_hostname\n % >>>>>>>>>>>>>>>>> (hostname, dnsnames[0]))\nssl.CertificateError: hostname >>>>>>>>>>>>>>>>> 'fd00:fd00:fd00:9900::2ef' doesn't match 'undercloud.com'\n\nDuring >>>>>>>>>>>>>>>>> handling of the above exception, another exception occurred:\n\nTraceback >>>>>>>>>>>>>>>>> (most recent call last):\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/requests/adapters.py\", line 449, in >>>>>>>>>>>>>>>>> send\n timeout=timeout\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/urllib3/connectionpool.py\", line 638, >>>>>>>>>>>>>>>>> in urlopen\n _stacktrace=sys.exc_info()[2])\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/urllib3/util/retry.py\", line 399, in >>>>>>>>>>>>>>>>> increment\n raise MaxRetryError(_pool, url, error or >>>>>>>>>>>>>>>>> ResponseError(cause))\nurllib3.exceptions.MaxRetryError: >>>>>>>>>>>>>>>>> HTTPSConnectionPool(host='fd00:fd00:fd00:9900::2ef', port=13000): Max >>>>>>>>>>>>>>>>> retries exceeded with url: / (Caused by >>>>>>>>>>>>>>>>> SSLError(CertificateError(\"hostname 'fd00:fd00:fd00:9900::2ef' doesn't >>>>>>>>>>>>>>>>> match 'undercloud.com'\",),))\n\nDuring handling of the >>>>>>>>>>>>>>>>> above exception, another exception occurred:\n\nTraceback (most recent call >>>>>>>>>>>>>>>>> last):\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 1022, >>>>>>>>>>>>>>>>> in _send_request\n resp = self.session.request(method, url, **kwargs)\n >>>>>>>>>>>>>>>>> File \"/usr/lib/python3.6/site-packages/requests/sessions.py\", line 533, >>>>>>>>>>>>>>>>> in request\n resp = self.send(prep, **send_kwargs)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/requests/sessions.py\", line 646, in >>>>>>>>>>>>>>>>> send\n r = adapter.send(request, **kwargs)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/requests/adapters.py\", line 514, in >>>>>>>>>>>>>>>>> send\n raise SSLError(e, request=request)\nrequests.exceptions.SSLError: >>>>>>>>>>>>>>>>> HTTPSConnectionPool(host='fd00:fd00:fd00:9900::2ef', port=13000): Max >>>>>>>>>>>>>>>>> retries exceeded with url: / (Caused by >>>>>>>>>>>>>>>>> SSLError(CertificateError(\"hostname 'fd00:fd00:fd00:9900::2ef' doesn't >>>>>>>>>>>>>>>>> match 'undercloud.com'\",),))\n\nDuring handling of the >>>>>>>>>>>>>>>>> above exception, another exception occurred:\n\nTraceback (most recent call >>>>>>>>>>>>>>>>> last):\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py\", >>>>>>>>>>>>>>>>> line 138, in _do_create_plugin\n authenticated=False)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py\", line >>>>>>>>>>>>>>>>> 610, in get_discovery\n authenticated=authenticated)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/discover.py\", line 1452, >>>>>>>>>>>>>>>>> in get_discovery\n disc = Discover(session, url, >>>>>>>>>>>>>>>>> authenticated=authenticated)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/discover.py\", line 536, >>>>>>>>>>>>>>>>> in __init__\n authenticated=authenticated)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/discover.py\", line 102, >>>>>>>>>>>>>>>>> in get_version_data\n resp = session.get(url, headers=headers, >>>>>>>>>>>>>>>>> authenticated=authenticated)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 1141, >>>>>>>>>>>>>>>>> in get\n return self.request(url, 'GET', **kwargs)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 931, in >>>>>>>>>>>>>>>>> request\n resp = send(**kwargs)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 1026, >>>>>>>>>>>>>>>>> in _send_request\n raise >>>>>>>>>>>>>>>>> exceptions.SSLError(msg)\nkeystoneauth1.exceptions.connection.SSLError: SSL >>>>>>>>>>>>>>>>> exception connecting to >>>>>>>>>>>>>>>>> https://[fd00:fd00:fd00:9900::2ef]:13000: >>>>>>>>>>>>>>>>> HTTPSConnectionPool(host='fd00:fd00:fd00:9900::2ef', port=13000): Max >>>>>>>>>>>>>>>>> retries exceeded with url: / (Caused by >>>>>>>>>>>>>>>>> SSLError(CertificateError(\"hostname 'fd00:fd00:fd00:9900::2ef' doesn't >>>>>>>>>>>>>>>>> match 'undercloud.com'\",),))\n\nDuring handling of the >>>>>>>>>>>>>>>>> above exception, another exception occurred:\n\nTraceback (most recent call >>>>>>>>>>>>>>>>> last):\n File \"\", line 102, in \n File \"\", line >>>>>>>>>>>>>>>>> 94, in _ansiballz_main\n File \"\", line 40, in invoke_module\n >>>>>>>>>>>>>>>>> File \"/usr/lib64/python3.6/runpy.py\", line 205, in run_module\n >>>>>>>>>>>>>>>>> return _run_module_code(code, init_globals, run_name, mod_spec)\n File >>>>>>>>>>>>>>>>> \"/usr/lib64/python3.6/runpy.py\", line 96, in _run_module_code\n >>>>>>>>>>>>>>>>> mod_name, mod_spec, pkg_name, script_name)\n File >>>>>>>>>>>>>>>>> \"/usr/lib64/python3.6/runpy.py\", line 85, in _run_code\n exec(code, >>>>>>>>>>>>>>>>> run_globals)\n File >>>>>>>>>>>>>>>>> \"/tmp/ansible_openstack.cloud.catalog_service_payload_7ikyjf7t/ansible_openstack.cloud.catalog_service_payload.zip/ansible_collections/openstack/cloud/plugins/modules/catalog_service.py\", >>>>>>>>>>>>>>>>> line 185, in \n File >>>>>>>>>>>>>>>>> \"/tmp/ansible_openstack.cloud.catalog_service_payload_7ikyjf7t/ansible_openstack.cloud.catalog_service_payload.zip/ansible_collections/openstack/cloud/plugins/modules/catalog_service.py\", >>>>>>>>>>>>>>>>> line 181, in main\n File >>>>>>>>>>>>>>>>> \"/tmp/ansible_openstack.cloud.catalog_service_payload_7ikyjf7t/ansible_openstack.cloud.catalog_service_payload.zip/ansible_collections/openstack/cloud/plugins/module_utils/openstack.py\", >>>>>>>>>>>>>>>>> line 407, in __call__\n File >>>>>>>>>>>>>>>>> \"/tmp/ansible_openstack.cloud.catalog_service_payload_7ikyjf7t/ansible_openstack.cloud.catalog_service_payload.zip/ansible_collections/openstack/cloud/plugins/modules/catalog_service.py\", >>>>>>>>>>>>>>>>> line 141, in run\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py\", line >>>>>>>>>>>>>>>>> 517, in search_services\n services = self.list_services()\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py\", line >>>>>>>>>>>>>>>>> 492, in list_services\n if self._is_client_version('identity', 2):\n >>>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/openstack/cloud/openstackcloud.py\", >>>>>>>>>>>>>>>>> line 460, in _is_client_version\n client = getattr(self, client_name)\n >>>>>>>>>>>>>>>>> File \"/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py\", >>>>>>>>>>>>>>>>> line 32, in _identity_client\n 'identity', min_version=2, >>>>>>>>>>>>>>>>> max_version='3.latest')\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/openstack/cloud/openstackcloud.py\", >>>>>>>>>>>>>>>>> line 407, in _get_versioned_client\n if adapter.get_endpoint():\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py\", line 291, in >>>>>>>>>>>>>>>>> get_endpoint\n return self.session.get_endpoint(auth or self.auth, >>>>>>>>>>>>>>>>> **kwargs)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 1243, >>>>>>>>>>>>>>>>> in get_endpoint\n return auth.get_endpoint(self, **kwargs)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py\", line >>>>>>>>>>>>>>>>> 380, in get_endpoint\n allow_version_hack=allow_version_hack, >>>>>>>>>>>>>>>>> **kwargs)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py\", line >>>>>>>>>>>>>>>>> 271, in get_endpoint_data\n service_catalog = >>>>>>>>>>>>>>>>> self.get_access(session).service_catalog\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py\", line >>>>>>>>>>>>>>>>> 134, in get_access\n self.auth_ref = self.get_auth_ref(session)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py\", >>>>>>>>>>>>>>>>> line 206, in get_auth_ref\n self._plugin = >>>>>>>>>>>>>>>>> self._do_create_plugin(session)\n File >>>>>>>>>>>>>>>>> \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py\", >>>>>>>>>>>>>>>>> line 161, in _do_create_plugin\n 'auth_url is correct. >>>>>>>>>>>>>>>>> %s' % e)\nkeystoneauth1.exceptions.discovery.DiscoveryFailure: Could not >>>>>>>>>>>>>>>>> find versioned identity endpoints when attempting to authenticate. Please >>>>>>>>>>>>>>>>> check that your auth_url is correct. SSL exception connecting to >>>>>>>>>>>>>>>>> https://[fd00:fd00:fd00:9900::2ef]:13000: >>>>>>>>>>>>>>>>> HTTPSConnectionPool(host='fd00:fd00:fd00:9900::2ef', port=13000): Max >>>>>>>>>>>>>>>>> retries exceeded with url: / (Caused by >>>>>>>>>>>>>>>>> SSLError(CertificateError(\"hostname 'fd00:fd00:fd00:9900::2ef' doesn't >>>>>>>>>>>>>>>>> match 'overcloud.example.com'\",),))\n", "module_stdout": >>>>>>>>>>>>>>>>> "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1} >>>>>>>>>>>>>>>>> 2022-07-08 17:03:23.609354 | >>>>>>>>>>>>>>>>> 5254009a-6a3c-adb1-f96f-0000000072ac | TIMING | Clean up legacy Cinder >>>>>>>>>>>>>>>>> keystone catalog entries | undercloud | 0:11:01.271914 | 2.47s >>>>>>>>>>>>>>>>> 2022-07-08 17:03:23.611094 | >>>>>>>>>>>>>>>>> 5254009a-6a3c-adb1-f96f-0000000072ac | TIMING | Clean up legacy Cinder >>>>>>>>>>>>>>>>> keystone catalog entries | undercloud | 0:11:01.273659 | 2.47s >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> PLAY RECAP >>>>>>>>>>>>>>>>> ********************************************************************* >>>>>>>>>>>>>>>>> localhost : ok=0 changed=0 >>>>>>>>>>>>>>>>> unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 >>>>>>>>>>>>>>>>> overcloud-controller-0 : ok=437 changed=104 >>>>>>>>>>>>>>>>> unreachable=0 failed=0 skipped=214 rescued=0 ignored=0 >>>>>>>>>>>>>>>>> overcloud-controller-1 : ok=436 changed=101 >>>>>>>>>>>>>>>>> unreachable=0 failed=0 skipped=214 rescued=0 ignored=0 >>>>>>>>>>>>>>>>> overcloud-controller-2 : ok=431 changed=101 >>>>>>>>>>>>>>>>> unreachable=0 failed=0 skipped=214 rescued=0 ignored=0 >>>>>>>>>>>>>>>>> overcloud-novacompute-0 : ok=345 changed=83 >>>>>>>>>>>>>>>>> unreachable=0 failed=0 skipped=198 rescued=0 ignored=0 >>>>>>>>>>>>>>>>> undercloud : ok=28 changed=7 >>>>>>>>>>>>>>>>> unreachable=0 failed=1 skipped=3 rescued=0 ignored=0 >>>>>>>>>>>>>>>>> 2022-07-08 17:03:23.647270 | >>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information >>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>>>>>>>>>>>>> 2022-07-08 17:03:23.647907 | >>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1373 >>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> in the deploy.sh: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> openstack overcloud deploy --templates \ >>>>>>>>>>>>>>>>> -r /home/stack/templates/roles_data.yaml \ >>>>>>>>>>>>>>>>> --networks-file >>>>>>>>>>>>>>>>> /home/stack/templates/custom_network_data.yaml \ >>>>>>>>>>>>>>>>> --vip-file /home/stack/templates/custom_vip_data.yaml >>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>> --baremetal-deployment >>>>>>>>>>>>>>>>> /home/stack/templates/overcloud-baremetal-deploy.yaml \ >>>>>>>>>>>>>>>>> --network-config \ >>>>>>>>>>>>>>>>> -e /home/stack/templates/environment.yaml \ >>>>>>>>>>>>>>>>> -e >>>>>>>>>>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml >>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>> -e >>>>>>>>>>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-inspector.yaml >>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>> -e >>>>>>>>>>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-overcloud.yaml >>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>> -e /home/stack/templates/ironic-config.yaml \ >>>>>>>>>>>>>>>>> -e >>>>>>>>>>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml >>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>> -e >>>>>>>>>>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/services/ptp.yaml \ >>>>>>>>>>>>>>>>> -e >>>>>>>>>>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-tls.yaml >>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>> -e >>>>>>>>>>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml >>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>> -e >>>>>>>>>>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/ssl/inject-trust-anchor.yaml >>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>> -e >>>>>>>>>>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ >>>>>>>>>>>>>>>>> -e >>>>>>>>>>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \ >>>>>>>>>>>>>>>>> -e /home/stack/containers-prepare-parameter.yaml >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Addition lines as highlighted in yellow were passed with >>>>>>>>>>>>>>>>> modifications: >>>>>>>>>>>>>>>>> tls-endpoints-public-ip.yaml: >>>>>>>>>>>>>>>>> Passed as is in the defaults. >>>>>>>>>>>>>>>>> enable-tls.yaml: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # >>>>>>>>>>>>>>>>> ******************************************************************* >>>>>>>>>>>>>>>>> # This file was created automatically by the sample >>>>>>>>>>>>>>>>> environment >>>>>>>>>>>>>>>>> # generator. Developers should use `tox -e genconfig` to >>>>>>>>>>>>>>>>> update it. >>>>>>>>>>>>>>>>> # Users are recommended to make changes to a copy of the >>>>>>>>>>>>>>>>> file instead >>>>>>>>>>>>>>>>> # of the original, if any customizations are needed. >>>>>>>>>>>>>>>>> # >>>>>>>>>>>>>>>>> ******************************************************************* >>>>>>>>>>>>>>>>> # title: Enable SSL on OpenStack Public Endpoints >>>>>>>>>>>>>>>>> # description: | >>>>>>>>>>>>>>>>> # Use this environment to pass in certificates for SSL >>>>>>>>>>>>>>>>> deployments. >>>>>>>>>>>>>>>>> # For these values to take effect, one of the >>>>>>>>>>>>>>>>> tls-endpoints-*.yaml >>>>>>>>>>>>>>>>> # environments must also be used. >>>>>>>>>>>>>>>>> parameter_defaults: >>>>>>>>>>>>>>>>> # Set CSRF_COOKIE_SECURE / SESSION_COOKIE_SECURE in >>>>>>>>>>>>>>>>> Horizon >>>>>>>>>>>>>>>>> # Type: boolean >>>>>>>>>>>>>>>>> HorizonSecureCookies: True >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # Specifies the default CA cert to use if TLS is used >>>>>>>>>>>>>>>>> for services in the public network. >>>>>>>>>>>>>>>>> # Type: string >>>>>>>>>>>>>>>>> PublicTLSCAFile: >>>>>>>>>>>>>>>>> '/etc/pki/ca-trust/source/anchors/overcloud-cacert.pem' >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # The content of the SSL certificate (without Key) in >>>>>>>>>>>>>>>>> PEM format. >>>>>>>>>>>>>>>>> # Type: string >>>>>>>>>>>>>>>>> SSLRootCertificate: | >>>>>>>>>>>>>>>>> -----BEGIN CERTIFICATE----- >>>>>>>>>>>>>>>>> ----*** CERTICATELINES TRIMMED ** >>>>>>>>>>>>>>>>> -----END CERTIFICATE----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> SSLCertificate: | >>>>>>>>>>>>>>>>> -----BEGIN CERTIFICATE----- >>>>>>>>>>>>>>>>> ----*** CERTICATELINES TRIMMED ** >>>>>>>>>>>>>>>>> -----END CERTIFICATE----- >>>>>>>>>>>>>>>>> # The content of an SSL intermediate CA certificate in >>>>>>>>>>>>>>>>> PEM format. >>>>>>>>>>>>>>>>> # Type: string >>>>>>>>>>>>>>>>> SSLIntermediateCertificate: '' >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # The content of the SSL Key in PEM format. >>>>>>>>>>>>>>>>> # Type: string >>>>>>>>>>>>>>>>> SSLKey: | >>>>>>>>>>>>>>>>> -----BEGIN PRIVATE KEY----- >>>>>>>>>>>>>>>>> ----*** CERTICATELINES TRIMMED ** >>>>>>>>>>>>>>>>> -----END PRIVATE KEY----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # ****************************************************** >>>>>>>>>>>>>>>>> # Static parameters - these are values that must be >>>>>>>>>>>>>>>>> # included in the environment but should not be changed. >>>>>>>>>>>>>>>>> # ****************************************************** >>>>>>>>>>>>>>>>> # The filepath of the certificate as it will be stored >>>>>>>>>>>>>>>>> in the controller. >>>>>>>>>>>>>>>>> # Type: string >>>>>>>>>>>>>>>>> DeployedSSLCertificatePath: >>>>>>>>>>>>>>>>> /etc/pki/tls/private/overcloud_endpoint.pem >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # ********************* >>>>>>>>>>>>>>>>> # End static parameters >>>>>>>>>>>>>>>>> # ********************* >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> inject-trust-anchor.yaml >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # >>>>>>>>>>>>>>>>> ******************************************************************* >>>>>>>>>>>>>>>>> # This file was created automatically by the sample >>>>>>>>>>>>>>>>> environment >>>>>>>>>>>>>>>>> # generator. Developers should use `tox -e genconfig` to >>>>>>>>>>>>>>>>> update it. >>>>>>>>>>>>>>>>> # Users are recommended to make changes to a copy of the >>>>>>>>>>>>>>>>> file instead >>>>>>>>>>>>>>>>> # of the original, if any customizations are needed. >>>>>>>>>>>>>>>>> # >>>>>>>>>>>>>>>>> ******************************************************************* >>>>>>>>>>>>>>>>> # title: Inject SSL Trust Anchor on Overcloud Nodes >>>>>>>>>>>>>>>>> # description: | >>>>>>>>>>>>>>>>> # When using an SSL certificate signed by a CA that is >>>>>>>>>>>>>>>>> not in the default >>>>>>>>>>>>>>>>> # list of CAs, this environment allows adding a custom >>>>>>>>>>>>>>>>> CA certificate to >>>>>>>>>>>>>>>>> # the overcloud nodes. >>>>>>>>>>>>>>>>> parameter_defaults: >>>>>>>>>>>>>>>>> # The content of a CA's SSL certificate file in PEM >>>>>>>>>>>>>>>>> format. This is evaluated on the client side. >>>>>>>>>>>>>>>>> # Mandatory. This parameter must be set by the user. >>>>>>>>>>>>>>>>> # Type: string >>>>>>>>>>>>>>>>> SSLRootCertificate: | >>>>>>>>>>>>>>>>> -----BEGIN CERTIFICATE----- >>>>>>>>>>>>>>>>> ----*** CERTICATELINES TRIMMED ** >>>>>>>>>>>>>>>>> -----END CERTIFICATE----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> resource_registry: >>>>>>>>>>>>>>>>> OS::TripleO::NodeTLSCAData: >>>>>>>>>>>>>>>>> ../../puppet/extraconfig/tls/ca-inject.yaml >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The procedure to create such files was followed using: >>>>>>>>>>>>>>>>> Deploying with SSL ? TripleO 3.0.0 documentation >>>>>>>>>>>>>>>>> (openstack.org) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Idea is to deploy overcloud with SSL enabled i.e* Self-signed >>>>>>>>>>>>>>>>> IP-based certificate, without DNS. * >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Any idea around this error would be of great help. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> skype: lokendrarathour >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> ~ Lokendra >>>>>>>>>>> skype: lokendrarathour >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~ Lokendra >>>>>>>> skype: lokendrarathour >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> ~ Lokendra >>>>>> skype: lokendrarathour >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> ~ Lokendra >>>>> skype: lokendrarathour >>>>> >>>>> >>>>> >>> >>> -- >>> ~ Lokendra >>> skype: lokendrarathour >>> >>> >>> >> >> -- >> ~ Lokendra >> skype: lokendrarathour >> >> >> >> > > -- > ~ Lokendra > skype: lokendrarathour > > > -- ~ Lokendra skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkchn.in at gmail.com Tue Aug 2 11:46:50 2022 From: kkchn.in at gmail.com (KK CHN) Date: Tue, 2 Aug 2022 17:16:50 +0530 Subject: Horizon dashboard query In-Reply-To: References: Message-ID: Does this need to be a separate web application? or Customizing horizon dashboard ? Which is feasible ? On Tue, Aug 2, 2022 at 1:06 PM Amit Uniyal wrote: > Hello, > > You want to automate and provide a web interface solution. > > The simplest way would be to create a new web application (which can have > only 2-3 pages) with an input form asking for VM details and usage. > > 1. Get all info in json format, update it as per available image, flavor > details. > > Now you have all details of request, you can add an approval system here > manual/automated (as per usage and quota assigned). > > 2. Convert this to a heat template, upload to swift(for future reference), > and call heat api. > Why heat ? > It will allow you to create n+ number of VM at once, for example can > create a full lab, having different instance flavor on different networks. > > 3. Update VM deployment status and access info back in the web application. > > Tools: > web app: Django or node js > > Docs: > https://docs.openstack.org/api-ref/orchestration/v1/ > https://docs.openstack.org/heat/rocky/template_guide/hot_guide.html > https://docs.openstack.org/ocata/cli-reference/heat.html#heat-stack-create > > Regards > > On Tue, Aug 2, 2022 at 9:14 AM KK CHN wrote: > >> Hi everyone! >> I need a guide or advice or anything. >> >> I am administering a private cloud in openstack. As I am using the >> Horizon dashboard for the VM provisioning and administering activities. >> But the users are sending requests in mail and creating the VMs for them. >> can we make workflow automation for the request and approval and >> provisioning automatically in the Horizon dashboard on approval? >> >> Has anyone already done /have idea on these kinds of solutions for your >> openstack cloud ??. Is there some kind of documentation that could help >> me?. >> >> Any guidance much appreciated, where to start, what to refer and which >> tool/programming language best if I need to code from scratch. >> >> thanks in advance. >> Krish >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Aug 2 11:55:54 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 2 Aug 2022 11:55:54 +0000 Subject: [DIB][diskimage-builder] Rocky Linux image build method In-Reply-To: References: Message-ID: <20220802115553.wdjf2exi7ve7yq6t@yuggoth.org> On 2022-08-02 12:17:58 +1000 (+1000), Ian Wienand wrote: [...] > In terms of having your dependencies available in final images, > nothing really beats specifying them explicitly. Something like [1]. > I think I'd encourage this rather than trying to add another platform > to support in dib. [...] And, just to be clear, in OpenDev our goal is to have only the things necessary to start the VM and connect with Ansible. We do also cache some expensive-to-retrieve things like Git repositories and files nearly every job is otherwise going to try to fetch over the network into these images, in order to improve job startup times and stability, but job-specific or project-specific dependencies should be explicitly installed by the jobs themselves. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From apetrich at redhat.com Tue Aug 2 12:34:43 2022 From: apetrich at redhat.com (Adriano Petrich) Date: Tue, 2 Aug 2022 14:34:43 +0200 Subject: Horizon dashboard query In-Reply-To: References: Message-ID: Another option is what Cern does that uses a mistral workflow to do something somewhat similar to that. Users can request resources and a mistral workflow provisions them. Here is the repo for some of their workflows. You can probably find a more detailed infrastructure description on one of their talks. https://gitlab.cern.ch/cloud-infrastructure/mistral-workflows Cheers, Adriano On Tue, 2 Aug 2022 at 14:05, KK CHN wrote: > Does this need to be a separate web application? or Customizing horizon > dashboard ? Which is feasible ? > > On Tue, Aug 2, 2022 at 1:06 PM Amit Uniyal wrote: > >> Hello, >> >> You want to automate and provide a web interface solution. >> >> The simplest way would be to create a new web application (which can have >> only 2-3 pages) with an input form asking for VM details and usage. >> >> 1. Get all info in json format, update it as per available image, flavor >> details. >> >> Now you have all details of request, you can add an approval system here >> manual/automated (as per usage and quota assigned). >> >> 2. Convert this to a heat template, upload to swift(for future >> reference), and call heat api. >> Why heat ? >> It will allow you to create n+ number of VM at once, for example can >> create a full lab, having different instance flavor on different networks. >> >> 3. Update VM deployment status and access info back in the web >> application. >> >> Tools: >> web app: Django or node js >> >> Docs: >> https://docs.openstack.org/api-ref/orchestration/v1/ >> https://docs.openstack.org/heat/rocky/template_guide/hot_guide.html >> https://docs.openstack.org/ocata/cli-reference/heat.html#heat-stack-create >> >> Regards >> >> On Tue, Aug 2, 2022 at 9:14 AM KK CHN wrote: >> >>> Hi everyone! >>> I need a guide or advice or anything. >>> >>> I am administering a private cloud in openstack. As I am using the >>> Horizon dashboard for the VM provisioning and administering activities. >>> But the users are sending requests in mail and creating the VMs for them. >>> can we make workflow automation for the request and approval and >>> provisioning automatically in the Horizon dashboard on approval? >>> >>> Has anyone already done /have idea on these kinds of solutions for your >>> openstack cloud ??. Is there some kind of documentation that could help >>> me?. >>> >>> Any guidance much appreciated, where to start, what to refer and which >>> tool/programming language best if I need to code from scratch. >>> >>> thanks in advance. >>> Krish >>> >> -- Adriano Vieira Petrich Software Engineer He/Him/His Red Hat GmbH , Registered seat: Werner von Siemens Ring 14, D-85630 Grasbrunn, Germany Commercial register: Amtsgericht Muenchen/Munich, HRB 153243, Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross Commercial register: Amtsgericht Muenchen/Munich, HRB 153243, Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Aug 2 12:35:24 2022 From: smooney at redhat.com (Sean Mooney) Date: Tue, 02 Aug 2022 13:35:24 +0100 Subject: Horizon dashboard query In-Reply-To: References: <20220802085807.Horde.ImuuQV5l-gxRWtDua4lN2OX@webmail.nde.ag> Message-ID: On Tue, 2022-08-02 at 17:13 +0530, KK CHN wrote: > The use case is, the user requests need to be controlled. Cannot provide > each project for each user. Even Though your infrastructure is large > enough to support each project for each user ( for a large user base > automation with approval/rejection like this is a fair practice.) normal users will only see the project they are a member off not all porject so this sound like a admin use cuase not a enduser use case corect. > > > On Tue, Aug 2, 2022 at 2:33 PM Eugen Block wrote: > > > I don't have any suggestions for the automation here but a question. > > Couldn't you just give each user a project so they can login and > > create their instances themselves? > > > > > > Zitat von Amit Uniyal : > > > > > Hello, > > > > > > You want to automate and provide a web interface solution. > > > > > > The simplest way would be to create a new web application (which can have > > > only 2-3 pages) with an input form asking for VM details and usage. > > > > > > 1. Get all info in json format, update it as per available image, flavor > > > details. > > > > > > Now you have all details of request, you can add an approval system here > > > manual/automated (as per usage and quota assigned). > > > > > > 2. Convert this to a heat template, upload to swift(for future > > reference), > > > and call heat api. > > > Why heat ? > > > It will allow you to create n+ number of VM at once, for example can > > > create a full lab, having different instance flavor on different > > networks. > > > > > > 3. Update VM deployment status and access info back in the web > > application. > > > > > > Tools: > > > web app: Django or node js > > > > > > Docs: > > > https://docs.openstack.org/api-ref/orchestration/v1/ > > > https://docs.openstack.org/heat/rocky/template_guide/hot_guide.html > > > > > https://docs.openstack.org/ocata/cli-reference/heat.html#heat-stack-create > > > > > > Regards > > > > > > On Tue, Aug 2, 2022 at 9:14 AM KK CHN wrote: > > > > > > > Hi everyone! > > > > I need a guide or advice or anything. > > > > > > > > I am administering a private cloud in openstack. As I am using the > > Horizon > > > > dashboard for the VM provisioning and administering activities. > > > > But the users are sending requests in mail and creating the VMs for > > them. > > > > can we make workflow automation for the request and approval and > > > > provisioning automatically in the Horizon dashboard on approval? > > > > > > > > Has anyone already done /have idea on these kinds of solutions for your > > > > openstack cloud ??. Is there some kind of documentation that could help > > > > me?. > > > > > > > > Any guidance much appreciated, where to start, what to refer and which > > > > tool/programming language best if I need to code from scratch. > > > > > > > > thanks in advance. > > > > Krish > > > > > > > > > > > > > > From ueha.ayumu at fujitsu.com Tue Aug 2 12:44:01 2022 From: ueha.ayumu at fujitsu.com (ueha.ayumu at fujitsu.com) Date: Tue, 2 Aug 2022 12:44:01 +0000 Subject: [ceilometer][gnocchi][tacker] Internal Server Error in devstack for Zuul gate job In-Reply-To: References: Message-ID: Hi Radek Thanks for your information! Umm.. When I checked the log, the following workarounds was suggested. ----------- If you cannot immediately regenerate your protos, some other possible workarounds are: 1. Downgrade the protobuf package to 3.20.x or lower. 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower). More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates ----------- The trigger seems to be that requirements upgraded the version of protobuf from 3.20.1 to 4.21.X. https://opendev.org/openstack/requirements/commit/0f0b7024ece8a47316e4d9775f09f0e8d53b4edb Anyway, we will try workaround "2" as a temporary fix and wait for the telemetry team to solve this problem. Thank you for your help! Best Regards, Ueha -----Original Message----- From: Rados?aw Piliszek Sent: Tuesday, August 2, 2022 7:03 PM To: Ueha, Ayumu/?? ? Cc: openstack-discuss at lists.openstack.org Subject: Re: [ceilometer][gnocchi][tacker] Internal Server Error in devstack for Zuul gate job Hi Ueha, It seems gnocchi is failing and requires a regeneration of the protobuf client: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_712/851478/2/check/tacker-functional-devstack-multinode-sol/71262e6/controller-tacker/logs/screen-gnocchi-api.txt --- Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 devstack at gnocchi-api.service[61040]: File "/usr/local/lib/python3.8/dist-packages/google/protobuf/descriptor.py", line 755, in __new__ Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 devstack at gnocchi-api.service[61040]: _message.Message._CheckCalledFromGeneratedFile() Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 devstack at gnocchi-api.service[61040]: TypeError: Descriptors cannot not be created directly. Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 devstack at gnocchi-api.service[61040]: If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. --- Kind regards, Radek -yoctozepto On Tue, 2 Aug 2022 at 11:53, ueha.ayumu at fujitsu.com wrote: > > Hi telemetry team, > > > > I?m Ueha from Tacker team, > > The Zuul gate job of Tacker failed with the following error. Do you know the solution? > > The gate job has failed, so we would appreciate it if you could deal it with high priority. > > Thanks! > > > > for reference, the same error occurring in the ceilometer patch. > > (https://review.opendev.org/c/openstack/ceilometer/+/851338 ?s > telemetry-dsvm-integration-centos-9s job) > > > > ------------------ > > ++ /opt/stack/ceilometer/devstack/plugin.sh:start_ceilometer:322 : /usr/local/bin/ceilometer-upgrade > > 2022-07-29 05:20:42.125 61523 DEBUG ceilometer.cmd.storage [-] > Upgrading Gnocchi resource types upgrade > /opt/stack/ceilometer/ceilometer/cmd/storage.py:42 > > 2022-07-29 05:20:42.228 61523 CRITICAL ceilometer [-] Unhandled > error: gnocchiclient.exceptions.ClientException: Internal Server Error > (HTTP 500) > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer Traceback (most recent call last): > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/usr/local/bin/ceilometer-upgrade", line 10, in > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer sys.exit(upgrade()) > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/opt/stack/ceilometer/ceilometer/cmd/storage.py", line 49, in upgrade > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer tenacity.Retrying( > > ......... omit ......... > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/usr/local/lib/python3.8/dist-packages/gnocchiclient/client.py", line 52, in request > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer raise exceptions.from_response(resp, method) > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer > gnocchiclient.exceptions.ClientException: Internal Server Error (HTTP > 500) > > 2022-07-29 05:20:42.228 61523 ERROR ceilometer > > ------------------ > > Full log: > https://zuul.opendev.org/t/openstack/build/71262e66ecf34827a8a3435657a > a9b3f > > > > Best Regards, > > Ueha > > From fungi at yuggoth.org Tue Aug 2 13:40:36 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 2 Aug 2022 13:40:36 +0000 Subject: [security-sig] Cancelling our August meeting Message-ID: <20220802134035.t347pwpiuef364r2@yuggoth.org> Apologies for the late notice, but I have travel for an appointment conflicting with our meeting and will be unable to chair it this month. If someone else wants to run the meeting please feel free, though we don't have any pressing agenda items so it's probably fine to skip. I'm also open to holding the meeting next week instead, if anybody is interested in rescheduling. Barring any ad hoc meetings in the interim, our next scheduled meeting is 15:00 UTC on Thursday, September 1. In the meantime, Security SIG folks can be found in the #openstack-security channel on OFTC or by prepending subjects to this ML with [security-sig] of course! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Tue Aug 2 14:05:01 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 2 Aug 2022 14:05:01 +0000 Subject: [security-sig] Any interest in getting together at the PTG? Message-ID: <20220802140500.vyekwfeuhc3t7wpg@yuggoth.org> I was going to bring this up during the monthly meeting, but since I'll be unavailable the next best solution is to raise it here... Are any of the Security SIG participants planning or at least hoping to be in Columbus for the PTG this October? And if so, is there any interest in carving out some spacetime in the schedule for security discussions? We still have another week to decide, but I'm happy to put in a request for us if anyone wants that. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From vfisarov at redhat.com Tue Aug 2 14:46:05 2022 From: vfisarov at redhat.com (Veronika Fisarova) Date: Tue, 2 Aug 2022 16:46:05 +0200 Subject: [TripleO], Undercloud deployment is failing with with the NTP error Message-ID: Good day everyone, I was trying to deploy TripleO via tripleo-quickstart following this guide . Working environment has satisfactory requirements following specifications in the guide. Python 3.6.8. is installed. The following deployment script was used: bash quickstart.sh --tags all -R wallaby -X --teardown all -v $VIRTHOST and the deployment is failing with the following error: 2022-08-02 13:05:13,693 p=21797 u=root n=ansible | 2022-08-02 13:05:13.692871 | 008acbda-7e58-575b-6e97-00000000078a | TASK | Ensure system is NTP time synced 2022-08-02 13:10:04,147 p=21797 u=root n=ansible | 2022-08-02 13:10:04.146932 | 008acbda-7e58-575b-6e97-00000000078a | FATAL | Ensure system is NTP time synced | undercloud | error={"changed": true, "cmd": ["chronyc", "waitsync", "30"], "delta": "0:04:50.254002", "end": "2022-08-02 13:10:04.121815", "msg": "non-zero return code", "rc": 1, "start": "2022-08-02 13:05:13.867813", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 3, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 4, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 5, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 6, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 7, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 8, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 9, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 10, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 11, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 12, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 13, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 14, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 15, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 16, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 17, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 18, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 19, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 20, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 21, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 22, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 23, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 24, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 25, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 26, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 27, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 28, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 29, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 30, refid: 00000000, correction: 0.000000000, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 21, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 22, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 23, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 24, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 25, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 26, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 27, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 28, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 29, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 30, refid: 00000000, correction: 0.000000000, skew: 0.000"]} 2022-08-02 13:10:04,178 p=21797 u=root n=ansible | PLAY RECAP ********************************************************************* 2022-08-02 13:10:04,178 p=21797 u=root n=ansible | localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 2022-08-02 13:10:04,179 p=21797 u=root n=ansible | undercloud : ok=209 changed=110 unreachable=0 failed=1 skipped=50 rescued=0 ignored=2 For more information, please see the copy of the partially deployed undercloud here and here . As far as I know, I can SSH to the deployed undercloud, but I cannot ensure if the undercloud is working properly and how many functions are missing. If you would need any additional information, please email me or use the IRC nick: deydra Any help or suggestions appreciated. -- *Veronika Fisarova* Openstack Validation Framework - Intern Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Aug 2 14:53:18 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 2 Aug 2022 14:53:18 +0000 Subject: [horizon][security-sig][tc] XStatic and JS dependencies Message-ID: <20220802145317.dimrja6x6hx6ood6@yuggoth.org> We discussed this at the PTG, and sending out a recap has been on my to do list for a few months. Sorry about the delay! The tl;dr here is that OpenStack is effectively serving as the upstream distributor for (often vulnerable versions of) JavaScript software we didn't make, and downstream consumers like GNU/Linux distributions are then directly repackaging our copies of these things, and assuming we've taken care of any security vulnerabilities for them. In the early days, we needed a way to make it possible for Horizon to consume JavaScript libraries installed on systems, typically supplied by Linux distros via system level package management. XStatic was and still is a good framework for this, providing a means of bundling up metadata which can be adjusted in deployments to tie in local copies of JS libs so that Horizon can find them, and also allowing us to conveniently use the same sorts of mappings for upstream testing as well. The original idea was that we were shipping XStatic metadata packages, and distributions would take care of supplying the actual JS bits since that's not our specialty as a community. To make our own testing easier, we embedded copies of the actual JS within our XStatic packages, which is where all of this started going horribly wrong. The rationale at the time was that distro package maintainers would know to devendor (unbundle and replace) the JS inside our XStatic packages, substituting standard ones from their distribution. What has happened instead, likely due to a confluence of convenience and lack of clear communication on our part, is that the distros took the entirety of these XStatic packages and redistributed them, JS and all. Why is that bad? Well, our approach to Python dependencies is that we freeze the versions we use at the time of our coordinated release, on the assumption that distros backport any relevant security fixes for them. That's generally okay because we're not shipping those dependencies ourselves, someone else does and takes care of fixing and communicating vulnerabilities in them. For many of the XStatic packages Horizon relies on, however, distros are treating us as the upstream supplier of that software, including the JS bundled inside it. We don't (and really can't reliably) track security vulnerabilities in our dependencies, and JS is no exception in this regard. Unfortunately, when it comes to the outdated versions of JS libs we're redistributing inside those packages, many distros assume we are doing exactly that. Worse, it's not just the Linux distros, but even our own deployment projects following the same pattern. How can we address this? The first and fastest thing we should do is to start clearly communicating that we make no effort to track or fix vulnerabilities for JS inside our XStatic packages. Readme files in all of them should get a big disclaimer added, as visible as possible, that the included "convenience" copies of JS libs in them are not for production use and should be considered insecure, to be replaced at deployment time with actual upstream copies instead (this will be noticed not just when browsing our source code, but for any subsequent releases on PyPI too). Also update any places in our documentation where XStatic packages are mentioned to point this situation out. Ideally, we should reach out to distros and deployment projects we know are installing our copies of the embedded JS libs to make sure they're aware of this problem too. In the long term, we really need to gut the JS content from the source trees of our XStatic packages and make our test jobs use something like NPM or Yarn to fetch the relevant versions at deployment time, installing them into the expected locations pointed to by the XStatic metadata. This will make it more obvious that we're not taking responsibility for the security of those libraries, since we're no longer directly supplying them. I've only got the loosest understanding of how XStatic integration works, and not much familiarity with Horizon or its testing, so am relying on Horizon's contributors to weigh in on whether this course of action is sensible and, more importantly, achievable. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Tue Aug 2 15:05:48 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 2 Aug 2022 15:05:48 +0000 Subject: [kolla][openstack-charms][openstack-chef][openstack-helm][openstackansible][packaging-sig][puppet-openstack][tripleo] XStatic and JS dependencies In-Reply-To: <20220802145317.dimrja6x6hx6ood6@yuggoth.org> References: <20220802145317.dimrja6x6hx6ood6@yuggoth.org> Message-ID: <20220802150548.docnhzb4fywptebk@yuggoth.org> I'm sending this reply separately so I can bring the topic to the attention of all our deployment projects without bloating the subject line of the first post, since it seems like at least some of them are falling into this trap and I'm not sure how to tell which ones (if any) aren't. I've also included the Packaging SIG in order to hopefully reach some of our downstream distribution package maintainers. In short, the XStatic packages we rely on for Horizon's integration of JavaScript libraries include convenience copies of those JS libs which are not to be assumed safe for production use, since we're not the actual authors of that code and are unable address known security vulnerabilities in them. See my longer message for all the details: https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029825.html -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From mrunge at matthias-runge.de Tue Aug 2 17:49:13 2022 From: mrunge at matthias-runge.de (Matthias Runge) Date: Tue, 2 Aug 2022 19:49:13 +0200 Subject: [ceilometer][gnocchi][tacker] Internal Server Error in devstack for Zuul gate job In-Reply-To: References: Message-ID: Hi, For a couple of years, Gnocchi is not anymore part of OpenStack, it is independent[1]. Especially, it is not part of OpenStack Telemetry. As you see, Gnocchi can use some help. Matthias [1] https://julien.danjou.info/gnocchi-independence/ > Am 02.08.2022 um 14:44 schrieb ueha.ayumu at fujitsu.com: > > Hi Radek > > Thanks for your information! > Umm.. When I checked the log, the following workarounds was suggested. > ----------- > If you cannot immediately regenerate your protos, some other possible workarounds are: > 1. Downgrade the protobuf package to 3.20.x or lower. > 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower). > More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates > ----------- > > The trigger seems to be that requirements upgraded the version of protobuf from 3.20.1 to 4.21.X. > https://opendev.org/openstack/requirements/commit/0f0b7024ece8a47316e4d9775f09f0e8d53b4edb > > Anyway, we will try workaround "2" as a temporary fix and wait for the telemetry team to solve this problem. > Thank you for your help! > > Best Regards, > Ueha > > -----Original Message----- > From: Rados?aw Piliszek > Sent: Tuesday, August 2, 2022 7:03 PM > To: Ueha, Ayumu/?? ? > Cc: openstack-discuss at lists.openstack.org > Subject: Re: [ceilometer][gnocchi][tacker] Internal Server Error in devstack for Zuul gate job > > Hi Ueha, > > It seems gnocchi is failing and requires a regeneration of the protobuf client: > > https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_712/851478/2/check/tacker-functional-devstack-multinode-sol/71262e6/controller-tacker/logs/screen-gnocchi-api.txt > > --- > > Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 > devstack at gnocchi-api.service[61040]: File > "/usr/local/lib/python3.8/dist-packages/google/protobuf/descriptor.py", > line 755, in __new__ > Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 > devstack at gnocchi-api.service[61040]: > _message.Message._CheckCalledFromGeneratedFile() > Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 > devstack at gnocchi-api.service[61040]: TypeError: Descriptors cannot not be created directly. > Jul 29 05:20:41.105688 ubuntu-focal-ovh-bhs1-0030564625 > devstack at gnocchi-api.service[61040]: If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. > > --- > > Kind regards, > Radek > -yoctozepto > > On Tue, 2 Aug 2022 at 11:53, ueha.ayumu at fujitsu.com wrote: >> >> Hi telemetry team, >> >> >> >> I?m Ueha from Tacker team, >> >> The Zuul gate job of Tacker failed with the following error. Do you know the solution? >> >> The gate job has failed, so we would appreciate it if you could deal it with high priority. >> >> Thanks! >> >> >> >> for reference, the same error occurring in the ceilometer patch. >> >> (https://review.opendev.org/c/openstack/ceilometer/+/851338 ?s >> telemetry-dsvm-integration-centos-9s job) >> >> >> >> ------------------ >> >> ++ /opt/stack/ceilometer/devstack/plugin.sh:start_ceilometer:322 : /usr/local/bin/ceilometer-upgrade >> >> 2022-07-29 05:20:42.125 61523 DEBUG ceilometer.cmd.storage [-] >> Upgrading Gnocchi resource types upgrade >> /opt/stack/ceilometer/ceilometer/cmd/storage.py:42 >> >> 2022-07-29 05:20:42.228 61523 CRITICAL ceilometer [-] Unhandled >> error: gnocchiclient.exceptions.ClientException: Internal Server Error >> (HTTP 500) >> >> 2022-07-29 05:20:42.228 61523 ERROR ceilometer Traceback (most recent call last): >> >> 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/usr/local/bin/ceilometer-upgrade", line 10, in >> >> 2022-07-29 05:20:42.228 61523 ERROR ceilometer sys.exit(upgrade()) >> >> 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/opt/stack/ceilometer/ceilometer/cmd/storage.py", line 49, in upgrade >> >> 2022-07-29 05:20:42.228 61523 ERROR ceilometer tenacity.Retrying( >> >> ......... omit ......... >> >> 2022-07-29 05:20:42.228 61523 ERROR ceilometer File "/usr/local/lib/python3.8/dist-packages/gnocchiclient/client.py", line 52, in request >> >> 2022-07-29 05:20:42.228 61523 ERROR ceilometer raise exceptions.from_response(resp, method) >> >> 2022-07-29 05:20:42.228 61523 ERROR ceilometer >> gnocchiclient.exceptions.ClientException: Internal Server Error (HTTP >> 500) >> >> 2022-07-29 05:20:42.228 61523 ERROR ceilometer >> >> ------------------ >> >> Full log: >> https://zuul.opendev.org/t/openstack/build/71262e66ecf34827a8a3435657a >> a9b3f >> >> >> >> Best Regards, >> >> Ueha >> >> From fungi at yuggoth.org Tue Aug 2 17:59:24 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 2 Aug 2022 17:59:24 +0000 Subject: [ceilometer][gnocchi][tacker] Internal Server Error in devstack for Zuul gate job In-Reply-To: References: Message-ID: <20220802175923.hcfs4kmqezifzubr@yuggoth.org> On 2022-08-02 19:49:13 +0200 (+0200), Matthias Runge wrote: > For a couple of years, Gnocchi is not anymore part of OpenStack, > it is independent[1]. Especially, it is not part of OpenStack > Telemetry. As you see, Gnocchi can use some help. [...] In particular, Gnocchi left because its maintainers realized that not being part of OpenStack would help them get more contributors, wider usage, and better long-term maintenance. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From mrunge at matthias-runge.de Tue Aug 2 18:35:10 2022 From: mrunge at matthias-runge.de (Matthias Runge) Date: Tue, 2 Aug 2022 20:35:10 +0200 Subject: [kolla][ops] Anyone using the collectd and/or telegraf? In-Reply-To: References: Message-ID: <428696F7-5918-4736-8261-0B09D72AF82B@matthias-runge.de> Hi, With my collectd upstream hat: do we know why it was removed from ubuntu? What is missing? Matthias > Am 27.07.2022 um 16:35 schrieb Rados?aw Piliszek : > > Hi Kolla-flavoured OpenStackers, > > Any of you using the collectd and/or telegraf with Kolla Ansible? > The core team is looking to deprecate their support as it's not tested > and collectd is now gone from Ubuntu Jammy. > Please reply to this mail. > > Cheers, > Radek > -yoctozepto > From mrunge at matthias-runge.de Tue Aug 2 18:44:38 2022 From: mrunge at matthias-runge.de (Matthias Runge) Date: Tue, 2 Aug 2022 20:44:38 +0200 Subject: [kolla][ops] Anyone using the collectd and/or telegraf? In-Reply-To: <428696F7-5918-4736-8261-0B09D72AF82B@matthias-runge.de> References: <428696F7-5918-4736-8261-0B09D72AF82B@matthias-runge.de> Message-ID: <7E64FABB-35AD-4E9D-9AB4-5D4677737529@matthias-runge.de> Hi, I found https://bugs.launchpad.net/ubuntu/+source/collectd/+bug/1971093 which hints, it could go back in, once liboping (only used for the ping plugin) is added back to ubuntu. For the OpenStack use case, the ping plugin does not give much info anyways. Matthias > Am 02.08.2022 um 20:35 schrieb Matthias Runge : > > Hi, > > With my collectd upstream hat: do we know why it was removed from ubuntu? What is missing? > > Matthias > >> Am 27.07.2022 um 16:35 schrieb Rados?aw Piliszek : >> >> Hi Kolla-flavoured OpenStackers, >> >> Any of you using the collectd and/or telegraf with Kolla Ansible? >> The core team is looking to deprecate their support as it's not tested >> and collectd is now gone from Ubuntu Jammy. >> Please reply to this mail. >> >> Cheers, >> Radek >> -yoctozepto >> > From kennelson11 at gmail.com Tue Aug 2 21:24:48 2022 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 2 Aug 2022 16:24:48 -0500 Subject: Reminder! PTG October 2022 Team Signup Deadline Message-ID: Hello Everyone, Don't forget to sign your team up for the next Project Teams Gathering (PTG), which will be held in Columbus, OH from Monday, October 17 to Thursday, October 20th, 2022! If you haven't already done so and your team is interested in participating, please complete the survey[1] by August 12th, 2022 at 7:00 UTC. Then make sure to register[2] for the PTG before prices go up on August 15th! Also, please book in the official hotel block for a discounted rate[3]. Booking in the PTG hotel block helps us keep costs low for all attendees, so please encourage your teams to book here. Thanks! -Kendall (diablo_rojo) [1] Team Survey: https://openinfrafoundation.formstack.com/forms/oct2022_ptg_team_signup [2] PTG Registration: https://openinfra-ptg.eventbrite.com [3] Hotel Block: https://www.hyatt.com/en-US/group-booking/CMHRC/G-L0RT -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony at bakeyournoodle.com Tue Aug 2 23:09:52 2022 From: tony at bakeyournoodle.com (Tony Breeds) Date: Wed, 3 Aug 2022 09:09:52 +1000 Subject: [security-sig] Any interest in getting together at the PTG? In-Reply-To: <20220802140500.vyekwfeuhc3t7wpg@yuggoth.org> References: <20220802140500.vyekwfeuhc3t7wpg@yuggoth.org> Message-ID: On Wed, 3 Aug 2022 at 00:18, Jeremy Stanley wrote: > > I was going to bring this up during the monthly meeting, but since > I'll be unavailable the next best solution is to raise it here... > > Are any of the Security SIG participants planning or at least hoping > to be in Columbus for the PTG this October? And if so, is there any > interest in carving out some spacetime in the schedule for security > discussions? We still have another week to decide, but I'm happy to > put in a request for us if anyone wants that. I'll be at the PTG and keen to "pitch in" with the security-sig. So that's at least 2 people ;P Yours Tony. From fereshtehloghmani at gmail.com Wed Aug 3 09:49:00 2022 From: fereshtehloghmani at gmail.com (fereshteh loghmani) Date: Wed, 3 Aug 2022 14:19:00 +0430 Subject: resize or migrate problem In-Reply-To: <0e10ad6bb29e5c7dc0845abcb6fc2d314b42cf54.camel@redhat.com> References: <0e10ad6bb29e5c7dc0845abcb6fc2d314b42cf54.camel@redhat.com> Message-ID: hello thank you for your response. about your questions: has the image been deleted in glance? - yes my image has been deleted in glance what release of openstack are you using and what storage backend are you using? - openstack_release: "victoria" -i don't use ceph and I use dedicated storage. and the backing file format: raw On Tue, Aug 2, 2022 at 2:37 PM Sean Mooney wrote: > On Tue, 2022-08-02 at 12:33 +0430, fereshteh loghmani wrote: > > hello > > I use OpenStack and I create some VMs in different compute in one region. > > when I resize the VM. if the compute doesn't have enough space, > > automatically the vm will be migrated to another compute. but my problem > is > > here that when it migrated, the _base file doesn't migrate and remains in > > the first compute that it doesn't have enough space. because of this, the > > VM appears with an error status in OpenStack. if I move the _base from > > first compute to final compute and reboot the server this problem has > been > > solved. > > so i need to not transfer _base manually and if it migrated to another > > compute the disk and _base transfer with each other. > > could you please help me to solve this problem? > > the _base file shoudl be automatically copied when requried. > has the image been deleted in glance? > what release of openstack are you using and what storage backend are you > using > for nova and glance. > can i assume nova is useing the defaul qcow2 backend? > > > thanks in advance > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed Aug 3 10:27:47 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 3 Aug 2022 12:27:47 +0200 Subject: [openstack] how to speed up live migration? Message-ID: Hello All, I am looking for a solution to speed up live migration. Instances where ram is used heavily like java application servers, live migration take a long time (more than 20 minutes for 8GB ram instance) and converge mode is already set to True in nova.conf. I also tried with post_copy but it does not change. After the first live migration (very solow) if I try to migrate again it is very fast. I presume the first migration is slow because memory fragmentation when an instance is running on the same compute node for a long time. I am looking for a solution considering the on my computing node I can have a little ram overcommit. Any case I am increasing the number of compute nodes to reduce it. Thanks Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.huettner at mail.schwarz Wed Aug 3 10:55:07 2022 From: felix.huettner at mail.schwarz (=?utf-8?B?RmVsaXggSMO8dHRuZXI=?=) Date: Wed, 3 Aug 2022 10:55:07 +0000 Subject: [openstack] how to speed up live migration? In-Reply-To: References: Message-ID: Hi Ignazio, Is it the actual live-migration that takes long (e.g. the libvirt migration you can watch with ?virsh domjobinfo ?) or the whole live-migration process as observed by nova. We have seen it a few times that the thing that actually takes long is plugging the neutron port on the target hypervisor (although I think this only applies to ml2-ovs). For us this seems to happen because the neutron-openvswitch-agent can take some time to assemble the firewall rules for the security group of the port (especially if you use large remote security groups). This would also explain why migrating back is fast, because the neutron-openvswitch-agent on the source will have the information cached. Alternatively you could have multiple live-migrations queued for the same source hypervisor, but nova only handles them one-by-one (unless you set max_concurrent_live_migrations). -- Felix Huettner From: Ignazio Cassano Sent: Wednesday, August 3, 2022 12:28 PM To: openstack-discuss Subject: [openstack] how to speed up live migration? Hello All, I am looking for a solution to speed up live migration. Instances where ram is used heavily like java application servers, live migration take a long time (more than 20 minutes for 8GB ram instance) and converge mode is already set to True in nova.conf. I also tried with post_copy but it does not change. After the first live migration (very solow) if I try to migrate again it is very fast. I presume the first migration is slow because memory fragmentation when an instance is running on the same compute node for a long time. I am looking for a solution considering the on my computing node I can have a little ram overcommit. Any case I am increasing the number of compute nodes to reduce it. Thanks Ignazio Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Wed Aug 3 11:00:00 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 3 Aug 2022 08:00:00 -0300 Subject: [cinder] Bug deputy report for week of 08-03-2022 Message-ID: This is a bug report from 07-27-2022 to 08-03-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- Low - https://bugs.launchpad.net/cinder/+bug/1983237 "Cannot retype volume of lvm to volume of fujitsu." Unassigned. - https://bugs.launchpad.net/cinder/+bug/1983287 "Infinidat Cinder driver fails to backup attached volume." Fix proposed to master. Cheers, Sofia -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Aug 3 12:04:33 2022 From: smooney at redhat.com (Sean Mooney) Date: Wed, 3 Aug 2022 13:04:33 +0100 Subject: resize or migrate problem In-Reply-To: References: <0e10ad6bb29e5c7dc0845abcb6fc2d314b42cf54.camel@redhat.com> Message-ID: On Wed, Aug 3, 2022 at 10:49 AM fereshteh loghmani wrote: > > hello > thank you for your response. > about your questions: > has the image been deleted in glance? > - yes my image has been deleted in glance ack that is what i tought might be the case. > what release of openstack are you using and what storage backend are you using? > - openstack_release: "victoria" > -i don't use ceph and I use dedicated storage. > and the backing file format: raw ok so there was a know issue where if the image was deleted and the backing file was not present on the source or could not be copied for some reasons that you would encounter the behaivor you observed. normally if the backing file is not accessible on the source node it will be downloaded form glance on the dest. which does not work in the case where the file is deleted. there was a bug directly relate to this which ii believe is fixed but im not sure if its fixed on victoria. i cant find the bug on launchpad quickly but i think if we look at gerrit or the nova code that perhaps lee yarwood fixed it 12-24 months ago. i might be miss remembering but i know there was an edge case for delete images with evacuate and i think it also affect cold migrate. > > On Tue, Aug 2, 2022 at 2:37 PM Sean Mooney wrote: >> >> On Tue, 2022-08-02 at 12:33 +0430, fereshteh loghmani wrote: >> > hello >> > I use OpenStack and I create some VMs in different compute in one region. >> > when I resize the VM. if the compute doesn't have enough space, >> > automatically the vm will be migrated to another compute. but my problem is >> > here that when it migrated, the _base file doesn't migrate and remains in >> > the first compute that it doesn't have enough space. because of this, the >> > VM appears with an error status in OpenStack. if I move the _base from >> > first compute to final compute and reboot the server this problem has been >> > solved. >> > so i need to not transfer _base manually and if it migrated to another >> > compute the disk and _base transfer with each other. >> > could you please help me to solve this problem? >> >> the _base file shoudl be automatically copied when requried. >> has the image been deleted in glance? >> what release of openstack are you using and what storage backend are you using >> for nova and glance. >> can i assume nova is useing the defaul qcow2 backend? >> >> > thanks in advance >> From kdhall at binghamton.edu Wed Aug 3 14:35:02 2022 From: kdhall at binghamton.edu (Dave Hall) Date: Wed, 3 Aug 2022 10:35:02 -0400 Subject: [glance] Error in store configuration. Adding images to store is disabled. Message-ID: Hello, (Any help on this would be greatly appreciated. I've been chasing this for 2 weeks now...) I posted the other day that I had solved this issue, but now it's back. The primary glance-api error message is: Error in store configuration. Adding images to store is disabled. So why is my store disabled? Is there some command or config line that controls this? Shouldn't the store be enabled by default? The relevant stanza in my openstack-user-config.yml is image_hosts: # infra38: # ip: 172.29.236.38 # container_vars: # limit_container_types: glance # glance_remote_client: # - what: "172.29.244.27:/images" # where: "/var/lib/glance/images" # type: "nfs" # options: "_netdev,vers=3,proto=tcp,sec=sys" infra38: ip: 172.29.236.38 container_vars: glance_default_store: file glance_nfs_local_directory: "images" glance_nfs_client: - server: "172.29.244.27" remote_path: "/images" local_path: "/var/lib/glance/images" type: "nfs" options: "_netdev,vers=3,proto=tcp,sec=sys,noauto,user" config_overrides: "{}" The stanza that's commented out produced the same result. My user_variables.yaml does not have any lines pertaining to glance. Also, I've removed the stanzas for the other 2 infra hosts. The glance-api.conf file from the container is: [DEFAULT] # Disable stderr logging use_stderr = False debug = False use_journal = True fatal_deprecations = False bind_host = 172.29.238.205 bind_port = 9292 http_keepalive = True digest_algorithm = sha256 backlog = 4096 workers = 16 cinder_catalog_info = volumev3:cinderv3:internalURL enable_v2_api = True transport_url = rabbit:// glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.84:5671, glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.174:5671, glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.239.33:5671//glance?ssl=1&ssl_version=TLSv1_2&ssl_ca_file= scrub_time = 43200 image_cache_dir = /var/lib/glance/cache/ image_cache_stall_time = 86400 image_cache_max_size = 10737418240 # defaults to true if RBD is used as default store show_image_direct_url = False show_multiple_locations = True enabled_backends = file:file,http:http,cinder:cinder [task] task_executor = taskflow [database] connection = mysql+pymysql:// glance:e6dd6f2ca946c9e6f72bb864387a at 172.29.236.36/glance?charset=utf8&ssl_verify_cert=true max_overflow = 50 max_pool_size = 5 pool_timeout = 30 connection_recycle_time = 600 [keystone_authtoken] insecure = False auth_type = password auth_url = http://172.29.236.36:5000/v3 www_authenticate_uri = http://172.29.236.36:5000 project_domain_id = default user_domain_id = default project_name = service username = glance password = c96a36e76208ee26851c78670d34dcaff1c870 region_name = RegionOne service_token_roles_required = False service_token_roles = service service_type = image memcached_servers = 172.29.239.168:11211,172.29.239.242:11211, 172.29.236.80:11211 token_cache_time = 300 # if your memcached server is shared, use these settings to avoid cache poisoning memcache_security_strategy = ENCRYPT memcache_secret_key = 0b65b4b99155a6430e923fc9c24d9674 [oslo_policy] policy_file = policy.yaml policy_default_rule = default policy_dirs = policy.d [oslo_messaging_notifications] topics = notifications driver = messagingv2 transport_url = rabbit:// glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.84:5671, glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.174:5671, glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.239.33:5671//glance?ssl=1&ssl_version=TLSv1_2&ssl_ca_file= [paste_deploy] flavor = keystone+cachemanagement [glance_store] default_backend = file [file] filesystem_store_datadir = /var/lib/glance/images/ [profiler] enabled = False [oslo_middleware] enable_proxy_headers_parsing = True [cors] allow_headers = origin,content-md5,x-image-meta-checksum,x-storage-token,accept-encoding,x-auth-token,x-identity-status,x-roles,x-service-catalog,x-user-id,x-tenant-id,x-openstack-request-id allow_methods = GET,POST,PUT,PATCH,DELETE allowed_origin = https://osa-portal.cs.binghamton.edu I'd be glad to run glance-api in the foreground with debug (in the container, or course), but it's not obvious from the .service file how to do that. I'd be glad to read source code, but a pointer or two would be handy. If NFS (or Hao's GPFS) just aren't supported anymore please tell me. Thanks. -Dave -- Dave Hall Binghamton University kdhall at binghamton.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.rosser at rd.bbc.co.uk Wed Aug 3 15:55:26 2022 From: jonathan.rosser at rd.bbc.co.uk (Jonathan Rosser) Date: Wed, 3 Aug 2022 16:55:26 +0100 Subject: [glance] Error in store configuration. Adding images to store is disabled. In-Reply-To: References: Message-ID: You should set this up like the example given previously by James Denton, with the variables set in user-variables.yml, not in openstack-user-config.yml. For reference we run a CI job that tests glance+NFS and the configuration used in that is here https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/templates/user_variables_nfs.yml.j2. You would be able to reproduce that using an all-in-one setup by adding 'nfs' to the SCENARIO environment variable. If you enable debug on a service and want to watch the log, you can use journalctl -fu , it should not be necessary to specifically run it in the foreground. Hope this helps, Jonathan. On 03/08/2022 15:35, Dave Hall wrote: > Hello, > > (Any help on this would be greatly appreciated.? I've been chasing > this for 2 weeks now...) > > I posted the other day that I had solved this issue, but now it's > back.? The primary glance-api error message is: > > Error in store configuration. Adding images to store is disabled. > > So why is my store disabled?? Is there some command or config line > that controls this?? Shouldn't the store be enabled by default? > > The relevant stanza in my openstack-user-config.yml is > > image_hosts: > # ?infra38: > # ? ?ip: 172.29.236.38 > # ? ?container_vars: > # ? ? ?limit_container_types: glance > # ? ? ?glance_remote_client: > # ? ? ? ?- what: "172.29.244.27:/images" > # ? ? ? ? ?where: "/var/lib/glance/images" > # ? ? ? ? ?type: "nfs" > # ? ? ? ? ?options: "_netdev,vers=3,proto=tcp,sec=sys" > > ? infra38: > ? ? ip: 172.29.236.38 > ? ? container_vars: > ? ? ? glance_default_store: file > ? ? ? glance_nfs_local_directory: "images" > ? ? ? glance_nfs_client: > ? ? ? ? - server: "172.29.244.27" > ? ? ? ? ? remote_path: "/images" > ? ? ? ? ? local_path: "/var/lib/glance/images" > ? ? ? ? ? type: "nfs" > ? ? ? ? ? options: "_netdev,vers=3,proto=tcp,sec=sys,noauto,user" > ? ? ? ? ? config_overrides: "{}" > > The stanza that's commented out produced the same result. My > user_variables.yaml does not have any lines pertaining to glance.? > Also, I've removed the stanzas for the other 2 infra hosts. > > The glance-api.conf file from the container is: > > [DEFAULT] > # Disable stderr logging > use_stderr = False > debug = False > use_journal = True > fatal_deprecations = False > bind_host = 172.29.238.205 > bind_port = 9292 > http_keepalive = True > digest_algorithm = sha256 > backlog = 4096 > workers = 16 > cinder_catalog_info = volumev3:cinderv3:internalURL > enable_v2_api = True > transport_url = > rabbit://glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.84:5671 > ,glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.174:5671 > ,glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.239.33:5671//glance?ssl=1&ssl_version=TLSv1_2&ssl_ca_file= > > scrub_time = 43200 > image_cache_dir = /var/lib/glance/cache/ > image_cache_stall_time = 86400 > image_cache_max_size = 10737418240 > # defaults to true if RBD is used as default store > show_image_direct_url = False > show_multiple_locations = True > enabled_backends = file:file,http:http,cinder:cinder > > [task] > task_executor = taskflow > > [database] > connection = > mysql+pymysql://glance:e6dd6f2ca946c9e6f72bb864387a at 172.29.236.36/glance?charset=utf8&ssl_verify_cert=true > > max_overflow = 50 > max_pool_size = 5 > pool_timeout = 30 > connection_recycle_time = 600 > > [keystone_authtoken] > insecure = False > auth_type = password > auth_url = http://172.29.236.36:5000/v3 > www_authenticate_uri = http://172.29.236.36:5000 > project_domain_id = default > user_domain_id = default > project_name = service > username = glance > password = c96a36e76208ee26851c78670d34dcaff1c870 > region_name = RegionOne > service_token_roles_required = False > service_token_roles = service > service_type = image > memcached_servers = 172.29.239.168:11211 > ,172.29.239.242:11211 > ,172.29.236.80:11211 > > token_cache_time = 300 > # if your memcached server is shared, use these settings to avoid > cache poisoning > memcache_security_strategy = ENCRYPT > memcache_secret_key = 0b65b4b99155a6430e923fc9c24d9674 > > [oslo_policy] > policy_file = policy.yaml > policy_default_rule = default > policy_dirs = policy.d > > [oslo_messaging_notifications] > topics = notifications > driver = messagingv2 > transport_url = > rabbit://glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.84:5671 > ,glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.174:5671 > ,glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.239.33:5671//glance?ssl=1&ssl_version=TLSv1_2&ssl_ca_file= > > > [paste_deploy] > flavor = keystone+cachemanagement > > [glance_store] > default_backend = file > > [file] > filesystem_store_datadir = /var/lib/glance/images/ > > [profiler] > enabled = False > > [oslo_middleware] > enable_proxy_headers_parsing = True > > [cors] > allow_headers = > origin,content-md5,x-image-meta-checksum,x-storage-token,accept-encoding,x-auth-token,x-identity-status,x-roles,x-service-catalog,x-user-id,x-tenant-id,x-openstack-request-id > allow_methods = GET,POST,PUT,PATCH,DELETE > allowed_origin = https://osa-portal.cs.binghamton.edu > > I'd be glad to run glance-api in the foreground with debug (in the > container, or course), but it's not obvious from the .service file how > to do that.? I'd be glad to read source code, but a pointer or two > would be handy.? If NFS (or Hao's GPFS) just aren't supported anymore > please tell me. > > Thanks. > > -Dave > > -- > Dave Hall > Binghamton University > kdhall at binghamton.edu > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aleksandr.komarov at itglobal.com Wed Aug 3 16:30:41 2022 From: aleksandr.komarov at itglobal.com (Komarov, Aleksandr) Date: Wed, 3 Aug 2022 16:30:41 +0000 Subject: Large files multipart upload fails - OpenStack Object Storage (swift) Message-ID: <34f7cc2975664b1eb99c198267af4731@itglobal.com> Hi to all! Please suggest how to fix the situation. When uploading files larger than 100GB, at the stage of multipart-manifest validation, we get 404 error for some segments that were previously uploaded successfully. As a result, the client receives a 400 response. Uploading a large file is not possible. Logs: Aug 1 13:18:30 swift01-object01 object-server: IP - - [01/Aug/2022:13:18:30 +0000] "PUT /mpathc/12130/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/000 00032" 201 - "PUT http://domain-name/v1/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" "tx30e82374b98845bab6883-0062e7d225" "proxy-server 8367" 128.7641 "-" 21577 0 Aug 1 13:18:30 swift01-object01 container-server: IP - - [01/Aug/2022:13:18:30 +0000] "PUT /mpathb/952/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00 000032" 201 - "PUT http://domain-name/mpathe/12130/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" "tx30e82374b98845bab6883-0062 e7d225" "object-server 24710" 0.0005 "-" 13211 0 Aug 1 14:01:46 swift01-object01 object-server: IP - - [01/Aug/2022:14:01:46 +0000] "HEAD /mpathc/12130/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" 404 - "HEAD http://domain-name/v1/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" "txf09676de38964ca299b77-0062e7dcc8" "proxy-server 8367" 0.0003 "-" 21579 0 At the same time, some segments successfully found. With files less than 50GB everything is fine. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Aug 4 08:46:23 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 04 Aug 2022 14:16:23 +0530 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Aug 4 at 1500 UTC Message-ID: <1826808cdb8.fcb93941134756.5495608853118394686@ghanshyammann.com> Hello Everyone, Below is the agenda for Today TC meeting schedule at 1500 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting == Agenda for Today's TC meeting == * Roll call * Follow up on past action items * Gate health check ** Bare 'recheck' state *** https://etherpad.opendev.org/p/recheck-weekly-summary * 2023.1 cycle PTG Planning * 2023.1 cycle Technical Election planning * RBAC feedback in ops meetup ** https://etherpad.opendev.org/p/rbac-zed-ptg#L171 ** https://review.opendev.org/c/openstack/governance/+/847418 * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open -gmann From dev.faz at gmail.com Thu Aug 4 09:34:39 2022 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Thu, 4 Aug 2022 11:34:39 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: Message-ID: Hi, take a look at: https://docs.openstack.org/nova/latest/admin/configuring-migrations.html#advanced-configuration-for-kvm-and-qemu esp. Auto-convergence and Post-copy Fabian Am Mi., 3. Aug. 2022 um 12:43 Uhr schrieb Ignazio Cassano : > > Hello All, > I am looking for a solution to speed up live migration. > Instances where ram is used heavily like java application servers, live migration take a long time (more than 20 minutes for 8GB ram instance) and converge mode is already set to True in nova.conf. > I also tried with post_copy but it does not change. > After the first live migration (very solow) if I try to migrate again it is very fast. > I presume the first migration is slow because memory fragmentation when an instance is running on the same compute node for a long time. > I am looking for a solution considering the on my computing node I can have > a little ram overcommit. Any case I am increasing the number of compute nodes to reduce it. > Thanks > Ignazio > From ignaziocassano at gmail.com Thu Aug 4 09:58:14 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 4 Aug 2022 11:58:14 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: Message-ID: Hello, menu thanks for your reply. I tried both post copy and converge but the live migrations is always very slow for applications servers like tomcat with java. The very strange behaviour is that if I migrate from compute node A to compute node B it takes more than 20 minutes. After that, if I migrate from compute B to compute A, it takes few seconds. I do not know if its is because dirty memory iscleaned during the first live migration. Live migration network si 10Gbs so I do not think the first live migration is affected by network performances. Ignazio Il giorno gio 4 ago 2022 alle ore 11:34 Fabian Zimmermann ha scritto: > Hi, > > take a look at: > > https://docs.openstack.org/nova/latest/admin/configuring-migrations.html#advanced-configuration-for-kvm-and-qemu > > esp. Auto-convergence and Post-copy > > Fabian > > Am Mi., 3. Aug. 2022 um 12:43 Uhr schrieb Ignazio Cassano > : > > > > Hello All, > > I am looking for a solution to speed up live migration. > > Instances where ram is used heavily like java application servers, live > migration take a long time (more than 20 minutes for 8GB ram instance) and > converge mode is already set to True in nova.conf. > > I also tried with post_copy but it does not change. > > After the first live migration (very solow) if I try to migrate again it > is very fast. > > I presume the first migration is slow because memory fragmentation when > an instance is running on the same compute node for a long time. > > I am looking for a solution considering the on my computing node I can > have > > a little ram overcommit. Any case I am increasing the number of compute > nodes to reduce it. > > Thanks > > Ignazio > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokendrarathour at gmail.com Thu Aug 4 11:37:58 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Thu, 4 Aug 2022 17:07:58 +0530 Subject: [Triple0] [Wallaby] External Ceph Integration getting failed Message-ID: Hi Team, I was trying to integrate External Ceph with Triple0 Wallaby, and at the end of deployment in step4 getting the below error: 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | Create containers from /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | overcloud-controller-2 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.530812 | | WARNING | ERROR: Can't run container nova_libvirt_init_secret stderr: 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | overcloud-novacompute-0 | error={"changed": false, "msg": "Failed containers: nova_libvirt_init_secret"} 2022-08-03 18:37:44,282 p=507732 u *external-ceph.conf:* parameter_defaults: # Enable use of RBD backend in nova-compute NovaEnableRbdBackend: True # Enable use of RBD backend in cinder-volume CinderEnableRbdBackend: True # Backend to use for cinder-backup CinderBackupBackend: ceph # Backend to use for glance GlanceBackend: rbd # Name of the Ceph pool hosting Nova ephemeral images NovaRbdPoolName: vms # Name of the Ceph pool hosting Cinder volumes CinderRbdPoolName: volumes # Name of the Ceph pool hosting Cinder backups CinderBackupRbdPoolName: backups # Name of the Ceph pool hosting Glance images GlanceRbdPoolName: images # Name of the user to authenticate with the external Ceph cluster CephClientUserName: admin # The cluster FSID CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' # The CephX user auth key CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' # The list of Ceph monitors CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' ~ Have tried checking and validating the ceph client details and they seem to be correct, further digging the container log I could see something like this : [root at overcloud-novacompute-0 containers]# tail -f nova_libvirt_init_secret.log tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file or directory tail: no files remaining [root at overcloud-novacompute-0 containers]# tail -f stdouts/nova_libvirt_init_secret.log 2022-08-04T11:48:47.689898197+05:30 stdout F ------------------------------------------------ 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T11:48:47.690625088+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin 2022-08-04T16:20:29.643785538+05:30 stdout F ------------------------------------------------ 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T16:20:29.644785532+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin ^C [root at overcloud-novacompute-0 containers]# tail -f stdouts/nova_compute_init_log.log -- ~ Lokendra skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: From fpantano at redhat.com Thu Aug 4 13:51:45 2022 From: fpantano at redhat.com (Francesco Pantano) Date: Thu, 4 Aug 2022 15:51:45 +0200 Subject: [Triple0] [Wallaby] External Ceph Integration getting failed In-Reply-To: References: Message-ID: Hi, ceph is supposed to be configured by this tripleo-ansible role [1], which is triggered by tht on external_deploy_steps [2]. In theory adding [3] should just work, assuming you customize the ceph cluster mon ip addresses, fsid and a few other related variables. >From your previous email I suspect in your external-ceph.yaml you missed the TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml (see [3]). Thanks, Francesco [1] https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_ceph_client [2] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/cephadm/ceph-client.yaml#L93 [3] https://github.com/openstack/tripleo-heat-templates/blob/master/environments/external-ceph.yaml On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour wrote: > Hi Team, > I was trying to integrate External Ceph with Triple0 Wallaby, and at the > end of deployment in step4 getting the below error: > > 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | > Create containers from > /var/lib/tripleo-config/container-startup-config/step_4 > 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | > /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | > overcloud-controller-2 > 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | > Create containers managed by Podman for > /var/lib/tripleo-config/container-startup-config/step_4 > 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:24.530812 | | WARNING | > ERROR: Can't run container nova_libvirt_init_secret > stderr: > 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | > Create containers managed by Podman for > /var/lib/tripleo-config/container-startup-config/step_4 | > overcloud-novacompute-0 | error={"changed": false, "msg": "Failed > containers: nova_libvirt_init_secret"} > 2022-08-03 18:37:44,282 p=507732 u > > > *external-ceph.conf:* > > parameter_defaults: > # Enable use of RBD backend in nova-compute > NovaEnableRbdBackend: True > # Enable use of RBD backend in cinder-volume > CinderEnableRbdBackend: True > # Backend to use for cinder-backup > CinderBackupBackend: ceph > # Backend to use for glance > GlanceBackend: rbd > # Name of the Ceph pool hosting Nova ephemeral images > NovaRbdPoolName: vms > # Name of the Ceph pool hosting Cinder volumes > CinderRbdPoolName: volumes > # Name of the Ceph pool hosting Cinder backups > CinderBackupRbdPoolName: backups > # Name of the Ceph pool hosting Glance images > GlanceRbdPoolName: images > # Name of the user to authenticate with the external Ceph cluster > CephClientUserName: admin > # The cluster FSID > CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' > # The CephX user auth key > CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' > # The list of Ceph monitors > CephExternalMonHost: > 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' > ~ > > > Have tried checking and validating the ceph client details and they seem > to be correct, further digging the container log I could see something like > this : > > [root at overcloud-novacompute-0 containers]# tail -f > nova_libvirt_init_secret.log > tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file > or directory > tail: no files remaining > [root at overcloud-novacompute-0 containers]# tail -f > stdouts/nova_libvirt_init_secret.log > 2022-08-04T11:48:47.689898197+05:30 stdout F > ------------------------------------------------ > 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets > for: ceph:admin > 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf > was not found > 2022-08-04T11:48:47.690625088+05:30 stdout F Path to > nova_libvirt_init_secret was ceph:admin > 2022-08-04T16:20:29.643785538+05:30 stdout F > ------------------------------------------------ > 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets > for: ceph:admin > 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf > was not found > 2022-08-04T16:20:29.644785532+05:30 stdout F Path to > nova_libvirt_init_secret was ceph:admin > ^C > [root at overcloud-novacompute-0 containers]# tail -f > stdouts/nova_compute_init_log.log > > -- > ~ Lokendra > skype: lokendrarathour > > > -- Francesco Pantano GPG KEY: F41BD75C -------------- next part -------------- An HTML attachment was scrubbed... URL: From pdeore at redhat.com Thu Aug 4 13:52:39 2022 From: pdeore at redhat.com (Pranali Deore) Date: Thu, 4 Aug 2022 19:22:39 +0530 Subject: [Glance] Weekly Meeting Cancelled Message-ID: Hello, Since most of the team members will not be around today, so cancelling today's glance weekly meeting. Thanks & Regards, Pranali -------------- next part -------------- An HTML attachment was scrubbed... URL: From anbanerj at redhat.com Thu Aug 4 14:39:56 2022 From: anbanerj at redhat.com (Ananya Banerjee) Date: Thu, 4 Aug 2022 16:39:56 +0200 Subject: [tripleo] gate blocker - tripleo-ci-centos-9-content-provider-wallaby Message-ID: Hi, We have a failure in tripleo-ci-centos-9-content-provider-wallaby which is blocking the gate. We are reverting [1] to clear the gate. Related Bug : https://bugs.launchpad.net/tripleo/+bug/1983585 [1] https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/multi-node-bridge/templates/zuul-multi-node-bridge-ovs.repo.j2#L27 Thanks, Ananya -- Ananya Banerjee, RHCSA, RHCE-OSP Software Engineer Red Hat EMEA anbanerj at redhat.com M: +491784949931 IM: frenzy_friday @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Thu Aug 4 14:56:48 2022 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 4 Aug 2022 16:56:48 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: Message-ID: <20220804145648.wfyjocspjljc36uf@localhost> On 03/08, Ignazio Cassano wrote: > Hello All, > I am looking for a solution to speed up live migration. > Instances where ram is used heavily like java application servers, live > migration take a long time (more than 20 minutes for 8GB ram instance) and > converge mode is already set to True in nova.conf. Hi, Probably doesn't affect your case, but I assume you are using ephemeral nova boot volumes. Have you tried using only Cinder volumes on the VM? Cheers, Gorka. > I also tried with post_copy but it does not change. > After the first live migration (very solow) if I try to migrate again it is > very fast. > I presume the first migration is slow because memory fragmentation when an > instance is running on the same compute node for a long time. > I am looking for a solution considering the on my computing node I can have > a little ram overcommit. Any case I am increasing the number of compute > nodes to reduce it. > Thanks > Ignazio From ignaziocassano at gmail.com Thu Aug 4 15:09:18 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 4 Aug 2022 17:09:18 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: <20220804145648.wfyjocspjljc36uf@localhost> References: <20220804145648.wfyjocspjljc36uf@localhost> Message-ID: HI, I am using cinder volumes. Ignazio Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor ha scritto: > On 03/08, Ignazio Cassano wrote: > > Hello All, > > I am looking for a solution to speed up live migration. > > Instances where ram is used heavily like java application servers, live > > migration take a long time (more than 20 minutes for 8GB ram instance) > and > > converge mode is already set to True in nova.conf. > > Hi, > > Probably doesn't affect your case, but I assume you are using ephemeral > nova boot volumes. > > Have you tried using only Cinder volumes on the VM? > > Cheers, > Gorka. > > > > I also tried with post_copy but it does not change. > > After the first live migration (very solow) if I try to migrate again it > is > > very fast. > > I presume the first migration is slow because memory fragmentation when > an > > instance is running on the same compute node for a long time. > > I am looking for a solution considering the on my computing node I can > have > > a little ram overcommit. Any case I am increasing the number of compute > > nodes to reduce it. > > Thanks > > Ignazio > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clay.gerrard at gmail.com Thu Aug 4 15:42:38 2022 From: clay.gerrard at gmail.com (Clay Gerrard) Date: Thu, 4 Aug 2022 10:42:38 -0500 Subject: Large files multipart upload fails - OpenStack Object Storage (swift) In-Reply-To: <34f7cc2975664b1eb99c198267af4731@itglobal.com> References: <34f7cc2975664b1eb99c198267af4731@itglobal.com> Message-ID: Those logs are curious. After uploading file-segments, the final request to create a static large object manifest will validate all of the referenced segments with HEAD requests. All the referenced SLO segments have to return a successful 2XX response before the manifest object will be created. I can see you included the partial logs from transaction tx30e82374b98845bab6883-0062e7d225 and txf09676de38964ca299b77-0062e7dcc8 - but it's not entirely clear what happened between the object-server PUT that created the file-segment object (response 201) and the HEAD that failed (response 404). It appears both requests went to the same node (swift0-1-object01) and device (/mpathc) - so unless the file was deleted (or expired, or corrupted, or rebalanced) I would not expect the 404 response. Can you check on the file-segment object AFTER the SLO validation fails? Does it keep returning 404? Where did it go?! On Wed, Aug 3, 2022 at 12:26 PM Komarov, Aleksandr < aleksandr.komarov at itglobal.com> wrote: > Hi to all! > > > > Please suggest how to fix the situation. > > > > When uploading files larger than 100GB, at the stage of multipart-manifest > validation, we get 404 error for some segments that were previously > uploaded successfully. As a result, the client receives a 400 response. > Uploading a large file is not possible. > > > > Logs: > > Aug 1 13:18:30 swift01-object01 object-server: IP - - > [01/Aug/2022:13:18:30 +0000] "PUT > /mpathc/12130/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/000 > > 00032" 201 - "PUT > http://domain-name/v1/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" > "tx30e82374b98845bab6883-0062e7d225" "proxy-server 8367" 128.7641 "-" 21577 > 0 > > Aug 1 13:18:30 swift01-object01 container-server: IP - - > [01/Aug/2022:13:18:30 +0000] "PUT > /mpathb/952/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00 > > 000032" 201 - "PUT > http://domain-name/mpathe/12130/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" > "tx30e82374b98845bab6883-0062 > > e7d225" "object-server 24710" 0.0005 "-" 13211 0 > > Aug 1 14:01:46 swift01-object01 object-server: IP - - > [01/Aug/2022:14:01:46 +0000] "HEAD > /mpathc/12130/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" > 404 - "HEAD > http://domain-name/v1/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" > "txf09676de38964ca299b77-0062e7dcc8" "proxy-server 8367" 0.0003 "-" 21579 0 > > > > At the same time, some segments successfully found. With files less than > 50GB everything is fine. > -- Clay Gerrard -------------- next part -------------- An HTML attachment was scrubbed... URL: From haleyb.dev at gmail.com Thu Aug 4 18:54:34 2022 From: haleyb.dev at gmail.com (Brian Haley) Date: Thu, 4 Aug 2022 14:54:34 -0400 Subject: [all] Dynamic Zuul results table in Gerrit 3 In-Reply-To: References: Message-ID: <36ab44ee-3f25-cfe6-609d-817b7f2189e0@gmail.com> Sorry to respond to such an old thread, but figured the context might help. Sometime in the past couple of months, the Zuul status script (thanks Radoslaw!) that's run in the tamper/greasemonkey browser extension stopped working. Not sure if anyone else noticed and/or has a workaround for it, I'm just not good enough with javascript to fix it myself :( Thanks, -Brian On 12/3/20 04:22, Rados?aw Piliszek wrote: > Hello Fellow OpenStack and OpenDev Folks! > > TL;DR click on [3] and enjoy. > > I am starting this thread to not hijack the discussion happening on [1]. > > First of all, I would like to thank gibi (Balazs Gibizer) for hacking > a way to get the place to render the table in the first place (pun > intended). > > I have been a long-time-now user of [2]. > I have improved and customised it for myself but never really got to > share back the changes I made. > The new Gerrit obviously broke the whole script so it was of no use to > share at that particular state. > However, inspired by gibi's work, I decided to finally sit down and > fix it to work with Gerrit 3 and here it comes: [3]. > Works well on Chrome with Tampermonkey. Not tested others. > > I hope you will enjoy this little helper (I do). > > I know the script looks super fugly but it generally boils down to a > mix of styles of 3 people and Gerrit having funky UI rendering. > > Finally, I'd also like to thank hrw (Marcin Juszkiewicz) for linking > me to the original Michel's script in 2019. > > [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-November/019051.html > [2] https://opendev.org/x/coats/src/commit/444c95738677593dcfed0cfd9667d4c4f0d596a3/coats/openstack_gerrit_zuul_status.user.js > [3] https://gist.github.com/yoctozepto/7ea1271c299d143388b7c1b1802ee75e > > Kind regards, > -yoctozepto > From kdhall at binghamton.edu Fri Aug 5 02:04:12 2022 From: kdhall at binghamton.edu (Dave Hall) Date: Thu, 4 Aug 2022 22:04:12 -0400 Subject: [glance] Error in store configuration. Adding images to store is disabled. In-Reply-To: References: Message-ID: Jonathan, James, A quick thank you - I wanted to report that I finally got it working. A lot of fiddling around with UID/GID settings on the NFS share. (Oddly, Cinder didn't need this.) One thing that had me going for a bit - I thought I'd test the NFS permissions outside of the glance-api process by using 'su -c "touch /var/lib/glance/images/testfile" glance'. When it didn't work, I assumed I didn't have NFS configured correctly. It turns out that the above command won't work if the shell for user glance is set to /bin/false. So, in the case of problems I would propose this as a debugging/verification technique - make sure that user glance can write to the NFS share by 'chsh -s /bin/bash glance' and then the above 'su -c' command. -Dave -- Dave Hall Binghamton University kdhall at binghamton.edu 607-760-2328 (Cell) 607-777-4641 (Office) On Wed, Aug 3, 2022 at 11:56 AM Jonathan Rosser < jonathan.rosser at rd.bbc.co.uk> wrote: > You should set this up like the example given previously by James Denton, > with the variables set in user-variables.yml, not in > openstack-user-config.yml. > > For reference we run a CI job that tests glance+NFS and the configuration > used in that is here > https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/templates/user_variables_nfs.yml.j2. > You would be able to reproduce that using an all-in-one setup by adding > 'nfs' to the SCENARIO environment variable. > > If you enable debug on a service and want to watch the log, you can use > journalctl -fu , it should not be necessary to specifically run > it in the foreground. > > Hope this helps, > Jonathan. > On 03/08/2022 15:35, Dave Hall wrote: > > Hello, > > (Any help on this would be greatly appreciated. I've been chasing this > for 2 weeks now...) > > I posted the other day that I had solved this issue, but now it's back. > The primary glance-api error message is: > > Error in store configuration. Adding images to store is disabled. > > So why is my store disabled? Is there some command or config line that > controls this? Shouldn't the store be enabled by default? > > The relevant stanza in my openstack-user-config.yml is > > image_hosts: > # infra38: > # ip: 172.29.236.38 > # container_vars: > # limit_container_types: glance > # glance_remote_client: > # - what: "172.29.244.27:/images" > # where: "/var/lib/glance/images" > # type: "nfs" > # options: "_netdev,vers=3,proto=tcp,sec=sys" > > infra38: > ip: 172.29.236.38 > container_vars: > glance_default_store: file > glance_nfs_local_directory: "images" > glance_nfs_client: > - server: "172.29.244.27" > remote_path: "/images" > local_path: "/var/lib/glance/images" > type: "nfs" > options: "_netdev,vers=3,proto=tcp,sec=sys,noauto,user" > config_overrides: "{}" > > The stanza that's commented out produced the same result. My > user_variables.yaml does not have any lines pertaining to glance. Also, > I've removed the stanzas for the other 2 infra hosts. > > The glance-api.conf file from the container is: > > [DEFAULT] > # Disable stderr logging > use_stderr = False > debug = False > use_journal = True > fatal_deprecations = False > bind_host = 172.29.238.205 > bind_port = 9292 > http_keepalive = True > digest_algorithm = sha256 > backlog = 4096 > workers = 16 > cinder_catalog_info = volumev3:cinderv3:internalURL > enable_v2_api = True > transport_url = rabbit:// > glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.84:5671, > glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.174:5671, > glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.239.33:5671//glance?ssl=1&ssl_version=TLSv1_2&ssl_ca_file= > scrub_time = 43200 > image_cache_dir = /var/lib/glance/cache/ > image_cache_stall_time = 86400 > image_cache_max_size = 10737418240 > # defaults to true if RBD is used as default store > show_image_direct_url = False > show_multiple_locations = True > enabled_backends = file:file,http:http,cinder:cinder > > [task] > task_executor = taskflow > > [database] > connection = mysql+pymysql:// > glance:e6dd6f2ca946c9e6f72bb864387a at 172.29.236.36/glance?charset=utf8&ssl_verify_cert=true > max_overflow = 50 > max_pool_size = 5 > pool_timeout = 30 > connection_recycle_time = 600 > > [keystone_authtoken] > insecure = False > auth_type = password > auth_url = http://172.29.236.36:5000/v3 > www_authenticate_uri = http://172.29.236.36:5000 > project_domain_id = default > user_domain_id = default > project_name = service > username = glance > password = c96a36e76208ee26851c78670d34dcaff1c870 > region_name = RegionOne > service_token_roles_required = False > service_token_roles = service > service_type = image > memcached_servers = 172.29.239.168:11211,172.29.239.242:11211, > 172.29.236.80:11211 > token_cache_time = 300 > # if your memcached server is shared, use these settings to avoid cache > poisoning > memcache_security_strategy = ENCRYPT > memcache_secret_key = 0b65b4b99155a6430e923fc9c24d9674 > > [oslo_policy] > policy_file = policy.yaml > policy_default_rule = default > policy_dirs = policy.d > > [oslo_messaging_notifications] > topics = notifications > driver = messagingv2 > transport_url = rabbit:// > glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.84:5671, > glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.238.174:5671, > glance:9e7f205ef620983a542b4f915420e50d33f774 at 172.29.239.33:5671//glance?ssl=1&ssl_version=TLSv1_2&ssl_ca_file= > > [paste_deploy] > flavor = keystone+cachemanagement > > [glance_store] > default_backend = file > > [file] > filesystem_store_datadir = /var/lib/glance/images/ > > [profiler] > enabled = False > > [oslo_middleware] > enable_proxy_headers_parsing = True > > [cors] > allow_headers = > origin,content-md5,x-image-meta-checksum,x-storage-token,accept-encoding,x-auth-token,x-identity-status,x-roles,x-service-catalog,x-user-id,x-tenant-id,x-openstack-request-id > allow_methods = GET,POST,PUT,PATCH,DELETE > allowed_origin = https://osa-portal.cs.binghamton.edu > > I'd be glad to run glance-api in the foreground with debug (in the > container, or course), but it's not obvious from the .service file how to > do that. I'd be glad to read source code, but a pointer or two would be > handy. If NFS (or Hao's GPFS) just aren't supported anymore please tell me. > > Thanks. > > -Dave > > -- > Dave Hall > Binghamton University > kdhall at binghamton.edu > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tdtemccna at gmail.com Fri Aug 5 03:06:26 2022 From: tdtemccna at gmail.com (Turritopsis Dohrnii Teo En Ming) Date: Fri, 5 Aug 2022 11:06:26 +0800 Subject: How do I deploy OpenStack open source cloud software on Red Hat Enterprise Linux 9.0? Message-ID: Subject: How do I deploy OpenStack open source cloud software on Red Hat Enterprise Linux 9.0? Good day from Singapore, How do I deploy OpenStack open source cloud software on Red Hat Enterprise Linux 9.0? I have just downloaded RHEL 9.0 ISO using RHEL Free Developer Subscription and installed it as a virtual machine inside Oracle VM VirtualBox. Windows 10 on my laptop is the host. Alternatively, can I deploy Red Hat OpenStack Platform in a Virtual Private Server (VPS)? I can make another order for a VPS just for this purpose. I already have 2 VPS servers just for deploying Virtualmin/Webmin web hosting control panel. If you can give me links to excellent and well-written guides and tutorials, it would be most useful. I want to try out open source software developed by NASA. Also, I have deployed Nextcloud previously. Now I want to try out OpenStack. Thank you. Regards, Mr. Turritopsis Dohrnii Teo En Ming Targeted Individual in Singapore 5 Aug 2022 Friday Blogs: https://tdtemcerts.blogspot.com https://tdtemcerts.wordpress.com From abishop at redhat.com Fri Aug 5 03:29:23 2022 From: abishop at redhat.com (Alan Bishop) Date: Thu, 4 Aug 2022 20:29:23 -0700 Subject: How do I deploy OpenStack open source cloud software on Red Hat Enterprise Linux 9.0? In-Reply-To: References: Message-ID: On Thu, Aug 4, 2022 at 8:13 PM Turritopsis Dohrnii Teo En Ming < tdtemccna at gmail.com> wrote: > Subject: How do I deploy OpenStack open source cloud software on Red > Hat Enterprise Linux 9.0? > > Good day from Singapore, > > How do I deploy OpenStack open source cloud software on Red Hat > Enterprise Linux 9.0? > Hi Turritopsis, I'm sure you'll find several people who will offer excellent suggestions from which you can choose. Two that quickly come to mind are RDO [1], and its enterprise derivative, Red Hat OpenStack Platform (RHOSP) [2]. [1] https://www.rdoproject.org// [2] https://www.redhat.com/en/technologies/linux-platforms/openstack-platform Basically, CentOS and RDO are community supported software, and RHEL and RHOSP are the equivalent enterprise editions. Alan I have just downloaded RHEL 9.0 ISO using RHEL Free Developer > Subscription and installed it as a virtual machine inside Oracle VM > VirtualBox. Windows 10 on my laptop is the host. > > Alternatively, can I deploy Red Hat OpenStack Platform in a Virtual > Private Server (VPS)? I can make another order for a VPS just for this > purpose. I already have 2 VPS servers just for deploying > Virtualmin/Webmin web hosting control panel. > > If you can give me links to excellent and well-written guides and > tutorials, it would be most useful. > > I want to try out open source software developed by NASA. > > Also, I have deployed Nextcloud previously. Now I want to try out > OpenStack. > > Thank you. > > Regards, > > Mr. Turritopsis Dohrnii Teo En Ming > Targeted Individual in Singapore > 5 Aug 2022 Friday > Blogs: > https://tdtemcerts.blogspot.com > https://tdtemcerts.wordpress.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tdtemccna at gmail.com Fri Aug 5 03:35:04 2022 From: tdtemccna at gmail.com (Turritopsis Dohrnii Teo En Ming) Date: Fri, 5 Aug 2022 11:35:04 +0800 Subject: How do I deploy OpenStack open source cloud software on Red Hat Enterprise Linux 9.0? In-Reply-To: References: Message-ID: On Fri, 5 Aug 2022 at 11:29, Alan Bishop wrote: > > > > On Thu, Aug 4, 2022 at 8:13 PM Turritopsis Dohrnii Teo En Ming wrote: >> >> Subject: How do I deploy OpenStack open source cloud software on Red >> Hat Enterprise Linux 9.0? >> >> Good day from Singapore, >> >> How do I deploy OpenStack open source cloud software on Red Hat >> Enterprise Linux 9.0? > > > Hi Turritopsis, > > I'm sure you'll find several people who will offer excellent suggestions from which you can choose. > > Two that quickly come to mind are RDO [1], and its enterprise derivative, Red Hat OpenStack Platform (RHOSP) [2]. > > [1] https://www.rdoproject.org// > [2] https://www.redhat.com/en/technologies/linux-platforms/openstack-platform > > Basically, CentOS and RDO are community supported software, and RHEL and RHOSP are the equivalent enterprise editions. > > Alan > >> I have just downloaded RHEL 9.0 ISO using RHEL Free Developer >> Subscription and installed it as a virtual machine inside Oracle VM >> VirtualBox. Windows 10 on my laptop is the host. >> >> Alternatively, can I deploy Red Hat OpenStack Platform in a Virtual >> Private Server (VPS)? I can make another order for a VPS just for this >> purpose. I already have 2 VPS servers just for deploying >> Virtualmin/Webmin web hosting control panel. >> >> If you can give me links to excellent and well-written guides and >> tutorials, it would be most useful. >> >> I want to try out open source software developed by NASA. >> >> Also, I have deployed Nextcloud previously. Now I want to try out OpenStack. >> >> Thank you. >> Thank you. I will check them out. Regards, Mr. Turritopsis Dohrnii Teo En Ming Targeted Individual in Singapore From skaplons at redhat.com Fri Aug 5 07:14:42 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 05 Aug 2022 09:14:42 +0200 Subject: [neutron] CI meeting on Aug 9 cancelled Message-ID: <7863784.tWJXoaVgUF@p1> Hi, I will be on PTO next week and on last CI meeting we decided to cancel CI meeting on Tuesday 9th of August. Have a great week and see You on the meeting on August 16th. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From geguileo at redhat.com Fri Aug 5 08:17:32 2022 From: geguileo at redhat.com (Gorka Eguileor) Date: Fri, 5 Aug 2022 10:17:32 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> Message-ID: <20220805081732.ppdumseft3cziiuu@localhost> On 04/08, Ignazio Cassano wrote: > HI, > I am using cinder volumes. > Ignazio > Hi, In that case there is no volume data being copied for the instance migration, and volume attach on the destination should not account for more than 30 seconds of those 20 minutes, so not much improvement possible there. Cheers, Gorka. > Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor > ha scritto: > > > On 03/08, Ignazio Cassano wrote: > > > Hello All, > > > I am looking for a solution to speed up live migration. > > > Instances where ram is used heavily like java application servers, live > > > migration take a long time (more than 20 minutes for 8GB ram instance) > > and > > > converge mode is already set to True in nova.conf. > > > > Hi, > > > > Probably doesn't affect your case, but I assume you are using ephemeral > > nova boot volumes. > > > > Have you tried using only Cinder volumes on the VM? > > > > Cheers, > > Gorka. > > > > > > > I also tried with post_copy but it does not change. > > > After the first live migration (very solow) if I try to migrate again it > > is > > > very fast. > > > I presume the first migration is slow because memory fragmentation when > > an > > > instance is running on the same compute node for a long time. > > > I am looking for a solution considering the on my computing node I can > > have > > > a little ram overcommit. Any case I am increasing the number of compute > > > nodes to reduce it. > > > Thanks > > > Ignazio > > > > From ignaziocassano at gmail.com Fri Aug 5 08:24:38 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 5 Aug 2022 10:24:38 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: <20220805081732.ppdumseft3cziiuu@localhost> References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> Message-ID: Hello, firstly let me to thank you for reply and sorry if I come back to ask why when I do the first migration from A to B it takes 20 minutes and then, when I migrate from B to A it takes few seconds. I wonder if after the first migration memory is reorganized. In the first live migration it lost time to get memory pages ? Ignazio Il giorno ven 5 ago 2022 alle ore 10:17 Gorka Eguileor ha scritto: > On 04/08, Ignazio Cassano wrote: > > HI, > > I am using cinder volumes. > > Ignazio > > > > Hi, > > In that case there is no volume data being copied for the instance > migration, and volume attach on the destination should not account for > more than 30 seconds of those 20 minutes, so not much improvement > possible there. > > Cheers, > Gorka. > > > Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor < > geguileo at redhat.com> > > ha scritto: > > > > > On 03/08, Ignazio Cassano wrote: > > > > Hello All, > > > > I am looking for a solution to speed up live migration. > > > > Instances where ram is used heavily like java application servers, > live > > > > migration take a long time (more than 20 minutes for 8GB ram > instance) > > > and > > > > converge mode is already set to True in nova.conf. > > > > > > Hi, > > > > > > Probably doesn't affect your case, but I assume you are using ephemeral > > > nova boot volumes. > > > > > > Have you tried using only Cinder volumes on the VM? > > > > > > Cheers, > > > Gorka. > > > > > > > > > > I also tried with post_copy but it does not change. > > > > After the first live migration (very solow) if I try to migrate > again it > > > is > > > > very fast. > > > > I presume the first migration is slow because memory fragmentation > when > > > an > > > > instance is running on the same compute node for a long time. > > > > I am looking for a solution considering the on my computing node I > can > > > have > > > > a little ram overcommit. Any case I am increasing the number of > compute > > > > nodes to reduce it. > > > > Thanks > > > > Ignazio > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Fri Aug 5 08:40:43 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Fri, 5 Aug 2022 10:40:43 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> Message-ID: On Fri, 5 Aug 2022 at 10:28, Ignazio Cassano wrote: > > why when I do the first migration from A to B it takes 20 minutes and then, when I migrate from B to A it takes few seconds. > I wonder if after the first migration memory is reorganized. > In the first live migration it lost time to get memory pages ? Just curious. Did you try migrating the same instance again after that, i.e. again from A to B. Is it still fast or is it slow again? Does it only happen with long-running instances? -yoctozepto From ignaziocassano at gmail.com Fri Aug 5 08:42:03 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 5 Aug 2022 10:42:03 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> Message-ID: Hi, I am going to try it. Thanks Il giorno ven 5 ago 2022 alle ore 10:40 Rados?aw Piliszek < radoslaw.piliszek at gmail.com> ha scritto: > On Fri, 5 Aug 2022 at 10:28, Ignazio Cassano > wrote: > > > > why when I do the first migration from A to B it takes 20 minutes and > then, when I migrate from B to A it takes few seconds. > > I wonder if after the first migration memory is reorganized. > > In the first live migration it lost time to get memory pages ? > > Just curious. Did you try migrating the same instance again after > that, i.e. again from A to B. Is it still fast or is it slow again? > Does it only happen with long-running instances? > > -yoctozepto > -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Fri Aug 5 08:49:54 2022 From: geguileo at redhat.com (Gorka Eguileor) Date: Fri, 5 Aug 2022 10:49:54 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> Message-ID: <20220805084954.yvja6hxl2ebeqzkb@localhost> On 05/08, Ignazio Cassano wrote: > Hello, firstly let me to thank you for reply and sorry if I come back to > ask why when I do the first migration from A to B it takes 20 minutes and > then, when I migrate from B to A it takes few seconds. > I wonder if after the first migration memory is reorganized. > In the first live migration it lost time to get memory pages ? > Ignazio > Hi, I work on Cinder, so my knowledge on live migrations is mostly limited to the attach/detach flow of the volumes. I thought that maybe if you were using ephemeral nova volumes (non-cinder) maybe the volume had not yet been deleted from the old node, or maybe it was using a qcow2 base file for multiple instances on the source (each using a different chain on top of it) and this qcow2 was not originally present in the destination (hence the time to copy it), so when we do a migration back since there are other instances that were also using it on the destination (original location) only de difference needs to be copied. But these are just brainstorming ideas, since I don't really know how Nova handles all this. I would recommend setting Nova log to debug mode in both source and destination nodes and look at where the time difference really is, in case it's not where you think it is. Cheers, Gorka. > Il giorno ven 5 ago 2022 alle ore 10:17 Gorka Eguileor > ha scritto: > > > On 04/08, Ignazio Cassano wrote: > > > HI, > > > I am using cinder volumes. > > > Ignazio > > > > > > > Hi, > > > > In that case there is no volume data being copied for the instance > > migration, and volume attach on the destination should not account for > > more than 30 seconds of those 20 minutes, so not much improvement > > possible there. > > > > Cheers, > > Gorka. > > > > > Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor < > > geguileo at redhat.com> > > > ha scritto: > > > > > > > On 03/08, Ignazio Cassano wrote: > > > > > Hello All, > > > > > I am looking for a solution to speed up live migration. > > > > > Instances where ram is used heavily like java application servers, > > live > > > > > migration take a long time (more than 20 minutes for 8GB ram > > instance) > > > > and > > > > > converge mode is already set to True in nova.conf. > > > > > > > > Hi, > > > > > > > > Probably doesn't affect your case, but I assume you are using ephemeral > > > > nova boot volumes. > > > > > > > > Have you tried using only Cinder volumes on the VM? > > > > > > > > Cheers, > > > > Gorka. > > > > > > > > > > > > > I also tried with post_copy but it does not change. > > > > > After the first live migration (very solow) if I try to migrate > > again it > > > > is > > > > > very fast. > > > > > I presume the first migration is slow because memory fragmentation > > when > > > > an > > > > > instance is running on the same compute node for a long time. > > > > > I am looking for a solution considering the on my computing node I > > can > > > > have > > > > > a little ram overcommit. Any case I am increasing the number of > > compute > > > > > nodes to reduce it. > > > > > Thanks > > > > > Ignazio > > > > > > > > > > > > From ignaziocassano at gmail.com Fri Aug 5 09:05:53 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 5 Aug 2022 11:05:53 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> Message-ID: Hi, migration from A to B (750 sec) migration from B to A (10 sec) Migration from A to B (10 sec) Ignazio Il giorno ven 5 ago 2022 alle ore 10:40 Rados?aw Piliszek < radoslaw.piliszek at gmail.com> ha scritto: > On Fri, 5 Aug 2022 at 10:28, Ignazio Cassano > wrote: > > > > why when I do the first migration from A to B it takes 20 minutes and > then, when I migrate from B to A it takes few seconds. > > I wonder if after the first migration memory is reorganized. > > In the first live migration it lost time to get memory pages ? > > Just curious. Did you try migrating the same instance again after > that, i.e. again from A to B. Is it still fast or is it slow again? > Does it only happen with long-running instances? > > -yoctozepto > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Aug 5 09:18:28 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 5 Aug 2022 11:18:28 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: <20220805084954.yvja6hxl2ebeqzkb@localhost> References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> <20220805084954.yvja6hxl2ebeqzkb@localhost> Message-ID: Hi, this is the volume attached on netapp nfs about the vm I am migrating: qemu-img info volume-002ff8af-9067-4f84-a01c-d147cdd1f70dqimage: volume-002ff8af-9067-4f84-a01c-d147cdd1f70d file format: raw virtual size: 40G (42949672960 bytes) disk size: 21G As you can see it is raw and it does not ha base image. Ignazio Il giorno ven 5 ago 2022 alle ore 10:49 Gorka Eguileor ha scritto: > On 05/08, Ignazio Cassano wrote: > > Hello, firstly let me to thank you for reply and sorry if I come back to > > ask why when I do the first migration from A to B it takes 20 minutes and > > then, when I migrate from B to A it takes few seconds. > > I wonder if after the first migration memory is reorganized. > > In the first live migration it lost time to get memory pages ? > > Ignazio > > > > Hi, > > I work on Cinder, so my knowledge on live migrations is mostly limited > to the attach/detach flow of the volumes. > > I thought that maybe if you were using ephemeral nova volumes > (non-cinder) maybe the volume had not yet been deleted from the old > node, or maybe it was using a qcow2 base file for multiple instances on > the source (each using a different chain on top of it) and this qcow2 > was not originally present in the destination (hence the time to copy > it), so when we do a migration back since there are other instances that > were also using it on the destination (original location) only de > difference needs to be copied. > > But these are just brainstorming ideas, since I don't really know how > Nova handles all this. > > I would recommend setting Nova log to debug mode in both source and > destination nodes and look at where the time difference really is, in > case it's not where you think it is. > > Cheers, > Gorka. > > > > Il giorno ven 5 ago 2022 alle ore 10:17 Gorka Eguileor < > geguileo at redhat.com> > > ha scritto: > > > > > On 04/08, Ignazio Cassano wrote: > > > > HI, > > > > I am using cinder volumes. > > > > Ignazio > > > > > > > > > > Hi, > > > > > > In that case there is no volume data being copied for the instance > > > migration, and volume attach on the destination should not account for > > > more than 30 seconds of those 20 minutes, so not much improvement > > > possible there. > > > > > > Cheers, > > > Gorka. > > > > > > > Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor < > > > geguileo at redhat.com> > > > > ha scritto: > > > > > > > > > On 03/08, Ignazio Cassano wrote: > > > > > > Hello All, > > > > > > I am looking for a solution to speed up live migration. > > > > > > Instances where ram is used heavily like java application > servers, > > > live > > > > > > migration take a long time (more than 20 minutes for 8GB ram > > > instance) > > > > > and > > > > > > converge mode is already set to True in nova.conf. > > > > > > > > > > Hi, > > > > > > > > > > Probably doesn't affect your case, but I assume you are using > ephemeral > > > > > nova boot volumes. > > > > > > > > > > Have you tried using only Cinder volumes on the VM? > > > > > > > > > > Cheers, > > > > > Gorka. > > > > > > > > > > > > > > > > I also tried with post_copy but it does not change. > > > > > > After the first live migration (very solow) if I try to migrate > > > again it > > > > > is > > > > > > very fast. > > > > > > I presume the first migration is slow because memory > fragmentation > > > when > > > > > an > > > > > > instance is running on the same compute node for a long time. > > > > > > I am looking for a solution considering the on my computing node > I > > > can > > > > > have > > > > > > a little ram overcommit. Any case I am increasing the number of > > > compute > > > > > > nodes to reduce it. > > > > > > Thanks > > > > > > Ignazio > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Aug 5 09:27:12 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 5 Aug 2022 11:27:12 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> <20220805084954.yvja6hxl2ebeqzkb@localhost> Message-ID: Migrating again to a new node (COMPUTE C) it takes 10 sec. The first migration from A to B (750 sec) is slow in migrating memory : *migration running for 30 secs, memory 89% remaining; (bytes processed=1258508063, remaining=15356194816, total=17184923648)2022-08-05 10:47:23.910 55600 INFO nova.virt.libvirt.driver [req-ff02667e-9d38-4a08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 60 secs, memory 87% remaining; (bytes processed=1489083638, remaining=15035801600, total=17184923648)08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 90 secs, memory 86% remaining; (bytes processed=1689004421, remaining=14802731008, total=17184923648)* and so on Il giorno ven 5 ago 2022 alle ore 11:18 Ignazio Cassano < ignaziocassano at gmail.com> ha scritto: > Hi, this is the volume attached on netapp nfs about the vm I am migrating: > qemu-img info volume-002ff8af-9067-4f84-a01c-d147cdd1f70dqimage: > volume-002ff8af-9067-4f84-a01c-d147cdd1f70d > file format: raw > virtual size: 40G (42949672960 bytes) > disk size: 21G > > As you can see it is raw and it does not ha base image. > Ignazio > > > > Il giorno ven 5 ago 2022 alle ore 10:49 Gorka Eguileor < > geguileo at redhat.com> ha scritto: > >> On 05/08, Ignazio Cassano wrote: >> > Hello, firstly let me to thank you for reply and sorry if I come back to >> > ask why when I do the first migration from A to B it takes 20 minutes >> and >> > then, when I migrate from B to A it takes few seconds. >> > I wonder if after the first migration memory is reorganized. >> > In the first live migration it lost time to get memory pages ? >> > Ignazio >> > >> >> Hi, >> >> I work on Cinder, so my knowledge on live migrations is mostly limited >> to the attach/detach flow of the volumes. >> >> I thought that maybe if you were using ephemeral nova volumes >> (non-cinder) maybe the volume had not yet been deleted from the old >> node, or maybe it was using a qcow2 base file for multiple instances on >> the source (each using a different chain on top of it) and this qcow2 >> was not originally present in the destination (hence the time to copy >> it), so when we do a migration back since there are other instances that >> were also using it on the destination (original location) only de >> difference needs to be copied. >> >> But these are just brainstorming ideas, since I don't really know how >> Nova handles all this. >> >> I would recommend setting Nova log to debug mode in both source and >> destination nodes and look at where the time difference really is, in >> case it's not where you think it is. >> >> Cheers, >> Gorka. >> >> >> > Il giorno ven 5 ago 2022 alle ore 10:17 Gorka Eguileor < >> geguileo at redhat.com> >> > ha scritto: >> > >> > > On 04/08, Ignazio Cassano wrote: >> > > > HI, >> > > > I am using cinder volumes. >> > > > Ignazio >> > > > >> > > >> > > Hi, >> > > >> > > In that case there is no volume data being copied for the instance >> > > migration, and volume attach on the destination should not account for >> > > more than 30 seconds of those 20 minutes, so not much improvement >> > > possible there. >> > > >> > > Cheers, >> > > Gorka. >> > > >> > > > Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor < >> > > geguileo at redhat.com> >> > > > ha scritto: >> > > > >> > > > > On 03/08, Ignazio Cassano wrote: >> > > > > > Hello All, >> > > > > > I am looking for a solution to speed up live migration. >> > > > > > Instances where ram is used heavily like java application >> servers, >> > > live >> > > > > > migration take a long time (more than 20 minutes for 8GB ram >> > > instance) >> > > > > and >> > > > > > converge mode is already set to True in nova.conf. >> > > > > >> > > > > Hi, >> > > > > >> > > > > Probably doesn't affect your case, but I assume you are using >> ephemeral >> > > > > nova boot volumes. >> > > > > >> > > > > Have you tried using only Cinder volumes on the VM? >> > > > > >> > > > > Cheers, >> > > > > Gorka. >> > > > > >> > > > > >> > > > > > I also tried with post_copy but it does not change. >> > > > > > After the first live migration (very solow) if I try to migrate >> > > again it >> > > > > is >> > > > > > very fast. >> > > > > > I presume the first migration is slow because memory >> fragmentation >> > > when >> > > > > an >> > > > > > instance is running on the same compute node for a long time. >> > > > > > I am looking for a solution considering the on my computing >> node I >> > > can >> > > > > have >> > > > > > a little ram overcommit. Any case I am increasing the number of >> > > compute >> > > > > > nodes to reduce it. >> > > > > > Thanks >> > > > > > Ignazio >> > > > > >> > > > > >> > > >> > > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Fri Aug 5 09:29:25 2022 From: marios at redhat.com (Marios Andreou) Date: Fri, 5 Aug 2022 12:29:25 +0300 Subject: [tripleo] gate blocker - tripleo-ci-centos-9-content-provider-wallaby In-Reply-To: References: Message-ID: thanks to amoralej we are now unblocked here https://bugs.launchpad.net/tripleo/+bug/1983585/comments/6 should be good to recheck your tripleo stable/wallaby things On Thu, Aug 4, 2022 at 5:56 PM Ananya Banerjee wrote: > Hi, > > We have a failure in tripleo-ci-centos-9-content-provider-wallaby which > is blocking the gate. We are reverting [1] to clear the gate. > Related Bug : https://bugs.launchpad.net/tripleo/+bug/1983585 > > [1] > https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/multi-node-bridge/templates/zuul-multi-node-bridge-ovs.repo.j2#L27 > > Thanks, > Ananya > -- > > Ananya Banerjee, RHCSA, RHCE-OSP > > Software Engineer > > Red Hat EMEA > > anbanerj at redhat.com > M: +491784949931 IM: frenzy_friday > @RedHat Red Hat > Red Hat > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Fri Aug 5 09:32:53 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Fri, 5 Aug 2022 11:32:53 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> Message-ID: On Fri, 5 Aug 2022 at 11:06, Ignazio Cassano wrote: > > Hi, > migration from A to B (750 sec) > migration from B to A (10 sec) > Migration from A to B (10 sec) Interesting! So it indeed looks like a dirty/cold case. However, as Gorka and others have already mentioned - you need to really pinpoint WHAT takes that long. Which involved component does its thing for too long. It could be that in these 740 secs there is actually no real throughput happening, just some thing waiting for timeout to progress on the 2nd try. -yoctozepto From ignaziocassano at gmail.com Fri Aug 5 09:40:55 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 5 Aug 2022 11:40:55 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> Message-ID: Hi, I sent in my previous email what happens in the first lige migration: *migration running for 30 secs, memory 89% remaining; (bytes processed=1258508063, remaining=15356194816, total=17184923648)2022-08-05 10:47:23.910 55600 INFO nova.virt.libvirt.driver [req-ff02667e-9d38-4a08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 60 secs, memory 87% remaining; (bytes processed=1489083638, remaining=15035801600, total=17184923648)08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 90 secs, memory 86% remaining; (bytes processed=1689004421, remaining=14802731008, total=17184923648)* and so on Tcpudumping from A to B I can see a lot traffic. Do you suggest to enable debug ? It seems clear that the memory content migration is slow. Ignazio Il giorno ven 5 ago 2022 alle ore 11:33 Rados?aw Piliszek < radoslaw.piliszek at gmail.com> ha scritto: > On Fri, 5 Aug 2022 at 11:06, Ignazio Cassano > wrote: > > > > Hi, > > migration from A to B (750 sec) > > migration from B to A (10 sec) > > Migration from A to B (10 sec) > > Interesting! So it indeed looks like a dirty/cold case. However, as > Gorka and others have already mentioned - you need to really pinpoint > WHAT takes that long. Which involved component does its thing for too > long. > It could be that in these 740 secs there is actually no real > throughput happening, just some thing waiting for timeout to progress > on the 2nd try. > > -yoctozepto > -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Fri Aug 5 09:45:15 2022 From: geguileo at redhat.com (Gorka Eguileor) Date: Fri, 5 Aug 2022 11:45:15 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> <20220805084954.yvja6hxl2ebeqzkb@localhost> Message-ID: <20220805094515.oglqd4y6mu2vshdl@localhost> On 05/08, Ignazio Cassano wrote: > Migrating again to a new node (COMPUTE C) it takes 10 sec. > The first migration from A to B (750 sec) is slow in migrating memory : > > > *migration running for 30 secs, memory 89% remaining; (bytes > processed=1258508063, remaining=15356194816, total=17184923648)2022-08-05 > 10:47:23.910 55600 INFO nova.virt.libvirt.driver > [req-ff02667e-9d38-4a08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca > 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: > d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 60 secs, memory > 87% remaining; (bytes processed=1489083638, remaining=15035801600, > total=17184923648)08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca > 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: > d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 90 secs, memory > 86% remaining; (bytes processed=1689004421, remaining=14802731008, > total=17184923648)* > > and so on That sounds crazy to me. Unless the first node has more load or more network usage than the others, or the VM isn't actually running on Compute B so the migration is not really of a running VM... > > Il giorno ven 5 ago 2022 alle ore 11:18 Ignazio Cassano < > ignaziocassano at gmail.com> ha scritto: > > > Hi, this is the volume attached on netapp nfs about the vm I am migrating: > > qemu-img info volume-002ff8af-9067-4f84-a01c-d147cdd1f70dqimage: > > volume-002ff8af-9067-4f84-a01c-d147cdd1f70d > > file format: raw > > virtual size: 40G (42949672960 bytes) > > disk size: 21G > > > > As you can see it is raw and it does not ha base image. > > Ignazio > > > > > > > > Il giorno ven 5 ago 2022 alle ore 10:49 Gorka Eguileor < > > geguileo at redhat.com> ha scritto: > > > >> On 05/08, Ignazio Cassano wrote: > >> > Hello, firstly let me to thank you for reply and sorry if I come back to > >> > ask why when I do the first migration from A to B it takes 20 minutes > >> and > >> > then, when I migrate from B to A it takes few seconds. > >> > I wonder if after the first migration memory is reorganized. > >> > In the first live migration it lost time to get memory pages ? > >> > Ignazio > >> > > >> > >> Hi, > >> > >> I work on Cinder, so my knowledge on live migrations is mostly limited > >> to the attach/detach flow of the volumes. > >> > >> I thought that maybe if you were using ephemeral nova volumes > >> (non-cinder) maybe the volume had not yet been deleted from the old > >> node, or maybe it was using a qcow2 base file for multiple instances on > >> the source (each using a different chain on top of it) and this qcow2 > >> was not originally present in the destination (hence the time to copy > >> it), so when we do a migration back since there are other instances that > >> were also using it on the destination (original location) only de > >> difference needs to be copied. > >> > >> But these are just brainstorming ideas, since I don't really know how > >> Nova handles all this. > >> > >> I would recommend setting Nova log to debug mode in both source and > >> destination nodes and look at where the time difference really is, in > >> case it's not where you think it is. > >> > >> Cheers, > >> Gorka. > >> > >> > >> > Il giorno ven 5 ago 2022 alle ore 10:17 Gorka Eguileor < > >> geguileo at redhat.com> > >> > ha scritto: > >> > > >> > > On 04/08, Ignazio Cassano wrote: > >> > > > HI, > >> > > > I am using cinder volumes. > >> > > > Ignazio > >> > > > > >> > > > >> > > Hi, > >> > > > >> > > In that case there is no volume data being copied for the instance > >> > > migration, and volume attach on the destination should not account for > >> > > more than 30 seconds of those 20 minutes, so not much improvement > >> > > possible there. > >> > > > >> > > Cheers, > >> > > Gorka. > >> > > > >> > > > Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor < > >> > > geguileo at redhat.com> > >> > > > ha scritto: > >> > > > > >> > > > > On 03/08, Ignazio Cassano wrote: > >> > > > > > Hello All, > >> > > > > > I am looking for a solution to speed up live migration. > >> > > > > > Instances where ram is used heavily like java application > >> servers, > >> > > live > >> > > > > > migration take a long time (more than 20 minutes for 8GB ram > >> > > instance) > >> > > > > and > >> > > > > > converge mode is already set to True in nova.conf. > >> > > > > > >> > > > > Hi, > >> > > > > > >> > > > > Probably doesn't affect your case, but I assume you are using > >> ephemeral > >> > > > > nova boot volumes. > >> > > > > > >> > > > > Have you tried using only Cinder volumes on the VM? > >> > > > > > >> > > > > Cheers, > >> > > > > Gorka. > >> > > > > > >> > > > > > >> > > > > > I also tried with post_copy but it does not change. > >> > > > > > After the first live migration (very solow) if I try to migrate > >> > > again it > >> > > > > is > >> > > > > > very fast. > >> > > > > > I presume the first migration is slow because memory > >> fragmentation > >> > > when > >> > > > > an > >> > > > > > instance is running on the same compute node for a long time. > >> > > > > > I am looking for a solution considering the on my computing > >> node I > >> > > can > >> > > > > have > >> > > > > > a little ram overcommit. Any case I am increasing the number of > >> > > compute > >> > > > > > nodes to reduce it. > >> > > > > > Thanks > >> > > > > > Ignazio > >> > > > > > >> > > > > > >> > > > >> > > > >> > >> From ignaziocassano at gmail.com Fri Aug 5 09:53:18 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 5 Aug 2022 11:53:18 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: <20220805094515.oglqd4y6mu2vshdl@localhost> References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> <20220805084954.yvja6hxl2ebeqzkb@localhost> <20220805094515.oglqd4y6mu2vshdl@localhost> Message-ID: When the instance is migrated again from te second to the first it takes 10 seconds. If first node has more loads on network or memory, it should take a long time in any case. Keep in mind I am not using hugepages but default configuration. I am convinced that it is about how the memory of an instance is managed after it runs for a long time on a node Ignazio Il giorno ven 5 ago 2022 alle ore 11:45 Gorka Eguileor ha scritto: > On 05/08, Ignazio Cassano wrote: > > Migrating again to a new node (COMPUTE C) it takes 10 sec. > > The first migration from A to B (750 sec) is slow in migrating memory : > > > > > > *migration running for 30 secs, memory 89% remaining; (bytes > > processed=1258508063, remaining=15356194816, total=17184923648)2022-08-05 > > 10:47:23.910 55600 INFO nova.virt.libvirt.driver > > [req-ff02667e-9d38-4a08-9c63-013ed1064218 > 66adb965bef64eaaab2af93ade87e2ca > > 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: > > d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 60 secs, > memory > > 87% remaining; (bytes processed=1489083638, remaining=15035801600, > > total=17184923648)08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca > > 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: > > d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 90 secs, > memory > > 86% remaining; (bytes processed=1689004421, remaining=14802731008, > > total=17184923648)* > > > > and so on > > That sounds crazy to me. Unless the first node has more load or more > network usage than the others, or the VM isn't actually running on > Compute B so the migration is not really of a running VM... > > > > > > > Il giorno ven 5 ago 2022 alle ore 11:18 Ignazio Cassano < > > ignaziocassano at gmail.com> ha scritto: > > > > > Hi, this is the volume attached on netapp nfs about the vm I am > migrating: > > > qemu-img info volume-002ff8af-9067-4f84-a01c-d147cdd1f70dqimage: > > > volume-002ff8af-9067-4f84-a01c-d147cdd1f70d > > > file format: raw > > > virtual size: 40G (42949672960 bytes) > > > disk size: 21G > > > > > > As you can see it is raw and it does not ha base image. > > > Ignazio > > > > > > > > > > > > Il giorno ven 5 ago 2022 alle ore 10:49 Gorka Eguileor < > > > geguileo at redhat.com> ha scritto: > > > > > >> On 05/08, Ignazio Cassano wrote: > > >> > Hello, firstly let me to thank you for reply and sorry if I come > back to > > >> > ask why when I do the first migration from A to B it takes 20 > minutes > > >> and > > >> > then, when I migrate from B to A it takes few seconds. > > >> > I wonder if after the first migration memory is reorganized. > > >> > In the first live migration it lost time to get memory pages ? > > >> > Ignazio > > >> > > > >> > > >> Hi, > > >> > > >> I work on Cinder, so my knowledge on live migrations is mostly limited > > >> to the attach/detach flow of the volumes. > > >> > > >> I thought that maybe if you were using ephemeral nova volumes > > >> (non-cinder) maybe the volume had not yet been deleted from the old > > >> node, or maybe it was using a qcow2 base file for multiple instances > on > > >> the source (each using a different chain on top of it) and this qcow2 > > >> was not originally present in the destination (hence the time to copy > > >> it), so when we do a migration back since there are other instances > that > > >> were also using it on the destination (original location) only de > > >> difference needs to be copied. > > >> > > >> But these are just brainstorming ideas, since I don't really know how > > >> Nova handles all this. > > >> > > >> I would recommend setting Nova log to debug mode in both source and > > >> destination nodes and look at where the time difference really is, in > > >> case it's not where you think it is. > > >> > > >> Cheers, > > >> Gorka. > > >> > > >> > > >> > Il giorno ven 5 ago 2022 alle ore 10:17 Gorka Eguileor < > > >> geguileo at redhat.com> > > >> > ha scritto: > > >> > > > >> > > On 04/08, Ignazio Cassano wrote: > > >> > > > HI, > > >> > > > I am using cinder volumes. > > >> > > > Ignazio > > >> > > > > > >> > > > > >> > > Hi, > > >> > > > > >> > > In that case there is no volume data being copied for the instance > > >> > > migration, and volume attach on the destination should not > account for > > >> > > more than 30 seconds of those 20 minutes, so not much improvement > > >> > > possible there. > > >> > > > > >> > > Cheers, > > >> > > Gorka. > > >> > > > > >> > > > Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor < > > >> > > geguileo at redhat.com> > > >> > > > ha scritto: > > >> > > > > > >> > > > > On 03/08, Ignazio Cassano wrote: > > >> > > > > > Hello All, > > >> > > > > > I am looking for a solution to speed up live migration. > > >> > > > > > Instances where ram is used heavily like java application > > >> servers, > > >> > > live > > >> > > > > > migration take a long time (more than 20 minutes for 8GB ram > > >> > > instance) > > >> > > > > and > > >> > > > > > converge mode is already set to True in nova.conf. > > >> > > > > > > >> > > > > Hi, > > >> > > > > > > >> > > > > Probably doesn't affect your case, but I assume you are using > > >> ephemeral > > >> > > > > nova boot volumes. > > >> > > > > > > >> > > > > Have you tried using only Cinder volumes on the VM? > > >> > > > > > > >> > > > > Cheers, > > >> > > > > Gorka. > > >> > > > > > > >> > > > > > > >> > > > > > I also tried with post_copy but it does not change. > > >> > > > > > After the first live migration (very solow) if I try to > migrate > > >> > > again it > > >> > > > > is > > >> > > > > > very fast. > > >> > > > > > I presume the first migration is slow because memory > > >> fragmentation > > >> > > when > > >> > > > > an > > >> > > > > > instance is running on the same compute node for a long > time. > > >> > > > > > I am looking for a solution considering the on my computing > > >> node I > > >> > > can > > >> > > > > have > > >> > > > > > a little ram overcommit. Any case I am increasing the > number of > > >> > > compute > > >> > > > > > nodes to reduce it. > > >> > > > > > Thanks > > >> > > > > > Ignazio > > >> > > > > > > >> > > > > > > >> > > > > >> > > > > >> > > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Fri Aug 5 09:57:21 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Fri, 5 Aug 2022 11:57:21 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: <20220805094515.oglqd4y6mu2vshdl@localhost> References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> <20220805084954.yvja6hxl2ebeqzkb@localhost> <20220805094515.oglqd4y6mu2vshdl@localhost> Message-ID: On Fri, 5 Aug 2022 at 11:51, Gorka Eguileor wrote: > > On 05/08, Ignazio Cassano wrote: > > Migrating again to a new node (COMPUTE C) it takes 10 sec. > > The first migration from A to B (750 sec) is slow in migrating memory : > > > > > > *migration running for 30 secs, memory 89% remaining; (bytes > > processed=1258508063, remaining=15356194816, total=17184923648)2022-08-05 > > 10:47:23.910 55600 INFO nova.virt.libvirt.driver > > [req-ff02667e-9d38-4a08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca > > 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: > > d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 60 secs, memory > > 87% remaining; (bytes processed=1489083638, remaining=15035801600, > > total=17184923648)08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca > > 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: > > d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 90 secs, memory > > 86% remaining; (bytes processed=1689004421, remaining=14802731008, > > total=17184923648)* > > > > and so on > > That sounds crazy to me. Unless the first node has more load or more > network usage than the others, or the VM isn't actually running on > Compute B so the migration is not really of a running VM... Wow, I agree it looks crazy just like Gorka has said. Indeed, by looking at the counters, it seems the process is progressing at approx. the rate of 1.4 GB per minute so approx. 12 minutes total makes perfect sense. So it really boils down to "why is the memory migration so slow?". More like a topic to discuss with libvirt, QEMU and KVM folks as I doubt nova (and the rest of the OpenStack stuff) has any impact on it. -yoctozepto From radoslaw.piliszek at gmail.com Fri Aug 5 10:04:34 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Fri, 5 Aug 2022 12:04:34 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> <20220805084954.yvja6hxl2ebeqzkb@localhost> <20220805094515.oglqd4y6mu2vshdl@localhost> Message-ID: On Fri, 5 Aug 2022 at 12:00, Ignazio Cassano wrote: > > When the instance is migrated again from te second to the first it takes 10 seconds. > If first node has more loads on network or memory, it should take a long time in any case. > Keep in mind I am not using hugepages but default configuration. > > I am convinced that it is about how the memory of an instance is managed after it runs for a long time on a node Just keep in mind the transfer rates you get are VERY LOW for anything RAM-like. It's around 20 MiB/s - my old HDD could go faster than that with mediocre fragmentation. ;-) It's more likely it spends time waiting for something instead of doing real work. -yoctozepto From smooney at redhat.com Fri Aug 5 10:34:41 2022 From: smooney at redhat.com (Sean Mooney) Date: Fri, 5 Aug 2022 11:34:41 +0100 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> <20220805084954.yvja6hxl2ebeqzkb@localhost> <20220805094515.oglqd4y6mu2vshdl@localhost> Message-ID: one thing to be aware of is if the vm writes even a singel byte to a memory page during the migration then entire page needs to be transferred again. not just that one byte which gets expensive if you use hugepages as a one byte write gets amplified to at 2mb or 1GB page copy. even for the default 4k pages its expensive. post-copy adn auto converge help with that to a degree but yes it sounds like this might be memory related but it could still be a network bandwidth limitation. using jumbo frames on the migration network may help as well as disabling tcp slow start. im not sure if there is really anything that can be done to increase the initial migration time beyond that. On Fri, Aug 5, 2022 at 11:26 AM Rados?aw Piliszek wrote: > > On Fri, 5 Aug 2022 at 12:00, Ignazio Cassano wrote: > > > > When the instance is migrated again from te second to the first it takes 10 seconds. > > If first node has more loads on network or memory, it should take a long time in any case. > > Keep in mind I am not using hugepages but default configuration. > > > > I am convinced that it is about how the memory of an instance is managed after it runs for a long time on a node > > Just keep in mind the transfer rates you get are VERY LOW for anything > RAM-like. It's around 20 MiB/s - my old HDD could go faster than that > with mediocre fragmentation. ;-) > It's more likely it spends time waiting for something instead of doing > real work. > > -yoctozepto > From ignaziocassano at gmail.com Fri Aug 5 10:47:39 2022 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 5 Aug 2022 12:47:39 +0200 Subject: [openstack] how to speed up live migration? In-Reply-To: References: <20220804145648.wfyjocspjljc36uf@localhost> <20220805081732.ppdumseft3cziiuu@localhost> <20220805084954.yvja6hxl2ebeqzkb@localhost> <20220805094515.oglqd4y6mu2vshdl@localhost> Message-ID: Hi Sean, I am going to test it. At this time live migration interfaces are bonded on tow 10 Gbs nic but they are used also for tenant and providers networks. I have a free nic (1gbs) on a vlan where there is no traffic.... Do you think I can try to switch on the above nic also if it only 1 gbs ? Il giorno ven 5 ago 2022 alle ore 12:34 Sean Mooney ha scritto: > one thing to be aware of is if the vm writes even a singel byte to a > memory page during the migration then entire page needs to be > transferred again. > not just that one byte which gets expensive if you use hugepages as a > one byte write gets amplified to at 2mb or 1GB page copy. > > even for the default 4k pages its expensive. post-copy adn auto > converge help with that to a degree but yes > it sounds like this might be memory related but it could still be a > network bandwidth limitation. > > using jumbo frames on the migration network may help as well as > disabling tcp slow start. > > im not sure if there is really anything that can be done to increase > the initial migration time beyond that. > > On Fri, Aug 5, 2022 at 11:26 AM Rados?aw Piliszek > wrote: > > > > On Fri, 5 Aug 2022 at 12:00, Ignazio Cassano > wrote: > > > > > > When the instance is migrated again from te second to the first it > takes 10 seconds. > > > If first node has more loads on network or memory, it should take a > long time in any case. > > > Keep in mind I am not using hugepages but default configuration. > > > > > > I am convinced that it is about how the memory of an instance is > managed after it runs for a long time on a node > > > > Just keep in mind the transfer rates you get are VERY LOW for anything > > RAM-like. It's around 20 MiB/s - my old HDD could go faster than that > > with mediocre fragmentation. ;-) > > It's more likely it spends time waiting for something instead of doing > > real work. > > > > -yoctozepto > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Fri Aug 5 12:24:25 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 05 Aug 2022 14:24:25 +0200 Subject: [oslo][stable] Backport of the default value of the config option change Message-ID: <21589662.aDxSllVl8Y@p1> Hi, Some time ago oslo.messaging changed default value of the "heartbeat_in_pthread" config option to "True" [1]. As was noticed some time ago, this don't works well with nova-compute - see bug [2] for details. Recently we noticed in our downstream Red Hat OpenStack, that it's not only nova-compute which don't works well with it and can hangs. We saw the same issue in various neutron agent processes. And it seems that it can be the same for any non-wsgi service which is using rabbitmq to send heartbeats. So giving all of that, I just proposed change of the default value of that config option to be "False" again [3]. And my question is - would it be possible and acceptable to backport such change up to stable/wallaby (if and when it will be approved for master of course). IMO this could be useful for users as using this option set as "True" be default don't makes any sense for the non-wsgi applications really and may cause more bad then good things really. What are You opinions about it? [1] https://review.opendev.org/c/openstack/oslo.messaging/+/747395 [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 [3] https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From aleksandr.komarov at itglobal.com Fri Aug 5 07:23:49 2022 From: aleksandr.komarov at itglobal.com (Komarov, Aleksandr) Date: Fri, 5 Aug 2022 07:23:49 +0000 Subject: Large files multipart upload fails - OpenStack Object Storage (swift) In-Reply-To: References: <34f7cc2975664b1eb99c198267af4731@itglobal.com> Message-ID: <5821c326c5994691a78869ee63f11841@itglobal.com> Hi! Thank you for your answer. We updated *.ring.gz files on swift-proxy nodes and rebooted them. After that the problem was solved. All HEAD validation requests returned 2XX. I think we should have done this after rebuilding the rings, but we didn't. From: Clay Gerrard Sent: Thursday, August 4, 2022 6:43 PM To: Komarov, Aleksandr Cc: openstack-discuss at lists.openstack.org Subject: Re: Large files multipart upload fails - OpenStack Object Storage (swift) Those logs are curious. After uploading file-segments, the final request to create a static large object manifest will validate all of the referenced segments with HEAD requests. All the referenced SLO segments have to return a successful 2XX response before the manifest object will be created. I can see you included the partial logs from transaction tx30e82374b98845bab6883-0062e7d225 and txf09676de38964ca299b77-0062e7dcc8 - but it's not entirely clear what happened between the object-server PUT that created the file-segment object (response 201) and the HEAD that failed (response 404). It appears both requests went to the same node (swift0-1-object01) and device (/mpathc) - so unless the file was deleted (or expired, or corrupted, or rebalanced) I would not expect the 404 response. Can you check on the file-segment object AFTER the SLO validation fails? Does it keep returning 404? Where did it go?! On Wed, Aug 3, 2022 at 12:26 PM Komarov, Aleksandr > wrote: Hi to all! Please suggest how to fix the situation. When uploading files larger than 100GB, at the stage of multipart-manifest validation, we get 404 error for some segments that were previously uploaded successfully. As a result, the client receives a 400 response. Uploading a large file is not possible. Logs: Aug 1 13:18:30 swift01-object01 object-server: IP - - [01/Aug/2022:13:18:30 +0000] "PUT /mpathc/12130/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/000 00032" 201 - "PUT http://domain-name/v1/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" "tx30e82374b98845bab6883-0062e7d225" "proxy-server 8367" 128.7641 "-" 21577 0 Aug 1 13:18:30 swift01-object01 container-server: IP - - [01/Aug/2022:13:18:30 +0000] "PUT /mpathb/952/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00 000032" 201 - "PUT http://domain-name/mpathe/12130/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" "tx30e82374b98845bab6883-0062 e7d225" "object-server 24710" 0.0005 "-" 13211 0 Aug 1 14:01:46 swift01-object01 object-server: IP - - [01/Aug/2022:14:01:46 +0000] "HEAD /mpathc/12130/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" 404 - "HEAD http://domain-name/v1/AUTH_5affc785b3454cddb2d1f69ce62b4441/bucket/.file-segments/150.file/00000032" "txf09676de38964ca299b77-0062e7dcc8" "proxy-server 8367" 0.0003 "-" 21579 0 At the same time, some segments successfully found. With files less than 50GB everything is fine. -- Clay Gerrard -------------- next part -------------- An HTML attachment was scrubbed... URL: From ces.eduardo98 at gmail.com Fri Aug 5 13:30:22 2022 From: ces.eduardo98 at gmail.com (Carlos Silva) Date: Fri, 5 Aug 2022 10:30:22 -0300 Subject: [manila] Feature Proposal Freeze in effect Message-ID: Greetings, Zorillas and interested stackers! As we defined in the release schedule, this week is the Manila Feature Proposal Freeze [1]. It means that all featureful changes should be already proposed and substantially completed with unit, functional and integration tests by EOW. If you have not managed to submit your featureful changes to gerrit yet, and would like to request an exception, please let me know by responding to this email thread. In case your exception is granted, we can get you some extra time. [1] https://releases.openstack.org/zed/schedule.html#z-manila-fpfreeze Thank you! carloss -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Aug 5 13:32:19 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 5 Aug 2022 13:32:19 +0000 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: <21589662.aDxSllVl8Y@p1> References: <21589662.aDxSllVl8Y@p1> Message-ID: <20220805133218.jxrv6ohygscuvex7@yuggoth.org> On 2022-08-05 14:24:25 +0200 (+0200), Slawek Kaplonski wrote: [...] > I just proposed change of the default value of that config option > to be "False" again. And my question is - would it be possible and > acceptable to backport such change up to stable/wallaby [...] In the past, stable branch maintainers have generally held that changing configuration defaults was only possible in the version under development, and not backportable. However, there is this carve-out in the policy document: "It?s nevertheless allowed to backport fixes for other bugs if their safety can be easily proved." https://docs.openstack.org/project-team-guide/stable-branches.html -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jpodivin at redhat.com Fri Aug 5 14:45:45 2022 From: jpodivin at redhat.com (Jiri Podivin) Date: Fri, 5 Aug 2022 16:45:45 +0200 Subject: [all][TC] Bare rechecks stats week of 25.07 In-Reply-To: <20220728151534.y336vusncblmmlqz@yuggoth.org> References: <15225123.PIt3FUKRBJ@p1> <20220728151534.y336vusncblmmlqz@yuggoth.org> Message-ID: Thanks Jeremy, I'll keep it in mind. On Thu, Jul 28, 2022 at 5:26 PM Jeremy Stanley wrote: > On 2022-07-28 16:33:46 +0200 (+0200), Jiri Podivin wrote: > [...] > > What is the preferred way of stating the reason for a recheck? I've kind > of > > assumed that it's something like writing: > > > > " recheck: this is broken" > > > > but that doesn't trigger the CI. I've tried looking it up in the Zuul and > > Opendev docs, but couldn't find anything. > [...] > > The regular expression for it is configured here: > > > https://opendev.org/openstack/project-config/src/commit/6b12602/zuul.d/pipelines.yaml#L24 > > What I usually do is "recheck because of nondeterminism in test foo" > (or whatever the reason). > > Don't prepend anything before the word "recheck" and also follow the > word with a space before you add any other text. I think Zuul must > assume a required space after the regular expression since that > doesn't seem to be encoded in the regex you see there. Make sure the > "recheck" appears on the very first line of the comment too. Aside > from that, you can add anything you like after it. > > Switching in/out of WIP and leaving votes at the same time as the > recheck should also be avoided as they seem to cause the expression > not to match. > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Fri Aug 5 17:50:50 2022 From: kennelson11 at gmail.com (Kendall Nelson) Date: Fri, 5 Aug 2022 12:50:50 -0500 Subject: Fwd: October PTG Update In-Reply-To: References: Message-ID: Hello Everyone, I wanted to provide an update on the October Project Teams Gathering (PTG). After discussing with community members and prospective sponsors, we have made the decision to pivot the event to 100% virtual to be as inclusive as possible. I know that this is not the ideal outcome and there are a lot of factors contributing to this decision, but we think it?s the best decision for the global community at this time. We are continuing to receive feedback on the value and importance of in-person PTGs; we just need to wait for other factors to be less of an obstacle If you have already registered, you should have been contacted for a full refund. We are going to use the same week for a virtual PTG and it will be free to attend, just like the previous ones. The registration is already live [1]. If you have any questions or feedback on this decision, please let us know. We will continue to monitor feedback as we make plans for a 2023 PTG. Thank you, Kendall (diablo_rojo) [1] https://openinfra-ptg.eventbrite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Aug 5 18:17:57 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 05 Aug 2022 23:47:57 +0530 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: <21589662.aDxSllVl8Y@p1> References: <21589662.aDxSllVl8Y@p1> Message-ID: <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> ---- On Fri, 05 Aug 2022 17:54:25 +0530 Slawek Kaplonski wrote --- > Hi, > > Some time ago oslo.messaging changed default value of the "heartbeat_in_pthread" config option to "True" [1]. > As was noticed some time ago, this don't works well with nova-compute - see bug [2] for details. > Recently we noticed in our downstream Red Hat OpenStack, that it's not only nova-compute which don't works well with it and can hangs. We saw the same issue in various neutron agent processes. And it seems that it can be the same for any non-wsgi service which is using rabbitmq to send heartbeats. > So giving all of that, I just proposed change of the default value of that config option to be "False" again [3]. > And my question is - would it be possible and acceptable to backport such change up to stable/wallaby (if and when it will be approved for master of course). IMO this could be useful for users as using this option set as "True" be default don't makes any sense for the non-wsgi applications really and may cause more bad then good things really. What are You opinions about it? This is tricky, in general the default value change should not be backported because it change the default behavior and so does the compatibility. But along with considering the cases do not work with the current default value (you mentioned in this email), we should consider if this worked in any other case or not. If so then I think we should not backport this and tell operator to override it to False as workaround for stable branch fixes. -gmann > > [1] https://review.opendev.org/c/openstack/oslo.messaging/+/747395 > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 > [3] https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ > > -- > Slawek Kaplonski > Principal Software Engineer > Red Hat From gmann at ghanshyammann.com Fri Aug 5 19:14:04 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 06 Aug 2022 00:44:04 +0530 Subject: [all][tc] What's happening in Technical Committee: summary 2022 Aug 5: Reading: 5 min Message-ID: <1826f6dd1a2.e30ac89d307365.5803008888551857942@ghanshyammann.com> Hello Everyone, Here is this week's summary of the Technical Committee activities. 1. TC Meetings: ============ * We had this week's meeting on Aug 4. Most of the topics I am summarizing in this email. This was a video call and recordings are available @https://www.youtube.com/watch?v=2CFrDKJsnEY * Next TC weekly meeting will be on Aug 11 Thursday at 15:00 UTC, feel free to add the topic on the agenda[1] by Aug 10. 2. What we completed this week: ========================= * Added charmed k8s operators to OpenStack Charms[2] 3. Activities In progress: ================== TC Tracker for Zed cycle ------------------------------ * Zed tracker etherpad includes the TC working items[3], Two are completed and others items are in-progress. Open Reviews ----------------- * Three open reviews for ongoing activities[4]. 2023.1 cycle Technical Election (TC + PTL) planning ------------------------------------------------------------- As we are in the R-9 week, its time to plan for the 2023.1 cycle election. As first step, we have volunteer to help executing this election. In TC meeting, Amy, Knikolla, and Slaweq volunteer to help and Ian might be available any ways. If you would like to get involved in the elections help, please let us know in #openstack-tc IRC channel. 2023.1 cycle TC PTG planning ------------------------------------ In TC meeting, we were gathering the data on who all will be traveling to in-person PTG or we should consider virtual PTG. But as you might have seen the announcement today, coming PTG will be 100% virtual[5]. 2021 User Survey TC Question Analysis ----------------------------------------------- No update on this. The survey summary is up for review[6]. Feel free to check and provide feedback. Zed cycle Leaderless projects ---------------------------------- Dale Smith volunteer to be PTL for Adjutant project [7] Fixing Zuul config error ---------------------------- Requesting projects with zuul config error to look into those and fix them which should not take much time[8][9]. Project updates ------------------- * Retire openstack-helm-addons[10] 4. How to contact the TC: ==================== If you would like to discuss or give feedback to TC, you can reach out to us in multiple ways: 1. Email: you can send the email with tag [tc] on openstack-discuss ML[11]. 2. Weekly meeting: The Technical Committee conduct a weekly meeting every Thursday 15 UTC [12] 3. Ping us using 'tc-members' nickname on #openstack-tc IRC channel. [1] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting [2] https://review.opendev.org/c/openstack/governance/+/849997 [3] https://etherpad.opendev.org/p/tc-zed-tracker [4] https://review.opendev.org/q/projects:openstack/governance+status:open [5] https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029879.html [6] https://review.opendev.org/c/openstack/governance/+/836888 [7] https://review.opendev.org/c/openstack/governance/+/849606 [8] https://etherpad.opendev.org/p/zuul-config-error-openstack [9] http://lists.openstack.org/pipermail/openstack-discuss/2022-May/028603.html [10] https://review.opendev.org/c/openstack/governance/+/849997 [11] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss [12] http://eavesdrop.openstack.org/#Technical_Committee_Meeting -gmann From alex.kavanagh at canonical.com Fri Aug 5 19:36:47 2022 From: alex.kavanagh at canonical.com (Alex Kavanagh) Date: Fri, 5 Aug 2022 20:36:47 +0100 Subject: [charms] General release of charmhub OpenStack xena/stable, ceph pacific/stable, OVN 21.09/stable tracks for Ubuntu 20.04 (focal) LTS Message-ID: Hi All The title may seem a bit odd because surely yoga is the current release, and zed is the current cycle? This is true, so a little explanation is in order. As the charm-guide [1] explains, in the previous cycle, the OpenStack, and supporting, charms moved from a single stable charm (for each component) that supported multiple releases, to a charm+track that targets a specific release [2]. The current release for the OpenStack charms is 'yoga' and thus there is a yoga/stable track. For a full explanation, please see [1]. Today, as part of our ongoing charmhub stable migration work this cycle, we have (re)released the 21.10 charms (that supported queens -> xena) as "xena" charms for OpenStack, "pacific" charms for Ceph, and "21.09" charms for OVN. The xena, pacific and 21.09-OVN channel charms have the same features as the existing 21.10 charms, include bug-fixes and relevant backports since the 21.10 charms were released, and have been re-tested and verified as a set prior to being released to the xena/stable channel. The full list of charms for a "Xena" OpenStack system, including ceph, OVN, mysql8, hacluster, rabbitmq-server, vault and others can be seen in the docs at [3]. The xena/pacific/21.09 charms have been designed to be upgradable from the 21.10 charms if you are running a xena OpenStack, but please consult the docs [4] about how to go about upgrading charms. The charms also are able to perform the OpenStack upgrade from wallaby -> xena, and again please consult the docs [5] for upgrading OpenStack. Our next stable maintenance release will be wallaby/pacific/20.12 (ovn), so please keep an eye out on the mailing lists. Thanks! [1] https://docs.openstack.org/charm-guide/latest/project/charm-delivery.html [2] https://juju.is/docs/olm/deploy-a-charm-from-charmhub#heading--specify-a-charmed-operator-channel [3] https://docs.openstack.org/charm-guide/latest/project/charm-delivery.html#channels-and-tracks-for-openstack-charms [4] https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/upgrade-charms.html [5] https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/upgrade-openstack.html -- Alex Kavanagh - Software Engineer OpenStack Engineering - Data Centre Development - Canonical Ltd -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Fri Aug 5 20:17:02 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Fri, 5 Aug 2022 22:17:02 +0200 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> References: <21589662.aDxSllVl8Y@p1> <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> Message-ID: <2ecb5760-8b2e-beb3-7734-48d8b786aeff@est.tech> With a stable maintainer hat on, i have to say that both Jeremy and Ghanshyam are right: - in general, backporting default config value changes are not allowed according to stable policy - in some situations, though, if there are other reasons / it is more beneficial to backport the patch (based on thorough consideration by the team) then the team can make an exception and backport it anyway So in this case that should be considered (as Ghanshyam wrote), whether it could have worked for some components, and what impact could it have, and maybe it is safer to not backport but inform the operators to change the config value for the failing cases. Cheers, El?d On 2022. 08. 05. 20:17, Ghanshyam Mann wrote: > ---- On Fri, 05 Aug 2022 17:54:25 +0530 Slawek Kaplonski wrote --- > > Hi, > > > > Some time ago oslo.messaging changed default value of the "heartbeat_in_pthread" config option to "True" [1]. > > As was noticed some time ago, this don't works well with nova-compute - see bug [2] for details. > > Recently we noticed in our downstream Red Hat OpenStack, that it's not only nova-compute which don't works well with it and can hangs. We saw the same issue in various neutron agent processes. And it seems that it can be the same for any non-wsgi service which is using rabbitmq to send heartbeats. > > So giving all of that, I just proposed change of the default value of that config option to be "False" again [3]. > > And my question is - would it be possible and acceptable to backport such change up to stable/wallaby (if and when it will be approved for master of course). IMO this could be useful for users as using this option set as "True" be default don't makes any sense for the non-wsgi applications really and may cause more bad then good things really. What are You opinions about it? > > This is tricky, in general the default value change should not be backported because it change > the default behavior and so does the compatibility. But along with considering the cases do not > work with the current default value (you mentioned in this email), we should consider if this worked > in any other case or not. If so then I think we should not backport this and tell operator to override > it to False as workaround for stable branch fixes. > > -gmann > > > > > [1] https://review.opendev.org/c/openstack/oslo.messaging/+/747395 > > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 > > [3] https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ > > > > -- > > Slawek Kaplonski > > Principal Software Engineer > > Red Hat > From elod.illes at est.tech Fri Aug 5 20:27:01 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Fri, 5 Aug 2022 22:27:01 +0200 Subject: [release] Release countdown for week R-8, Aug 08 - 12 Message-ID: <81c6e026-6463-2fbd-fa9d-e7ab8e113f63@est.tech> General Information ------------------- The following cycle-with-intermediary deliverables have not done any intermediary release yet during this cycle: Ironic: * bifrost * ironic-prometheus-exporter * ironic-python-agent-builder * ironic-ui * networking-baremetal * networking-generic-switch OpenStackSDK: * python-openstackclient Swift: * swift Since these deliverables are traditionally released after our warning, we did not propose to change the release model to cycle-with-rc for them this time. Please let us know if any of the above project is considering the release model change anyway and we propose those patches. Otherwise immediately release an intermediary release, or acknowledge that a release needs to be done before RC1! We also published a proposed release schedule for the upcoming Antelope cycle. Please check out the separate thread: https://lists.openstack.org/pipermail/openstack-discuss/2022-July/029767.html Upcoming Deadlines & Dates -------------------------- Non-client library freeze: August 26th, 2022 (R-6 week) Client library freeze: September 1st, 2022 (R-5 week) Zed-3 milestone: September 1st, 2022 (R-5 week) El?d Ill?s irc: elodilles From smooney at redhat.com Fri Aug 5 21:39:52 2022 From: smooney at redhat.com (Sean Mooney) Date: Fri, 5 Aug 2022 22:39:52 +0100 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> References: <21589662.aDxSllVl8Y@p1> <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> Message-ID: On Fri, Aug 5, 2022 at 7:40 PM Ghanshyam Mann wrote: > > ---- On Fri, 05 Aug 2022 17:54:25 +0530 Slawek Kaplonski wrote --- > > Hi, > > > > Some time ago oslo.messaging changed default value of the "heartbeat_in_pthread" config option to "True" [1]. > > As was noticed some time ago, this don't works well with nova-compute - see bug [2] for details. > > Recently we noticed in our downstream Red Hat OpenStack, that it's not only nova-compute which don't works well with it and can hangs. We saw the same issue in various neutron agent processes. And it seems that it can be the same for any non-wsgi service which is using rabbitmq to send heartbeats. > > So giving all of that, I just proposed change of the default value of that config option to be "False" again [3]. > > And my question is - would it be possible and acceptable to backport such change up to stable/wallaby (if and when it will be approved for master of course). IMO this could be useful for users as using this option set as "True" be default don't makes any sense for the non-wsgi applications really and may cause more bad then good things really. What are You opinions about it? > > This is tricky, in general the default value change should not be backported because it change > the default behavior and so does the compatibility. But along with considering the cases do not > work with the current default value (you mentioned in this email), we should consider if this worked > in any other case or not. If so then I think we should not backport this and tell operator to override > it to False as workaround for stable branch fixes. as afar as i am aware the only impact of setting the default to false for wsgi applications is running under mod_wsgi or uwsgi may have the heatbeat greenthread killed when the wsgi server susspand the application after a time out following the processing of an api request. there is no known negitive impact to this other then a log message that can safely be ignored on both rabbitmq and the api log relating to the amqp messing connection being closed and repopend. keeping the value at true can cause the nova compute agent, neutron agent and i susppoct nova conductor/schduler to hang following a rabbitmq disconnect. that can leave the relevnet service unresponcei until its restarted. so having the default set to true is known to breake several services but tehre are no know issue that are caused by setting it to false that impact the operation fo any service. so i have a stong preference for setting thsi to false by default on stable branches. > > -gmann > > > > > [1] https://review.opendev.org/c/openstack/oslo.messaging/+/747395 > > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 > > [3] https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ > > > > -- > > Slawek Kaplonski > > Principal Software Engineer > > Red Hat > From fungi at yuggoth.org Sat Aug 6 13:11:39 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sat, 6 Aug 2022 13:11:39 +0000 Subject: [tc] August 2022 OpenInfra Board Sync In-Reply-To: <20220630142207.rwtyc3apyhd2gyjv@yuggoth.org> Message-ID: <20220806131138.r7fdlciv6xtbsq6n@yuggoth.org> I'm forwarding the meeting invitation from Julia for Wednesday, 2022-08-10 20:00-21:00 UTC: ----- Forwarded message from Julia Kreger ----- Date: Fri, 05 Aug 2022 22:30:33 +0000 Subject: Invitation: OpenInfra Board/OpenStack TC - Informal call @ Wed Aug 10, 2022 3pm - 4pm (CDT) OpenInfra Board/OpenStack TC - Informal call Wednesday Aug 10, 2022 ? 3pm ? 4pm Central Time - Chicago Location https://us02web.zoom.us/j/83715721248?pwd=bm5rYlBaWTJwRUM3T3k2cCtEVk9DZz09 https://www.google.com/url?q=https%3A%2F%2Fus02web.zoom.us%2Fj%2F83715721248%3Fpwd%3Dbm5rYlBaWTJwRUM3T3k2cCtEVk9DZz09&sa=D&ust=1660170600000000&usg=AOvVaw0pXPysIZQwGG0FI-kTciET Greetings everyone,We are going to have an informal call between the OpenInfra Board members who are interested, and members of the OpenStack TC to build mutual context. There will not be a presentation, just discussion of where we perceive things are at, where they are going, and provide an opportunity to ask questions.Please feel free to forward this invite, but attendance is *not* mandatory.Etherpad: https://etherpad.opendev.org/p/board-tc-informal-discussionThanks everyone!-Julia??????????Julia Kreger is inviting you to a scheduled Zoom meeting.Join Zoom Meetinghttps://us02web.zoom.us/j/83715721248?pwd=bm5rYlBaWTJwRUM3T3k2cCtEVk9DZz09Meeting ID: 837 1572 1248Passcode: 825541One tap mobile+16694449171,,83715721248#,,,,*825541# US+16699006833,,83715721248#,,,,*825541# US (San Jose)Dial by your location        +1 669 444 9171 US        +1 669 900 6833 US (San Jose)        +1 253 215 8782 US (Tacoma)        +1 346 248 7799 US (Houston)        +1 301 715 8592 US (Washington DC)        +1 312 626 6799 US (Chicago)        +1 386 347 5053 US        +1 564 217 2000 US        +1 646 931 3860 US        +1 929 205 6099 US (New York)Meeting ID: 837 1572 1248Passcode: 825541Find your local number: https://us02web.zoom.us/u/kUW73ftHL?????????? ----- End forwarded message ----- From kdhall at binghamton.edu Sat Aug 6 18:52:03 2022 From: kdhall at binghamton.edu (Dave Hall) Date: Sat, 6 Aug 2022 14:52:03 -0400 Subject: [Cinder][NFS][Openstack-Ansible] Message-ID: Hello, I seem to have gotten myself in a bit of a mess trying to set up Cinder with an NFS back-end. After working with Glance and NFS, I started on Cinder. I noticed immediately that there weren't any NFS mounts in the Cinder-API containers like there were in the Glance-API containers. Also that there were no NFS packages in the Cinder-API containers. In reading some Cinder documentation, I also got the impression that each Cinder host/container needs to have its own NFS store. Pawing through the playbooks and documentation I saw that unlike Glance, Cinder is split into two pieces - Cinder-API and Cinder-Volume. I found cinder-volume.yml.example in env.d, activated it, and created Cinder-Volume containers on my 3 infra hosts. I also created 3 separate NFS shares and changed the storage-hosts section of my openstack_user_config.yml accordingly. After this I found that while I was able to create volumes, the prep_volume part of launching an instance was failing. Digging in, I found: # openstack volume service list +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ | Binary | Host | Zone | Status | State | Updated At | +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ | cinder-volume | C6220-9 at nfs_volume | nova | enabled | down | 2022-07-23T02:46:13.000000 | | cinder-volume | C6220-10 at nfs_volume | nova | enabled | down | 2022-07-23T02:46:14.000000 | | cinder-volume | C6220-11 at nfs_volume | nova | enabled | down | 2022-07-23T02:46:14.000000 | | cinder-scheduler | infra36-cinder-api-container-da8e100f | nova | enabled | up | 2022-08-06T13:29:10.000000 | | cinder-scheduler | infra38-cinder-api-container-27219f93 | nova | enabled | up | 2022-08-06T13:29:10.000000 | | cinder-scheduler | infra37-cinder-api-container-ea7f847b | nova | enabled | up | 2022-08-06T13:29:10.000000 | | cinder-volume | C6220-9 at nfs_volume1 | nova | enabled | up | 2022-08-06T13:29:10.000000 | | cinder-volume | infra37-cinder-volumes-container-5b9635ad at nfs_volume | nova | enabled | down | 2022-08-04T18:32:53.000000 | | cinder-volume | infra36-cinder-volumes-container-77190057 at nfs_volume1 | nova | enabled | down | 2022-08-06T13:03:03.000000 | | cinder-volume | infra38-cinder-volumes-container-a7bcfc9b at nfs_volume | nova | enabled | down | 2022-08-04T18:32:53.000000 | | cinder-volume | infra37-cinder-volumes-container-5b9635ad at nfs_volume2 | nova | enabled | down | 2022-08-06T13:03:05.000000 | | cinder-volume | C6220-10 at nfs_volume2 | nova | enabled | up | 2022-08-06T13:29:10.000000 | | cinder-volume | C6220-11 at nfs_volume3 | nova | enabled | up | 2022-08-06T13:29:10.000000 | | cinder-volume | infra38-cinder-volumes-container-a7bcfc9b at nfs_volume3 | nova | enabled | down | 2022-08-06T13:03:03.000000 | +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ Thinking I could save this, I used containers-lxc-destroy.yml to destroy my cinder-volumes containers and deactivated cinder-volume.yml.example. Then I ran setup-hosts.yml, which has restored the cinder-volumes containers even though is_metal: false has been removed. Clearly a stronger intervention will be required. I would like to fully get rid of the cinder-volumes containers and go back to an is_metal: true scenario. I also need to get rid of the unnumbered nfs_volume referenes, which I assume are in some cinder config file somewhere. Below is a clip from my openstack_user_config.yml: storage_hosts: infra36: ip: 172.29.236.36 container_vars: cinder_backends: nfs_volume1: volume_backend_name: NFS_VOLUME1 volume_driver: cinder.volume.drivers.nfs.NfsDriver nfs_mount_options: "rsize=65535,wsize=65535,timeo=1200,actimeo=120" nfs_shares_config: /etc/cinder/nfs_shares_volume1 shares: - { ip: "172.29.244.27", share: "/NFS_VOLUME1" } infra37: ip: 172.29.236.37 container_vars: cinder_backends: nfs_volume2: volume_backend_name: NFS_VOLUME2 volume_driver: cinder.volume.drivers.nfs.NfsDriver nfs_mount_options: "rsize=65535,wsize=65535,timeo=1200,actimeo=120" nfs_shares_config: /etc/cinder/nfs_shares_volume2 shares: - { ip: "172.29.244.27", share: "/NFS_VOLUME2" } infra38: ip: 172.29.236.38 container_vars: cinder_backends: nfs_volume3: volume_backend_name: NFS_VOLUME3 volume_driver: cinder.volume.drivers.nfs.NfsDriver nfs_mount_options: "rsize=65535,wsize=65535,timeo=1200,actimeo=120" nfs_shares_config: /etc/cinder/nfs_shares_volume3 shares: - { ip: "172.29.244.27", share: "/NFS_VOLUME3" } Any advice would be greatly appreciated. Thanks. -Dave -- Dave Hall Binghamton University kdhall at binghamton.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdhall at binghamton.edu Sun Aug 7 14:11:27 2022 From: kdhall at binghamton.edu (Dave Hall) Date: Sun, 7 Aug 2022 10:11:27 -0400 Subject: [Cinder][NFS][Openstack-Ansible] Cinder-Volume Mess - Containers and Metal by Accident In-Reply-To: References: Message-ID: Hello, Please pardon the repost - I noticed this morning that I didn't finish the subject line. Problem summary: I have a bunch of lingering non-functional cinder definitions and I'm looking for guidance on how to clean them up. Thanks. -Dave -- Dave Hall Binghamton University kdhall at binghamton.edu On Sat, Aug 6, 2022 at 2:52 PM Dave Hall wrote: > Hello, > > I seem to have gotten myself in a bit of a mess trying to set up Cinder > with an NFS back-end. After working with Glance and NFS, I started on > Cinder. I noticed immediately that there weren't any NFS mounts in the > Cinder-API containers like there were in the Glance-API containers. Also > that there were no NFS packages in the Cinder-API containers. > > In reading some Cinder documentation, I also got the impression that each > Cinder host/container needs to have its own NFS store. > > Pawing through the playbooks and documentation I saw that unlike Glance, > Cinder is split into two pieces - Cinder-API and Cinder-Volume. I found > cinder-volume.yml.example in env.d, activated it, and created Cinder-Volume > containers on my 3 infra hosts. I also created 3 separate NFS shares and > changed the storage-hosts section of my openstack_user_config.yml > accordingly. > > After this I found that while I was able to create volumes, the > prep_volume part of launching an instance was failing. > > Digging in, I found: > > # openstack volume service list > > +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ > | Binary | Host > | Zone | Status | State | Updated At | > > +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ > | cinder-volume | C6220-9 at nfs_volume > | nova | enabled | down | 2022-07-23T02:46:13.000000 | > | cinder-volume | C6220-10 at nfs_volume > | nova | enabled | down | 2022-07-23T02:46:14.000000 | > | cinder-volume | C6220-11 at nfs_volume > | nova | enabled | down | 2022-07-23T02:46:14.000000 | > | cinder-scheduler | infra36-cinder-api-container-da8e100f > | nova | enabled | up | 2022-08-06T13:29:10.000000 | > | cinder-scheduler | infra38-cinder-api-container-27219f93 > | nova | enabled | up | 2022-08-06T13:29:10.000000 | > | cinder-scheduler | infra37-cinder-api-container-ea7f847b > | nova | enabled | up | 2022-08-06T13:29:10.000000 | > | cinder-volume | C6220-9 at nfs_volume1 > | nova | enabled | up | 2022-08-06T13:29:10.000000 | > | cinder-volume | infra37-cinder-volumes-container-5b9635ad at nfs_volume > | nova | enabled | down | 2022-08-04T18:32:53.000000 | > | cinder-volume | infra36-cinder-volumes-container-77190057 at nfs_volume1 > | nova | enabled | down | 2022-08-06T13:03:03.000000 | > | cinder-volume | infra38-cinder-volumes-container-a7bcfc9b at nfs_volume > | nova | enabled | down | 2022-08-04T18:32:53.000000 | > | cinder-volume | infra37-cinder-volumes-container-5b9635ad at nfs_volume2 > | nova | enabled | down | 2022-08-06T13:03:05.000000 | > | cinder-volume | C6220-10 at nfs_volume2 > | nova | enabled | up | 2022-08-06T13:29:10.000000 | > | cinder-volume | C6220-11 at nfs_volume3 > | nova | enabled | up | 2022-08-06T13:29:10.000000 | > | cinder-volume | infra38-cinder-volumes-container-a7bcfc9b at nfs_volume3 > | nova | enabled | down | 2022-08-06T13:03:03.000000 | > > +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ > > Thinking I could save this, I used containers-lxc-destroy.yml to destroy > my cinder-volumes containers and deactivated cinder-volume.yml.example. > Then I ran setup-hosts.yml, which has restored the cinder-volumes > containers even though is_metal: false has been removed. > > Clearly a stronger intervention will be required. I would like to fully > get rid of the cinder-volumes containers and go back to an is_metal: true > scenario. I also need to get rid of the unnumbered nfs_volume referenes, > which I assume are in some cinder config file somewhere. > > Below is a clip from my openstack_user_config.yml: > > storage_hosts: > infra36: > ip: 172.29.236.36 > container_vars: > cinder_backends: > nfs_volume1: > volume_backend_name: NFS_VOLUME1 > volume_driver: cinder.volume.drivers.nfs.NfsDriver > nfs_mount_options: > "rsize=65535,wsize=65535,timeo=1200,actimeo=120" > nfs_shares_config: /etc/cinder/nfs_shares_volume1 > shares: > - { ip: "172.29.244.27", share: "/NFS_VOLUME1" } > infra37: > ip: 172.29.236.37 > container_vars: > cinder_backends: > nfs_volume2: > volume_backend_name: NFS_VOLUME2 > volume_driver: cinder.volume.drivers.nfs.NfsDriver > nfs_mount_options: > "rsize=65535,wsize=65535,timeo=1200,actimeo=120" > nfs_shares_config: /etc/cinder/nfs_shares_volume2 > shares: > - { ip: "172.29.244.27", share: "/NFS_VOLUME2" } > infra38: > ip: 172.29.236.38 > container_vars: > cinder_backends: > nfs_volume3: > volume_backend_name: NFS_VOLUME3 > volume_driver: cinder.volume.drivers.nfs.NfsDriver > nfs_mount_options: > "rsize=65535,wsize=65535,timeo=1200,actimeo=120" > nfs_shares_config: /etc/cinder/nfs_shares_volume3 > shares: > - { ip: "172.29.244.27", share: "/NFS_VOLUME3" } > > Any advice would be greatly appreciated. > > Thanks. > > -Dave > > -- > Dave Hall > Binghamton University > kdhall at binghamton.edu > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Sun Aug 7 14:40:04 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Sun, 7 Aug 2022 16:40:04 +0200 Subject: [Cinder][NFS][Openstack-Ansible] Cinder-Volume Mess - Containers and Metal by Accident In-Reply-To: References: Message-ID: Hi, 1. When you remove definition from openstack_user_config, items from inventory won't be dropped automatically. For that you would need to use scripts/inventory-manage.py -r . You should indeed destroy containers first. 2. In order to remove cider-volume definitions from API, you would need to run cinder-manage service remove https://docs.openstack.org/cinder/rocky/man/cinder-manage.html#cinder-service cinder-manage binary can be found on cinder-api containers inside virtualenv, ie /openstack/venvs/cinder-/bin/cinder-manage ??, 7 ???. 2022 ?., 16:22 Dave Hall : > Hello, > > Please pardon the repost - I noticed this morning that I didn't finish the > subject line. > > Problem summary: I have a bunch of lingering non-functional cinder > definitions and I'm looking for guidance on how to clean them up. > > Thanks. > > -Dave > > -- > Dave Hall > Binghamton University > kdhall at binghamton.edu > > > > On Sat, Aug 6, 2022 at 2:52 PM Dave Hall wrote: > >> Hello, >> >> I seem to have gotten myself in a bit of a mess trying to set up Cinder >> with an NFS back-end. After working with Glance and NFS, I started on >> Cinder. I noticed immediately that there weren't any NFS mounts in the >> Cinder-API containers like there were in the Glance-API containers. Also >> that there were no NFS packages in the Cinder-API containers. >> >> In reading some Cinder documentation, I also got the impression that each >> Cinder host/container needs to have its own NFS store. >> >> Pawing through the playbooks and documentation I saw that unlike Glance, >> Cinder is split into two pieces - Cinder-API and Cinder-Volume. I found >> cinder-volume.yml.example in env.d, activated it, and created Cinder-Volume >> containers on my 3 infra hosts. I also created 3 separate NFS shares and >> changed the storage-hosts section of my openstack_user_config.yml >> accordingly. >> >> After this I found that while I was able to create volumes, the >> prep_volume part of launching an instance was failing. >> >> Digging in, I found: >> >> # openstack volume service list >> >> +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ >> | Binary | Host >> | Zone | Status | State | Updated At | >> >> +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ >> | cinder-volume | C6220-9 at nfs_volume >> | nova | enabled | down | 2022-07-23T02:46:13.000000 | >> | cinder-volume | C6220-10 at nfs_volume >> | nova | enabled | down | 2022-07-23T02:46:14.000000 | >> | cinder-volume | C6220-11 at nfs_volume >> | nova | enabled | down | 2022-07-23T02:46:14.000000 | >> | cinder-scheduler | infra36-cinder-api-container-da8e100f >> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >> | cinder-scheduler | infra38-cinder-api-container-27219f93 >> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >> | cinder-scheduler | infra37-cinder-api-container-ea7f847b >> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >> | cinder-volume | C6220-9 at nfs_volume1 >> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >> | cinder-volume | infra37-cinder-volumes-container-5b9635ad at nfs_volume >> | nova | enabled | down | 2022-08-04T18:32:53.000000 | >> | cinder-volume | infra36-cinder-volumes-container-77190057 at nfs_volume1 >> | nova | enabled | down | 2022-08-06T13:03:03.000000 | >> | cinder-volume | infra38-cinder-volumes-container-a7bcfc9b at nfs_volume >> | nova | enabled | down | 2022-08-04T18:32:53.000000 | >> | cinder-volume | infra37-cinder-volumes-container-5b9635ad at nfs_volume2 >> | nova | enabled | down | 2022-08-06T13:03:05.000000 | >> | cinder-volume | C6220-10 at nfs_volume2 >> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >> | cinder-volume | C6220-11 at nfs_volume3 >> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >> | cinder-volume | infra38-cinder-volumes-container-a7bcfc9b at nfs_volume3 >> | nova | enabled | down | 2022-08-06T13:03:03.000000 | >> >> +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ >> >> Thinking I could save this, I used containers-lxc-destroy.yml to destroy >> my cinder-volumes containers and deactivated cinder-volume.yml.example. >> Then I ran setup-hosts.yml, which has restored the cinder-volumes >> containers even though is_metal: false has been removed. >> >> Clearly a stronger intervention will be required. I would like to fully >> get rid of the cinder-volumes containers and go back to an is_metal: true >> scenario. I also need to get rid of the unnumbered nfs_volume referenes, >> which I assume are in some cinder config file somewhere. >> >> Below is a clip from my openstack_user_config.yml: >> >> storage_hosts: >> infra36: >> ip: 172.29.236.36 >> container_vars: >> cinder_backends: >> nfs_volume1: >> volume_backend_name: NFS_VOLUME1 >> volume_driver: cinder.volume.drivers.nfs.NfsDriver >> nfs_mount_options: >> "rsize=65535,wsize=65535,timeo=1200,actimeo=120" >> nfs_shares_config: /etc/cinder/nfs_shares_volume1 >> shares: >> - { ip: "172.29.244.27", share: "/NFS_VOLUME1" } >> infra37: >> ip: 172.29.236.37 >> container_vars: >> cinder_backends: >> nfs_volume2: >> volume_backend_name: NFS_VOLUME2 >> volume_driver: cinder.volume.drivers.nfs.NfsDriver >> nfs_mount_options: >> "rsize=65535,wsize=65535,timeo=1200,actimeo=120" >> nfs_shares_config: /etc/cinder/nfs_shares_volume2 >> shares: >> - { ip: "172.29.244.27", share: "/NFS_VOLUME2" } >> infra38: >> ip: 172.29.236.38 >> container_vars: >> cinder_backends: >> nfs_volume3: >> volume_backend_name: NFS_VOLUME3 >> volume_driver: cinder.volume.drivers.nfs.NfsDriver >> nfs_mount_options: >> "rsize=65535,wsize=65535,timeo=1200,actimeo=120" >> nfs_shares_config: /etc/cinder/nfs_shares_volume3 >> shares: >> - { ip: "172.29.244.27", share: "/NFS_VOLUME3" } >> >> Any advice would be greatly appreciated. >> >> Thanks. >> >> -Dave >> >> -- >> Dave Hall >> Binghamton University >> kdhall at binghamton.edu >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Sun Aug 7 15:05:34 2022 From: satish.txt at gmail.com (Satish Patel) Date: Sun, 7 Aug 2022 11:05:34 -0400 Subject: Guide for Openstack installation with HA, OVN ? In-Reply-To: References: Message-ID: <0AA11851-279F-46E3-80ED-E907271DD70E@gmail.com> I have created some OVN related content for HA in my blog check it out. https://satishdotpatel.github.io/openstack-ansible-multinode-ovn/ Sent from my iPhone > On Jul 13, 2022, at 2:21 AM, Alvaro Soto wrote: > > ? > Any idea on what you want to use to deploy your cluster? > > https://docs.openstack.org/openstack-ansible/latest/ > https://wiki.openstack.org/wiki/TripleO > > ??? > > Cheers! > >> On Wed, Jul 13, 2022 at 12:53 AM ??? wrote: >> >> Hello >> >> >> >> I've tried minimal installation of openstack following official documentation >> >> Now I want to install openstack with >> >> >> >> 1. High availability configuration - keystone, placement, neutron, glance, cinder, ... >> >> 2. Open Virtual Network for neutron driver >> >> >> >> But it's hard to find documentation online >> >> Could you provide some links for the material? >> >> >> >> Thank you! >> > > > -- > > Alvaro Soto > > Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. > ---------------------------------------------------------- > Great people talk about ideas, > ordinary people talk about things, > small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Sun Aug 7 15:08:54 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Mon, 8 Aug 2022 00:08:54 +0900 Subject: [Cinder][NFS][Openstack-Ansible] Cinder-Volume Mess - Containers and Metal by Accident In-Reply-To: References: Message-ID: I'm not familiar with the setup created by openstack-ansible but let me leave a few comments from cinder's perspective. - cinder api does not require access to backend storage. Only cinder-volume communicates with backend storage. Thus missing nfs mount/tools in cinder-api is what we definitely expect. - Usually you need to deploy cinder-volume as an act/sby process. Your cinder-volume would run on only one of the controller nodes managed by a cluster technology like pacemaker. You can split cinder-volume per backend but in such case you'd run each c-vol in act/sby mode. - Recent cinder supports deploying cinder-volume in active/active mode with lock management by tooz. I'm not sure whether this is supported by openstack-ansible as well as it is well tested with NFS backend. - When you deploy cinder-volume in active/active mode, cinder-volume should run in all controller nodes and all processes should have access to the same storage backends. If you use different backend setting for each controller then you'd lost control about some of your volumes when the associated controller node goes down. I'd recommend checking whether you expect deploying cinder-volume in active/standby or in active/active. In case you want active/active then you should have multiple cinder-volume processes with the same backend definition. On Sun, Aug 7, 2022 at 11:55 PM Dmitriy Rabotyagov wrote: > Hi, > > 1. When you remove definition from openstack_user_config, items from > inventory won't be dropped automatically. For that you would need to use > scripts/inventory-manage.py -r . You should indeed destroy > containers first. > > 2. In order to remove cider-volume definitions from API, you would need to > run cinder-manage service remove > > > https://docs.openstack.org/cinder/rocky/man/cinder-manage.html#cinder-service > > cinder-manage binary can be found on cinder-api containers inside > virtualenv, ie /openstack/venvs/cinder-/bin/cinder-manage > > > ??, 7 ???. 2022 ?., 16:22 Dave Hall : > >> Hello, >> >> Please pardon the repost - I noticed this morning that I didn't finish >> the subject line. >> >> Problem summary: I have a bunch of lingering non-functional cinder >> definitions and I'm looking for guidance on how to clean them up. >> >> Thanks. >> >> -Dave >> >> -- >> Dave Hall >> Binghamton University >> kdhall at binghamton.edu >> >> >> >> On Sat, Aug 6, 2022 at 2:52 PM Dave Hall wrote: >> >>> Hello, >>> >>> I seem to have gotten myself in a bit of a mess trying to set up Cinder >>> with an NFS back-end. After working with Glance and NFS, I started on >>> Cinder. I noticed immediately that there weren't any NFS mounts in the >>> Cinder-API containers like there were in the Glance-API containers. Also >>> that there were no NFS packages in the Cinder-API containers. >>> >>> In reading some Cinder documentation, I also got the impression that >>> each Cinder host/container needs to have its own NFS store. >>> >>> Pawing through the playbooks and documentation I saw that unlike Glance, >>> Cinder is split into two pieces - Cinder-API and Cinder-Volume. I found >>> cinder-volume.yml.example in env.d, activated it, and created Cinder-Volume >>> containers on my 3 infra hosts. I also created 3 separate NFS shares and >>> changed the storage-hosts section of my openstack_user_config.yml >>> accordingly. >>> >>> After this I found that while I was able to create volumes, the >>> prep_volume part of launching an instance was failing. >>> >>> Digging in, I found: >>> >>> # openstack volume service list >>> >>> +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ >>> | Binary | Host >>> | Zone | Status | State | Updated At | >>> >>> +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ >>> | cinder-volume | C6220-9 at nfs_volume >>> | nova | enabled | down | 2022-07-23T02:46:13.000000 | >>> | cinder-volume | C6220-10 at nfs_volume >>> | nova | enabled | down | 2022-07-23T02:46:14.000000 | >>> | cinder-volume | C6220-11 at nfs_volume >>> | nova | enabled | down | 2022-07-23T02:46:14.000000 | >>> | cinder-scheduler | infra36-cinder-api-container-da8e100f >>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>> | cinder-scheduler | infra38-cinder-api-container-27219f93 >>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>> | cinder-scheduler | infra37-cinder-api-container-ea7f847b >>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>> | cinder-volume | C6220-9 at nfs_volume1 >>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>> | cinder-volume | infra37-cinder-volumes-container-5b9635ad at nfs_volume >>> | nova | enabled | down | 2022-08-04T18:32:53.000000 | >>> | cinder-volume | >>> infra36-cinder-volumes-container-77190057 at nfs_volume1 | nova | enabled >>> | down | 2022-08-06T13:03:03.000000 | >>> | cinder-volume | infra38-cinder-volumes-container-a7bcfc9b at nfs_volume >>> | nova | enabled | down | 2022-08-04T18:32:53.000000 | >>> | cinder-volume | >>> infra37-cinder-volumes-container-5b9635ad at nfs_volume2 | nova | enabled >>> | down | 2022-08-06T13:03:05.000000 | >>> | cinder-volume | C6220-10 at nfs_volume2 >>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>> | cinder-volume | C6220-11 at nfs_volume3 >>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>> | cinder-volume | >>> infra38-cinder-volumes-container-a7bcfc9b at nfs_volume3 | nova | enabled >>> | down | 2022-08-06T13:03:03.000000 | >>> >>> +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ >>> >>> Thinking I could save this, I used containers-lxc-destroy.yml to destroy >>> my cinder-volumes containers and deactivated cinder-volume.yml.example. >>> Then I ran setup-hosts.yml, which has restored the cinder-volumes >>> containers even though is_metal: false has been removed. >>> >>> Clearly a stronger intervention will be required. I would like to fully >>> get rid of the cinder-volumes containers and go back to an is_metal: true >>> scenario. I also need to get rid of the unnumbered nfs_volume referenes, >>> which I assume are in some cinder config file somewhere. >>> >>> Below is a clip from my openstack_user_config.yml: >>> >>> storage_hosts: >>> infra36: >>> ip: 172.29.236.36 >>> container_vars: >>> cinder_backends: >>> nfs_volume1: >>> volume_backend_name: NFS_VOLUME1 >>> volume_driver: cinder.volume.drivers.nfs.NfsDriver >>> nfs_mount_options: >>> "rsize=65535,wsize=65535,timeo=1200,actimeo=120" >>> nfs_shares_config: /etc/cinder/nfs_shares_volume1 >>> shares: >>> - { ip: "172.29.244.27", share: "/NFS_VOLUME1" } >>> infra37: >>> ip: 172.29.236.37 >>> container_vars: >>> cinder_backends: >>> nfs_volume2: >>> volume_backend_name: NFS_VOLUME2 >>> volume_driver: cinder.volume.drivers.nfs.NfsDriver >>> nfs_mount_options: >>> "rsize=65535,wsize=65535,timeo=1200,actimeo=120" >>> nfs_shares_config: /etc/cinder/nfs_shares_volume2 >>> shares: >>> - { ip: "172.29.244.27", share: "/NFS_VOLUME2" } >>> infra38: >>> ip: 172.29.236.38 >>> container_vars: >>> cinder_backends: >>> nfs_volume3: >>> volume_backend_name: NFS_VOLUME3 >>> volume_driver: cinder.volume.drivers.nfs.NfsDriver >>> nfs_mount_options: >>> "rsize=65535,wsize=65535,timeo=1200,actimeo=120" >>> nfs_shares_config: /etc/cinder/nfs_shares_volume3 >>> shares: >>> - { ip: "172.29.244.27", share: "/NFS_VOLUME3" } >>> >>> Any advice would be greatly appreciated. >>> >>> Thanks. >>> >>> -Dave >>> >>> -- >>> Dave Hall >>> Binghamton University >>> kdhall at binghamton.edu >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Sun Aug 7 15:36:09 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Sun, 7 Aug 2022 17:36:09 +0200 Subject: [Cinder][NFS][Openstack-Ansible] Cinder-Volume Mess - Containers and Metal by Accident In-Reply-To: References: Message-ID: I have actually a question on unrelated topic, but related to the active/standby of cinder-volume. Is there any way to explicitly tell to the backend that it should be active? As well as would it restore active mode if at some point become unreachable and then restore? Recently I tried to look through docs and config options, but was not able to find an answer. ??, 7 ???. 2022 ?., 17:09 Takashi Kajinami : > > - Usually you need to deploy cinder-volume as an act/sby process. Your > cinder-volume would run > on only one of the controller nodes managed by a cluster technology like > pacemaker. > You can split cinder-volume per backend but in such case you'd run each > c-vol in act/sby mode. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Sun Aug 7 15:47:20 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Sun, 7 Aug 2022 17:47:20 +0200 Subject: [Cinder][NFS][Openstack-Ansible] Cinder-Volume Mess - Containers and Metal by Accident In-Reply-To: References: Message-ID: Another question - are features that tooz support with etcd enough for cinder active/active to work? IIRC, for NFS openstack-ansible deploy in active/standby mode by default. Only ceph goes "active/active" but we don't define coordination url for that - only define cluster_name... Though, we do provide a role/way to get etcd, so no blocker to get coordination_url for cinder if etcd covers all cinder needs ??, 7 ???. 2022 ?., 17:36 Dmitriy Rabotyagov : > I have actually a question on unrelated topic, but related to the > active/standby of cinder-volume. > > Is there any way to explicitly tell to the backend that it should be > active? As well as would it restore active mode if at some point become > unreachable and then restore? Recently I tried to look through docs and > config options, but was not able to find an answer. > > ??, 7 ???. 2022 ?., 17:09 Takashi Kajinami : > >> >> - Usually you need to deploy cinder-volume as an act/sby process. Your >> cinder-volume would run >> on only one of the controller nodes managed by a cluster technology >> like pacemaker. >> You can split cinder-volume per backend but in such case you'd run each >> c-vol in act/sby mode. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdhall at binghamton.edu Sun Aug 7 17:46:36 2022 From: kdhall at binghamton.edu (Dave Hall) Date: Sun, 7 Aug 2022 13:46:36 -0400 Subject: [Cinder][NFS][Openstack-Ansible] Cinder-Volume Mess - Containers and Metal by Accident In-Reply-To: References: Message-ID: Thank you both for your responses and insight. It is very much appreciated. Dmitriy, Is there any sort of document/webpage about OpenStack-Ansible Internals? I didn't know anything about the scripts/inventory-manage.py script. I will study it further and get a full understanding. I'd also be interested to know how OpenStack-Ansible is put together - how the inventory is generated and managed, how the playbooks in /etc/ansible are managed and interpreted, etc. I'm also going to be wondering how to clean up the cinder-volume processes running on metal - until I get this active/active vs. active/standby sorted out I may need to drop to a single cinder-volume process. What's important for me right now is to be able to deploy instances ASAP and sort out the resilience issues later. Takashi, Thank you for clearing up my misconceptions and helping me to understand cinder's logical structure. In the long run I will definitely aim for a 3-way active/active deployment. -Dave -- Dave Hall Binghamton University kdhall at binghamton.edu On Sun, Aug 7, 2022 at 11:10 AM Takashi Kajinami wrote: > I'm not familiar with the setup created by openstack-ansible but let me > leave a few comments > from cinder's perspective. > > - cinder api does not require access to backend storage. Only > cinder-volume communicates with > backend storage. Thus missing nfs mount/tools in cinder-api is what we > definitely expect. > > - Usually you need to deploy cinder-volume as an act/sby process. Your > cinder-volume would run > on only one of the controller nodes managed by a cluster technology like > pacemaker. > You can split cinder-volume per backend but in such case you'd run each > c-vol in act/sby mode. > > - Recent cinder supports deploying cinder-volume in active/active mode > with lock management by > tooz. I'm not sure whether this is supported by openstack-ansible as > well as it is well tested with > NFS backend. > > - When you deploy cinder-volume in active/active mode, cinder-volume > should run in all controller nodes > and all processes should have access to the same storage backends. If > you use different backend setting > for each controller then you'd lost control about some of your volumes > when the associated controller node > goes down. > > I'd recommend checking whether you expect deploying cinder-volume in > active/standby or in > active/active. In case you want active/active then you should have > multiple cinder-volume processes > with the same backend definition. > > On Sun, Aug 7, 2022 at 11:55 PM Dmitriy Rabotyagov < > noonedeadpunk at gmail.com> wrote: > >> Hi, >> >> 1. When you remove definition from openstack_user_config, items from >> inventory won't be dropped automatically. For that you would need to use >> scripts/inventory-manage.py -r . You should indeed destroy >> containers first. >> >> 2. In order to remove cider-volume definitions from API, you would need >> to run cinder-manage service remove >> >> >> https://docs.openstack.org/cinder/rocky/man/cinder-manage.html#cinder-service >> >> cinder-manage binary can be found on cinder-api containers inside >> virtualenv, ie /openstack/venvs/cinder-/bin/cinder-manage >> >> >> ??, 7 ???. 2022 ?., 16:22 Dave Hall : >> >>> Hello, >>> >>> Please pardon the repost - I noticed this morning that I didn't finish >>> the subject line. >>> >>> Problem summary: I have a bunch of lingering non-functional cinder >>> definitions and I'm looking for guidance on how to clean them up. >>> >>> Thanks. >>> >>> -Dave >>> >>> -- >>> Dave Hall >>> Binghamton University >>> kdhall at binghamton.edu >>> >>> >>> >>> On Sat, Aug 6, 2022 at 2:52 PM Dave Hall wrote: >>> >>>> Hello, >>>> >>>> I seem to have gotten myself in a bit of a mess trying to set up Cinder >>>> with an NFS back-end. After working with Glance and NFS, I started on >>>> Cinder. I noticed immediately that there weren't any NFS mounts in the >>>> Cinder-API containers like there were in the Glance-API containers. Also >>>> that there were no NFS packages in the Cinder-API containers. >>>> >>>> In reading some Cinder documentation, I also got the impression that >>>> each Cinder host/container needs to have its own NFS store. >>>> >>>> Pawing through the playbooks and documentation I saw that unlike >>>> Glance, Cinder is split into two pieces - Cinder-API and Cinder-Volume. I >>>> found cinder-volume.yml.example in env.d, activated it, and created >>>> Cinder-Volume containers on my 3 infra hosts. I also created 3 separate >>>> NFS shares and changed the storage-hosts section of my >>>> openstack_user_config.yml accordingly. >>>> >>>> After this I found that while I was able to create volumes, the >>>> prep_volume part of launching an instance was failing. >>>> >>>> Digging in, I found: >>>> >>>> # openstack volume service list >>>> >>>> +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ >>>> | Binary | Host >>>> | Zone | Status | State | Updated At | >>>> >>>> +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ >>>> | cinder-volume | C6220-9 at nfs_volume >>>> | nova | enabled | down | 2022-07-23T02:46:13.000000 | >>>> | cinder-volume | C6220-10 at nfs_volume >>>> | nova | enabled | down | 2022-07-23T02:46:14.000000 | >>>> | cinder-volume | C6220-11 at nfs_volume >>>> | nova | enabled | down | 2022-07-23T02:46:14.000000 | >>>> | cinder-scheduler | infra36-cinder-api-container-da8e100f >>>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>>> | cinder-scheduler | infra38-cinder-api-container-27219f93 >>>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>>> | cinder-scheduler | infra37-cinder-api-container-ea7f847b >>>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>>> | cinder-volume | C6220-9 at nfs_volume1 >>>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>>> | cinder-volume | >>>> infra37-cinder-volumes-container-5b9635ad at nfs_volume | nova | enabled >>>> | down | 2022-08-04T18:32:53.000000 | >>>> | cinder-volume | >>>> infra36-cinder-volumes-container-77190057 at nfs_volume1 | nova | enabled >>>> | down | 2022-08-06T13:03:03.000000 | >>>> | cinder-volume | >>>> infra38-cinder-volumes-container-a7bcfc9b at nfs_volume | nova | enabled >>>> | down | 2022-08-04T18:32:53.000000 | >>>> | cinder-volume | >>>> infra37-cinder-volumes-container-5b9635ad at nfs_volume2 | nova | enabled >>>> | down | 2022-08-06T13:03:05.000000 | >>>> | cinder-volume | C6220-10 at nfs_volume2 >>>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>>> | cinder-volume | C6220-11 at nfs_volume3 >>>> | nova | enabled | up | 2022-08-06T13:29:10.000000 | >>>> | cinder-volume | >>>> infra38-cinder-volumes-container-a7bcfc9b at nfs_volume3 | nova | enabled >>>> | down | 2022-08-06T13:03:03.000000 | >>>> >>>> +------------------+-------------------------------------------------------+------+---------+-------+----------------------------+ >>>> >>>> Thinking I could save this, I used containers-lxc-destroy.yml to >>>> destroy my cinder-volumes containers and deactivated >>>> cinder-volume.yml.example. Then I ran setup-hosts.yml, which has restored >>>> the cinder-volumes containers even though is_metal: false has been removed. >>>> >>>> Clearly a stronger intervention will be required. I would like to >>>> fully get rid of the cinder-volumes containers and go back to an is_metal: >>>> true scenario. I also need to get rid of the unnumbered nfs_volume >>>> referenes, which I assume are in some cinder config file somewhere. >>>> >>>> Below is a clip from my openstack_user_config.yml: >>>> >>>> storage_hosts: >>>> infra36: >>>> ip: 172.29.236.36 >>>> container_vars: >>>> cinder_backends: >>>> nfs_volume1: >>>> volume_backend_name: NFS_VOLUME1 >>>> volume_driver: cinder.volume.drivers.nfs.NfsDriver >>>> nfs_mount_options: >>>> "rsize=65535,wsize=65535,timeo=1200,actimeo=120" >>>> nfs_shares_config: /etc/cinder/nfs_shares_volume1 >>>> shares: >>>> - { ip: "172.29.244.27", share: "/NFS_VOLUME1" } >>>> infra37: >>>> ip: 172.29.236.37 >>>> container_vars: >>>> cinder_backends: >>>> nfs_volume2: >>>> volume_backend_name: NFS_VOLUME2 >>>> volume_driver: cinder.volume.drivers.nfs.NfsDriver >>>> nfs_mount_options: >>>> "rsize=65535,wsize=65535,timeo=1200,actimeo=120" >>>> nfs_shares_config: /etc/cinder/nfs_shares_volume2 >>>> shares: >>>> - { ip: "172.29.244.27", share: "/NFS_VOLUME2" } >>>> infra38: >>>> ip: 172.29.236.38 >>>> container_vars: >>>> cinder_backends: >>>> nfs_volume3: >>>> volume_backend_name: NFS_VOLUME3 >>>> volume_driver: cinder.volume.drivers.nfs.NfsDriver >>>> nfs_mount_options: >>>> "rsize=65535,wsize=65535,timeo=1200,actimeo=120" >>>> nfs_shares_config: /etc/cinder/nfs_shares_volume3 >>>> shares: >>>> - { ip: "172.29.244.27", share: "/NFS_VOLUME3" } >>>> >>>> Any advice would be greatly appreciated. >>>> >>>> Thanks. >>>> >>>> -Dave >>>> >>>> -- >>>> Dave Hall >>>> Binghamton University >>>> kdhall at binghamton.edu >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From abishop at redhat.com Sun Aug 7 19:19:33 2022 From: abishop at redhat.com (Alan Bishop) Date: Sun, 7 Aug 2022 12:19:33 -0700 Subject: [Cinder][NFS][Openstack-Ansible] Cinder-Volume Mess - Containers and Metal by Accident In-Reply-To: References: Message-ID: On Sun, Aug 7, 2022 at 8:56 AM Dmitriy Rabotyagov wrote: > Another question - are features that tooz support with etcd enough for > cinder active/active to work? > Here's some general info about running cinder-volume in A/A mode: - Very few drivers support A/A. Drivers need to explicitly declare their support, and there's no conformance test to ensure A/A mode actually works (drivers merely claim it should work). The RBD driver definitely supports A/A, and NFS is currently *not* supported. - You need to configure two things for a driver to run A/A; 1) configure the 'cluster' setting in cinder.conf, 2) you need to configure a suitable DLM for the [coordination]backend_url in cinder.conf. From a DLM perspective, your best option is tooz's etcd3gw driver. Bear in mind the etcd3 driver is deprecated [1]. [1] https://review.opendev.org/c/openstack/tooz/+/833107 > IIRC, for NFS openstack-ansible deploy in active/standby mode by default. > Only ceph goes "active/active" but we don't define coordination url for > that - only define cluster_name... > As I mentioned above, you definitely need to configure the coordination url with a suitable DLM for A/A to work properly. Things may seem to work when doing some light testing, but the cinder-volume services will eventually start stepping all over each other once the activity load increases. Alan > Though, we do provide a role/way to get etcd, so no blocker to get > coordination_url for cinder if etcd covers all cinder needs > > ??, 7 ???. 2022 ?., 17:36 Dmitriy Rabotyagov : > >> I have actually a question on unrelated topic, but related to the >> active/standby of cinder-volume. >> >> Is there any way to explicitly tell to the backend that it should be >> active? As well as would it restore active mode if at some point become >> unreachable and then restore? Recently I tried to look through docs and >> config options, but was not able to find an answer. >> >> ??, 7 ???. 2022 ?., 17:09 Takashi Kajinami : >> >>> >>> - Usually you need to deploy cinder-volume as an act/sby process. Your >>> cinder-volume would run >>> on only one of the controller nodes managed by a cluster technology >>> like pacemaker. >>> You can split cinder-volume per backend but in such case you'd run >>> each c-vol in act/sby mode. >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Sun Aug 7 22:16:23 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Sun, 7 Aug 2022 23:16:23 +0100 Subject: [kolla-ansible] why does not the bootstrap create the docker bridge? Message-ID: Hi, The bootstrap installs docker but it does not create the docker bridge, why? Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Mon Aug 8 06:20:29 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Mon, 8 Aug 2022 08:20:29 +0200 Subject: [Cinder][NFS][Openstack-Ansible] Cinder-Volume Mess - Containers and Metal by Accident In-Reply-To: References: Message-ID: > From a DLM perspective, your best option is tooz's etcd3gw driver. Well, from tooz drivers feature comparison I thought that the best driver would be kazoo, that's why I asked if etcd (provided by etcd3gw driver) is smth that fully satisfies needs and can be used safely. Also, am I right, that for Active/Passive, instead of defining `cluster` you would need to define `volume_backend_name` (or `host`? Don't really remember) to be exactly the same value for the group that should act as a/p for the specific backend? ??, 7 ???. 2022 ?. ? 21:19, Alan Bishop : > > > > On Sun, Aug 7, 2022 at 8:56 AM Dmitriy Rabotyagov wrote: >> >> Another question - are features that tooz support with etcd enough for cinder active/active to work? > > > Here's some general info about running cinder-volume in A/A mode: > > - Very few drivers support A/A. Drivers need to explicitly declare their support, and there's no conformance test to ensure A/A mode actually works (drivers merely claim it should work). The RBD driver definitely supports A/A, and NFS is currently *not* supported. > > - You need to configure two things for a driver to run A/A; 1) configure the 'cluster' setting in cinder.conf, 2) you need to configure a suitable DLM for the [coordination]backend_url in cinder.conf. From a DLM perspective, your best option is tooz's etcd3gw driver. Bear in mind the etcd3 driver is deprecated [1]. > > [1] https://review.opendev.org/c/openstack/tooz/+/833107 > >> >> IIRC, for NFS openstack-ansible deploy in active/standby mode by default. Only ceph goes "active/active" but we don't define coordination url for that - only define cluster_name... > > > As I mentioned above, you definitely need to configure the coordination url with a suitable DLM for A/A to work properly. Things may seem to work when doing some light testing, but the cinder-volume services will eventually start stepping all over each other once the activity load increases. > > Alan > >> >> Though, we do provide a role/way to get etcd, so no blocker to get coordination_url for cinder if etcd covers all cinder needs >> >> ??, 7 ???. 2022 ?., 17:36 Dmitriy Rabotyagov : >>> >>> I have actually a question on unrelated topic, but related to the active/standby of cinder-volume. >>> >>> Is there any way to explicitly tell to the backend that it should be active? As well as would it restore active mode if at some point become unreachable and then restore? Recently I tried to look through docs and config options, but was not able to find an answer. >>> >>> ??, 7 ???. 2022 ?., 17:09 Takashi Kajinami : >>>> >>>> >>>> - Usually you need to deploy cinder-volume as an act/sby process. Your cinder-volume would run >>>> on only one of the controller nodes managed by a cluster technology like pacemaker. >>>> You can split cinder-volume per backend but in such case you'd run each c-vol in act/sby mode. From pierre at stackhpc.com Mon Aug 8 08:16:08 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 8 Aug 2022 10:16:08 +0200 Subject: [all] Automatic update of Python jobs in Zuul project configuration Message-ID: Hello, A patch submitted to python-blazarclient by yoctozepto [1] made me realise the automatic change of Python jobs from yoga to zed missed various repositories [2], mostly python clients. Is there a reason for this? Also, one can easily find projects which are using even older job templates (xena or earlier). Thanks, Pierre Riteau (priteau) [1] https://review.opendev.org/c/openstack/python-blazarclient/+/851903 [2] https://codesearch.openstack.org/?q=openstack-python3-yoga-jobs&i=nope&literal=nope&files=&excludeFiles=&repos= -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Mon Aug 8 08:25:05 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Mon, 8 Aug 2022 10:25:05 +0200 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: References: <21589662.aDxSllVl8Y@p1> <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> Message-ID: Hello all: I understand that by default we don't allow backporting a config knob default value. But I'm with Sean and his explanation. For "uwsgi" applications, if pthread is False, the only drawback will be the reconnection of the MQ socket. But in the case described by Slawek, the problem is more relevant because once the agent has been disconnected for a long time from the MQ, it is not possible to reconnect again and the agent needs to be manually restarted. I would backport the patch setting this config knob to False. Regards. On Sat, Aug 6, 2022 at 12:08 AM Sean Mooney wrote: > On Fri, Aug 5, 2022 at 7:40 PM Ghanshyam Mann > wrote: > > > > ---- On Fri, 05 Aug 2022 17:54:25 +0530 Slawek Kaplonski wrote --- > > > Hi, > > > > > > Some time ago oslo.messaging changed default value of the > "heartbeat_in_pthread" config option to "True" [1]. > > > As was noticed some time ago, this don't works well with nova-compute > - see bug [2] for details. > > > Recently we noticed in our downstream Red Hat OpenStack, that it's > not only nova-compute which don't works well with it and can hangs. We saw > the same issue in various neutron agent processes. And it seems that it can > be the same for any non-wsgi service which is using rabbitmq to send > heartbeats. > > > So giving all of that, I just proposed change of the default value of > that config option to be "False" again [3]. > > > And my question is - would it be possible and acceptable to backport > such change up to stable/wallaby (if and when it will be approved for > master of course). IMO this could be useful for users as using this option > set as "True" be default don't makes any sense for the non-wsgi > applications really and may cause more bad then good things really. What > are You opinions about it? > > > > This is tricky, in general the default value change should not be > backported because it change > > the default behavior and so does the compatibility. But along with > considering the cases do not > > work with the current default value (you mentioned in this email), we > should consider if this worked > > in any other case or not. If so then I think we should not backport this > and tell operator to override > > it to False as workaround for stable branch fixes. > as afar as i am aware the only impact of setting the default to false > for wsgi applications is > running under mod_wsgi or uwsgi may have the heatbeat greenthread > killed when the wsgi server susspand the application > after a time out following the processing of an api request. > > there is no known negitive impact to this other then a log message > that can safely be ignored on both rabbitmq and the api log relating > to the amqp messing connection being closed and repopend. > > keeping the value at true can cause the nova compute agent, neutron > agent and i susppoct nova conductor/schduler to hang following a > rabbitmq disconnect. > that can leave the relevnet service unresponcei until its restarted. > > so having the default set to true is known to breake several services > but tehre are no know issue that are caused by setting it to false > that impact the operation fo any service. > > so i have a stong preference for setting thsi to false by default on > stable branches. > > > > -gmann > > > > > > > > [1] https://review.opendev.org/c/openstack/oslo.messaging/+/747395 > > > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 > > > [3] https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ > > > > > > -- > > > Slawek Kaplonski > > > Principal Software Engineer > > > Red Hat > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From egarciar at redhat.com Mon Aug 8 08:59:29 2022 From: egarciar at redhat.com (Elvira Garcia Ruiz) Date: Mon, 8 Aug 2022 10:59:29 +0200 Subject: [neutron] Bug Deputy Report August 01 - 07 Message-ID: Hi, I was the bug deputy last week. Find the summary below: High: ------- - [ovn]router_interface port probability cannot be up https://bugs.launchpad.net/neutron/+bug/1983530 Fix: https://review.opendev.org/c/openstack/neutron/+/851997 - l2_pop does unnecessary db fetches for incompatible network types https://bugs.launchpad.net/neutron/+bug/1983558 Fix: https://review.opendev.org/c/openstack/neutron/+/852089 Medium: ----------- - [OVN] Old DNS servers are served by DHCP after dns_servers option is updated https://bugs.launchpad.net/neutron/+bug/1983270 Assigned to me Low: ------- - Neutron should clean up ACLs in OVN NB DB when a remote security group is deleted https://bugs.launchpad.net/neutron/+bug/1983600 Unassigned Kind regards! Elvira -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbengt at redhat.com Mon Aug 8 09:05:20 2022 From: dbengt at redhat.com (Daniel Mats Niklas Bengtsson) Date: Mon, 8 Aug 2022 11:05:20 +0200 Subject: [Oslo] IRC meeting. Message-ID: Hi there, I'm sorry, I was on vacation and then I had personal problems. We are still in the summer period and I think there are still people on vacation, I suggest resuming meetings in September. What do you think? From radoslaw.piliszek at gmail.com Mon Aug 8 10:37:57 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 8 Aug 2022 12:37:57 +0200 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: References: <21589662.aDxSllVl8Y@p1> <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> Message-ID: Hi all, May this config option support "auto" by default and autodetect whether the application is running under mod_wsgi (and uwsgi if it also has the issue with green threads but here I'm not really sure...) and then decide on the best option? This way I would consider this backporting a fix (i.e. the library tries better to work in the target environment). As a final thought, bear in mind there are operators who have already overwritten the default, the deployment projects can help as well. -yoctozepto On Mon, 8 Aug 2022 at 10:30, Rodolfo Alonso Hernandez wrote: > > Hello all: > > I understand that by default we don't allow backporting a config knob default value. But I'm with Sean and his explanation. For "uwsgi" applications, if pthread is False, the only drawback will be the reconnection of the MQ socket. But in the case described by Slawek, the problem is more relevant because once the agent has been disconnected for a long time from the MQ, it is not possible to reconnect again and the agent needs to be manually restarted. I would backport the patch setting this config knob to False. > > Regards. > > > On Sat, Aug 6, 2022 at 12:08 AM Sean Mooney wrote: >> >> On Fri, Aug 5, 2022 at 7:40 PM Ghanshyam Mann wrote: >> > >> > ---- On Fri, 05 Aug 2022 17:54:25 +0530 Slawek Kaplonski wrote --- >> > > Hi, >> > > >> > > Some time ago oslo.messaging changed default value of the "heartbeat_in_pthread" config option to "True" [1]. >> > > As was noticed some time ago, this don't works well with nova-compute - see bug [2] for details. >> > > Recently we noticed in our downstream Red Hat OpenStack, that it's not only nova-compute which don't works well with it and can hangs. We saw the same issue in various neutron agent processes. And it seems that it can be the same for any non-wsgi service which is using rabbitmq to send heartbeats. >> > > So giving all of that, I just proposed change of the default value of that config option to be "False" again [3]. >> > > And my question is - would it be possible and acceptable to backport such change up to stable/wallaby (if and when it will be approved for master of course). IMO this could be useful for users as using this option set as "True" be default don't makes any sense for the non-wsgi applications really and may cause more bad then good things really. What are You opinions about it? >> > >> > This is tricky, in general the default value change should not be backported because it change >> > the default behavior and so does the compatibility. But along with considering the cases do not >> > work with the current default value (you mentioned in this email), we should consider if this worked >> > in any other case or not. If so then I think we should not backport this and tell operator to override >> > it to False as workaround for stable branch fixes. >> as afar as i am aware the only impact of setting the default to false >> for wsgi applications is >> running under mod_wsgi or uwsgi may have the heatbeat greenthread >> killed when the wsgi server susspand the application >> after a time out following the processing of an api request. >> >> there is no known negitive impact to this other then a log message >> that can safely be ignored on both rabbitmq and the api log relating >> to the amqp messing connection being closed and repopend. >> >> keeping the value at true can cause the nova compute agent, neutron >> agent and i susppoct nova conductor/schduler to hang following a >> rabbitmq disconnect. >> that can leave the relevnet service unresponcei until its restarted. >> >> so having the default set to true is known to breake several services >> but tehre are no know issue that are caused by setting it to false >> that impact the operation fo any service. >> >> so i have a stong preference for setting thsi to false by default on >> stable branches. >> > >> > -gmann >> > >> > > >> > > [1] https://review.opendev.org/c/openstack/oslo.messaging/+/747395 >> > > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 >> > > [3] https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ >> > > >> > > -- >> > > Slawek Kaplonski >> > > Principal Software Engineer >> > > Red Hat >> > >> >> From radoslaw.piliszek at gmail.com Mon Aug 8 10:40:42 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 8 Aug 2022 12:40:42 +0200 Subject: [kolla-ansible] why does not the bootstrap create the docker bridge? In-Reply-To: References: Message-ID: This is to avoid networking conflicts. The default Docker bridge takes over a certain IP address space. In Kolla Ansible we specifically use host networking to avoid surprising operators (and OpenStack daemons) with additional routing layers. -yoctozepto On Mon, 8 Aug 2022 at 00:23, wodel youchi wrote: > > Hi, > > The bootstrap installs docker but it does not create the docker bridge, why? > > Regards. From fungi at yuggoth.org Mon Aug 8 11:44:51 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 8 Aug 2022 11:44:51 +0000 Subject: [all] Automatic update of Python jobs in Zuul project configuration In-Reply-To: References: Message-ID: <20220808114451.2temslurr25ci6ex@yuggoth.org> On 2022-08-08 10:16:08 +0200 (+0200), Pierre Riteau wrote: > A patch submitted to python-blazarclient by yoctozepto [1] made me realise > the automatic change of Python jobs from yoga to zed missed various > repositories [2], mostly python clients. Is there a reason for this? At least some of those have simply never merged the changes proposed for them at the start of the cycle: https://review.opendev.org/q/topic:add-zed-python-jobtemplates+is:open > Also, one can easily find projects which are using even older job templates > (xena or earlier). [...] Ditto: https://review.opendev.org/q/topic:add-yoga-python-jobtemplates+is:open https://review.opendev.org/q/topic:add-xena-python-jobtemplates+is:open https://review.opendev.org/q/topic:add-wallaby-python-jobtemplates+is:open https://review.opendev.org/q/topic:add-victoria-python-jobtemplates+is:open -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From pierre at stackhpc.com Mon Aug 8 12:50:30 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 8 Aug 2022 14:50:30 +0200 Subject: [all] Automatic update of Python jobs in Zuul project configuration In-Reply-To: <20220808114451.2temslurr25ci6ex@yuggoth.org> References: <20220808114451.2temslurr25ci6ex@yuggoth.org> Message-ID: On Mon, 8 Aug 2022 at 13:55, Jeremy Stanley wrote: > On 2022-08-08 10:16:08 +0200 (+0200), Pierre Riteau wrote: > > A patch submitted to python-blazarclient by yoctozepto [1] made me > realise > > the automatic change of Python jobs from yoga to zed missed various > > repositories [2], mostly python clients. Is there a reason for this? > > At least some of those have simply never merged the changes proposed > for them at the start of the cycle: > > https://review.opendev.org/q/topic:add-zed-python-jobtemplates+is:open Of course, I should have checked for this as well. So two out of four blazar repositories (blazar-nova and python-blazarclient) have not received the automatic patch. If it's not clear why, I will raise it at the next cycle if it happens again and we can look at logs. > > Also, one can easily find projects which are using even older job > templates > > (xena or earlier). > [...] > > Ditto: > > https://review.opendev.org/q/topic:add-yoga-python-jobtemplates+is:open > https://review.opendev.org/q/topic:add-xena-python-jobtemplates+is:open > https://review.opendev.org/q/topic:add-wallaby-python-jobtemplates+is:open > https://review.opendev.org/q/topic:add-victoria-python-jobtemplates+is:open > I wasn't implying here that automatic patches had been missed, but just pointing out that some projects are trailing behind. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Mon Aug 8 13:07:32 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 8 Aug 2022 13:07:32 +0000 Subject: [all] Automatic update of Python jobs in Zuul project configuration In-Reply-To: References: <20220808114451.2temslurr25ci6ex@yuggoth.org> Message-ID: <20220808130731.4i452qf7rvohvayw@yuggoth.org> On 2022-08-08 14:50:30 +0200 (+0200), Pierre Riteau wrote: [...] > two out of four blazar repositories (blazar-nova and > python-blazarclient) have not received the automatic patch. If > it's not clear why, I will raise it at the next cycle if it > happens again and we can look at logs. [...] Agreed, we're discussing it currently in #openstack-release to see if we can understand what happened. Normally those changes are auto-proposed during branch creation, so those projects should have received proposals the moment the release requests to create stable/yoga branches in them merged. There is no indication those changes ever got proposed, and it's been too long now for us to still have CI logs from that timeframe, so we're having to do a bit of theorizing. In contrast, I can see that python-blazarclient got a similar change proposed to move to the yoga template when stable/xena was created, so that would seem to point to being a more recent issue. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gmann at ghanshyammann.com Mon Aug 8 14:15:23 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 08 Aug 2022 19:45:23 +0530 Subject: [all] Automatic update of Python jobs in Zuul project configuration In-Reply-To: <20220808130731.4i452qf7rvohvayw@yuggoth.org> References: <20220808114451.2temslurr25ci6ex@yuggoth.org> <20220808130731.4i452qf7rvohvayw@yuggoth.org> Message-ID: <1827dcf73d8.cd76d8b0501335.3791791131190669606@ghanshyammann.com> ---- On Mon, 08 Aug 2022 18:37:32 +0530 Jeremy Stanley wrote --- > On 2022-08-08 14:50:30 +0200 (+0200), Pierre Riteau wrote: > [...] > > two out of four blazar repositories (blazar-nova and > > python-blazarclient) have not received the automatic patch. If > > it's not clear why, I will raise it at the next cycle if it > > happens again and we can look at logs. > [...] > > Agreed, we're discussing it currently in #openstack-release to see > if we can understand what happened. Normally those changes are > auto-proposed during branch creation, so those projects should have > received proposals the moment the release requests to create > stable/yoga branches in them merged. There is no indication those > changes ever got proposed, and it's been too long now for us to > still have CI logs from that timeframe, so we're having to do a bit > of theorizing. > > In contrast, I can see that python-blazarclient got a similar change > proposed to move to the yoga template when stable/xena was created, > so that would seem to point to being a more recent issue. There were other cases also where template update have been missed or not corrct (for example independent release repo). That is why in Zed PTG, we agreed to remove these release specific template and auto generated patches to update the template name. From this cycle onwards, we will be doing these python version change in generic template at single place and do not need to update the python testing template name in every release. -gmann > -- > Jeremy Stanley > From fungi at yuggoth.org Mon Aug 8 14:23:22 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 8 Aug 2022 14:23:22 +0000 Subject: [all] Automatic update of Python jobs in Zuul project configuration In-Reply-To: <1827dcf73d8.cd76d8b0501335.3791791131190669606@ghanshyammann.com> References: <20220808114451.2temslurr25ci6ex@yuggoth.org> <20220808130731.4i452qf7rvohvayw@yuggoth.org> <1827dcf73d8.cd76d8b0501335.3791791131190669606@ghanshyammann.com> Message-ID: <20220808142322.mwv5unv6xcnvzfo7@yuggoth.org> On 2022-08-08 19:45:23 +0530 (+0530), Ghanshyam Mann wrote: [...] > There were other cases also where template update have been missed > or not corrct (for example independent release repo). That is why > in Zed PTG, we agreed to remove these release specific template > and auto generated patches to update the template name. From this > cycle onwards, we will be doing these python version change in > generic template at single place and do not need to update the > python testing template name in every release. Thanks for the reminder. Do you happen to know whether the release automation has been updated to accommodate this policy change? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gmann at ghanshyammann.com Mon Aug 8 14:58:16 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 08 Aug 2022 20:28:16 +0530 Subject: [all] Automatic update of Python jobs in Zuul project configuration In-Reply-To: <20220808142322.mwv5unv6xcnvzfo7@yuggoth.org> References: <20220808114451.2temslurr25ci6ex@yuggoth.org> <20220808130731.4i452qf7rvohvayw@yuggoth.org> <1827dcf73d8.cd76d8b0501335.3791791131190669606@ghanshyammann.com> <20220808142322.mwv5unv6xcnvzfo7@yuggoth.org> Message-ID: <1827df6b609.d4756961506875.4924950027350720065@ghanshyammann.com> ---- On Mon, 08 Aug 2022 19:53:22 +0530 Jeremy Stanley wrote --- > On 2022-08-08 19:45:23 +0530 (+0530), Ghanshyam Mann wrote: > [...] > > There were other cases also where template update have been missed > > or not corrct (for example independent release repo). That is why > > in Zed PTG, we agreed to remove these release specific template > > and auto generated patches to update the template name. From this > > cycle onwards, we will be doing these python version change in > > generic template at single place and do not need to update the > > python testing template name in every release. > > Thanks for the reminder. Do you happen to know whether the release > automation has been updated to accommodate this policy change? Not yet, I have todo to do that and rebase my job template patch. Plan is that use the same auto-generated way to update the release specific template to generic one this time and from next release onwards we will update only generic template with proper notification to projects to make sure that new python versions jobs are working fine for their repo. -gmann > -- > Jeremy Stanley > From zigo at debian.org Mon Aug 8 15:12:51 2022 From: zigo at debian.org (Thomas Goirand) Date: Mon, 8 Aug 2022 17:12:51 +0200 Subject: [announcement] [debian] Debian Buster with OpenStack Rocky will receive LTS support Message-ID: <17d2f2ea-f6a8-c2b0-7fae-98ef8d9766c2@debian.org> Hi! After some discussion with the folks taking care of the Debian LTS project, we decided that we will attempt to support OpenStack Rocky, running on Debian 10 (aka: Buster), for another 2 years, together with the rest of the distribution. In the past, we decided that we wouldn't do it, and that OpenStack would be part of the list of unsupported packages, because it was too difficult to maintain. However, it is my opinion that this has changed, because: - The codebase is evolving slower - I do have all the tooling to re-do a Rocky deployment if needed - There's a lot less CVE per year than it used to be (10 years ago, it was 1 every 2 weeks on average, now we have like 2 grave issue per year...). - I'm confident that if there's a grave issue, I'll be able to find help from the this wonderful community (it happened in the past...). Also, some sponsors of the Debian LTS project are actually running OpenStack Rocky on Debian. So it makes sense to help them. It is to be noted that I will not be the person doing the actual security backports, but I'll be there in case the person doing it needs help from me (for example, to do a deployment and manual regression testing). Hoping this will help OpenStack users running on Debian, Cheers, Thomas Goirand (zigo) From opensrloo at gmail.com Mon Aug 8 15:34:20 2022 From: opensrloo at gmail.com (Ruby Loo) Date: Mon, 8 Aug 2022 11:34:20 -0400 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: References: <21589662.aDxSllVl8Y@p1> <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> Message-ID: As one data point. We (yahoo) encountered this issue when we were trying out the wallaby release. It took a few days elapsed to figure out, because all the services were newly using wallaby with config settings that we weren't familiar with. (That's what happens when you're trying to upgrade from ocata...) It would be great for others, for this to 'just work'. (I don't recall now, if anyone mentioned it upstream or not. Apologies for that.) If no code fix is done, maybe update the release notes (for nova, neutron, ...) of the affected releases to indicate that one might want to change this setting (since we did read the release notes). Thanks, --ruby On Mon, Aug 8, 2022 at 6:45 AM Rados?aw Piliszek < radoslaw.piliszek at gmail.com> wrote: > Hi all, > > May this config option support "auto" by default and autodetect > whether the application is running under mod_wsgi (and uwsgi if it > also has the issue with green threads but here I'm not really sure...) > and then decide on the best option? > This way I would consider this backporting a fix (i.e. the library > tries better to work in the target environment). > > As a final thought, bear in mind there are operators who have already > overwritten the default, the deployment projects can help as well. > > -yoctozepto > > On Mon, 8 Aug 2022 at 10:30, Rodolfo Alonso Hernandez > wrote: > > > > Hello all: > > > > I understand that by default we don't allow backporting a config knob > default value. But I'm with Sean and his explanation. For "uwsgi" > applications, if pthread is False, the only drawback will be the > reconnection of the MQ socket. But in the case described by Slawek, the > problem is more relevant because once the agent has been disconnected for a > long time from the MQ, it is not possible to reconnect again and the agent > needs to be manually restarted. I would backport the patch setting this > config knob to False. > > > > Regards. > > > > > > On Sat, Aug 6, 2022 at 12:08 AM Sean Mooney wrote: > >> > >> On Fri, Aug 5, 2022 at 7:40 PM Ghanshyam Mann > wrote: > >> > > >> > ---- On Fri, 05 Aug 2022 17:54:25 +0530 Slawek Kaplonski wrote --- > >> > > Hi, > >> > > > >> > > Some time ago oslo.messaging changed default value of the > "heartbeat_in_pthread" config option to "True" [1]. > >> > > As was noticed some time ago, this don't works well with > nova-compute - see bug [2] for details. > >> > > Recently we noticed in our downstream Red Hat OpenStack, that it's > not only nova-compute which don't works well with it and can hangs. We saw > the same issue in various neutron agent processes. And it seems that it can > be the same for any non-wsgi service which is using rabbitmq to send > heartbeats. > >> > > So giving all of that, I just proposed change of the default value > of that config option to be "False" again [3]. > >> > > And my question is - would it be possible and acceptable to > backport such change up to stable/wallaby (if and when it will be approved > for master of course). IMO this could be useful for users as using this > option set as "True" be default don't makes any sense for the non-wsgi > applications really and may cause more bad then good things really. What > are You opinions about it? > >> > > >> > This is tricky, in general the default value change should not be > backported because it change > >> > the default behavior and so does the compatibility. But along with > considering the cases do not > >> > work with the current default value (you mentioned in this email), we > should consider if this worked > >> > in any other case or not. If so then I think we should not backport > this and tell operator to override > >> > it to False as workaround for stable branch fixes. > >> as afar as i am aware the only impact of setting the default to false > >> for wsgi applications is > >> running under mod_wsgi or uwsgi may have the heatbeat greenthread > >> killed when the wsgi server susspand the application > >> after a time out following the processing of an api request. > >> > >> there is no known negitive impact to this other then a log message > >> that can safely be ignored on both rabbitmq and the api log relating > >> to the amqp messing connection being closed and repopend. > >> > >> keeping the value at true can cause the nova compute agent, neutron > >> agent and i susppoct nova conductor/schduler to hang following a > >> rabbitmq disconnect. > >> that can leave the relevnet service unresponcei until its restarted. > >> > >> so having the default set to true is known to breake several services > >> but tehre are no know issue that are caused by setting it to false > >> that impact the operation fo any service. > >> > >> so i have a stong preference for setting thsi to false by default on > >> stable branches. > >> > > >> > -gmann > >> > > >> > > > >> > > [1] https://review.opendev.org/c/openstack/oslo.messaging/+/747395 > >> > > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 > >> > > [3] > https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ > >> > > > >> > > -- > >> > > Slawek Kaplonski > >> > > Principal Software Engineer > >> > > Red Hat > >> > > >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Mon Aug 8 15:37:05 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Mon, 8 Aug 2022 17:37:05 +0200 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: References: <21589662.aDxSllVl8Y@p1> <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> Message-ID: Hey At the very least in OpenStack-Ansible we already handle that case, and have overwritten heartbeat_in_pthread for non-UWSGI services, which is already in stable branches. So backporting this new default setting would make us revert this patch and apply a set of new ones for uWSGI which is kind of nasty thing to do on stable branches. IIRC (can be wrong here), kolla-ansible and TripleO also adopted such changes in their codebase. So with quite high probability, if you use any deployment tooling, this should be already handled relatively well. We also can post a release note to stable branches about "known issue" instead of backporting a new default. ??, 8 ???. 2022 ?. ? 12:46, Rados?aw Piliszek : > > Hi all, > > May this config option support "auto" by default and autodetect > whether the application is running under mod_wsgi (and uwsgi if it > also has the issue with green threads but here I'm not really sure...) > and then decide on the best option? > This way I would consider this backporting a fix (i.e. the library > tries better to work in the target environment). > > As a final thought, bear in mind there are operators who have already > overwritten the default, the deployment projects can help as well. > > -yoctozepto > > On Mon, 8 Aug 2022 at 10:30, Rodolfo Alonso Hernandez > wrote: > > > > Hello all: > > > > I understand that by default we don't allow backporting a config knob default value. But I'm with Sean and his explanation. For "uwsgi" applications, if pthread is False, the only drawback will be the reconnection of the MQ socket. But in the case described by Slawek, the problem is more relevant because once the agent has been disconnected for a long time from the MQ, it is not possible to reconnect again and the agent needs to be manually restarted. I would backport the patch setting this config knob to False. > > > > Regards. > > > > > > On Sat, Aug 6, 2022 at 12:08 AM Sean Mooney wrote: > >> > >> On Fri, Aug 5, 2022 at 7:40 PM Ghanshyam Mann wrote: > >> > > >> > ---- On Fri, 05 Aug 2022 17:54:25 +0530 Slawek Kaplonski wrote --- > >> > > Hi, > >> > > > >> > > Some time ago oslo.messaging changed default value of the "heartbeat_in_pthread" config option to "True" [1]. > >> > > As was noticed some time ago, this don't works well with nova-compute - see bug [2] for details. > >> > > Recently we noticed in our downstream Red Hat OpenStack, that it's not only nova-compute which don't works well with it and can hangs. We saw the same issue in various neutron agent processes. And it seems that it can be the same for any non-wsgi service which is using rabbitmq to send heartbeats. > >> > > So giving all of that, I just proposed change of the default value of that config option to be "False" again [3]. > >> > > And my question is - would it be possible and acceptable to backport such change up to stable/wallaby (if and when it will be approved for master of course). IMO this could be useful for users as using this option set as "True" be default don't makes any sense for the non-wsgi applications really and may cause more bad then good things really. What are You opinions about it? > >> > > >> > This is tricky, in general the default value change should not be backported because it change > >> > the default behavior and so does the compatibility. But along with considering the cases do not > >> > work with the current default value (you mentioned in this email), we should consider if this worked > >> > in any other case or not. If so then I think we should not backport this and tell operator to override > >> > it to False as workaround for stable branch fixes. > >> as afar as i am aware the only impact of setting the default to false > >> for wsgi applications is > >> running under mod_wsgi or uwsgi may have the heatbeat greenthread > >> killed when the wsgi server susspand the application > >> after a time out following the processing of an api request. > >> > >> there is no known negitive impact to this other then a log message > >> that can safely be ignored on both rabbitmq and the api log relating > >> to the amqp messing connection being closed and repopend. > >> > >> keeping the value at true can cause the nova compute agent, neutron > >> agent and i susppoct nova conductor/schduler to hang following a > >> rabbitmq disconnect. > >> that can leave the relevnet service unresponcei until its restarted. > >> > >> so having the default set to true is known to breake several services > >> but tehre are no know issue that are caused by setting it to false > >> that impact the operation fo any service. > >> > >> so i have a stong preference for setting thsi to false by default on > >> stable branches. > >> > > >> > -gmann > >> > > >> > > > >> > > [1] https://review.opendev.org/c/openstack/oslo.messaging/+/747395 > >> > > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 > >> > > [3] https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ > >> > > > >> > > -- > >> > > Slawek Kaplonski > >> > > Principal Software Engineer > >> > > Red Hat > >> > > >> > >> > From smooney at redhat.com Mon Aug 8 16:15:08 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 8 Aug 2022 17:15:08 +0100 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: References: <21589662.aDxSllVl8Y@p1> <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> Message-ID: On Mon, Aug 8, 2022 at 4:37 PM Dmitriy Rabotyagov wrote: > > Hey > > At the very least in OpenStack-Ansible we already handle that case, > and have overwritten heartbeat_in_pthread for non-UWSGI services, > which is already in stable branches. So backporting this new default > setting would make us revert this patch and apply a set of new ones > for uWSGI which is kind of nasty thing to do on stable branches. > > IIRC (can be wrong here), kolla-ansible and TripleO also adopted such > changes in their codebase. Tripleo is specificly broken by the current default in wallaby Slawek raised this question of backporting partly because we are trying to decied fi we need to backport this downstream only for our osp product or modify tripleo/puppet to override this. we would strongly prefer not to ship a different default in our product then upstream if we can avoid it but we likely cannot release with the current defaut without either changing this downstream or upstrema in ooo. > So with quite high probability, if you use > any deployment tooling, this should be already handled relatively > well. > > We also can post a release note to stable branches about "known issue" > instead of backporting a new default. > > ??, 8 ???. 2022 ?. ? 12:46, Rados?aw Piliszek : > > > > Hi all, > > > > May this config option support "auto" by default and autodetect > > whether the application is running under mod_wsgi (and uwsgi if it > > also has the issue with green threads but here I'm not really sure...) > > and then decide on the best option? > > This way I would consider this backporting a fix (i.e. the library > > tries better to work in the target environment). > > > > As a final thought, bear in mind there are operators who have already > > overwritten the default, the deployment projects can help as well. > > > > -yoctozepto > > > > On Mon, 8 Aug 2022 at 10:30, Rodolfo Alonso Hernandez > > wrote: > > > > > > Hello all: > > > > > > I understand that by default we don't allow backporting a config knob default value. But I'm with Sean and his explanation. For "uwsgi" applications, if pthread is False, the only drawback will be the reconnection of the MQ socket. But in the case described by Slawek, the problem is more relevant because once the agent has been disconnected for a long time from the MQ, it is not possible to reconnect again and the agent needs to be manually restarted. I would backport the patch setting this config knob to False. > > > > > > Regards. > > > > > > > > > On Sat, Aug 6, 2022 at 12:08 AM Sean Mooney wrote: > > >> > > >> On Fri, Aug 5, 2022 at 7:40 PM Ghanshyam Mann wrote: > > >> > > > >> > ---- On Fri, 05 Aug 2022 17:54:25 +0530 Slawek Kaplonski wrote --- > > >> > > Hi, > > >> > > > > >> > > Some time ago oslo.messaging changed default value of the "heartbeat_in_pthread" config option to "True" [1]. > > >> > > As was noticed some time ago, this don't works well with nova-compute - see bug [2] for details. > > >> > > Recently we noticed in our downstream Red Hat OpenStack, that it's not only nova-compute which don't works well with it and can hangs. We saw the same issue in various neutron agent processes. And it seems that it can be the same for any non-wsgi service which is using rabbitmq to send heartbeats. > > >> > > So giving all of that, I just proposed change of the default value of that config option to be "False" again [3]. > > >> > > And my question is - would it be possible and acceptable to backport such change up to stable/wallaby (if and when it will be approved for master of course). IMO this could be useful for users as using this option set as "True" be default don't makes any sense for the non-wsgi applications really and may cause more bad then good things really. What are You opinions about it? > > >> > > > >> > This is tricky, in general the default value change should not be backported because it change > > >> > the default behavior and so does the compatibility. But along with considering the cases do not > > >> > work with the current default value (you mentioned in this email), we should consider if this worked > > >> > in any other case or not. If so then I think we should not backport this and tell operator to override > > >> > it to False as workaround for stable branch fixes. > > >> as afar as i am aware the only impact of setting the default to false > > >> for wsgi applications is > > >> running under mod_wsgi or uwsgi may have the heatbeat greenthread > > >> killed when the wsgi server susspand the application > > >> after a time out following the processing of an api request. > > >> > > >> there is no known negitive impact to this other then a log message > > >> that can safely be ignored on both rabbitmq and the api log relating > > >> to the amqp messing connection being closed and repopend. > > >> > > >> keeping the value at true can cause the nova compute agent, neutron > > >> agent and i susppoct nova conductor/schduler to hang following a > > >> rabbitmq disconnect. > > >> that can leave the relevnet service unresponcei until its restarted. > > >> > > >> so having the default set to true is known to breake several services > > >> but tehre are no know issue that are caused by setting it to false > > >> that impact the operation fo any service. > > >> > > >> so i have a stong preference for setting thsi to false by default on > > >> stable branches. > > >> > > > >> > -gmann > > >> > > > >> > > > > >> > > [1] https://review.opendev.org/c/openstack/oslo.messaging/+/747395 > > >> > > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 > > >> > > [3] https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ > > >> > > > > >> > > -- > > >> > > Slawek Kaplonski > > >> > > Principal Software Engineer > > >> > > Red Hat > > >> > > > >> > > >> > > > From gmann at ghanshyammann.com Mon Aug 8 16:44:45 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 08 Aug 2022 22:14:45 +0530 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Aug 11 at 1500 UTC Message-ID: <1827e583106.105f647cf518957.4828420962111334675@ghanshyammann.com> Hello Everyone, The technical Committee's next weekly meeting is scheduled for 2022 Aug 11, at 1500 UTC. If you would like to add topics for discussion, please add them to the below wiki page by Wednesday, Aug 10 at 2100 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting -gmann From alex.kavanagh at canonical.com Mon Aug 8 20:14:23 2022 From: alex.kavanagh at canonical.com (Alex Kavanagh) Date: Mon, 8 Aug 2022 21:14:23 +0100 Subject: [charms] Team Delegation proposal In-Reply-To: References: Message-ID: Hi Chris On Thu, 28 Jul 2022 at 21:46, Chris MacNaughton < chris.macnaughton at canonical.com> wrote: > Hello All, > > > I would like to propose some new ACLs in Gerrit for the openstack-charms > project: > > - openstack-core-charms > - ceph-charms > - network-charms > - stable-maintenance > > I think the names need to be tweaked slightly: - charms-openstack - charms-ceph - charms-ovn - charms-maintenance This is to keep it inline/similar to the other charms-* groups. Back to the email proper: > > With an increasing focus split among the openstack-charmers team, I'm > observing that people are focused on more specific subsets of the charms, > and would like to propose that new ACLs are created to allow us to > recognize that officially. I've chosen the breakdown above as it aligns > neatly with where the focus lines are at this point, letting developers > work on their specific focus areas. > Whilst this is a reasonable idea, I'll admit to being slightly worried that it will solidify the lines between the teams; but perhaps that's what's going to happen anway? > > This proposal would not reduce permissions for anybody who is currently a > core on the openstack-charms project and, in fact, future subteam core > members could aspire to full openstack-charmers core as well. Ideally, this > approach will let us escalate developers to "core" developers for the > subteam(s) where they have demonstrated the expertise we expect in a > core-charmer. It also allows a more gradual escalation to being a core in > the openstack-charms project, making it a progression rather than a single > destination. > Not a bad idea. > > As a related addition, I'm appending to this proposal the creation of a > stable-maintenance ACL which would allow members to manage backports > without a full core-charmer grant. > I guess with this one we'd have to be careful when assigning. Landing things in stable releases is basically a measure of how well we, as a team, manage regressions and how much work that creates in terms of managing our very large stable charm set. I'd be keen to hold this one back if we can. So why trial it with `charms-ceph`? Cheers Alex --- > --- > Chris MacNaughton > -- Alex Kavanagh - Software Engineer OpenStack Engineering - Data Centre Development - Canonical Ltd -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.rydberg at cleura.com Tue Aug 9 07:09:41 2022 From: tobias.rydberg at cleura.com (Tobias Rydberg) Date: Tue, 9 Aug 2022 09:09:41 +0200 Subject: [publiccloud-sig] A new start for Public Cloud SIG In-Reply-To: References: Message-ID: <1bfd030b-2ad7-dd55-4daa-786080f13fe7@cleura.com> Hi all, On small change, the meeting will take place in the #openstack-operators channel, since the publiccloud channel isn't registered. Talk to you all in Wednesday! BR, Tobias On 2022-07-07 09:54, Tobias Rydberg wrote: > Hi everyone, > > In Berlin it became clear that there is a big interest in restarting > the Public Cloud SIG. Thank you all for your contributions in that > forum session [0] and the interest in participating in the work of > this SIG. > > A lot of good ideas of what we should focus on was identified, with a > clear focus of interoperability and standardization, to make the > experience of using OpenStack as an end-user even better. > Standardization of images and flavors? - naming, metadata etc - being > one of them, working closely with InterOp WG regarding the checks and > governance of the OpenStack Powered Program another. The ultimate goal > could be to reach a state where it is possible to start to federate > between the public clouds, but for that to be possible on a more > global scale we need to start with aligning the simple things. > > To kick this off, we will start with bi-weekly IRC meetings again, > shape the goals kick of some work towards identified goals. Since we > have an IRC channel (#openstck-publiccloud) my suggestion is that we > will start there. Let's decide on suggestions for day and time for our > bi-weekly meetings during the kick off meeting. > > Kick off meeting > =========== > When: Wednesday 10th of August at 1400 UTC > Where: IRC in channel #openstack-publiccloud > > > I created an etherpad for our first meeting [1], feel free to add > items to the agenda or other suggestions on goals etc that you might > have prior to the meeting. > > Hope to see at IRC in August! > > [0] https://etherpad.opendev.org/p/berlin-summit-future-public-cloud-sig > [1] https://etherpad.opendev.org/p/publiccloud-sig-kickoff > > BR, > Tobias Rydberg > -- *Tobias Rydberg* Solution Architect Email: tobias.rydberg at cleura.com cleura.com ? The European cloud! Read our press release ? City Network becomes Cleura -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3626 bytes Desc: S/MIME Cryptographic Signature URL: From zigo at debian.org Tue Aug 9 07:41:14 2022 From: zigo at debian.org (Thomas Goirand) Date: Tue, 9 Aug 2022 09:41:14 +0200 Subject: [publiccloud-sig] A new start for Public Cloud SIG In-Reply-To: References: Message-ID: On 7/7/22 09:54, Tobias Rydberg wrote: > Hi everyone, > > In Berlin it became clear that there is a big interest in restarting the > Public Cloud SIG. Thank you all for your contributions in that forum > session [0] and the interest in participating in the work of this SIG. > > A lot of good ideas of what we should focus on was identified, with a > clear focus of interoperability and standardization, to make the > experience of using OpenStack as an end-user even better. > Standardization of images and flavors? - naming, metadata etc - being > one of them, working closely with InterOp WG regarding the checks and > governance of the OpenStack Powered Program another. The ultimate goal > could be to reach a state where it is possible to start to federate > between the public clouds, but for that to be possible on a more global > scale we need to start with aligning the simple things. > > To kick this off, we will start with bi-weekly IRC meetings again, shape > the goals kick of some work towards identified goals. Since we have an > IRC channel (#openstck-publiccloud) Typo: #openstack-publiccloud > my suggestion is that we will start > there. Let's decide on suggestions for day and time for our bi-weekly > meetings during the kick off meeting. > > Kick off meeting > =========== > When: Wednesday 10th of August at 1400 UTC > Where: IRC in channel #openstack-publiccloud Noted. I hope to be there with my colleague Kevin. Thanks Tobias for taking care of organizing this. Cheers, Thomas Goirand (zigo) From tkajinam at redhat.com Tue Aug 9 08:31:34 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Tue, 9 Aug 2022 17:31:34 +0900 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: References: <21589662.aDxSllVl8Y@p1> <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> Message-ID: I tend to approve the backport as an exception, based on the following points. - The old default has been used for a long time and it has been proven to be stable. It changes behavior but it does not require any change in the other components (No change is required in rabbitmq, for example). - The reason we made that switch was to just get rid of "noisy" heartbeat warning, which did not affect actual functionality, and I don't expect any risks with restoring these warning logs. - On the other hand, we've learned the current default causes broken functionality of non-wsgi services, which has a huge impact based on our current architecture. The issue was already confirmed by multiple organizations. What is worse, debugging the issue is quite difficult - Earlier it was suggested that the users should configure the parameter according to the process architecture, but it's not quite easy to determine the proper setup unless you have basic understanding about OpenStack architecture. Also, not all deployment toolings support setting options per service (Neither Puppet OpenStack or TripleO supports it now). Using the more "safe" default would be much beneficial for users/operators. In the meantime I'd also look into the way to override the option in deployment toolings, with my hat as Puppet OpenStack Core and TripleO core on, but backporting the change is something worth justifying IMHO. On Tue, Aug 9, 2022 at 1:30 AM Sean Mooney wrote: > On Mon, Aug 8, 2022 at 4:37 PM Dmitriy Rabotyagov > wrote: > > > > Hey > > > > At the very least in OpenStack-Ansible we already handle that case, > > and have overwritten heartbeat_in_pthread for non-UWSGI services, > > which is already in stable branches. So backporting this new default > > setting would make us revert this patch and apply a set of new ones > > for uWSGI which is kind of nasty thing to do on stable branches. > > > > IIRC (can be wrong here), kolla-ansible and TripleO also adopted such > > changes in their codebase. > Tripleo is specificly broken by the current default in wallaby > Slawek raised this question of backporting partly because we are > trying to decied fi we > need to backport this downstream only for our osp product or modify > tripleo/puppet to override > this. > > we would strongly prefer not to ship a different default in our > product then upstream if we can avoid it > but we likely cannot release with the current defaut without either > changing this downstream or upstrema in ooo. > > > So with quite high probability, if you use > > any deployment tooling, this should be already handled relatively > > well. > > > > We also can post a release note to stable branches about "known issue" > > instead of backporting a new default. > > > > ??, 8 ???. 2022 ?. ? 12:46, Rados?aw Piliszek < > radoslaw.piliszek at gmail.com>: > > > > > > Hi all, > > > > > > May this config option support "auto" by default and autodetect > > > whether the application is running under mod_wsgi (and uwsgi if it > > > also has the issue with green threads but here I'm not really sure...) > > > and then decide on the best option? > > > This way I would consider this backporting a fix (i.e. the library > > > tries better to work in the target environment). > > > > > > As a final thought, bear in mind there are operators who have already > > > overwritten the default, the deployment projects can help as well. > > > > > > -yoctozepto > > > > > > On Mon, 8 Aug 2022 at 10:30, Rodolfo Alonso Hernandez > > > wrote: > > > > > > > > Hello all: > > > > > > > > I understand that by default we don't allow backporting a config > knob default value. But I'm with Sean and his explanation. For "uwsgi" > applications, if pthread is False, the only drawback will be the > reconnection of the MQ socket. But in the case described by Slawek, the > problem is more relevant because once the agent has been disconnected for a > long time from the MQ, it is not possible to reconnect again and the agent > needs to be manually restarted. I would backport the patch setting this > config knob to False. > > > > > > > > Regards. > > > > > > > > > > > > On Sat, Aug 6, 2022 at 12:08 AM Sean Mooney > wrote: > > > >> > > > >> On Fri, Aug 5, 2022 at 7:40 PM Ghanshyam Mann < > gmann at ghanshyammann.com> wrote: > > > >> > > > > >> > ---- On Fri, 05 Aug 2022 17:54:25 +0530 Slawek Kaplonski wrote > --- > > > >> > > Hi, > > > >> > > > > > >> > > Some time ago oslo.messaging changed default value of the > "heartbeat_in_pthread" config option to "True" [1]. > > > >> > > As was noticed some time ago, this don't works well with > nova-compute - see bug [2] for details. > > > >> > > Recently we noticed in our downstream Red Hat OpenStack, that > it's not only nova-compute which don't works well with it and can hangs. We > saw the same issue in various neutron agent processes. And it seems that it > can be the same for any non-wsgi service which is using rabbitmq to send > heartbeats. > > > >> > > So giving all of that, I just proposed change of the default > value of that config option to be "False" again [3]. > > > >> > > And my question is - would it be possible and acceptable to > backport such change up to stable/wallaby (if and when it will be approved > for master of course). IMO this could be useful for users as using this > option set as "True" be default don't makes any sense for the non-wsgi > applications really and may cause more bad then good things really. What > are You opinions about it? > > > >> > > > > >> > This is tricky, in general the default value change should not be > backported because it change > > > >> > the default behavior and so does the compatibility. But along > with considering the cases do not > > > >> > work with the current default value (you mentioned in this > email), we should consider if this worked > > > >> > in any other case or not. If so then I think we should not > backport this and tell operator to override > > > >> > it to False as workaround for stable branch fixes. > > > >> as afar as i am aware the only impact of setting the default to > false > > > >> for wsgi applications is > > > >> running under mod_wsgi or uwsgi may have the heatbeat greenthread > > > >> killed when the wsgi server susspand the application > > > >> after a time out following the processing of an api request. > > > >> > > > >> there is no known negitive impact to this other then a log message > > > >> that can safely be ignored on both rabbitmq and the api log relating > > > >> to the amqp messing connection being closed and repopend. > > > >> > > > >> keeping the value at true can cause the nova compute agent, neutron > > > >> agent and i susppoct nova conductor/schduler to hang following a > > > >> rabbitmq disconnect. > > > >> that can leave the relevnet service unresponcei until its restarted. > > > >> > > > >> so having the default set to true is known to breake several > services > > > >> but tehre are no know issue that are caused by setting it to false > > > >> that impact the operation fo any service. > > > >> > > > >> so i have a stong preference for setting thsi to false by default on > > > >> stable branches. > > > >> > > > > >> > -gmann > > > >> > > > > >> > > > > > >> > > [1] > https://review.opendev.org/c/openstack/oslo.messaging/+/747395 > > > >> > > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 > > > >> > > [3] > https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ > > > >> > > > > > >> > > -- > > > >> > > Slawek Kaplonski > > > >> > > Principal Software Engineer > > > >> > > Red Hat > > > >> > > > > >> > > > >> > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gibi at redhat.com Tue Aug 9 12:27:34 2022 From: gibi at redhat.com (Balazs Gibizer) Date: Tue, 09 Aug 2022 14:27:34 +0200 Subject: [nova] Review guide for PCI tracking for Placement patches In-Reply-To: <4BIUER.4Z8ORYASOYO42@redhat.com> References: <4BIUER.4Z8ORYASOYO42@redhat.com> Message-ID: Hi, Top posting as I wanted to give an update on implementation progress of the feature. As before I have one main patch series and one additional side track providing improvements to PCI DeviceSpec handling that is not mandatory for the feature itself. The main series starts at [5] but it now has a bug fix dependency [6] pulled before it. Now the main series is in a mergable state as it contains the complete PCI inventory handling for the feature and this logic can be enabled independently from the, yet to be written, scheduling part. 1833394042 Allow enabling PCI tracking in Placement <-- inventory reporting can be enabled by nova.conf 1eabfde2a1 Handle PCI dev reconf with allocations <-- allocation healing works 74dc70ad04 Heal PCI allocation during resize e1af40959a Heal missing simple PCI allocation in the resource tracker a520649516 Retry /reshape at provider generation conflict f7e1ed838f Move provider_tree RP creation to PciResourceProvider <-- inventory reporting works 2d08e28eb3 Stop if tracking is disable after it was enabled before 742bc26da0 Support [pci]device_spec reconfiguration e445964d59 Reject devname based device_spec config b796b56622 Ignore PCI devs with physical_network tag 81ba9cf1bf Reject mixed VF rc and trait config 734fa580c3 Reject PCI dependent device config fd725ce577 Extend device_spec with resource_class and traits 5f4128b188 Basics for PCI Placement reporting a588df760f Rename whitelist in tests <-- this is where the side track branches out 646e1e69be Rename exception.PciConfigInvalidWhitelist to PciConfigInvalidSpec d26ff3b695 Rename [pci]passthrough_whitelist to device_spec d275c20bca Add compute restart capability for libvirt func tests 5b3e6c1146 Poison /sys access via various calls in test <-- main series starts 575c15df7a Update RequestSpec.pci_request for resize <-- bugfix for 1983753 7b0a1e2b30 Reproducer for bug 1983753 The side track starts at [7] at the middle of the main series. 983dfe69d6 Move __str__ to the PciAddressSpec base class bc24686626 Fix type annotation of pci.Whitelist class 6941757d06 Remove unused PF checking from get_function_by_ifname 6c7903c11c Clean up mapping input to address spec types 30d7c1eadf Remove dead code from PhysicalPciAddress af649c184b Fix PciAddressSpec descendants to call super.__init__ 238a6174e8 Unparent PciDeviceSpec from PciAddressSpec 6836dba493 Extra tests for remote managed dev spec 5d85ec7829 Add more test coverage for devname base dev spec a588df760f Rename whitelist in tests <-- this is the common base with the main series Next I will continue with the last part, the scheduling side, of the feature. Any feedback is highly appreciated. Cheers, gibi [1] https://specs.openstack.org/openstack/nova-specs/specs/zed/approved/pci-device-tracking-in-placement.html [2] https://review.opendev.org/c/openstack/nova-specs/+/791047 [3] https://review.opendev.org/q/topic:bp/pci-device-tracking-in-placement [4] https://review.opendev.org/q/topic:bp/pci-device-spec-cleanup [5] https://review.opendev.org/c/openstack/nova/+/844627/ [6] https://review.opendev.org/c/openstack/nova/+/852296 [7] https://review.opendev.org/c/openstack/nova/+/844625 From chris.macnaughton at canonical.com Tue Aug 9 13:28:59 2022 From: chris.macnaughton at canonical.com (Chris MacNaughton) Date: Tue, 9 Aug 2022 09:28:59 -0400 Subject: [charms] Nominate Luciano Giudice for charms-ceph core Message-ID: <5719a7c8-7739-4acd-2d3a-422615d23af1@canonical.com> Hello all, I'd like to propose Luciano as a new Ceph charms core team member. He has contributed quality changes over the last year, and has been providing quality reviews for the Ceph charms. patches: https://review.opendev.org/q/owner:luciano.logiudice%2540canonical.com reviews: https://review.opendev.org/q/reviewedby:luciano.logiudice%2540canonical.com I hope you will join me in supporting Luciano. Chris MacNaughton -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From noonedeadpunk at gmail.com Tue Aug 9 15:52:54 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Tue, 9 Aug 2022 17:52:54 +0200 Subject: [openstack-ansible] Meeting on August 16 is cancelled Message-ID: Hi everyone, Since several core participants won't be able to present at the meeting it was decided to cancel the Team meeting on 16th of August 2022. From tony at bakeyournoodle.com Tue Aug 9 15:57:41 2022 From: tony at bakeyournoodle.com (Tony Breeds) Date: Tue, 9 Aug 2022 10:57:41 -0500 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: References: <21589662.aDxSllVl8Y@p1> <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> Message-ID: Hi All, As others have said this isn't something we do without consideration. My feel from this thread is that the risks are somewhat low and understood. I think we've had a discussion and it's okay. I think it's okay to do this backport. We should obviously include a release note that calls this out and hopefully there is room in the semver to make this a minor update (as opposed to a patch) to also "lag that this *may* not be Tony. From fungi at yuggoth.org Tue Aug 9 16:41:59 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 9 Aug 2022 16:41:59 +0000 Subject: [all][TC] Bare rechecks stats week of 25.07 In-Reply-To: <20220728151534.y336vusncblmmlqz@yuggoth.org> References: <15225123.PIt3FUKRBJ@p1> <20220728151534.y336vusncblmmlqz@yuggoth.org> Message-ID: <20220809164159.gwstamhe3wxuyi4g@yuggoth.org> On 2022-07-28 15:15:34 +0000 (+0000), Jeremy Stanley wrote: [...] > follow the word "recheck" with a space before you add any other > text. I think Zuul must assume a required space after the regular > expression since that doesn't seem to be encoded in the regex you > see there. [...] Just to follow up, I misinterpreted the results of an earlier experiment when trying to confirm this. Based on revisiting it today with https://review.opendev.org/852605 it looks like you don't need a space between "recheck" and other text, so "recheck: whatever" also works fine. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From swogatpradhan22 at gmail.com Tue Aug 9 17:14:29 2022 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Tue, 9 Aug 2022 22:44:29 +0530 Subject: Security vulnerabilities in Horizon dashboard | Openstack Wallaby | Tripleo | Openstack Horizon In-Reply-To: References: Message-ID: Hi, Any ideas? On Mon, Aug 1, 2022 at 12:29 PM Swogat Pradhan wrote: > Hi, > I am setting up an openstack wallaby cloud for a client using tripleo. > After setting everything up the client ran a WEB scan and found some > vulnerabilities (attached snapshot for reference). > > Can you please guide me on how to fix these vulnerabilities in the > dashboard service? > > With regards, > Swogat pradhan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Aug 9 17:56:01 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 9 Aug 2022 17:56:01 +0000 Subject: [horizon][security-sig][tripleo] Security vulnerabilities in Horizon dashboard | Openstack Wallaby In-Reply-To: References: Message-ID: <20220809175600.ztmlg6uanxtm5xam@yuggoth.org> On 2022-08-09 22:44:29 +0530 (+0530), Swogat Pradhan wrote: > Any ideas? [...] Hopefully you saw my earlier reply, but if not I'll link to the archived copy: https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029796.html -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From cjeanner at redhat.com Wed Aug 10 05:54:53 2022 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Wed, 10 Aug 2022 07:54:53 +0200 Subject: Correct way to add firewall rules in tripleo | Wallaby In-Reply-To: References: Message-ID: <5416d922-b90a-40a4-99da-96a6b2f51dbc@redhat.com> Hello there, I think the "action" keyword is wrong, it should actually be "jump". As stated in the error message, "action" should be insert/append - the drop/accept are actually "jump" values. I'll push a patch against the doc shortly to update that. Cheers, C. On 7/20/22 19:37, Swogat Pradhan wrote: > Hi, > I am trying to add a rule for zabbix in my tripleo wallaby setup on top > of centos 8 stream. > i followed > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/security_hardening.html > > > > but got the error message: > > ?[ERROR]: Failed, module return: {'msg': 'value of action must be one of: > append, insert, got: accept', 'failed': True, 'invocation': {'module_args': > {'state': 'present', 'action': 'accept', 'jump': 'ACCEPT', 'chain': 'INPUT', > 'protocol': 'tcp', 'source': '172.25.161.50', 'ctstate': ['NEW'], > 'ip_version': > 'ipv4', 'comment': '301 allow zabbix ipv4', 'destination_port': '10050', > 'table': 'filter', 'match': [], 'syn': 'ignore', 'flush': False}}, > 'warnings': > ["The value 10050 (type int) in a string field was converted to '10050' > (type > string). If this does not look like what you expect, quote the entire > value to > ensure it does not change."], '_ansible_parsed': True} > ?[ERROR]: Failed, return data: {'stdout': None, 'stderr': None, 'msg': > 'value > of action must be one of: append, insert, got: accept', 'cmd': None, > 'rc': 0, > 'failed': True} > 2022-07-21 01:27:33.335477 | 48d539a1-1679-1e80-25fd-000000005aa1 | > ? TASK | Manage firewall rules > 2022-07-21 01:27:33.351515 | 48d539a1-1679-1e80-25fd-000000005542 | > ?FATAL | Manage firewall rules | overcloud-controller-0 | > error={"changed": false, "cmd": null, "msg": "value of action must be > one of: append, insert, got: accept", "rc": 0, "stderr": null, "stdout": > null} > > > When i tried the following link: > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/15/html/security_and_hardening_guide/using_director_to_configure_security_hardening > > my script is running fine but rules are not updated in iptables for zabbix. > > Can you please suggest a correct approach to open port 10050 in tripleo? > > With regards, > Swogat Pradhan -- C?dric Jeanneret (He/Him/His) Sr. Software Engineer - OpenStack Platform Deployment Framework TC Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 840 bytes Desc: OpenPGP digital signature URL: From kkchn.in at gmail.com Wed Aug 10 06:03:27 2022 From: kkchn.in at gmail.com (KK CHN) Date: Wed, 10 Aug 2022 11:33:27 +0530 Subject: Openstack Graphical processors virtualisation Message-ID: 1. Does Openstack support GPU virtualization ? We are running a cloud using ussuri with KVM hypervisors. Is creation of vGPUs possible? Does KVM support GPU virtualization or any limitations? 2. What are the supported GPU types? (GRID GPU or GPUs attached to physical blades ) 3. If it supports GPU attached to physical blades, will live migration of the VM be supported. Will OpenStack be able to identify the next host which has an attached GPU and perform live migration, in case of the Failure of One Blade with attached GPU with resident VM. 4. What are the virtualization options if vGPU options are not supported by a particular GPU model? Thanks in advance, Krish -------------- next part -------------- An HTML attachment was scrubbed... URL: From alsotoes at gmail.com Wed Aug 10 06:41:41 2022 From: alsotoes at gmail.com (Alvaro Soto) Date: Wed, 10 Aug 2022 01:41:41 -0500 Subject: Openstack Graphical processors virtualisation In-Reply-To: References: Message-ID: Hey, just a little help with one question. 1- https://docs.openstack.org/nova/yoga/admin/virtual-gpu.html https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/instances_and_images_guide/ch-virtual_gpu On Wed, Aug 10, 2022 at 1:17 AM KK CHN wrote: > 1. Does Openstack support GPU virtualization ? We are running a cloud > using ussuri with KVM hypervisors. Is creation of vGPUs possible? > > Does KVM support GPU virtualization or any limitations? > > 2. What are the supported GPU types? (GRID GPU or GPUs attached to > physical blades ) > > 3. If it supports GPU attached to physical blades, will live migration of > the VM be supported. Will OpenStack be able to identify the next host which > has an attached GPU and perform live migration, in case of the Failure of > One Blade with attached GPU with resident VM. > > 4. What are the virtualization options if vGPU options are not supported > by a particular GPU model? > > > Thanks in advance, > Krish > -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yasufum.o at gmail.com Wed Aug 10 09:05:23 2022 From: yasufum.o at gmail.com (Yasufumi Ogawa) Date: Wed, 10 Aug 2022 18:05:23 +0900 Subject: [tacker] IRC meeting on Aug 16 cancelled Message-ID: <3f1364ea-b658-5fcb-88b3-89b89daf7f68@gmail.com> Hi team, Since many of us are going to have a vacation next week, I'll cancel the meeting on Aug 16th. Thanks, Yasufumi From smooney at redhat.com Wed Aug 10 10:23:20 2022 From: smooney at redhat.com (Sean Mooney) Date: Wed, 10 Aug 2022 11:23:20 +0100 Subject: Openstack Graphical processors virtualisation In-Reply-To: References: Message-ID: our main vgpu expert is on PTO for a few weeks but ill try and respond inline. On Wed, Aug 10, 2022 at 8:07 AM Alvaro Soto wrote: > > Hey, just a little help with one question. > > 1- > https://docs.openstack.org/nova/yoga/admin/virtual-gpu.html > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/instances_and_images_guide/ch-virtual_gpu yep thos are the docs for nova's vGPU support we also support pci passthough of a full gpu to a vm for usecause that need that. > > On Wed, Aug 10, 2022 at 1:17 AM KK CHN wrote: >> >> 1. Does Openstack support GPU virtualization ? We are running a cloud using ussuri with KVM hypervisors. Is creation of vGPUs possible? yes if you have an nvidia gpu and pay for there licnese server ectra you can config nova to expose those vgpus to the guests. if you have amd gpus and those support there mxgpu feature then that just exposes the vGPUs as stanard sriov VFs instead of vfio-mediated devices so you use pci passthough instead of the vGPU mdev feature in that case to consume them. >> >> Does KVM support GPU virtualization or any limitations? yes however nvidia have not yet upstream support for vgpu/mdev live migration. redhat nvidia and others are currently activly working on upstreamoing that to the kernel qemu and libvirt but that is the main limiation. depending on your release nova also does not support move operations like cold migration however those have been added in more recent releases. >> >> 2. What are the supported GPU types? (GRID GPU or GPUs attached to physical blades ) both >> >> 3. If it supports GPU attached to physical blades, will live migration of the VM be supported.Will OpenStack be able to identify the next host which has an attached GPU and perform live migration, in case of the Failure of One Blade with attached GPU with resident VM. no live migration is not possibel with mdev or sriov vf attachment currently. cold migration and evacuate are supported in more recent releases of openstack in the case of maintance or hardware failure. >> >> 4. What are the virtualization options if vGPU options are not supported by a particular GPU model? pci passthough of the fulll gpu or sriov if the card supprots that. >> >> >> Thanks in advance, >> Krish > > > > -- > > Alvaro Soto > > Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. > ---------------------------------------------------------- > Great people talk about ideas, > ordinary people talk about things, > small people talk... about other people. From michal.arbet at ultimum.io Wed Aug 10 10:28:05 2022 From: michal.arbet at ultimum.io (Michal Arbet) Date: Wed, 10 Aug 2022 12:28:05 +0200 Subject: Need help on rabbitmq In-Reply-To: References: Message-ID: Hi, Do you mean oslo.messaging ? Or ? Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Po???? 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet at ultimum.io *https://ultimum.io * LinkedIn | Twitter | Facebook ?t 26. 7. 2022 v 15:39 odes?latel Satish Patel napsal: > It?s hard to guess without release and version info. I had issue like that > which fixed by upgrade of ampq library in wallaby release. > > Sent from my iPhone > > On Jul 26, 2022, at 9:05 AM, Michal Arbet wrote: > > ? > Hi, > > We also had issues with rabbitmq and heartbeats, did you investigate if > this is bug ? Or was it regular issue in your case ? > > Thanks > Michal Arbet > Openstack Engineer > > Ultimum Technologies a.s. > Na Po???? 1047/26, 11000 Praha 1 > Czech Republic > > +420 604 228 897 > michal.arbet at ultimum.io > *https://ultimum.io * > > LinkedIn | Twitter > | Facebook > > > > ?t 21. 6. 2022 v 9:57 odes?latel AJ_ sunny napsal: > >> Hi team >> >> I am using kolla-ansible based openstack infra and getting below error in >> logs seems frequent rabbitmq disconnections >> >> <0.5023.16> closing AMQP connection <0.5023.16> (10.80.0.13:40356 -> >> 10.80.0.13:5672 - mod_wsgi:19:d3196668-57e5-46dc-8b69-78d73b5873a0): >> missed heartbeats from client, timeout: 60s >> AMQP server on 10.80.0.13:5672 is unreachable: [Errno 104] Connection >> reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno >> 104] Connection reset by peer >> >> Is this bug or any resolution for fixing this issue? >> >> >> Thanks >> Arihant Jain >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Aug 10 11:05:57 2022 From: smooney at redhat.com (Sean Mooney) Date: Wed, 10 Aug 2022 12:05:57 +0100 Subject: Need help on rabbitmq In-Reply-To: References: Message-ID: this is not a bug its expected behavior. you can and should ignore it On Wed, Aug 10, 2022 at 11:50 AM Michal Arbet wrote: > Hi, > > Do you mean oslo.messaging ? Or ? > > > Michal Arbet > Openstack Engineer > > Ultimum Technologies a.s. > Na Po???? 1047/26, 11000 Praha 1 > Czech Republic > > +420 604 228 897 > michal.arbet at ultimum.io > *https://ultimum.io * > > LinkedIn | Twitter > | Facebook > > > > ?t 26. 7. 2022 v 15:39 odes?latel Satish Patel > napsal: > >> It?s hard to guess without release and version info. I had issue like >> that which fixed by upgrade of ampq library in wallaby release. >> >> Sent from my iPhone >> >> On Jul 26, 2022, at 9:05 AM, Michal Arbet >> wrote: >> >> ? >> Hi, >> >> We also had issues with rabbitmq and heartbeats, did you investigate if >> this is bug ? Or was it regular issue in your case ? >> >> Thanks >> Michal Arbet >> Openstack Engineer >> >> Ultimum Technologies a.s. >> Na Po???? 1047/26, 11000 Praha 1 >> Czech Republic >> >> +420 604 228 897 >> michal.arbet at ultimum.io >> *https://ultimum.io * >> >> LinkedIn | >> Twitter | Facebook >> >> >> >> ?t 21. 6. 2022 v 9:57 odes?latel AJ_ sunny napsal: >> >>> Hi team >>> >>> I am using kolla-ansible based openstack infra and getting below error >>> in logs seems frequent rabbitmq disconnections >>> >>> <0.5023.16> closing AMQP connection <0.5023.16> (10.80.0.13:40356 -> >>> 10.80.0.13:5672 - mod_wsgi:19:d3196668-57e5-46dc-8b69-78d73b5873a0): >>> missed heartbeats from client, timeout: 60s >>> AMQP server on 10.80.0.13:5672 is unreachable: [Errno 104] Connection >>> reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno >>> 104] Connection reset by peer >>> >>> Is this bug or any resolution for fixing this issue? >>> >>> >>> Thanks >>> Arihant Jain >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcjong12110 at gmail.com Wed Aug 10 06:56:23 2022 From: jcjong12110 at gmail.com (=?UTF-8?B?7KCV7J6s7LKg?=) Date: Wed, 10 Aug 2022 15:56:23 +0900 Subject: About the meaning of compute nodes Message-ID: If there are many nodes, is it possible to create a high-performance VM? (e.g. Can 2 1 core cpu nodes create 1 2 core cpu VM) -------------- next part -------------- An HTML attachment was scrubbed... URL: From iurygregory at gmail.com Wed Aug 10 12:18:12 2022 From: iurygregory at gmail.com (Iury Gregory) Date: Wed, 10 Aug 2022 09:18:12 -0300 Subject: [PTG] [ironic] Antelope PTG new Etherpad Message-ID: Hello everyone, We changed the etherpad link to track the topics we will discuss at the PTG, instead of https://etherpad.opendev.org/p/ironic-ptg-planing-Columbus-OH we will be using https://etherpad.opendev.org/p/ironic-antelope-ptg Feel free to add any topics =) -- *Att[]'s* *Iury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Ironic PTL * *Senior Software Engineer at Red Hat Brazil* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From Danny.Webb at thehutgroup.com Wed Aug 10 12:23:16 2022 From: Danny.Webb at thehutgroup.com (Danny Webb) Date: Wed, 10 Aug 2022 12:23:16 +0000 Subject: Openstack Graphical processors virtualisation In-Reply-To: References: Message-ID: worth also mentioning that MIG isn't currently supported in Openstack. We just finished a POC with some a100 and a40 cards and the vgpu setup wasn't too hard to do, but you definitely needed to read a combination of the NVIDIA docs and the openstack docs to get a working setup. There is one open bug for mediated devices that you should be aware of though it looks like a fix is in the works: https://bugs.launchpad.net/nova/+bug/1900800 For straight PCI passthrough I found some of the tooling around it in openstack lacking compared to for the vgpu devices but that may have just been me missing something. Eg, I could only really see the PCI devices in the DB but couldn't find a way to see them using the SDK / cli. ________________________________ From: Sean Mooney Sent: 10 August 2022 11:23 To: Alvaro Soto Cc: KK CHN ; openstack-discuss Subject: Re: Openstack Graphical processors virtualisation CAUTION: This email originates from outside THG our main vgpu expert is on PTO for a few weeks but ill try and respond inline. On Wed, Aug 10, 2022 at 8:07 AM Alvaro Soto wrote: > > Hey, just a little help with one question. > > 1- > https://docs.openstack.org/nova/yoga/admin/virtual-gpu.html > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/instances_and_images_guide/ch-virtual_gpu yep thos are the docs for nova's vGPU support we also support pci passthough of a full gpu to a vm for usecause that need that. > > On Wed, Aug 10, 2022 at 1:17 AM KK CHN wrote: >> >> 1. Does Openstack support GPU virtualization ? We are running a cloud using ussuri with KVM hypervisors. Is creation of vGPUs possible? yes if you have an nvidia gpu and pay for there licnese server ectra you can config nova to expose those vgpus to the guests. if you have amd gpus and those support there mxgpu feature then that just exposes the vGPUs as stanard sriov VFs instead of vfio-mediated devices so you use pci passthough instead of the vGPU mdev feature in that case to consume them. >> >> Does KVM support GPU virtualization or any limitations? yes however nvidia have not yet upstream support for vgpu/mdev live migration. redhat nvidia and others are currently activly working on upstreamoing that to the kernel qemu and libvirt but that is the main limiation. depending on your release nova also does not support move operations like cold migration however those have been added in more recent releases. >> >> 2. What are the supported GPU types? (GRID GPU or GPUs attached to physical blades ) both >> >> 3. If it supports GPU attached to physical blades, will live migration of the VM be supported.Will OpenStack be able to identify the next host which has an attached GPU and perform live migration, in case of the Failure of One Blade with attached GPU with resident VM. no live migration is not possibel with mdev or sriov vf attachment currently. cold migration and evacuate are supported in more recent releases of openstack in the case of maintance or hardware failure. >> >> 4. What are the virtualization options if vGPU options are not supported by a particular GPU model? pci passthough of the fulll gpu or sriov if the card supprots that. >> >> >> Thanks in advance, >> Krish > > > > -- > > Alvaro Soto > > Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. > ---------------------------------------------------------- > Great people talk about ideas, > ordinary people talk about things, > small people talk... about other people. Danny Webb Principal OpenStack Engineer The Hut Group Tel: Email: Danny.Webb at thehutgroup.com For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Fifth Floor, Voyager House, Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its respective subsidiaries. Confidentiality Notice This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 811888 or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of the company. Encryptions and Viruses Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions sent by e-mail. Monitoring Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also be monitored and may be recorded to secure effective use and operation and for other lawful business purposes. hgvyjuv -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Aug 10 12:48:37 2022 From: smooney at redhat.com (Sean Mooney) Date: Wed, 10 Aug 2022 13:48:37 +0100 Subject: Openstack Graphical processors virtualisation In-Reply-To: References: Message-ID: On Wed, Aug 10, 2022 at 1:23 PM Danny Webb wrote: > > worth also mentioning that MIG isn't currently supported in Openstack. We just finished a POC with some a100 and a40 cards and the vgpu setup wasn't too hard to do, but you definitely needed to read a combination of the NVIDIA docs and the openstack docs to get a working setup. actually from a nova perspective mig is supported however you have to preconfigure the devices. mig just moves the mdevs to the VFs and you need to precreate the VFs and gpu compute instances but we dont plan to change that going forward. https://bugs.launchpad.net/nova/+bug/1900800 is a valid bug but its not related to mig. recreating the mdevs on host reboot is a complex task currently but its being worked on. > > There is one open bug for mediated devices that you should be aware of though it looks like a fix is in the works: > > https://bugs.launchpad.net/nova/+bug/1900800 > > For straight PCI passthrough I found some of the tooling around it in openstack lacking compared to for the vgpu devices but that may have just been me missing something. Eg, I could only really see the PCI devices in the DB but couldn't find a way to see them using the SDK / cli. yes that is by design rather then an oversight, we are chanign that in the zed cycle as we will start trracking pci device in placement going forward. but our intent is not to expose them via the nova api. > ________________________________ > From: Sean Mooney > Sent: 10 August 2022 11:23 > To: Alvaro Soto > Cc: KK CHN ; openstack-discuss > Subject: Re: Openstack Graphical processors virtualisation > > CAUTION: This email originates from outside THG > > our main vgpu expert is on PTO for a few weeks but ill try and respond inline. > > On Wed, Aug 10, 2022 at 8:07 AM Alvaro Soto wrote: > > > > Hey, just a little help with one question. > > > > 1- > > https://docs.openstack.org/nova/yoga/admin/virtual-gpu.html > > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/instances_and_images_guide/ch-virtual_gpu > yep thos are the docs for nova's vGPU support we also support pci > passthough of a full gpu to a vm for usecause that need that. > > > > On Wed, Aug 10, 2022 at 1:17 AM KK CHN wrote: > >> > >> 1. Does Openstack support GPU virtualization ? We are running a cloud using ussuri with KVM hypervisors. Is creation of vGPUs possible? > > yes if you have an nvidia gpu and pay for there licnese server ectra > you can config nova to expose those vgpus to the guests. > if you have amd gpus and those support there mxgpu feature then that > just exposes the vGPUs as stanard sriov VFs instead of vfio-mediated > devices > so you use pci passthough instead of the vGPU mdev feature in that > case to consume them. > >> > >> Does KVM support GPU virtualization or any limitations? > yes however nvidia have not yet upstream support for vgpu/mdev live > migration. redhat nvidia and others are currently activly working on > upstreamoing that to the kernel > qemu and libvirt but that is the main limiation. depending on your > release nova also does not support move operations like cold migration > however those have been > added in more recent releases. > >> > >> 2. What are the supported GPU types? (GRID GPU or GPUs attached to physical blades ) > both > >> > >> 3. If it supports GPU attached to physical blades, will live migration of the VM be supported.Will OpenStack be able to identify the next host which has an attached GPU and perform live migration, in case of the Failure of One Blade with attached GPU with resident VM. > > no live migration is not possibel with mdev or sriov vf attachment currently. > cold migration and evacuate are supported in more recent releases of > openstack in the case of maintance or hardware failure. > >> > >> 4. What are the virtualization options if vGPU options are not supported by a particular GPU model? > pci passthough of the fulll gpu or sriov if the card supprots that. > >> > >> > >> Thanks in advance, > >> Krish > > > > > > > > -- > > > > Alvaro Soto > > > > Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. > > ---------------------------------------------------------- > > Great people talk about ideas, > > ordinary people talk about things, > > small people talk... about other people. > > Danny Webb > Principal OpenStack Engineer > The Hut Group > > Tel: > Email: Danny.Webb at thehutgroup.com > > > For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Fifth Floor, Voyager House, Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its respective subsidiaries. > > Confidentiality Notice > This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 811888 or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of the company. > > Encryptions and Viruses > Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions sent by e-mail. > > Monitoring > Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also be monitored and may be recorded to secure effective use and operation and for other lawful business purposes. > > hgvyjuv From berndbausch at gmail.com Wed Aug 10 13:34:49 2022 From: berndbausch at gmail.com (Bernd Bausch) Date: Wed, 10 Aug 2022 22:34:49 +0900 Subject: About the meaning of compute nodes In-Reply-To: References: Message-ID: A VM is contained in a single compute node. There is no way for a VM to "span" several nodes. On 2022/08/10 3:56 PM, ??? wrote: > If there are many nodes, is it possible to create a high-performance VM? > (e.g. Can 2 1 core cpu nodes create 1 2 core cpu VM) From satish.txt at gmail.com Wed Aug 10 13:39:46 2022 From: satish.txt at gmail.com (Satish Patel) Date: Wed, 10 Aug 2022 09:39:46 -0400 Subject: [neutron][ovn] ovn-bgp-agent EVPN mode question Message-ID: Folks, I am trying to set up the following lab and so far everything looks good and working but I have some doubts or questions to make sure it's really working as expected. Lab Ref: https://ltomasbo.wordpress.com/2021/06/25/openstack-networking-with-evpn/ rack-1-host-1 (controller) rack-1-host-2 (compute1 - This is hosting cr-lrp ports, inshort router) rack-2-host-1 (compute2) I have created two vms vagrant at rack-1-host-1:~$ nova list nova CLI is deprecated and will be a removed in a future release +--------------------------------------+------+--------+------------+-------------+--------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------+--------+------------+-------------+--------------------------------------+ | aecb4f10-c46f-4551-b112-44e4dc007e88 | vm1 | ACTIVE | - | Running | private-test=10.0.0.105 | | ceae14b9-70c2-4dbc-8071-0d64d9a0ca84 | vm2 | ACTIVE | - | Running | private-test=10.0.0.86, 172.16.1.200 | +--------------------------------------+------+--------+------------+-------------+--------------------------------------+ # on rack-1-host-2 when i spun up vm2 which endup on rack-1-host-2 hence it created vrf-2001 on dummy lo-2001 interface and exposed vm2 ip address 10.0.0.86/32 96: vrf-2001: mtu 65575 qdisc noqueue state UP group default qlen 1000 link/ether 22:cc:25:b3:7b:96 brd ff:ff:ff:ff:ff:ff 97: br-2001: mtu 1500 qdisc noqueue master vrf-2001 state UP group default qlen 1000 link/ether 0a:c3:23:7a:8f:0c brd ff:ff:ff:ff:ff:ff inet6 fe80::851:67ff:fe64:b2c3/64 scope link valid_lft forever preferred_lft forever 98: vxlan-2001: mtu 1500 qdisc noqueue master br-2001 state UNKNOWN group default qlen 1000 link/ether 0a:c3:23:7a:8f:0c brd ff:ff:ff:ff:ff:ff inet6 fe80::8c3:23ff:fe7a:8f0c/64 scope link valid_lft forever preferred_lft forever 99: lo-2001: mtu 1500 qdisc noqueue master vrf-2001 state UNKNOWN group default qlen 1000 link/ether d6:60:da:91:2e:6d brd ff:ff:ff:ff:ff:ff inet 10.0.0.86/32 scope global lo-2001 valid_lft forever preferred_lft forever inet6 fe80::d460:daff:fe91:2e6d/64 scope link valid_lft forever preferred_lft forever # on rack-2-host-1 when i created vm1 which endup on rack-2-host-1 but it doesn't expose the vm ip address. Is that normal behavior? When I attach a floating ip to vm2 then why does my floating ip address not get exposed in BGP? Thank you in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon at csail.mit.edu Wed Aug 10 13:56:19 2022 From: jon at csail.mit.edu (Jonathan Proulx) Date: Wed, 10 Aug 2022 09:56:19 -0400 Subject: About the meaning of compute nodes In-Reply-To: References: Message-ID: <20220810135619.hrn2c7pm2c7m6orl@csail.mit.edu> On Wed, Aug 10, 2022 at 10:34:49PM +0900, Bernd Bausch wrote: :A VM is contained in a single compute node. There is no way for a VM to :"span" several nodes. This is true in the case of OpenStack and VMs in general. There are/were attempts at creating "Single System Image" clusters that are kindof the inverse of conventional VMs and pool multiple physical hosts in to a single virtual system. I used OpenMOSIX ~20 years ago in this context but at that time there were quite a few limitations as to how you had to compile your binaries to actually take advantage and that wasn't possible at the time for most of my use cases. Obviously this is very old information. OpenMOSIX seems defunct but the proprietary version seems to still exists: https://en.wikipedia.org/wiki/OpenMosix https://en.wikipedia.org/wiki/MOSIX https://mosix.cs.huji.ac.il/index.html This is completely unrelated to and as far as I know incompatible with OpenStack, but if that is of interest to you may provide a path to further inquiry. -Jon : :On 2022/08/10 3:56 PM, ??? wrote: :> If there are many nodes, is it possible to create a high-performance VM? :> (e.g. Can 2 1 core cpu nodes create 1 2 core cpu VM) : From elod.illes at est.tech Wed Aug 10 13:58:15 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Wed, 10 Aug 2022 15:58:15 +0200 Subject: [all] Proposed Antelope cycle schedule In-Reply-To: References: Message-ID: <1449e74b-9c0b-74b3-cb7b-63a63037a453@est.tech> Hi, During last week's Release Management meeting we discussed the schedule plan and decided to propose another, which is one week shorter (with release date March 22nd), to give time for the PTG (March 27 - March 31?) before the holidays in early April of 2023. See the 2nd alternative: https://review.opendev.org/c/openstack/releases/+/852741 (see generated page for better readability [1]) (note: the milestone dates were not changed as there's no better, evenly paced dates as far as we see) Please review this as well and give us feedback which one is better. Also, we would like to ask Foundation and Technical Committee to decide between the 2 options based on the reviews. (For the 1st alternative see my below mail) [1] https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_dd6/852741/1/check/openstack-tox-docs/dd6af3f/docs/antelope/schedule.html Thanks, El?d Ill?s irc: elodilles On 2022. 07. 28. 18:04, El?d Ill?s wrote: > Hi, > > As we are beyond Zed milestone 2 for more than 2 weeks now, it's time > to start planning the next, Antelope cycle and its release schedule: > > Antelope schedule: > https://review.opendev.org/c/openstack/releases/+/850753 > > (or see its generated page [1]) > > Feel free to review it and comment on the patch if there is something > that should be considered for the schedule. > > [1] > https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_699/850753/5/check/openstack-tox-docs/699fc2d/docs/antelope/schedule.html > > Thanks, > > El?d Ill?s > irc: elodilles > From senrique at redhat.com Wed Aug 10 14:20:54 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 10 Aug 2022 11:20:54 -0300 Subject: [cinder] Bug deputy report for week of 08-10-2022 Message-ID: This is a bug report from 08-03-2022 to 08-10-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- Low - https://bugs.launchpad.net/cinder/+bug/1984169 "RBD pool stats are reported inaccurately." Fix proposed to master. - https://bugs.launchpad.net/cinder/+bug/1984000 "Infinidat Cinder driver consistency groups feature is broken." Fix proposed to master. Cheers, Sofia -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Wed Aug 10 14:32:21 2022 From: marios at redhat.com (Marios Andreou) Date: Wed, 10 Aug 2022 17:32:21 +0300 Subject: [tripleo] gate blocker tripleo-ci-centos-9-undercloud-upgrade Message-ID: o/ tripleo please hold rechecks on master tripleo branch - we have a gate blocker for tripleo-ci-centos-9-undercloud-upgrade https://bugs.launchpad.net/tripleo/+bug/1984175 package/conflict/mirror issue not clear yet - grateful for pointers if you have them From fungi at yuggoth.org Wed Aug 10 16:04:11 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 10 Aug 2022 16:04:11 +0000 Subject: About the meaning of compute nodes In-Reply-To: <20220810135619.hrn2c7pm2c7m6orl@csail.mit.edu> References: <20220810135619.hrn2c7pm2c7m6orl@csail.mit.edu> Message-ID: <20220810160410.mbupxneb2gy77orj@yuggoth.org> On 2022-08-10 09:56:19 -0400 (-0400), Jonathan Proulx wrote: [...] > Obviously this is very old information. OpenMOSIX seems defunct but > the proprietary version seems to still exists: [...] Funny, I was going to reply with something similar, along with some of the less flexible approaches like MIPCH, DIPC and PVM. It appears that openMOSIX was succeeded by LinuxPMI and OpenSSI, but I'm having trouble tracking down where development of those happened or is still happening, if it is. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From amoralej at redhat.com Wed Aug 10 16:05:45 2022 From: amoralej at redhat.com (Alfredo Moralejo Alonso) Date: Wed, 10 Aug 2022 18:05:45 +0200 Subject: [tripleo] gate blocker tripleo-ci-centos-9-undercloud-upgrade In-Reply-To: References: Message-ID: Hi, I'd say AFS mirror is out of sync. Standard CentOS repo: [root at 77b891bf9fd8 ~]# dnf repoquery -q NetworkManager-ovs NetworkManager-ovs-1:1.39.10-1.el9.x86_64 NetworkManager-ovs-1:1.39.12-1.el9.x86_64 NetworkManager-ovs-1:1.39.5-1.el9.x86_64 NetworkManager-ovs-1:1.39.6-1.el9.x86_64 NetworkManager-ovs-1:1.39.7-2.el9.x8 Checking AFS: # dnf repoquery --repofrompath=afs, http://mirror.regionone.vexxhost-nodepool-sf.rdoproject.org/centos-stream/9-stream/AppStream/x86_64/os/ --disablerepo="*" --enablerepo=afs -q NetworkManager-ovs NetworkManager-ovs-1:1.39.10-1.el9.x86_64 NetworkManager-ovs-1:1.39.3-1.el9.x86_64 NetworkManager-ovs-1:1.39.5-1.el9.x86_64 NetworkManager-ovs-1:1.39.6-1.el9.x86_64 NetworkManager-ovs-1:1.39.7-2.el9.x86_64 I hope that helps, Alfredo On Wed, Aug 10, 2022 at 4:42 PM Marios Andreou wrote: > o/ tripleo > > please hold rechecks on master tripleo branch - we have a gate blocker > for tripleo-ci-centos-9-undercloud-upgrade > > https://bugs.launchpad.net/tripleo/+bug/1984175 > > package/conflict/mirror issue not clear yet - grateful for pointers if > you have them > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed Aug 10 16:21:04 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 10 Aug 2022 16:21:04 +0000 Subject: [tripleo] gate blocker tripleo-ci-centos-9-undercloud-upgrade In-Reply-To: References: Message-ID: <20220810162103.yttjwgjsztuo24n7@yuggoth.org> On 2022-08-10 18:05:45 +0200 (+0200), Alfredo Moralejo Alonso wrote: > I'd say AFS mirror is out of sync. [...] > http://mirror.regionone.vexxhost-nodepool-sf.rdoproject.org/centos-stream/9-stream/AppStream/x86_64/os/ [...] To be clear, this is RDO's SoftwareFactory CentOS mirror. Is it backed by OpenDev's AFS tree, or are you creating that mirror independently? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Wed Aug 10 16:24:53 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 10 Aug 2022 16:24:53 +0000 Subject: About the meaning of compute nodes In-Reply-To: <20220810160410.mbupxneb2gy77orj@yuggoth.org> References: <20220810135619.hrn2c7pm2c7m6orl@csail.mit.edu> <20220810160410.mbupxneb2gy77orj@yuggoth.org> Message-ID: <20220810162453.bk3gdmxfslaskrnq@yuggoth.org> On 2022-08-10 16:04:11 +0000 (+0000), Jeremy Stanley wrote: > On 2022-08-10 09:56:19 -0400 (-0400), Jonathan Proulx wrote: > [...] > > Obviously this is very old information. OpenMOSIX seems defunct but > > the proprietary version seems to still exists: > [...] > > Funny, I was going to reply with something similar, along with some > of the less flexible approaches like MIPCH, DIPC and PVM. It appears > that openMOSIX was succeeded by LinuxPMI and OpenSSI, but I'm having > trouble tracking down where development of those happened or is > still happening, if it is. Oh, and as far as being incompatible with OpenStack, maybe not. You could probably still deploy it directly with Ironic or through Nova's baremetal driver, perhaps relying on Cyborg for lifecycle management of any accelerator hardware connected to the servers, and so on. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Wed Aug 10 16:34:01 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 10 Aug 2022 16:34:01 +0000 Subject: [tripleo] gate blocker tripleo-ci-centos-9-undercloud-upgrade In-Reply-To: <20220810162103.yttjwgjsztuo24n7@yuggoth.org> References: <20220810162103.yttjwgjsztuo24n7@yuggoth.org> Message-ID: <20220810163401.tovpqpllht37t7pb@yuggoth.org> On 2022-08-10 16:21:04 +0000 (+0000), Jeremy Stanley wrote: > On 2022-08-10 18:05:45 +0200 (+0200), Alfredo Moralejo Alonso wrote: > > I'd say AFS mirror is out of sync. > [...] > > http://mirror.regionone.vexxhost-nodepool-sf.rdoproject.org/centos-stream/9-stream/AppStream/x86_64/os/ > [...] > > To be clear, this is RDO's SoftwareFactory CentOS mirror. Is it > backed by OpenDev's AFS tree, or are you creating that mirror > independently? Poking around in it a bit, it does seem to be. In that case, we're mirroring from rsync://mirror.facebook.net/centos-stream/9-stream/ and the timestamp at the base of that tree indicates it was last updated around 12:00 UTC today. The cronjob for that runs every 6 hours, so should refresh ~1.5 hours from now. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Wed Aug 10 16:46:08 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 10 Aug 2022 16:46:08 +0000 Subject: [tripleo] gate blocker tripleo-ci-centos-9-undercloud-upgrade In-Reply-To: References: Message-ID: <20220810164608.rl5334guqc7y2dtn@yuggoth.org> On 2022-08-10 18:05:45 +0200 (+0200), Alfredo Moralejo Alonso wrote: > Standard CentOS repo: > > [root at 77b891bf9fd8 ~]# dnf repoquery -q NetworkManager-ovs > NetworkManager-ovs-1:1.39.10-1.el9.x86_64 > NetworkManager-ovs-1:1.39.12-1.el9.x86_64 > NetworkManager-ovs-1:1.39.5-1.el9.x86_64 > NetworkManager-ovs-1:1.39.6-1.el9.x86_64 > NetworkManager-ovs-1:1.39.7-2.el9.x8 > > Checking AFS: > > # dnf repoquery --repofrompath=afs, > http://mirror.regionone.vexxhost-nodepool-sf.rdoproject.org/centos-stream/9-stream/AppStream/x86_64/os/ > --disablerepo="*" --enablerepo=afs -q NetworkManager-ovs > > NetworkManager-ovs-1:1.39.10-1.el9.x86_64 > NetworkManager-ovs-1:1.39.3-1.el9.x86_64 > NetworkManager-ovs-1:1.39.5-1.el9.x86_64 > NetworkManager-ovs-1:1.39.6-1.el9.x86_64 > NetworkManager-ovs-1:1.39.7-2.el9.x86_64 [...] Looks like it's Facebook's mirror which is behind, and that's what we're pulling from... wget -qO- http://mirror.facebook.net/centos-stream/9-stream/AppStream/x86_64/os/Packages/ | grep NetworkManager-ovs | sed 's/.*href="\([^"]*\)".*/\1/' NetworkManager-ovs-1.39.3-1.el9.x86_64.rpm NetworkManager-ovs-1.39.5-1.el9.x86_64.rpm NetworkManager-ovs-1.39.6-1.el9.x86_64.rpm NetworkManager-ovs-1.39.7-2.el9.x86_64.rpm NetworkManager-ovs-1.39.10-1.el9.x86_64.rpm -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From amoralej at redhat.com Wed Aug 10 17:16:58 2022 From: amoralej at redhat.com (Alfredo Moralejo Alonso) Date: Wed, 10 Aug 2022 19:16:58 +0200 Subject: [tripleo] gate blocker tripleo-ci-centos-9-undercloud-upgrade In-Reply-To: <20220810164608.rl5334guqc7y2dtn@yuggoth.org> References: <20220810164608.rl5334guqc7y2dtn@yuggoth.org> Message-ID: On Wed, Aug 10, 2022 at 7:00 PM Jeremy Stanley wrote: > On 2022-08-10 18:05:45 +0200 (+0200), Alfredo Moralejo Alonso wrote: > > Standard CentOS repo: > > > > [root at 77b891bf9fd8 ~]# dnf repoquery -q NetworkManager-ovs > > NetworkManager-ovs-1:1.39.10-1.el9.x86_64 > > NetworkManager-ovs-1:1.39.12-1.el9.x86_64 > > NetworkManager-ovs-1:1.39.5-1.el9.x86_64 > > NetworkManager-ovs-1:1.39.6-1.el9.x86_64 > > NetworkManager-ovs-1:1.39.7-2.el9.x8 > > > > Checking AFS: > > > > # dnf repoquery --repofrompath=afs, > > > http://mirror.regionone.vexxhost-nodepool-sf.rdoproject.org/centos-stream/9-stream/AppStream/x86_64/os/ > > --disablerepo="*" --enablerepo=afs -q NetworkManager-ovs > > > > NetworkManager-ovs-1:1.39.10-1.el9.x86_64 > > NetworkManager-ovs-1:1.39.3-1.el9.x86_64 > > NetworkManager-ovs-1:1.39.5-1.el9.x86_64 > > NetworkManager-ovs-1:1.39.6-1.el9.x86_64 > > NetworkManager-ovs-1:1.39.7-2.el9.x86_64 > [...] > > Looks like it's Facebook's mirror which is behind, and that's what > we're pulling from... > > wget -qO- > http://mirror.facebook.net/centos-stream/9-stream/AppStream/x86_64/os/Packages/ > | grep NetworkManager-ovs | sed 's/.*href="\([^"]*\)".*/\1/' > > NetworkManager-ovs-1.39.3-1.el9.x86_64.rpm > NetworkManager-ovs-1.39.5-1.el9.x86_64.rpm > NetworkManager-ovs-1.39.6-1.el9.x86_64.rpm > NetworkManager-ovs-1.39.7-2.el9.x86_64.rpm > NetworkManager-ovs-1.39.10-1.el9.x86_64.rpm > > Yep, that seems to be the actual issue. As you noted, the mirror I mentioned is from RDO but it's backed by opendev AFS. > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amoralej at redhat.com Wed Aug 10 21:44:04 2022 From: amoralej at redhat.com (Alfredo Moralejo Alonso) Date: Wed, 10 Aug 2022 23:44:04 +0200 Subject: [tripleo] gate blocker tripleo-ci-centos-9-undercloud-upgrade In-Reply-To: References: <20220810164608.rl5334guqc7y2dtn@yuggoth.org> Message-ID: On Wed, Aug 10, 2022 at 7:16 PM Alfredo Moralejo Alonso wrote: > > > > On Wed, Aug 10, 2022 at 7:00 PM Jeremy Stanley wrote: > >> On 2022-08-10 18:05:45 +0200 (+0200), Alfredo Moralejo Alonso wrote: >> > Standard CentOS repo: >> > >> > [root at 77b891bf9fd8 ~]# dnf repoquery -q NetworkManager-ovs >> > NetworkManager-ovs-1:1.39.10-1.el9.x86_64 >> > NetworkManager-ovs-1:1.39.12-1.el9.x86_64 >> > NetworkManager-ovs-1:1.39.5-1.el9.x86_64 >> > NetworkManager-ovs-1:1.39.6-1.el9.x86_64 >> > NetworkManager-ovs-1:1.39.7-2.el9.x8 >> > >> > Checking AFS: >> > >> > # dnf repoquery --repofrompath=afs, >> > >> http://mirror.regionone.vexxhost-nodepool-sf.rdoproject.org/centos-stream/9-stream/AppStream/x86_64/os/ >> > --disablerepo="*" --enablerepo=afs -q NetworkManager-ovs >> > >> > NetworkManager-ovs-1:1.39.10-1.el9.x86_64 >> > NetworkManager-ovs-1:1.39.3-1.el9.x86_64 >> > NetworkManager-ovs-1:1.39.5-1.el9.x86_64 >> > NetworkManager-ovs-1:1.39.6-1.el9.x86_64 >> > NetworkManager-ovs-1:1.39.7-2.el9.x86_64 >> [...] >> >> Looks like it's Facebook's mirror which is behind, and that's what >> we're pulling from... >> >> wget -qO- >> http://mirror.facebook.net/centos-stream/9-stream/AppStream/x86_64/os/Packages/ >> | grep NetworkManager-ovs | sed 's/.*href="\([^"]*\)".*/\1/' >> >> NetworkManager-ovs-1.39.3-1.el9.x86_64.rpm >> NetworkManager-ovs-1.39.5-1.el9.x86_64.rpm >> NetworkManager-ovs-1.39.6-1.el9.x86_64.rpm >> NetworkManager-ovs-1.39.7-2.el9.x86_64.rpm >> NetworkManager-ovs-1.39.10-1.el9.x86_64.rpm >> >> > Yep, that seems to be the actual issue. > Sent https://review.opendev.org/c/opendev/system-config/+/852793 to switch to rackspace mirror > As you noted, the mirror I mentioned is from RDO but it's backed by > opendev AFS. > > > >> -- >> Jeremy Stanley >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Aug 10 21:46:46 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 11 Aug 2022 03:16:46 +0530 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Aug 11 at 1500 UTC In-Reply-To: <1827e583106.105f647cf518957.4828420962111334675@ghanshyammann.com> References: <1827e583106.105f647cf518957.4828420962111334675@ghanshyammann.com> Message-ID: <18289b9698d.b5b05123855359.7312959443034906149@ghanshyammann.com> Hello Everyone, Below is the agenda for tomorrow's TC meeting schedule at 1500 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting * Roll call * Follow up on past action items * Gate health check ** Bare 'recheck' state *** https://etherpad.opendev.org/p/recheck-weekly-summary * 2023.1 cycle PTG Planning ** Encourage projects to schedule 'operator hours' as a separate slot in PTG(avoiding conflicts among other projects 'operator hours') * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open -gmann ---- On Mon, 08 Aug 2022 22:14:45 +0530 Ghanshyam Mann wrote --- > Hello Everyone, > > The technical Committee's next weekly meeting is scheduled for 2022 Aug 11, at 1500 UTC. > > If you would like to add topics for discussion, please add them to the below wiki page by > Wednesday, Aug 10 at 2100 UTC. > > https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting > > -gmann > > > > From lokendrarathour at gmail.com Thu Aug 11 03:45:26 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Thu, 11 Aug 2022 09:15:26 +0530 Subject: [Triple0] [Wallaby] External Ceph Integration getting failed In-Reply-To: References: Message-ID: Hi Thanks, for the inputs, we could see the miss, now we have added the required miss : "TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml" Now with this setting if we deploy the setup in wallaby, we are getting error as: PLAY [External deployment step 1] ********************************************** 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | TASK | External deployment step 1 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | OK | External deployment step 1 | undercloud -> localhost | result={ "changed": false, "msg": "Use --start-at-task 'External deployment step 1' to resume from this task" } [WARNING]: ('undercloud -> localhost', '525400d4-7124-4a42-664c-0000000000a8') missing from stats 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | INCLUDED | /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml | undercloud 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | TASK | Set some tripleo-ansible facts 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | OK | Set some tripleo-ansible facts | undercloud 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | 0.03s 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | TASK | Container image prepare 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | FATAL | Container image prepare | *undercloud | error={"changed": false, "error": "None: Max retries exceeded with url: /v2/ (Caused by None)", "msg": "Error running container image prepare: None: Max retries exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | TIMING | tripleo_container_image_prepare : Container image prepare | undercloud | 0:06:13.385607 | 72.12s This gets failed at step 1, As this is wallaby and based on the document (Use an external Ceph cluster with the Overcloud ? TripleO 3.0.0 documentation (openstack.org) ) we should only pass this external-ceph.yaml for the external ceph intergration. But it is not happening. Few things to note: 1. Container Prepare: (undercloud) [stack at undercloud ~]$ cat containers-prepare-parameter.yaml # Generated with the following on 2022-06-28T18:56:38.642315 # # openstack tripleo container image prepare default --local-push-destination --output-env-file /home/stack/containers-prepare-parameter.yaml # parameter_defaults: ContainerImagePrepare: - push_destination: true set: name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version (undercloud) [stack at undercloud ~]$ 2. this is SSL based deployment. Any idea for the error, the issue is seen only once we have the external ceph integration enabled. Best Regards, Lokendra On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano wrote: > Hi, > ceph is supposed to be configured by this tripleo-ansible role [1], which > is triggered by tht on external_deploy_steps [2]. > In theory adding [3] should just work, assuming you customize the ceph > cluster mon ip addresses, fsid and a few other related variables. > From your previous email I suspect in your external-ceph.yaml you missed > the TripleO resource OS::TripleO::Services::CephExternal: > ../deployment/cephadm/ceph-client.yaml > (see [3]). > > Thanks, > Francesco > > > [1] > https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_ceph_client > [2] > https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/cephadm/ceph-client.yaml#L93 > [3] > https://github.com/openstack/tripleo-heat-templates/blob/master/environments/external-ceph.yaml > > On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour > wrote: > >> Hi Team, >> I was trying to integrate External Ceph with Triple0 Wallaby, and at the >> end of deployment in step4 getting the below error: >> >> 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | >> Create containers from >> /var/lib/tripleo-config/container-startup-config/step_4 >> 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | >> /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | >> overcloud-controller-2 >> 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | >> Create containers managed by Podman for >> /var/lib/tripleo-config/container-startup-config/step_4 >> 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:24.530812 | | WARNING | >> ERROR: Can't run container nova_libvirt_init_secret >> stderr: >> 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | >> Create containers managed by Podman for >> /var/lib/tripleo-config/container-startup-config/step_4 | >> overcloud-novacompute-0 | error={"changed": false, "msg": "Failed >> containers: nova_libvirt_init_secret"} >> 2022-08-03 18:37:44,282 p=507732 u >> >> >> *external-ceph.conf:* >> >> parameter_defaults: >> # Enable use of RBD backend in nova-compute >> NovaEnableRbdBackend: True >> # Enable use of RBD backend in cinder-volume >> CinderEnableRbdBackend: True >> # Backend to use for cinder-backup >> CinderBackupBackend: ceph >> # Backend to use for glance >> GlanceBackend: rbd >> # Name of the Ceph pool hosting Nova ephemeral images >> NovaRbdPoolName: vms >> # Name of the Ceph pool hosting Cinder volumes >> CinderRbdPoolName: volumes >> # Name of the Ceph pool hosting Cinder backups >> CinderBackupRbdPoolName: backups >> # Name of the Ceph pool hosting Glance images >> GlanceRbdPoolName: images >> # Name of the user to authenticate with the external Ceph cluster >> CephClientUserName: admin >> # The cluster FSID >> CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' >> # The CephX user auth key >> CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' >> # The list of Ceph monitors >> CephExternalMonHost: >> 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' >> ~ >> >> >> Have tried checking and validating the ceph client details and they seem >> to be correct, further digging the container log I could see something like >> this : >> >> [root at overcloud-novacompute-0 containers]# tail -f >> nova_libvirt_init_secret.log >> tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such >> file or directory >> tail: no files remaining >> [root at overcloud-novacompute-0 containers]# tail -f >> stdouts/nova_libvirt_init_secret.log >> 2022-08-04T11:48:47.689898197+05:30 stdout F >> ------------------------------------------------ >> 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets >> for: ceph:admin >> 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf >> was not found >> 2022-08-04T11:48:47.690625088+05:30 stdout F Path to >> nova_libvirt_init_secret was ceph:admin >> 2022-08-04T16:20:29.643785538+05:30 stdout F >> ------------------------------------------------ >> 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets >> for: ceph:admin >> 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf >> was not found >> 2022-08-04T16:20:29.644785532+05:30 stdout F Path to >> nova_libvirt_init_secret was ceph:admin >> ^C >> [root at overcloud-novacompute-0 containers]# tail -f >> stdouts/nova_compute_init_log.log >> >> -- >> ~ Lokendra >> skype: lokendrarathour >> >> >> > > -- > Francesco Pantano > GPG KEY: F41BD75C > -- ~ Lokendra skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Thu Aug 11 05:03:15 2022 From: marios at redhat.com (Marios Andreou) Date: Thu, 11 Aug 2022 08:03:15 +0300 Subject: [tripleo] gate blocker tripleo-ci-centos-9-undercloud-upgrade In-Reply-To: References: <20220810164608.rl5334guqc7y2dtn@yuggoth.org> Message-ID: thanks very much for your help Alfredo and Jeremy Seems the facebook mirror didn't refresh/sync yet - at least the last run in the build history is from ~1 UTC this morning and it is still hitting the same issue https://0eee8cd31d7046899357-239a29fa7add7ba7ba8e1040fbce5f75.ssl.cf2.rackcdn.com/850594/2/check/tripleo-ci-centos-9-undercloud-upgrade/f6a7920/logs/undercloud/home/zuul/undercloud_upgrade.log On Thu, Aug 11, 2022 at 1:12 AM Alfredo Moralejo Alonso wrote: > > > > On Wed, Aug 10, 2022 at 7:16 PM Alfredo Moralejo Alonso wrote: >> >> >> >> >> On Wed, Aug 10, 2022 at 7:00 PM Jeremy Stanley wrote: >>> >>> On 2022-08-10 18:05:45 +0200 (+0200), Alfredo Moralejo Alonso wrote: >>> > Standard CentOS repo: >>> > >>> > [root at 77b891bf9fd8 ~]# dnf repoquery -q NetworkManager-ovs >>> > NetworkManager-ovs-1:1.39.10-1.el9.x86_64 >>> > NetworkManager-ovs-1:1.39.12-1.el9.x86_64 >>> > NetworkManager-ovs-1:1.39.5-1.el9.x86_64 >>> > NetworkManager-ovs-1:1.39.6-1.el9.x86_64 >>> > NetworkManager-ovs-1:1.39.7-2.el9.x8 >>> > >>> > Checking AFS: >>> > >>> > # dnf repoquery --repofrompath=afs, >>> > http://mirror.regionone.vexxhost-nodepool-sf.rdoproject.org/centos-stream/9-stream/AppStream/x86_64/os/ >>> > --disablerepo="*" --enablerepo=afs -q NetworkManager-ovs >>> > >>> > NetworkManager-ovs-1:1.39.10-1.el9.x86_64 >>> > NetworkManager-ovs-1:1.39.3-1.el9.x86_64 >>> > NetworkManager-ovs-1:1.39.5-1.el9.x86_64 >>> > NetworkManager-ovs-1:1.39.6-1.el9.x86_64 >>> > NetworkManager-ovs-1:1.39.7-2.el9.x86_64 >>> [...] >>> >>> Looks like it's Facebook's mirror which is behind, and that's what >>> we're pulling from... >>> >>> wget -qO- http://mirror.facebook.net/centos-stream/9-stream/AppStream/x86_64/os/Packages/ | grep NetworkManager-ovs | sed 's/.*href="\([^"]*\)".*/\1/' >>> >>> NetworkManager-ovs-1.39.3-1.el9.x86_64.rpm >>> NetworkManager-ovs-1.39.5-1.el9.x86_64.rpm >>> NetworkManager-ovs-1.39.6-1.el9.x86_64.rpm >>> NetworkManager-ovs-1.39.7-2.el9.x86_64.rpm >>> NetworkManager-ovs-1.39.10-1.el9.x86_64.rpm >>> >> >> Yep, that seems to be the actual issue. > > > Sent https://review.opendev.org/c/opendev/system-config/+/852793 to switch to rackspace mirror > >> >> As you noted, the mirror I mentioned is from RDO but it's backed by opendev AFS. >> >> >>> >>> -- >>> Jeremy Stanley From yves.guimard at gmail.com Thu Aug 11 06:48:49 2022 From: yves.guimard at gmail.com (Yves Gd) Date: Thu, 11 Aug 2022 08:48:49 +0200 Subject: [kolla] Best way to restart controllers Message-ID: Hi all, We are looking for the best way to process system upgrades on our controller nodes and especially the best way to restart them (we have 3 controllers nodes with routers ha). If we just restart them one by one (waiting long time between each), we have already seen rabbitmq cluster errors (needing a complete restart of rabbitmq). With a kolla stop, a restart and a kolla deploy, some docker containers are restarted on others nodes. The administration guides describes how to add or remove a node but not this simple case. How do you proceed securely? Thanks for all, Yves -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Thu Aug 11 09:09:58 2022 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 11 Aug 2022 11:09:58 +0200 Subject: [all] Proposed Antelope cycle schedule In-Reply-To: <1449e74b-9c0b-74b3-cb7b-63a63037a453@est.tech> References: <1449e74b-9c0b-74b3-cb7b-63a63037a453@est.tech> Message-ID: <7f3c8429-b6b7-8c42-da2e-f933d3b993b4@openstack.org> El?d Ill?s wrote: > [...] > Please review this as well and give us feedback which one is better. > Also, we would like to ask Foundation and Technical Committee to decide > between the 2 options based on the reviews. From a Foundation marketing perspective both solutions can work. The only difference is where a PTG could happen, given religious holidays in the early weeks of April. If we pick the 24-week option[1], we keep the option to hold a PTG the week of March 27. If we pick the 25-week option[2], if we wanted to do a PTG within the first weeks of the cycle, our only option would be to hold the PTG on the same week as release week. So... small preference for the 24-week option as people are generally more available to participate in release announcements if PTG week is not happening at the same time, so it gives us more flexibility. That said, the TC should ultimately pick between the two options, as there may be other factors playing in. [1] https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_dd6/852741/1/check/openstack-tox-docs/dd6af3f/docs/antelope/schedule.html [2] https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_699/850753/5/check/openstack-tox-docs/699fc2d/docs/antelope/schedule.html -- Thierry Carrez (ttx) From kkchn.in at gmail.com Thu Aug 11 10:19:00 2022 From: kkchn.in at gmail.com (KK CHN) Date: Thu, 11 Aug 2022 15:49:00 +0530 Subject: Metering, billing software components for Openstack Message-ID: List, We are running our datacenter using the Ussuri version. Planning to upgrade to higher versions soon. 1. What are the metering, billing and metric solutions for ussuri and other latest openstack versions ? 2. The ceilometer and gnocchi is the way forward or ? People are using any latest tools for the best results pls share your thoughts. Any thoughts most welcome. Krish -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrunge at matthias-runge.de Thu Aug 11 12:29:15 2022 From: mrunge at matthias-runge.de (Matthias Runge) Date: Thu, 11 Aug 2022 14:29:15 +0200 Subject: Metering, billing software components for Openstack In-Reply-To: References: Message-ID: <3d132405-106f-4353-eb32-4786b5cb435b@matthias-runge.de> On 11/08/2022 12:19, KK CHN wrote: > List, > > We are running our datacenter using the Ussuri version.? Planning to > upgrade to higher versions soon. > > 1. What are? the metering, billing and metric solutions for ussuri and > other latest openstack versions ? > > 2. The ceilometer and gnocchi? is the way forward or? ?? People are > using any latest tools for the best results pls share your thoughts. > There is the official OpenStack project Cloudkitty for billing and chargeback. It uses the Gnocchi API. Matthias From mark at stackhpc.com Thu Aug 11 12:44:25 2022 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 11 Aug 2022 13:44:25 +0100 Subject: Metering, billing software components for Openstack In-Reply-To: <3d132405-106f-4353-eb32-4786b5cb435b@matthias-runge.de> References: <3d132405-106f-4353-eb32-4786b5cb435b@matthias-runge.de> Message-ID: On Thu, 11 Aug 2022, 13:42 Matthias Runge, wrote: > On 11/08/2022 12:19, KK CHN wrote: > > List, > > > > We are running our datacenter using the Ussuri version. Planning to > > upgrade to higher versions soon. > > > > 1. What are the metering, billing and metric solutions for ussuri and > > other latest openstack versions ? > > > > 2. The ceilometer and gnocchi is the way forward or ? People are > > using any latest tools for the best results pls share your thoughts. > > > > There is the official OpenStack project Cloudkitty for billing and > chargeback. It uses the Gnocchi API. > It can also use Prometheus as a data source, if you aren't running ceilometer & gnocchi. > > Matthias > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Thu Aug 11 14:13:52 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 11 Aug 2022 14:13:52 +0000 Subject: [all][tc] August 2022 OpenInfra Board Sync In-Reply-To: <20220630142207.rwtyc3apyhd2gyjv@yuggoth.org> References: <20220630142207.rwtyc3apyhd2gyjv@yuggoth.org> Message-ID: <20220811141352.3rk3rx2ea3ic2226@yuggoth.org> The discussion was held yesterday (2022-08-10) at 20:00 UTC and ran for roughly 65 minutes. Many thanks to Julia for hosting that conference call! You can find the rough notes here: https://etherpad.opendev.org/p/r.685e1bd99e2f305c70148c08a54236e0 Takeaways are that the TC will explicitly invite board members to continuation of this or related topics during their PTG sessions once the schedule is finalized, and we'll also arrange another hour-long call like yesterday's to occur on Wednesday, November 16 at 20:00 UTC (meeting hold to follow once I work out how to craft an ICS attachment for it). Thanks to everyone from the board, TC, and broader community who participated! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From pierre at stackhpc.com Thu Aug 11 14:20:32 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 11 Aug 2022 16:20:32 +0200 Subject: About the meaning of compute nodes In-Reply-To: References: Message-ID: On Wed, 10 Aug 2022 at 15:52, Bernd Bausch wrote: > A VM is contained in a single compute node. There is no way for a VM to > "span" several nodes. > > On 2022/08/10 3:56 PM, ??? wrote: > > If there are many nodes, is it possible to create a high-performance VM? > > (e.g. Can 2 1 core cpu nodes create 1 2 core cpu VM) > A brief Google search found this research project called GiantVM which is implementing a distributed hypervisor [1] [2]. But of course in the context of OpenStack or production systems in general, your answer is completely valid. [1] https://giantvm.github.io [2] https://dl.acm.org/doi/pdf/10.1145/3505251 -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Thu Aug 11 16:29:25 2022 From: johfulto at redhat.com (John Fulton) Date: Thu, 11 Aug 2022 12:29:25 -0400 Subject: [Triple0] [Wallaby] External Ceph Integration getting failed In-Reply-To: References: Message-ID: The ceph container should no longer be needed for external ceph configuration (since the move from ceph-ansible to cephadm) but if removing the ceph env files makes the error go away, then try adding it back and then following these steps to prepare the ceph container on your undercloud before deploying. https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html#container-options On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour wrote: > Hi Thanks, > for the inputs, we could see the miss, > now we have added the required miss : > "TripleO resource > OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml" > > Now with this setting if we deploy the setup in wallaby, we are > getting error as: > > > PLAY [External deployment step 1] > ********************************************** > 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | > TASK | External deployment step 1 > 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | > OK | External deployment step 1 | undercloud -> localhost | result={ > "changed": false, > "msg": "Use --start-at-task 'External deployment step 1' to resume > from this task" > } > [WARNING]: ('undercloud -> localhost', > '525400d4-7124-4a42-664c-0000000000a8') > missing from stats > 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | > TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s > 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | > INCLUDED | > /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml > | undercloud > 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | > TASK | Set some tripleo-ansible facts > 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | > OK | Set some tripleo-ansible facts | undercloud > 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | > TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | > 0.03s > 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | > TASK | Container image prepare > 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | > FATAL | Container image prepare | *undercloud | error={"changed": false, > "error": "None: Max retries exceeded with url: /v2/ (Caused by None)", > "msg": "Error running container image prepare: None: Max retries exceeded > with url: /v2/ (Caused by None)", "params": {}, "success": false}* > 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | > TIMING | tripleo_container_image_prepare : Container image prepare | > undercloud | 0:06:13.385607 | 72.12s > > This gets failed at step 1, As this is wallaby and based on the document (Use > an external Ceph cluster with the Overcloud ? TripleO 3.0.0 documentation > (openstack.org) > ) > we should only pass this external-ceph.yaml for the external ceph > intergration. > But it is not happening. > > > Few things to note: > 1. Container Prepare: > > (undercloud) [stack at undercloud ~]$ cat containers-prepare-parameter.yaml > # Generated with the following on 2022-06-28T18:56:38.642315 > # > # openstack tripleo container image prepare default > --local-push-destination --output-env-file > /home/stack/containers-prepare-parameter.yaml > # > > > parameter_defaults: > ContainerImagePrepare: > - push_destination: true > set: > name_prefix: openstack- > name_suffix: '' > namespace: myserver.com:5000/tripleowallaby > neutron_driver: ovn > rhel_containers: false > tag: current-tripleo > tag_from_label: rdo_version > (undercloud) [stack at undercloud ~]$ > > 2. this is SSL based deployment. > > Any idea for the error, the issue is seen only once we have the external > ceph integration enabled. > > Best Regards, > Lokendra > > > > > On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano > wrote: > >> Hi, >> ceph is supposed to be configured by this tripleo-ansible role [1], which >> is triggered by tht on external_deploy_steps [2]. >> In theory adding [3] should just work, assuming you customize the ceph >> cluster mon ip addresses, fsid and a few other related variables. >> From your previous email I suspect in your external-ceph.yaml you missed >> the TripleO resource OS::TripleO::Services::CephExternal: >> ../deployment/cephadm/ceph-client.yaml >> (see [3]). >> >> Thanks, >> Francesco >> >> >> [1] >> https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_ceph_client >> [2] >> https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/cephadm/ceph-client.yaml#L93 >> [3] >> https://github.com/openstack/tripleo-heat-templates/blob/master/environments/external-ceph.yaml >> >> On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < >> lokendrarathour at gmail.com> wrote: >> >>> Hi Team, >>> I was trying to integrate External Ceph with Triple0 Wallaby, and at the >>> end of deployment in step4 getting the below error: >>> >>> 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 >>> 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | >>> Create containers from >>> /var/lib/tripleo-config/container-startup-config/step_4 >>> 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 >>> 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | >>> /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | >>> overcloud-controller-2 >>> 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 >>> 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | >>> Create containers managed by Podman for >>> /var/lib/tripleo-config/container-startup-config/step_4 >>> 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 >>> 18:37:24.530812 | | WARNING | >>> ERROR: Can't run container nova_libvirt_init_secret >>> stderr: >>> 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 >>> 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | >>> Create containers managed by Podman for >>> /var/lib/tripleo-config/container-startup-config/step_4 | >>> overcloud-novacompute-0 | error={"changed": false, "msg": "Failed >>> containers: nova_libvirt_init_secret"} >>> 2022-08-03 18:37:44,282 p=507732 u >>> >>> >>> *external-ceph.conf:* >>> >>> parameter_defaults: >>> # Enable use of RBD backend in nova-compute >>> NovaEnableRbdBackend: True >>> # Enable use of RBD backend in cinder-volume >>> CinderEnableRbdBackend: True >>> # Backend to use for cinder-backup >>> CinderBackupBackend: ceph >>> # Backend to use for glance >>> GlanceBackend: rbd >>> # Name of the Ceph pool hosting Nova ephemeral images >>> NovaRbdPoolName: vms >>> # Name of the Ceph pool hosting Cinder volumes >>> CinderRbdPoolName: volumes >>> # Name of the Ceph pool hosting Cinder backups >>> CinderBackupRbdPoolName: backups >>> # Name of the Ceph pool hosting Glance images >>> GlanceRbdPoolName: images >>> # Name of the user to authenticate with the external Ceph cluster >>> CephClientUserName: admin >>> # The cluster FSID >>> CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' >>> # The CephX user auth key >>> CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' >>> # The list of Ceph monitors >>> CephExternalMonHost: >>> 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' >>> ~ >>> >>> >>> Have tried checking and validating the ceph client details and they seem >>> to be correct, further digging the container log I could see something like >>> this : >>> >>> [root at overcloud-novacompute-0 containers]# tail -f >>> nova_libvirt_init_secret.log >>> tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such >>> file or directory >>> tail: no files remaining >>> [root at overcloud-novacompute-0 containers]# tail -f >>> stdouts/nova_libvirt_init_secret.log >>> 2022-08-04T11:48:47.689898197+05:30 stdout F >>> ------------------------------------------------ >>> 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets >>> for: ceph:admin >>> 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf >>> was not found >>> 2022-08-04T11:48:47.690625088+05:30 stdout F Path to >>> nova_libvirt_init_secret was ceph:admin >>> 2022-08-04T16:20:29.643785538+05:30 stdout F >>> ------------------------------------------------ >>> 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets >>> for: ceph:admin >>> 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf >>> was not found >>> 2022-08-04T16:20:29.644785532+05:30 stdout F Path to >>> nova_libvirt_init_secret was ceph:admin >>> ^C >>> [root at overcloud-novacompute-0 containers]# tail -f >>> stdouts/nova_compute_init_log.log >>> >>> -- >>> ~ Lokendra >>> skype: lokendrarathour >>> >>> >>> >> >> -- >> Francesco Pantano >> GPG KEY: F41BD75C >> > > > -- > ~ Lokendra > skype: lokendrarathour > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Thu Aug 11 17:33:17 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 11 Aug 2022 10:33:17 -0700 Subject: [tripleo] gate blocker tripleo-ci-centos-9-undercloud-upgrade In-Reply-To: References: <20220810164608.rl5334guqc7y2dtn@yuggoth.org> Message-ID: On Wed, Aug 10, 2022, at 10:16 AM, Alfredo Moralejo Alonso wrote: > Yep, that seems to be the actual issue. > > As you noted, the mirror I mentioned is from RDO but it's backed by opendev AFS. > Note, we can and do change how our mirrors are configured, and we do not treat this as an externally stable interface. A good example of this is when we changed the pypi bandersnatch mirror in AFS to a caching proxy. I'm not aware of any plans that would break this currently, and don't expect it to change anytime soon. But be aware that if we need to change it, we can and will. Another example is how we recently dropped source packages from our mirrors. Our jobs don't rely on them and if they do they can fetch them from upstream. Clark From sandeepggn93 at gmail.com Fri Aug 12 05:45:43 2022 From: sandeepggn93 at gmail.com (Sandeep Yadav) Date: Fri, 12 Aug 2022 11:15:43 +0530 Subject: [tripleo] gate blocker tripleo-ci-centos-9-undercloud-upgrade In-Reply-To: References: <20220810164608.rl5334guqc7y2dtn@yuggoth.org> Message-ID: Hello All, As per [1], Issue on Centos stream 9 build infra got solved last night. The Facebook mirror is in sync now[2] for the earlier affected package. The upgrade job is back to green.[3] [1] https://lists.centos.org/pipermail/centos-devel/2022-August/120525.html [2] ~~~ $ wget -qO- http://mirror.facebook.net/centos-stream/9-stream/AppStream/x86_64/os/Packages/ | grep NetworkManager-ovs | sed 's/.*href="\([^"]*\)".*/\1/' NetworkManager-ovs-1.39.5-1.el9.x86_64.rpm NetworkManager-ovs-1.39.6-1.el9.x86_64.rpm NetworkManager-ovs-1.39.7-2.el9.x86_64.rpm NetworkManager-ovs-1.39.10-1.el9.x86_64.rpm NetworkManager-ovs-1.39.12-1.el9.x86_64.rpm ~~~ [3] https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-undercloud-upgrade Thank You Sandeep On Thu, Aug 11, 2022 at 11:24 PM Clark Boylan wrote: > On Wed, Aug 10, 2022, at 10:16 AM, Alfredo Moralejo Alonso wrote: > > Yep, that seems to be the actual issue. > > > > As you noted, the mirror I mentioned is from RDO but it's backed by > opendev AFS. > > > > Note, we can and do change how our mirrors are configured, and we do not > treat this as an externally stable interface. A good example of this is > when we changed the pypi bandersnatch mirror in AFS to a caching proxy. I'm > not aware of any plans that would break this currently, and don't expect it > to change anytime soon. But be aware that if we need to change it, we can > and will. > > Another example is how we recently dropped source packages from our > mirrors. Our jobs don't rely on them and if they do they can fetch them > from upstream. > > Clark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Aug 12 11:27:51 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 12 Aug 2022 11:27:51 +0000 Subject: [tripleo] gate blocker tripleo-ci-centos-9-undercloud-upgrade In-Reply-To: References: <20220810164608.rl5334guqc7y2dtn@yuggoth.org> Message-ID: <20220812112750.acktycci7dup2kjv@yuggoth.org> On 2022-08-12 11:15:43 +0530 (+0530), Sandeep Yadav wrote: > As per [1], Issue on Centos stream 9 build infra got solved last > night. > > The Facebook mirror is in sync now[2] for the earlier affected > package. The upgrade job is back to green.[3] [...] Unfortunately, that's about the same time Ian logged that we completed the work to shift synchronization off Facebook's mirror and back to Rackspace's again. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From elod.illes at est.tech Fri Aug 12 14:26:51 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Fri, 12 Aug 2022 16:26:51 +0200 Subject: [release] Release countdown for week R-7, Aug 15 - 19 Message-ID: <853650ff-f749-4692-59d4-08305730b9fa@est.tech> Development Focus ----------------- We are entering the last weeks of the Zed development cycle. From now until the final release, we'll send a countdown email like this every week. It's probably a good time for teams to take stock of their library and client work that needs to be completed yet. The non-client library freeze is coming up, followed closely by the client lib freeze. Please plan accordingly to avoid any last minute rushes to get key functionality in. General Information ------------------- Next week is the Extra-ATC freeze, in preparation for elections. All contributions to OpenStack are valuable, but some are not expressed as Gerrit code changes. Please list active contributors to your project team who do not have a code contribution this cycle, and therefore won't automatically be considered an Active Technical Contributor and allowed to vote. This is done by adding extra-atcs to https://opendev.org/openstack/governance/src/branch/master/reference/projects.yaml before the Extra-ATC freeze on August 18th, 2022. A quick reminder of the upcoming freeze dates. Those vary depending on deliverable type: * General libraries (except client libraries) need to have their last feature release before Non-client library freeze (August 25th, 2022). Their stable branches are cut early. * Client libraries (think python-*client libraries) need to have their last feature release before Client library freeze (September 1st, 2022) * Deliverables following a cycle-with-rc model (that would be most services) observe a Feature freeze on that same date, September 1st, 2022. Any feature addition beyond that date should be discussed on the mailing-list and get PTL approval. After feature freeze, cycle-with-rc deliverables need to produce a first release candidate (and a stable branch) before RC1 deadline (September 15th, 2022) * Deliverables following cycle-with-intermediary model can release as necessary, but in all cases before Final RC deadline (September 29th, 2022) Finally, now is also a good time to start planning what highlights you want for your deliverables in the cycle highlights. The deadline to submit an initial version for those is set to Feature freeze (September 1st, 2022). Background on cycle-highlights: http://lists.openstack.org/pipermail/openstack-dev/2017-December/125613.html Project Team Guide, Cycle-Highlights: https://docs.openstack.org/project-team-guide/release-management.html#cycle-highlights knelson [at] openstack.org/diablo_rojo on IRC is available if you need help selecting or writing your highlights Upcoming Deadlines & Dates -------------------------- Extra-ATC freeze: August 18th, 2022 (R-7 week) Non-client library freeze: August 25th, 2022 (R-6 week) Client library freeze: September 1st, 2022 (R-5 week) Zed-3 milestone: September 1st, 2022 (R-5 week) Next PTG: October 17-21, 2022 (virtual PTG!!!) El?d Ill?s irc: elodilles From jimmy at openinfra.dev Fri Aug 12 14:28:59 2022 From: jimmy at openinfra.dev (Jimmy McArthur) Date: Fri, 12 Aug 2022 09:28:59 -0500 Subject: OpenStack User Survey Message-ID: Hi Everyone - Just a quick reminder to please take the OpenStack User Survey [1]? This vitally important survey provides direct, anonymized feedback to our Project Teams to help them build better software. If you run a deployment, no matter the size, please take 30 min out of your day to take this survey! Thank you, Jimmy [1] https://openstack.org/user-survey/ From the.wade.albright at gmail.com Fri Aug 12 15:13:49 2022 From: the.wade.albright at gmail.com (Wade Albright) Date: Fri, 12 Aug 2022 08:13:49 -0700 Subject: [ironic][xena] problems updating redfish_password for existing node In-Reply-To: References: Message-ID: So I seem to have run into a new issue after upgrading to the newer versions to fix the password change issue. Now I am randomly getting errors like the below. Once I hit this error for a given node, no operations work on the node. I thought maybe it was an issue with the node itself, but it doesn't seem like it. The BMC seems to be working fine. After a conductor restart, things start working again. Has anyone seen something like this? Log example: 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils [req-b6dd74da-1cc7-4c63-b58e-b7ded37007e9 - - - - -] Node ef5a2502-680b-4933-a0ee-6737e57ce1c5 failed deploy step {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions. ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback (most recent call last): 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 697, in _update_chunk_length 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils self.chunk_left = int(line, 16) 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils ValueError: invalid literal for int() with base 16: b'' 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During handling of the above exception, another exception occurred: 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback (most recent call last): 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 438, in _error_catcher 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils yield 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 764, in read_chunked 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils self._update_chunk_length() 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 701, in _update_chunk_length 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise InvalidChunkLength(self, line) 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read) 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During handling of the above exception, another exception occurred: 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback (most recent call last): 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 760, in generate 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for chunk in self.raw.stream(chunk_size, decode_content=True): 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 572, in stream 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for line in self.read_chunked(amt, decode_content=decode_content): 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 793, in read_chunked 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils self._original_response.close() 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__ 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils self.gen.throw(type, value, traceback) 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 455, in _error_catcher 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise ProtocolError("Connection broken: %r" % e, e) 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes r ead)) On Wed, Jul 20, 2022 at 2:04 PM Wade Albright wrote: > I forgot to mention, that using session auth solved the problem after > upgrading to the newer versions that include the two mentioned patches. > > On Wed, Jul 20, 2022 at 7:36 AM Wade Albright > wrote: > >> Switching to session auth solved the problem, and it seems like the >> better way to go anyway for equipment that supports it. Thanks again for >> all your help! >> >> Wade >> >> On Tue, Jul 19, 2022 at 5:37 PM Julia Kreger >> wrote: >> >>> Just to provide a brief update for the mailing list. It looks like >>> this is a case of use of Basic Auth with the BMC, where we were not >>> catching the error properly... and thus not reporting the >>> authentication failure to ironic so it would catch, and initiate a new >>> client with the most up to date password. The default, typically used >>> path is Session based authentication as BMCs generally handle internal >>> session/user login tracking in a far better fashion. But not every BMC >>> supports sessions. >>> >>> Fix in review[0] :) >>> >>> -Julia >>> [0] https://review.opendev.org/c/openstack/sushy/+/850425 >>> >>> On Mon, Jul 18, 2022 at 4:15 PM Julia Kreger >>> wrote: >>> > >>> > Excellent, hopefully I'll be able to figure out why Sushy is not doing >>> > the needful... Or if it is and Ironic is not picking up on it. >>> > >>> > Anyway, I've posted >>> > https://review.opendev.org/c/openstack/ironic/+/850259 which might >>> > handle this issue. Obviously a work in progress, but it represents >>> > what I think is happening inside of ironic itself leading into sushy >>> > when cache access occurs. >>> > >>> > On Mon, Jul 18, 2022 at 4:04 PM Wade Albright >>> > wrote: >>> > > >>> > > Sounds good, I will do that tomorrow. Thanks Julia. >>> > > >>> > > On Mon, Jul 18, 2022 at 3:27 PM Julia Kreger < >>> juliaashleykreger at gmail.com> wrote: >>> > >> >>> > >> Debug would be best. I think I have an idea what is going on, and >>> this >>> > >> is a similar variation. If you want, you can email them directly to >>> > >> me. Specifically only need entries reported by the sushy library and >>> > >> ironic.drivers.modules.redfish.utils. >>> > >> >>> > >> On Mon, Jul 18, 2022 at 3:20 PM Wade Albright >>> > >> wrote: >>> > >> > >>> > >> > I'm happy to supply some logs, what verbosity level should i use? >>> And should I just embed the logs in email to the list or upload somewhere? >>> > >> > >>> > >> > On Mon, Jul 18, 2022 at 3:14 PM Julia Kreger < >>> juliaashleykreger at gmail.com> wrote: >>> > >> >> >>> > >> >> If you could supply some conductor logs, that would be helpful. >>> It >>> > >> >> should be re-authenticating, but obviously we have a larger bug >>> there >>> > >> >> we need to find the root issue behind. >>> > >> >> >>> > >> >> On Mon, Jul 18, 2022 at 3:06 PM Wade Albright >>> > >> >> wrote: >>> > >> >> > >>> > >> >> > I was able to use the patches to update the code, but >>> unfortunately the problem is still there for me. >>> > >> >> > >>> > >> >> > I also tried an RPM upgrade to the versions Julia mentioned >>> had the fixes, namely Sushy 3.12.1 - Released May 2022 and Ironic 18.2.1 - >>> Released in January 2022. But it did not fix the problem. >>> > >> >> > >>> > >> >> > I am able to consistently reproduce the error. >>> > >> >> > - step 1: change BMC password directly on the node itself >>> > >> >> > - step 2: update BMC password (redfish_password) in ironic >>> with 'openstack baremetal node set --driver-info >>> redfish_password='newpass' >>> > >> >> > >>> > >> >> > After step 1 there are errors in the logs entries like >>> "Session authentication appears to have been lost at some point in time" >>> and eventually it puts the node into maintenance mode and marks the power >>> state as "none." >>> > >> >> > After step 2 and taking the host back out of maintenance mode, >>> it goes through a similar set of log entries puts the node into MM again. >>> > >> >> > >>> > >> >> > After the above steps, a conductor restart fixes the problem >>> and operations work normally again. Given this it seems like there is still >>> some kind of caching issue. >>> > >> >> > >>> > >> >> > On Sat, Jul 16, 2022 at 6:01 PM Wade Albright < >>> the.wade.albright at gmail.com> wrote: >>> > >> >> >> >>> > >> >> >> Hi Julia, >>> > >> >> >> >>> > >> >> >> Thank you so much for the reply! Hopefully this is the issue. >>> I'll try out the patches next week and report back. I'll also email you on >>> Monday about the versions, that would be very helpful to know. >>> > >> >> >> >>> > >> >> >> Thanks again, really appreciate it. >>> > >> >> >> >>> > >> >> >> Wade >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> On Sat, Jul 16, 2022 at 4:36 PM Julia Kreger < >>> juliaashleykreger at gmail.com> wrote: >>> > >> >> >>> >>> > >> >> >>> Greetings! >>> > >> >> >>> >>> > >> >> >>> I believe you need two patches, one in ironic and one in >>> sushy. >>> > >> >> >>> >>> > >> >> >>> Sushy: >>> > >> >> >>> https://review.opendev.org/c/openstack/sushy/+/832860 >>> > >> >> >>> >>> > >> >> >>> Ironic: >>> > >> >> >>> https://review.opendev.org/c/openstack/ironic/+/820588 >>> > >> >> >>> >>> > >> >> >>> I think it is variation, and the comment about working after >>> you restart the conductor is the big signal to me. I?m on a phone on a bad >>> data connection, if you email me on Monday I can see what versions the >>> fixes would be in. >>> > >> >> >>> >>> > >> >> >>> For the record, it is a session cache issue, the bug was >>> that the service didn?t quite know what to do when auth fails. >>> > >> >> >>> >>> > >> >> >>> -Julia >>> > >> >> >>> >>> > >> >> >>> >>> > >> >> >>> On Fri, Jul 15, 2022 at 2:55 PM Wade Albright < >>> the.wade.albright at gmail.com> wrote: >>> > >> >> >>>> >>> > >> >> >>>> Hi, >>> > >> >> >>>> >>> > >> >> >>>> I'm hitting a problem when trying to update the >>> redfish_password for an existing node. I'm curious to know if anyone else >>> has encountered this problem. I'm not sure if I'm just doing something >>> wrong or if there is a bug. Or if the problem is unique to my setup. >>> > >> >> >>>> >>> > >> >> >>>> I have a node already added into ironic with all the driver >>> details set, and things are working fine. I am able to run deployments. >>> > >> >> >>>> >>> > >> >> >>>> Now I need to change the redfish password on the host. So I >>> update the password for redfish access on the host, then use an 'openstack >>> baremetal node set --driver-info redfish_password=' command >>> to set the new redfish_password. >>> > >> >> >>>> >>> > >> >> >>>> Once this has been done, deployment no longer works. I see >>> redfish authentication errors in the logs and the operation fails. I waited >>> a bit to see if there might just be a delay in updating the password, but >>> after awhile it still didn't work. >>> > >> >> >>>> >>> > >> >> >>>> I restarted the conductor, and after that things work fine >>> again. So it seems like the password is cached or something. Is there a way >>> to force the password to update? I even tried removing the redfish >>> credentials and re-adding them, but that didn't work either. Only a >>> conductor restart seems to make the new password work. >>> > >> >> >>>> >>> > >> >> >>>> We are running Xena, using rpm installation on Oracle Linux >>> 8.5. >>> > >> >> >>>> >>> > >> >> >>>> Thanks in advance for any help with this issue. >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Aug 12 18:00:55 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 12 Aug 2022 18:00:55 +0000 Subject: [tc] November 2022 OpenInfra Board Sync Message-ID: <20220812180054.va6w642kpuzqsr4b@yuggoth.org> The Open Infrastructure Foundation Board of Directors is endeavoring to engage in regular check-ins with official OpenInfra projects. The goal is for a loosely structured discussion one-hour in length, involving members of the board and the OpenStack TC, along with other interested community members. This is not intended to be a formal presentation, and no materials need to be prepared in advance. I've started an Etherpad where participants can brainstorm potential topics of conversation, time-permitting: https://etherpad.opendev.org/p/2022-11-board-openstack-sync At the conclusion of the August 10 discussion, we agreed to tentatively schedule the next call for 20:00 UTC on Wednesday, November 16, so I've attached a calendar file which can serve as a convenient schedule hold for this, in case anyone needs it. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: 2022-11-board-openstack-sync.ics Type: text/calendar Size: 620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From the.wade.albright at gmail.com Fri Aug 12 21:14:11 2022 From: the.wade.albright at gmail.com (Wade Albright) Date: Fri, 12 Aug 2022 14:14:11 -0700 Subject: [ironic][xena] problems updating redfish_password for existing node In-Reply-To: References: Message-ID: I'm not sure why this problem only now started showing up, but it appears to be unrelated to Ironic. I was able to reproduce it directly outside of Ironic using a simple python program using urllib to get URLs from the BMC/redfish interface. Seems to be some combination of a buggy server SSL implementation and newer openssl 1.1.1. Apparently it doesn't happen using openssl 1.0. I've found some information about possible workarounds but haven't figured it out yet. If I do I'll update this thread just in case anyone else runs into it. On Fri, Aug 12, 2022 at 8:13 AM Wade Albright wrote: > So I seem to have run into a new issue after upgrading to the newer > versions to fix the password change issue. > > Now I am randomly getting errors like the below. Once I hit this error for > a given node, no operations work on the node. I thought maybe it was an > issue with the node itself, but it doesn't seem like it. The BMC seems to > be working fine. > > After a conductor restart, things start working again. Has anyone seen > something like this? > > Log example: > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > [req-b6dd74da-1cc7-4c63-b58e-b7ded37007e9 - - - - -] Node > ef5a2502-680b-4933-a0ee-6737e57ce1c5 failed deploy step {'step': > 'write_image', 'priority': > 80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: > ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", > InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions. > ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length > b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback > (most recent call last): > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 697, in > _update_chunk_length > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > self.chunk_left = int(line, 16) > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils ValueError: > invalid literal for int() with base 16: b'' > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During > handling of the above exception, another exception occurred: > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback > (most recent call last): > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 438, in > _error_catcher > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils yield > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 764, in > read_chunked > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > self._update_chunk_length() > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 701, in > _update_chunk_length > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise > InvalidChunkLength(self, line) > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 > bytes read) > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During > handling of the above exception, another exception occurred: > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback > (most recent call last): > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/requests/models.py", line 760, in > generate > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for chunk > in self.raw.stream(chunk_size, decode_content=True): > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 572, in > stream > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for line > in self.read_chunked(amt, decode_content=decode_content): > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 793, in > read_chunked > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > self._original_response.close() > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__ > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > self.gen.throw(type, value, traceback) > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 455, in > _error_catcher > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise > ProtocolError("Connection broken: %r" % e, e) > 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > urllib3.exceptions.ProtocolError: ("Connection broken: > InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got > length b'', 0 bytes r > ead)) > > On Wed, Jul 20, 2022 at 2:04 PM Wade Albright > wrote: > >> I forgot to mention, that using session auth solved the problem after >> upgrading to the newer versions that include the two mentioned patches. >> >> On Wed, Jul 20, 2022 at 7:36 AM Wade Albright < >> the.wade.albright at gmail.com> wrote: >> >>> Switching to session auth solved the problem, and it seems like the >>> better way to go anyway for equipment that supports it. Thanks again for >>> all your help! >>> >>> Wade >>> >>> On Tue, Jul 19, 2022 at 5:37 PM Julia Kreger < >>> juliaashleykreger at gmail.com> wrote: >>> >>>> Just to provide a brief update for the mailing list. It looks like >>>> this is a case of use of Basic Auth with the BMC, where we were not >>>> catching the error properly... and thus not reporting the >>>> authentication failure to ironic so it would catch, and initiate a new >>>> client with the most up to date password. The default, typically used >>>> path is Session based authentication as BMCs generally handle internal >>>> session/user login tracking in a far better fashion. But not every BMC >>>> supports sessions. >>>> >>>> Fix in review[0] :) >>>> >>>> -Julia >>>> [0] https://review.opendev.org/c/openstack/sushy/+/850425 >>>> >>>> On Mon, Jul 18, 2022 at 4:15 PM Julia Kreger >>>> wrote: >>>> > >>>> > Excellent, hopefully I'll be able to figure out why Sushy is not doing >>>> > the needful... Or if it is and Ironic is not picking up on it. >>>> > >>>> > Anyway, I've posted >>>> > https://review.opendev.org/c/openstack/ironic/+/850259 which might >>>> > handle this issue. Obviously a work in progress, but it represents >>>> > what I think is happening inside of ironic itself leading into sushy >>>> > when cache access occurs. >>>> > >>>> > On Mon, Jul 18, 2022 at 4:04 PM Wade Albright >>>> > wrote: >>>> > > >>>> > > Sounds good, I will do that tomorrow. Thanks Julia. >>>> > > >>>> > > On Mon, Jul 18, 2022 at 3:27 PM Julia Kreger < >>>> juliaashleykreger at gmail.com> wrote: >>>> > >> >>>> > >> Debug would be best. I think I have an idea what is going on, and >>>> this >>>> > >> is a similar variation. If you want, you can email them directly to >>>> > >> me. Specifically only need entries reported by the sushy library >>>> and >>>> > >> ironic.drivers.modules.redfish.utils. >>>> > >> >>>> > >> On Mon, Jul 18, 2022 at 3:20 PM Wade Albright >>>> > >> wrote: >>>> > >> > >>>> > >> > I'm happy to supply some logs, what verbosity level should i >>>> use? And should I just embed the logs in email to the list or upload >>>> somewhere? >>>> > >> > >>>> > >> > On Mon, Jul 18, 2022 at 3:14 PM Julia Kreger < >>>> juliaashleykreger at gmail.com> wrote: >>>> > >> >> >>>> > >> >> If you could supply some conductor logs, that would be helpful. >>>> It >>>> > >> >> should be re-authenticating, but obviously we have a larger bug >>>> there >>>> > >> >> we need to find the root issue behind. >>>> > >> >> >>>> > >> >> On Mon, Jul 18, 2022 at 3:06 PM Wade Albright >>>> > >> >> wrote: >>>> > >> >> > >>>> > >> >> > I was able to use the patches to update the code, but >>>> unfortunately the problem is still there for me. >>>> > >> >> > >>>> > >> >> > I also tried an RPM upgrade to the versions Julia mentioned >>>> had the fixes, namely Sushy 3.12.1 - Released May 2022 and Ironic 18.2.1 - >>>> Released in January 2022. But it did not fix the problem. >>>> > >> >> > >>>> > >> >> > I am able to consistently reproduce the error. >>>> > >> >> > - step 1: change BMC password directly on the node itself >>>> > >> >> > - step 2: update BMC password (redfish_password) in ironic >>>> with 'openstack baremetal node set --driver-info >>>> redfish_password='newpass' >>>> > >> >> > >>>> > >> >> > After step 1 there are errors in the logs entries like >>>> "Session authentication appears to have been lost at some point in time" >>>> and eventually it puts the node into maintenance mode and marks the power >>>> state as "none." >>>> > >> >> > After step 2 and taking the host back out of maintenance >>>> mode, it goes through a similar set of log entries puts the node into MM >>>> again. >>>> > >> >> > >>>> > >> >> > After the above steps, a conductor restart fixes the problem >>>> and operations work normally again. Given this it seems like there is still >>>> some kind of caching issue. >>>> > >> >> > >>>> > >> >> > On Sat, Jul 16, 2022 at 6:01 PM Wade Albright < >>>> the.wade.albright at gmail.com> wrote: >>>> > >> >> >> >>>> > >> >> >> Hi Julia, >>>> > >> >> >> >>>> > >> >> >> Thank you so much for the reply! Hopefully this is the >>>> issue. I'll try out the patches next week and report back. I'll also email >>>> you on Monday about the versions, that would be very helpful to know. >>>> > >> >> >> >>>> > >> >> >> Thanks again, really appreciate it. >>>> > >> >> >> >>>> > >> >> >> Wade >>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> On Sat, Jul 16, 2022 at 4:36 PM Julia Kreger < >>>> juliaashleykreger at gmail.com> wrote: >>>> > >> >> >>> >>>> > >> >> >>> Greetings! >>>> > >> >> >>> >>>> > >> >> >>> I believe you need two patches, one in ironic and one in >>>> sushy. >>>> > >> >> >>> >>>> > >> >> >>> Sushy: >>>> > >> >> >>> https://review.opendev.org/c/openstack/sushy/+/832860 >>>> > >> >> >>> >>>> > >> >> >>> Ironic: >>>> > >> >> >>> https://review.opendev.org/c/openstack/ironic/+/820588 >>>> > >> >> >>> >>>> > >> >> >>> I think it is variation, and the comment about working >>>> after you restart the conductor is the big signal to me. I?m on a phone on >>>> a bad data connection, if you email me on Monday I can see what versions >>>> the fixes would be in. >>>> > >> >> >>> >>>> > >> >> >>> For the record, it is a session cache issue, the bug was >>>> that the service didn?t quite know what to do when auth fails. >>>> > >> >> >>> >>>> > >> >> >>> -Julia >>>> > >> >> >>> >>>> > >> >> >>> >>>> > >> >> >>> On Fri, Jul 15, 2022 at 2:55 PM Wade Albright < >>>> the.wade.albright at gmail.com> wrote: >>>> > >> >> >>>> >>>> > >> >> >>>> Hi, >>>> > >> >> >>>> >>>> > >> >> >>>> I'm hitting a problem when trying to update the >>>> redfish_password for an existing node. I'm curious to know if anyone else >>>> has encountered this problem. I'm not sure if I'm just doing something >>>> wrong or if there is a bug. Or if the problem is unique to my setup. >>>> > >> >> >>>> >>>> > >> >> >>>> I have a node already added into ironic with all the >>>> driver details set, and things are working fine. I am able to run >>>> deployments. >>>> > >> >> >>>> >>>> > >> >> >>>> Now I need to change the redfish password on the host. So >>>> I update the password for redfish access on the host, then use an >>>> 'openstack baremetal node set --driver-info >>>> redfish_password=' command to set the new redfish_password. >>>> > >> >> >>>> >>>> > >> >> >>>> Once this has been done, deployment no longer works. I see >>>> redfish authentication errors in the logs and the operation fails. I waited >>>> a bit to see if there might just be a delay in updating the password, but >>>> after awhile it still didn't work. >>>> > >> >> >>>> >>>> > >> >> >>>> I restarted the conductor, and after that things work fine >>>> again. So it seems like the password is cached or something. Is there a way >>>> to force the password to update? I even tried removing the redfish >>>> credentials and re-adding them, but that didn't work either. Only a >>>> conductor restart seems to make the new password work. >>>> > >> >> >>>> >>>> > >> >> >>>> We are running Xena, using rpm installation on Oracle >>>> Linux 8.5. >>>> > >> >> >>>> >>>> > >> >> >>>> Thanks in advance for any help with this issue. >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From fv at spots.edu Fri Aug 12 21:35:32 2022 From: fv at spots.edu (Father Vlasie) Date: Fri, 12 Aug 2022 14:35:32 -0700 Subject: [Ansible] [Yoga] Not recognising Rocky Linux as supported Message-ID: Hello everyone! I am trying to deploy Openstack Ansible with Rocky Linux 8.6 on my target hosts. The documentation here (https://docs.openstack.org/project-deploy-guide/openstack-ansible/latest/targethosts.html) says " ? Centos 8 Stream 64-bit * Derivitives: Rocky Linux? under the list of supported operation systems. Ansible does not seem to check for the existence of RHEL 8 systems. :( But when I run ?openstack-ansible setup-hosts.yml? I get the error: TASK [Check for a supported Operating System] ***************************************************************************************** fatal: [infra1]: FAILED! => { "assertion": "(ansible_facts['distribution'] == 'Debian' and ansible_facts['distribution_release'] == 'bullseye') or (ansible_facts['distribution'] == 'Ubuntu' and ansible_facts['distribution_release'] == 'focal') or (ansible_facts['distribution'] == 'Ubuntu' and ansible_facts['distribution_release'] == 'jammy') or (ansible_facts['os_family'] == 'RedHat' and ansible_facts['distribution_major_version'] == '9')", "changed": false, "evaluated_to": false, "msg": "The only supported platforms for this release are Debian 11 (Bullseye), Ubuntu 20.04 LTS (Focal), Ubuntu 22.04 (Jammy) and CentOS 9 Stream.\n? What should I do? Thank you! From cboylan at sapwetik.org Fri Aug 12 21:54:22 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 12 Aug 2022 14:54:22 -0700 Subject: [Ansible] [Yoga] Not recognising Rocky Linux as supported In-Reply-To: References: Message-ID: <20b428f6-9f49-4e58-a380-b1c49dc6072e@www.fastmail.com> On Fri, Aug 12, 2022, at 2:35 PM, Father Vlasie wrote: > Hello everyone! > > I am trying to deploy Openstack Ansible with Rocky Linux 8.6 on my > target hosts. The documentation here > (https://docs.openstack.org/project-deploy-guide/openstack-ansible/latest/targethosts.html) > says " ? Centos 8 Stream 64-bit * Derivitives: Rocky Linux? under the > list of supported operation systems. > > Ansible does not seem to check for the existence of RHEL 8 systems. :( > > But when I run ?openstack-ansible setup-hosts.yml? I get the error: > > TASK [Check for a supported Operating System] > ***************************************************************************************** > fatal: [infra1]: FAILED! => { > "assertion": "(ansible_facts['distribution'] == 'Debian' and > ansible_facts['distribution_release'] == 'bullseye') or > (ansible_facts['distribution'] == 'Ubuntu' and > ansible_facts['distribution_release'] == 'focal') or > (ansible_facts['distribution'] == 'Ubuntu' and > ansible_facts['distribution_release'] == 'jammy') or > (ansible_facts['os_family'] == 'RedHat' and > ansible_facts['distribution_major_version'] == '9')", > "changed": false, > "evaluated_to": false, > "msg": "The only supported platforms for this release are Debian 11 > (Bullseye), Ubuntu 20.04 LTS (Focal), Ubuntu 22.04 (Jammy) and CentOS 9 > Stream.\n? Comparing https://opendev.org/openstack/openstack-ansible/src/branch/stable/yoga/playbooks/openstack-hosts-setup.yml#L56-L60 to https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/openstack-hosts-setup.yml#L56-L59 it appears that OpenStack Ansible has removed CentOS Stream/RHEL/Rocky 8 support on master, but the assertion allows it under Yoga. Based on the failure above you must be running master OSA? I would try running the Yoga branch instead. The master branch documentation likely needs to be updated as well. > > What should I do? > > Thank you! From arnaud.morin at gmail.com Fri Aug 12 22:03:07 2022 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Fri, 12 Aug 2022 22:03:07 +0000 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: References: <21589662.aDxSllVl8Y@p1> <1826f3a70ed.11b631ede304077.7225094803174603413@ghanshyammann.com> Message-ID: Hey all, For the record, we, at OVH, were also affected by this when deploying a new version of oslo.messaging. Now that we are aware of this, it's ok to configure this correctly in our deployment, but we would appreciate having the correct value by default in stable release (so +1 for backport). Cheers, On 09.08.22 - 10:57, Tony Breeds wrote: > Hi All, > As others have said this isn't something we do without > consideration. My feel from this thread is that the risks are > somewhat low and understood. I think we've had a discussion and it's > okay. > > I think it's okay to do this backport. We should obviously include a > release note that calls this out and hopefully there is room in the > semver to make this a minor update (as opposed to a patch) to also > "lag that this *may* not be > > Tony. > From the.wade.albright at gmail.com Fri Aug 12 22:10:50 2022 From: the.wade.albright at gmail.com (Wade Albright) Date: Fri, 12 Aug 2022 15:10:50 -0700 Subject: [ironic][xena] problems updating redfish_password for existing node In-Reply-To: References: Message-ID: Sorry for the spam. The openssl issue may have been a red herring. I am not able to reproduce the issue directly with my own python code. I was trying to fetch something that required authentication. After I added the correct auth info it works fine. I am not able to cause the same error as is happening in the Ironic logs. Anyway I'll do some more testing and report back. On Fri, Aug 12, 2022 at 2:14 PM Wade Albright wrote: > I'm not sure why this problem only now started showing up, but it appears > to be unrelated to Ironic. I was able to reproduce it directly outside of > Ironic using a simple python program using urllib to get URLs from the > BMC/redfish interface. Seems to be some combination of a buggy server SSL > implementation and newer openssl 1.1.1. Apparently it doesn't happen using > openssl 1.0. > > I've found some information about possible workarounds but haven't figured > it out yet. If I do I'll update this thread just in case anyone else runs > into it. > > On Fri, Aug 12, 2022 at 8:13 AM Wade Albright > wrote: > >> So I seem to have run into a new issue after upgrading to the newer >> versions to fix the password change issue. >> >> Now I am randomly getting errors like the below. Once I hit this error >> for a given node, no operations work on the node. I thought maybe it was an >> issue with the node itself, but it doesn't seem like it. The BMC seems to >> be working fine. >> >> After a conductor restart, things start working again. Has anyone seen >> something like this? >> >> Log example: >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> [req-b6dd74da-1cc7-4c63-b58e-b7ded37007e9 - - - - -] Node >> ef5a2502-680b-4933-a0ee-6737e57ce1c5 failed deploy step {'step': >> 'write_image', 'priority': >> 80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: >> ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", >> InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions. >> ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length >> b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback >> (most recent call last): >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 697, in >> _update_chunk_length >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> self.chunk_left = int(line, 16) >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils ValueError: >> invalid literal for int() with base 16: b'' >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During >> handling of the above exception, another exception occurred: >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback >> (most recent call last): >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 438, in >> _error_catcher >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils yield >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 764, in >> read_chunked >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> self._update_chunk_length() >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 701, in >> _update_chunk_length >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise >> InvalidChunkLength(self, line) >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 >> bytes read) >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During >> handling of the above exception, another exception occurred: >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback >> (most recent call last): >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/requests/models.py", line 760, in >> generate >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for >> chunk in self.raw.stream(chunk_size, decode_content=True): >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 572, in >> stream >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for line >> in self.read_chunked(amt, decode_content=decode_content): >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 793, in >> read_chunked >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> self._original_response.close() >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__ >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> self.gen.throw(type, value, traceback) >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 455, in >> _error_catcher >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise >> ProtocolError("Connection broken: %r" % e, e) >> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> urllib3.exceptions.ProtocolError: ("Connection broken: >> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got >> length b'', 0 bytes r >> ead)) >> >> On Wed, Jul 20, 2022 at 2:04 PM Wade Albright < >> the.wade.albright at gmail.com> wrote: >> >>> I forgot to mention, that using session auth solved the problem after >>> upgrading to the newer versions that include the two mentioned patches. >>> >>> On Wed, Jul 20, 2022 at 7:36 AM Wade Albright < >>> the.wade.albright at gmail.com> wrote: >>> >>>> Switching to session auth solved the problem, and it seems like the >>>> better way to go anyway for equipment that supports it. Thanks again for >>>> all your help! >>>> >>>> Wade >>>> >>>> On Tue, Jul 19, 2022 at 5:37 PM Julia Kreger < >>>> juliaashleykreger at gmail.com> wrote: >>>> >>>>> Just to provide a brief update for the mailing list. It looks like >>>>> this is a case of use of Basic Auth with the BMC, where we were not >>>>> catching the error properly... and thus not reporting the >>>>> authentication failure to ironic so it would catch, and initiate a new >>>>> client with the most up to date password. The default, typically used >>>>> path is Session based authentication as BMCs generally handle internal >>>>> session/user login tracking in a far better fashion. But not every BMC >>>>> supports sessions. >>>>> >>>>> Fix in review[0] :) >>>>> >>>>> -Julia >>>>> [0] https://review.opendev.org/c/openstack/sushy/+/850425 >>>>> >>>>> On Mon, Jul 18, 2022 at 4:15 PM Julia Kreger >>>>> wrote: >>>>> > >>>>> > Excellent, hopefully I'll be able to figure out why Sushy is not >>>>> doing >>>>> > the needful... Or if it is and Ironic is not picking up on it. >>>>> > >>>>> > Anyway, I've posted >>>>> > https://review.opendev.org/c/openstack/ironic/+/850259 which might >>>>> > handle this issue. Obviously a work in progress, but it represents >>>>> > what I think is happening inside of ironic itself leading into sushy >>>>> > when cache access occurs. >>>>> > >>>>> > On Mon, Jul 18, 2022 at 4:04 PM Wade Albright >>>>> > wrote: >>>>> > > >>>>> > > Sounds good, I will do that tomorrow. Thanks Julia. >>>>> > > >>>>> > > On Mon, Jul 18, 2022 at 3:27 PM Julia Kreger < >>>>> juliaashleykreger at gmail.com> wrote: >>>>> > >> >>>>> > >> Debug would be best. I think I have an idea what is going on, and >>>>> this >>>>> > >> is a similar variation. If you want, you can email them directly >>>>> to >>>>> > >> me. Specifically only need entries reported by the sushy library >>>>> and >>>>> > >> ironic.drivers.modules.redfish.utils. >>>>> > >> >>>>> > >> On Mon, Jul 18, 2022 at 3:20 PM Wade Albright >>>>> > >> wrote: >>>>> > >> > >>>>> > >> > I'm happy to supply some logs, what verbosity level should i >>>>> use? And should I just embed the logs in email to the list or upload >>>>> somewhere? >>>>> > >> > >>>>> > >> > On Mon, Jul 18, 2022 at 3:14 PM Julia Kreger < >>>>> juliaashleykreger at gmail.com> wrote: >>>>> > >> >> >>>>> > >> >> If you could supply some conductor logs, that would be >>>>> helpful. It >>>>> > >> >> should be re-authenticating, but obviously we have a larger >>>>> bug there >>>>> > >> >> we need to find the root issue behind. >>>>> > >> >> >>>>> > >> >> On Mon, Jul 18, 2022 at 3:06 PM Wade Albright >>>>> > >> >> wrote: >>>>> > >> >> > >>>>> > >> >> > I was able to use the patches to update the code, but >>>>> unfortunately the problem is still there for me. >>>>> > >> >> > >>>>> > >> >> > I also tried an RPM upgrade to the versions Julia mentioned >>>>> had the fixes, namely Sushy 3.12.1 - Released May 2022 and Ironic 18.2.1 - >>>>> Released in January 2022. But it did not fix the problem. >>>>> > >> >> > >>>>> > >> >> > I am able to consistently reproduce the error. >>>>> > >> >> > - step 1: change BMC password directly on the node itself >>>>> > >> >> > - step 2: update BMC password (redfish_password) in ironic >>>>> with 'openstack baremetal node set --driver-info >>>>> redfish_password='newpass' >>>>> > >> >> > >>>>> > >> >> > After step 1 there are errors in the logs entries like >>>>> "Session authentication appears to have been lost at some point in time" >>>>> and eventually it puts the node into maintenance mode and marks the power >>>>> state as "none." >>>>> > >> >> > After step 2 and taking the host back out of maintenance >>>>> mode, it goes through a similar set of log entries puts the node into MM >>>>> again. >>>>> > >> >> > >>>>> > >> >> > After the above steps, a conductor restart fixes the problem >>>>> and operations work normally again. Given this it seems like there is still >>>>> some kind of caching issue. >>>>> > >> >> > >>>>> > >> >> > On Sat, Jul 16, 2022 at 6:01 PM Wade Albright < >>>>> the.wade.albright at gmail.com> wrote: >>>>> > >> >> >> >>>>> > >> >> >> Hi Julia, >>>>> > >> >> >> >>>>> > >> >> >> Thank you so much for the reply! Hopefully this is the >>>>> issue. I'll try out the patches next week and report back. I'll also email >>>>> you on Monday about the versions, that would be very helpful to know. >>>>> > >> >> >> >>>>> > >> >> >> Thanks again, really appreciate it. >>>>> > >> >> >> >>>>> > >> >> >> Wade >>>>> > >> >> >> >>>>> > >> >> >> >>>>> > >> >> >> >>>>> > >> >> >> On Sat, Jul 16, 2022 at 4:36 PM Julia Kreger < >>>>> juliaashleykreger at gmail.com> wrote: >>>>> > >> >> >>> >>>>> > >> >> >>> Greetings! >>>>> > >> >> >>> >>>>> > >> >> >>> I believe you need two patches, one in ironic and one in >>>>> sushy. >>>>> > >> >> >>> >>>>> > >> >> >>> Sushy: >>>>> > >> >> >>> https://review.opendev.org/c/openstack/sushy/+/832860 >>>>> > >> >> >>> >>>>> > >> >> >>> Ironic: >>>>> > >> >> >>> https://review.opendev.org/c/openstack/ironic/+/820588 >>>>> > >> >> >>> >>>>> > >> >> >>> I think it is variation, and the comment about working >>>>> after you restart the conductor is the big signal to me. I?m on a phone on >>>>> a bad data connection, if you email me on Monday I can see what versions >>>>> the fixes would be in. >>>>> > >> >> >>> >>>>> > >> >> >>> For the record, it is a session cache issue, the bug was >>>>> that the service didn?t quite know what to do when auth fails. >>>>> > >> >> >>> >>>>> > >> >> >>> -Julia >>>>> > >> >> >>> >>>>> > >> >> >>> >>>>> > >> >> >>> On Fri, Jul 15, 2022 at 2:55 PM Wade Albright < >>>>> the.wade.albright at gmail.com> wrote: >>>>> > >> >> >>>> >>>>> > >> >> >>>> Hi, >>>>> > >> >> >>>> >>>>> > >> >> >>>> I'm hitting a problem when trying to update the >>>>> redfish_password for an existing node. I'm curious to know if anyone else >>>>> has encountered this problem. I'm not sure if I'm just doing something >>>>> wrong or if there is a bug. Or if the problem is unique to my setup. >>>>> > >> >> >>>> >>>>> > >> >> >>>> I have a node already added into ironic with all the >>>>> driver details set, and things are working fine. I am able to run >>>>> deployments. >>>>> > >> >> >>>> >>>>> > >> >> >>>> Now I need to change the redfish password on the host. So >>>>> I update the password for redfish access on the host, then use an >>>>> 'openstack baremetal node set --driver-info >>>>> redfish_password=' command to set the new redfish_password. >>>>> > >> >> >>>> >>>>> > >> >> >>>> Once this has been done, deployment no longer works. I >>>>> see redfish authentication errors in the logs and the operation fails. I >>>>> waited a bit to see if there might just be a delay in updating the >>>>> password, but after awhile it still didn't work. >>>>> > >> >> >>>> >>>>> > >> >> >>>> I restarted the conductor, and after that things work >>>>> fine again. So it seems like the password is cached or something. Is there >>>>> a way to force the password to update? I even tried removing the redfish >>>>> credentials and re-adding them, but that didn't work either. Only a >>>>> conductor restart seems to make the new password work. >>>>> > >> >> >>>> >>>>> > >> >> >>>> We are running Xena, using rpm installation on Oracle >>>>> Linux 8.5. >>>>> > >> >> >>>> >>>>> > >> >> >>>> Thanks in advance for any help with this issue. >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From fv at spots.edu Fri Aug 12 22:43:15 2022 From: fv at spots.edu (Father Vlasie) Date: Fri, 12 Aug 2022 15:43:15 -0700 Subject: [Ansible] [Yoga] Not recognising Rocky Linux as supported In-Reply-To: References: Message-ID: <56CF4D3E-D9CE-4A39-9468-218751689E2E@spots.edu> > On Aug 12, 2022, at 3:13 PM, openstack-discuss-request at lists.openstack.org wrote: > > Comparing https://opendev.org/openstack/openstack-ansible/src/branch/stable/yoga/playbooks/openstack-hosts-setup.yml#L56-L60 to https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/openstack-hosts-setup.yml#L56-L59 it appears that OpenStack Ansible has removed CentOS Stream/RHEL/Rocky 8 support on master, but the assertion allows it under Yoga. Based on the failure above you must be running master OSA? I would try running the Yoga branch instead. > > The master branch documentation likely needs to be updated as well. That worked perfectly, thank you! :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Sat Aug 13 00:11:48 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 13 Aug 2022 05:41:48 +0530 Subject: [all][tc] What's happening in Technical Committee: summary 2022 Aug 12: Reading: 5 min Message-ID: <182948aeb88.127419949981346.6003691795628138147@ghanshyammann.com> Hello Everyone, Here is this week's summary of the Technical Committee activities. 1. TC Meetings: ============ * We had this week's meeting on Aug 11. Most of the meeting discussions are summarized in this email. Meeting full logs are available @https://meetings.opendev.org/meetings/tc/2022/tc.2022-08-11-15.00.log.html * Next TC weekly meeting will be on Aug 18 Thursday at 15:00 UTC, feel free to add the topic on the agenda[1] by Aug 17. 2. What we completed this week: ========================= * None in this week. 3. Activities In progress: ================== TC Tracker for Zed cycle ------------------------------ * Zed tracker etherpad includes the TC working items[3], Two are completed and others items are in-progress. Open Reviews ----------------- * Two open reviews for ongoing activities[4]. OpenStack TC + Board member meeting ------------------------------------------------ As you might know, OpenStack TC and Board member had a informal meeting on Aug 10[5][6], I am summarizing the discussion here. The Openstack TC has presented the 'OpenStack Updates' to Board meeting in berlin. One of the key part of the updates were 'OpenStack community Challenges' and we could not finish the discussion or brainstorm on these in berlin meeting. We decided to continue the discussion on those (Slide#21)[7]. In this call, we can only cover the first challenge only 'Less Interaction with Operators & Users'. There is no doubt that we are facing this issue at noticeable level (RBAC is good example where we lacked the operator/user feedback). A lot of things including ML, events are discussed on why it is happening and how to improve it. Having ops meetup as a separate event is something many members raised concern about and it will be good to merge these events into other community events like Summit, PTG etc. To try the first step towards that we can invite or welcome operators in Oct virtual PTG and give them a feeling on merging the events with developers event can be beneficial to both the group. On that, Dan suggested the idea of scheduling the 'Operatos Hours' per project in PTG and there we will keep the discussion on operator-centric and the format they want. We continued the discussion on this in TC weekly meeting where we had mixed suggestions on the 'Operator Hours' schedule, either to ask each project to schedule it in their room and operator join or other way around. Before deciding any format, we will be asking operators about their preference. We will continue the discussion on 'OpenStack community Challenges' in the next call which is schedule on 20:00 UTC on Wednesday, November 16[8]. If anyone is interested, feel free to join it. 2023.1 cycle Technical Election (TC + PTL) planning ------------------------------------------------------------- Not much update on this but we will be seeing the election schedule proposed soon. 2023.1 cycle TC PTG planning ------------------------------------ We discussed about how to best encourage/invite operators into the PTG (I wrote in the above section) and based on the operators feedback we will proceed next. We will continue the discussion in the next meeting also. 2021 User Survey TC Question Analysis ----------------------------------------------- No update on this. The survey summary is up for review[9]. Feel free to check and provide feedback. Zed cycle Leaderless projects ---------------------------------- Dale Smith volunteer to be PTL for Adjutant project [10] Fixing Zuul config error ---------------------------- Requesting projects with zuul config error to look into those and fix them which should not take much time[11]12]. Project updates ------------------- * Retire openstack-helm-addons[13] 4. How to contact the TC: ==================== If you would like to discuss or give feedback to TC, you can reach out to us in multiple ways: 1. Email: you can send the email with tag [tc] on openstack-discuss ML[14]. 2. Weekly meeting: The Technical Committee conduct a weekly meeting every Thursday 15 UTC [15] 3. Ping us using 'tc-members' nickname on #openstack-tc IRC channel. [1] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting [2] https://review.opendev.org/c/openstack/governance/+/849997 [3] https://etherpad.opendev.org/p/tc-zed-tracker [4] https://review.opendev.org/q/projects:openstack/governance+status:open [5] https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029886.html [6] https://etherpad.opendev.org/p/2022-08-board-openstack-sync [7] https://docs.google.com/presentation/d/1yCiZy_9A6hURXD0BRdr33OlK7Ugch5EgSvFzL-jvDIg/edit#slide=id.g12fc62c62e2_0_114 [8] https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029968.html [9] https://review.opendev.org/c/openstack/governance/+/836888 [10] https://review.opendev.org/c/openstack/governance/+/849606 [11] https://etherpad.opendev.org/p/zuul-config-error-openstack [12] http://lists.openstack.org/pipermail/openstack-discuss/2022-May/028603.html [13] https://review.opendev.org/c/openstack/governance/+/849997 [14] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss [15] http://eavesdrop.openstack.org/#Technical_Committee_Meeting -gmann From juliaashleykreger at gmail.com Sat Aug 13 04:05:52 2022 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Fri, 12 Aug 2022 21:05:52 -0700 Subject: [ironic][xena] problems updating redfish_password for existing node In-Reply-To: References: Message-ID: Two questions: 1) do you see open sockets to the BMCs in netstat output? 2) is your code using ?connection: close?? Or are you using sushy? Honestly, this seems *really* weird with current sushy versions, and is kind of reminiscent of a cached session which is using kept alive sockets. If you could grep out req-b6dd74da-1cc7-4c63-b58e-b7ded37007e9 to see what the prior couple of conductor actions were, that would give us better context as to what is going on. -Julia On Fri, Aug 12, 2022 at 3:11 PM Wade Albright wrote: > Sorry for the spam. The openssl issue may have been a red herring. I am > not able to reproduce the issue directly with my own python code. I was > trying to fetch something that required authentication. After I added the > correct auth info it works fine. I am not able to cause the same error as > is happening in the Ironic logs. > > Anyway I'll do some more testing and report back. > > On Fri, Aug 12, 2022 at 2:14 PM Wade Albright > wrote: > >> I'm not sure why this problem only now started showing up, but it appears >> to be unrelated to Ironic. I was able to reproduce it directly outside of >> Ironic using a simple python program using urllib to get URLs from the >> BMC/redfish interface. Seems to be some combination of a buggy server SSL >> implementation and newer openssl 1.1.1. Apparently it doesn't happen using >> openssl 1.0. >> >> I've found some information about possible workarounds but haven't >> figured it out yet. If I do I'll update this thread just in case anyone >> else runs into it. >> >> On Fri, Aug 12, 2022 at 8:13 AM Wade Albright < >> the.wade.albright at gmail.com> wrote: >> >>> So I seem to have run into a new issue after upgrading to the newer >>> versions to fix the password change issue. >>> >>> Now I am randomly getting errors like the below. Once I hit this error >>> for a given node, no operations work on the node. I thought maybe it was an >>> issue with the node itself, but it doesn't seem like it. The BMC seems to >>> be working fine. >>> >>> After a conductor restart, things start working again. Has anyone seen >>> something like this? >>> >>> Log example: >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils [- - - - -] >>> Node ef5a2502-680b-4933-a0ee-6737e57ce1c5 failed deploy step {'step': >>> 'write_image', 'priority': >>> 80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: >>> ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", >>> InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions. >>> ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length >>> b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback >>> (most recent call last): >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 697, in >>> _update_chunk_length >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>> self.chunk_left = int(line, 16) >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils ValueError: >>> invalid literal for int() with base 16: b'' >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During >>> handling of the above exception, another exception occurred: >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback >>> (most recent call last): >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 438, in >>> _error_catcher >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils yield >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 764, in >>> read_chunked >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>> self._update_chunk_length() >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 701, in >>> _update_chunk_length >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise >>> InvalidChunkLength(self, line) >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>> urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 >>> bytes read) >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During >>> handling of the above exception, another exception occurred: >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback >>> (most recent call last): >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>> "/usr/local/lib/python3.6/site-packages/requests/models.py", line 760, in >>> generate >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for >>> chunk in self.raw.stream(chunk_size, decode_content=True): >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 572, in >>> stream >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for >>> line in self.read_chunked(amt, decode_content=decode_content): >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 793, in >>> read_chunked >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>> self._original_response.close() >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>> "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__ >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>> self.gen.throw(type, value, traceback) >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 455, in >>> _error_catcher >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise >>> ProtocolError("Connection broken: %r" % e, e) >>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>> urllib3.exceptions.ProtocolError: ("Connection broken: >>> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got >>> length b'', 0 bytes r >>> ead)) >>> >>> On Wed, Jul 20, 2022 at 2:04 PM Wade Albright < >>> the.wade.albright at gmail.com> wrote: >>> >>>> I forgot to mention, that using session auth solved the problem after >>>> upgrading to the newer versions that include the two mentioned patches. >>>> >>>> On Wed, Jul 20, 2022 at 7:36 AM Wade Albright < >>>> the.wade.albright at gmail.com> wrote: >>>> >>>>> Switching to session auth solved the problem, and it seems like the >>>>> better way to go anyway for equipment that supports it. Thanks again for >>>>> all your help! >>>>> >>>>> Wade >>>>> >>>>> On Tue, Jul 19, 2022 at 5:37 PM Julia Kreger < >>>>> juliaashleykreger at gmail.com> wrote: >>>>> >>>>>> Just to provide a brief update for the mailing list. It looks like >>>>>> this is a case of use of Basic Auth with the BMC, where we were not >>>>>> catching the error properly... and thus not reporting the >>>>>> authentication failure to ironic so it would catch, and initiate a new >>>>>> client with the most up to date password. The default, typically used >>>>>> path is Session based authentication as BMCs generally handle internal >>>>>> session/user login tracking in a far better fashion. But not every BMC >>>>>> supports sessions. >>>>>> >>>>>> Fix in review[0] :) >>>>>> >>>>>> -Julia >>>>>> [0] https://review.opendev.org/c/openstack/sushy/+/850425 >>>>>> >>>>>> On Mon, Jul 18, 2022 at 4:15 PM Julia Kreger >>>>>> wrote: >>>>>> > >>>>>> > Excellent, hopefully I'll be able to figure out why Sushy is not >>>>>> doing >>>>>> > the needful... Or if it is and Ironic is not picking up on it. >>>>>> > >>>>>> > Anyway, I've posted >>>>>> > https://review.opendev.org/c/openstack/ironic/+/850259 which might >>>>>> > handle this issue. Obviously a work in progress, but it represents >>>>>> > what I think is happening inside of ironic itself leading into sushy >>>>>> > when cache access occurs. >>>>>> > >>>>>> > On Mon, Jul 18, 2022 at 4:04 PM Wade Albright >>>>>> > wrote: >>>>>> > > >>>>>> > > Sounds good, I will do that tomorrow. Thanks Julia. >>>>>> > > >>>>>> > > On Mon, Jul 18, 2022 at 3:27 PM Julia Kreger < >>>>>> juliaashleykreger at gmail.com> wrote: >>>>>> > >> >>>>>> > >> Debug would be best. I think I have an idea what is going on, >>>>>> and this >>>>>> > >> is a similar variation. If you want, you can email them directly >>>>>> to >>>>>> > >> me. Specifically only need entries reported by the sushy library >>>>>> and >>>>>> > >> ironic.drivers.modules.redfish.utils. >>>>>> > >> >>>>>> > >> On Mon, Jul 18, 2022 at 3:20 PM Wade Albright >>>>>> > >> wrote: >>>>>> > >> > >>>>>> > >> > I'm happy to supply some logs, what verbosity level should i >>>>>> use? And should I just embed the logs in email to the list or upload >>>>>> somewhere? >>>>>> > >> > >>>>>> > >> > On Mon, Jul 18, 2022 at 3:14 PM Julia Kreger < >>>>>> juliaashleykreger at gmail.com> wrote: >>>>>> > >> >> >>>>>> > >> >> If you could supply some conductor logs, that would be >>>>>> helpful. It >>>>>> > >> >> should be re-authenticating, but obviously we have a larger >>>>>> bug there >>>>>> > >> >> we need to find the root issue behind. >>>>>> > >> >> >>>>>> > >> >> On Mon, Jul 18, 2022 at 3:06 PM Wade Albright >>>>>> > >> >> wrote: >>>>>> > >> >> > >>>>>> > >> >> > I was able to use the patches to update the code, but >>>>>> unfortunately the problem is still there for me. >>>>>> > >> >> > >>>>>> > >> >> > I also tried an RPM upgrade to the versions Julia mentioned >>>>>> had the fixes, namely Sushy 3.12.1 - Released May 2022 and Ironic 18.2.1 - >>>>>> Released in January 2022. But it did not fix the problem. >>>>>> > >> >> > >>>>>> > >> >> > I am able to consistently reproduce the error. >>>>>> > >> >> > - step 1: change BMC password directly on the node itself >>>>>> > >> >> > - step 2: update BMC password (redfish_password) in ironic >>>>>> with 'openstack baremetal node set --driver-info >>>>>> redfish_password='newpass' >>>>>> > >> >> > >>>>>> > >> >> > After step 1 there are errors in the logs entries like >>>>>> "Session authentication appears to have been lost at some point in time" >>>>>> and eventually it puts the node into maintenance mode and marks the power >>>>>> state as "none." >>>>>> > >> >> > After step 2 and taking the host back out of maintenance >>>>>> mode, it goes through a similar set of log entries puts the node into MM >>>>>> again. >>>>>> > >> >> > >>>>>> > >> >> > After the above steps, a conductor restart fixes the >>>>>> problem and operations work normally again. Given this it seems like there >>>>>> is still some kind of caching issue. >>>>>> > >> >> > >>>>>> > >> >> > On Sat, Jul 16, 2022 at 6:01 PM Wade Albright < >>>>>> the.wade.albright at gmail.com> wrote: >>>>>> > >> >> >> >>>>>> > >> >> >> Hi Julia, >>>>>> > >> >> >> >>>>>> > >> >> >> Thank you so much for the reply! Hopefully this is the >>>>>> issue. I'll try out the patches next week and report back. I'll also email >>>>>> you on Monday about the versions, that would be very helpful to know. >>>>>> > >> >> >> >>>>>> > >> >> >> Thanks again, really appreciate it. >>>>>> > >> >> >> >>>>>> > >> >> >> Wade >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> On Sat, Jul 16, 2022 at 4:36 PM Julia Kreger < >>>>>> juliaashleykreger at gmail.com> wrote: >>>>>> > >> >> >>> >>>>>> > >> >> >>> Greetings! >>>>>> > >> >> >>> >>>>>> > >> >> >>> I believe you need two patches, one in ironic and one in >>>>>> sushy. >>>>>> > >> >> >>> >>>>>> > >> >> >>> Sushy: >>>>>> > >> >> >>> https://review.opendev.org/c/openstack/sushy/+/832860 >>>>>> > >> >> >>> >>>>>> > >> >> >>> Ironic: >>>>>> > >> >> >>> https://review.opendev.org/c/openstack/ironic/+/820588 >>>>>> > >> >> >>> >>>>>> > >> >> >>> I think it is variation, and the comment about working >>>>>> after you restart the conductor is the big signal to me. I?m on a phone on >>>>>> a bad data connection, if you email me on Monday I can see what versions >>>>>> the fixes would be in. >>>>>> > >> >> >>> >>>>>> > >> >> >>> For the record, it is a session cache issue, the bug was >>>>>> that the service didn?t quite know what to do when auth fails. >>>>>> > >> >> >>> >>>>>> > >> >> >>> -Julia >>>>>> > >> >> >>> >>>>>> > >> >> >>> >>>>>> > >> >> >>> On Fri, Jul 15, 2022 at 2:55 PM Wade Albright < >>>>>> the.wade.albright at gmail.com> wrote: >>>>>> > >> >> >>>> >>>>>> > >> >> >>>> Hi, >>>>>> > >> >> >>>> >>>>>> > >> >> >>>> I'm hitting a problem when trying to update the >>>>>> redfish_password for an existing node. I'm curious to know if anyone else >>>>>> has encountered this problem. I'm not sure if I'm just doing something >>>>>> wrong or if there is a bug. Or if the problem is unique to my setup. >>>>>> > >> >> >>>> >>>>>> > >> >> >>>> I have a node already added into ironic with all the >>>>>> driver details set, and things are working fine. I am able to run >>>>>> deployments. >>>>>> > >> >> >>>> >>>>>> > >> >> >>>> Now I need to change the redfish password on the host. >>>>>> So I update the password for redfish access on the host, then use an >>>>>> 'openstack baremetal node set --driver-info >>>>>> redfish_password=' command to set the new redfish_password. >>>>>> > >> >> >>>> >>>>>> > >> >> >>>> Once this has been done, deployment no longer works. I >>>>>> see redfish authentication errors in the logs and the operation fails. I >>>>>> waited a bit to see if there might just be a delay in updating the >>>>>> password, but after awhile it still didn't work. >>>>>> > >> >> >>>> >>>>>> > >> >> >>>> I restarted the conductor, and after that things work >>>>>> fine again. So it seems like the password is cached or something. Is there >>>>>> a way to force the password to update? I even tried removing the redfish >>>>>> credentials and re-adding them, but that didn't work either. Only a >>>>>> conductor restart seems to make the new password work. >>>>>> > >> >> >>>> >>>>>> > >> >> >>>> We are running Xena, using rpm installation on Oracle >>>>>> Linux 8.5. >>>>>> > >> >> >>>> >>>>>> > >> >> >>>> Thanks in advance for any help with this issue. >>>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Sat Aug 13 05:53:03 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Sat, 13 Aug 2022 07:53:03 +0200 Subject: [Ansible] [Yoga] Not recognising Rocky Linux as supported In-Reply-To: <20b428f6-9f49-4e58-a380-b1c49dc6072e@www.fastmail.com> References: <20b428f6-9f49-4e58-a380-b1c49dc6072e@www.fastmail.com> Message-ID: Yes, we have dropped Rocky 8 support as well as CentOS 8 Stream for master. Neil Hanlon was working on adding Rocky 9 support instead and we hopefully will backport it to Yoga as well. Documentation should be fixed indeed. ??, 13 ???. 2022 ?., 00:01 Clark Boylan : > On Fri, Aug 12, 2022, at 2:35 PM, Father Vlasie wrote: > > Hello everyone! > > > > I am trying to deploy Openstack Ansible with Rocky Linux 8.6 on my > > target hosts. The documentation here > > ( > https://docs.openstack.org/project-deploy-guide/openstack-ansible/latest/targethosts.html) > > > says " ? Centos 8 Stream 64-bit * Derivitives: Rocky Linux? under > the > > list of supported operation systems. > > > > Ansible does not seem to check for the existence of RHEL 8 systems. :( > > > > But when I run ?openstack-ansible setup-hosts.yml? I get the error: > > > > TASK [Check for a supported Operating System] > > > ***************************************************************************************** > > fatal: [infra1]: FAILED! => { > > "assertion": "(ansible_facts['distribution'] == 'Debian' and > > ansible_facts['distribution_release'] == 'bullseye') or > > (ansible_facts['distribution'] == 'Ubuntu' and > > ansible_facts['distribution_release'] == 'focal') or > > (ansible_facts['distribution'] == 'Ubuntu' and > > ansible_facts['distribution_release'] == 'jammy') or > > (ansible_facts['os_family'] == 'RedHat' and > > ansible_facts['distribution_major_version'] == '9')", > > "changed": false, > > "evaluated_to": false, > > "msg": "The only supported platforms for this release are Debian 11 > > (Bullseye), Ubuntu 20.04 LTS (Focal), Ubuntu 22.04 (Jammy) and CentOS 9 > > Stream.\n? > > Comparing > https://opendev.org/openstack/openstack-ansible/src/branch/stable/yoga/playbooks/openstack-hosts-setup.yml#L56-L60 > to > https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/openstack-hosts-setup.yml#L56-L59 > it appears that OpenStack Ansible has removed CentOS Stream/RHEL/Rocky 8 > support on master, but the assertion allows it under Yoga. Based on the > failure above you must be running master OSA? I would try running the Yoga > branch instead. > > The master branch documentation likely needs to be updated as well. > > > > > What should I do? > > > > Thank you! > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fv at spots.edu Mon Aug 15 02:27:04 2022 From: fv at spots.edu (Father Vlasie) Date: Sun, 14 Aug 2022 19:27:04 -0700 Subject: [Ansible] [Yoga] Set rsyslog not to containerise Message-ID: Hello everyone! I am trying to deploy Openstack Ansible with Rocky Linux 8.6. I have a rsyslog host that is somewhat underpowered (RAM and CPU) and I thought it might help to run rsyslog without containerisation. I have searched but I have not been able to find an example of this.My guess is that I need to create a file in /etc/openstack_deploy/env.d What syntax and commands do I need? I have searched but I could not find any examples... Thank you very much! Father Vlasie From janders at redhat.com Mon Aug 15 04:02:09 2022 From: janders at redhat.com (Jacob Anders) Date: Mon, 15 Aug 2022 14:02:09 +1000 Subject: [ironic] Proposing Jacob Anders to sushy-core In-Reply-To: References: Message-ID: Hi Julia and Iury, (apologies for delayed response, I've been on leave) Thank you for your nomination and the encouragement to be more active in the reviews space - I will try! Cheers, Jacob On Tue, Aug 2, 2022 at 4:39 AM Julia Kreger wrote: > Greetings! > > So, from the knowledge/contribution of code standpoint, I agree. > However, I don't see much in the way of reviews in stackalytics[0]. > Generally I would prefer to encourage this before granting core > privileges. It is not a requirement to be a core to engage in code > review. > > -Julia > > > [0]: https://www.stackalytics.io/?user_id=janders%40redhat.com > > On Mon, Aug 1, 2022 at 10:13 AM Iury Gregory > wrote: > > > > Hello ironic-cores and sushy-cores, > > > > I would like to propose Jacob Anders (janders irc) for sushy-core. > > He made great contributions to improve sushy to cover corner cases from > different HW in the last releases, you can find some of his contributions > in [1], please vote with +1/-1. > > > > [1] > https://review.opendev.org/q/owner:janders%2540redhat.com+project:openstack/sushy+status:merged > > > > -- > > Att[]'s > > Iury Gregory Melo Ferreira > > MSc in Computer Science at UFCG > > Ironic PTL > > Senior Software Engineer at Red Hat Brazil > > Social: https://www.linkedin.com/in/iurygregory > > E-mail: iurygregory at gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Mon Aug 15 09:35:07 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Mon, 15 Aug 2022 11:35:07 +0200 Subject: [Ansible] [Yoga] Set rsyslog not to containerise In-Reply-To: References: Message-ID: Hey, Well, first of all in Yoga we don't send any logs with rsyslog. This is deprecated since Train I guess, and we haven't managed to fully remove roles and reference in docs. All logs are stored in systemd-journald. Logs from containers are available on metal hosts. You still can forward journald to rsyslog, but maybe you want better central logging solution overall? As example, in ops repo we do have community-driven role for elk deployment https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/elk_metrics_7x Another way can be to use journald-remote, and we have playbook and role for that as well: https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/infra-journal-remote.yml But tool itself has several nasty bugs and not maintained well upstream. Regarding your original question, you can see example of how to deploy some service on metal in our docs: https://docs.openstack.org/openstack-ansible/latest/reference/inventory/configure-inventory.html#deploying-directly-on-hosts ??, 15 ???. 2022 ?., 04:34 Father Vlasie : > Hello everyone! > > I am trying to deploy Openstack Ansible with Rocky Linux 8.6. > > I have a rsyslog host that is somewhat underpowered (RAM and CPU) and I > thought it might help to run rsyslog without containerisation. > > I have searched but I have not been able to find an example of this.My > guess is that I need to create a file in /etc/openstack_deploy/env.d > > What syntax and commands do I need? > > I have searched but I could not find any examples... > > Thank you very much! > > Father Vlasie > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fsalali.server at gmail.com Mon Aug 15 06:33:31 2022 From: fsalali.server at gmail.com (Faezeh Salali) Date: Mon, 15 Aug 2022 11:03:31 +0430 Subject: docker.repo file Message-ID: Hi on Kolla ansible victoria version in bootstrap command this error is displayed: failed to fetch key at https://download.docker.com/linux/rocky/gpg , error was: HTTP Error 404: Not Found my compute os is rocky Linux 8 and it seems there is some problem with the docker repo on compute and /etc/yum.repos.d/docker.repo directory. Please send me the link of the baseurl and gpgkey. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuandatk25a at gmail.com Mon Aug 15 09:59:27 2022 From: tuandatk25a at gmail.com (Vu Tuan Dat) Date: Mon, 15 Aug 2022 16:59:27 +0700 Subject: [Octavia] Monitor through Listener Prometheus Protocol Message-ID: HI, I installed Prometheus on controller nodes. However, My prometheus instance cannot connect to Listener Prometheus endpoint because it's binded to VIP only in network namespace. Anyone can help? Thanks in advanced, Dat -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Mon Aug 15 15:07:37 2022 From: johnsomor at gmail.com (Michael Johnson) Date: Mon, 15 Aug 2022 08:07:37 -0700 Subject: [Octavia] Monitor through Listener Prometheus Protocol In-Reply-To: References: Message-ID: You are correct that the Prometheus endpoint on Octavia Amphora load balancers is on the VIP network. The Prometheus endpoint is implemented as a listener on the load balancer. You will need to have a route from your Prometheus instances to the VIP network of the load balancer. Michael On Mon, Aug 15, 2022 at 6:23 AM Vu Tuan Dat wrote: > > HI, > > I installed Prometheus on controller nodes. However, My prometheus instance cannot connect to Listener Prometheus endpoint because it's binded to VIP only in network namespace. > > Anyone can help? > > Thanks in advanced, > Dat From cboylan at sapwetik.org Mon Aug 15 19:01:41 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 15 Aug 2022 12:01:41 -0700 Subject: docker.repo file In-Reply-To: References: Message-ID: <2b808d03-759a-48dc-a03b-cbf3570d11fe@www.fastmail.com> On Sun, Aug 14, 2022, at 11:33 PM, Faezeh Salali wrote: > Hi > on Kolla ansible victoria version in bootstrap command this error is > displayed: > failed to fetch key at https://download.docker.com/linux/rocky/gpg , > error was: HTTP Error 404: Not Found > my compute os is rocky Linux 8 and it seems there is some problem with > the docker repo on compute and /etc/yum.repos.d/docker.repo directory. > Please send me the link of the baseurl and gpgkey. > Thank you. There is no Rocky Linux dir at https://download.docker.com/linux/. Instead I suspect that you will want to use either the CentOS or the RHEL packages from that location. https://opendev.org/openstack/kolla-ansible/src/branch/master/doc/source/reference/deployment-and-bootstrapping/bootstrap-servers.rst#bootstrap-servers-docker-package-repos also indicates that if you don't set enable_docker_repo then Kolla will use the distro provided packages. That would install docker using the packages shipped by Rocky. Clark From fv at spots.edu Mon Aug 15 20:05:54 2022 From: fv at spots.edu (Father Vlasie) Date: Mon, 15 Aug 2022 13:05:54 -0700 Subject: [Ansible] [Yoga] Infra LXC Cache Failure Message-ID: <40C5C9DF-3BD4-43A8-A334-34BC0AA03A65@spots.edu> Hello everyone! I am trying to deploy Openstack Ansible with Rocky Linux 8.6. I have 3 infra nodes all running the same software/hardware specifications but the 3rd one keeps giving me the following error. This occurs when running setup-hosts.yml ---------- TASK [lxc_hosts : Ensure that the LXC cache has been prepared] ******************************************************************************* FAILED - RETRYING: [infra3]: Ensure that the LXC cache has been prepared (120 retries left). FAILED - RETRYING: [infra3]: Ensure that the LXC cache has been prepared (119 retries left). fatal: [infra3]: FAILED! => {"ansible_job_id": "205392349782.93369", "attempts": 3, "changed": true, "cmd": "chroot /var/lib/machines/rocky-8-amd64 /opt/cache-prep-commands.sh > /var/log/lxc-cache-prep-commands.log 2>&1", "delta": "0:00:16.353232", "end": "2022-08-15 12:59:42.058169", "finished": 1, "msg": "non-zero return code", "rc": 1, "results_file": "/root/.ansible_async/205392349782.93369", "start": "2022-08-15 12:59:25.704937", "started": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} ---------- Thank you very much for any help! Father Vlasie From fv at spots.edu Mon Aug 15 21:03:53 2022 From: fv at spots.edu (Father Vlasie) Date: Mon, 15 Aug 2022 14:03:53 -0700 Subject: [Ansible] [Yoga] Infra LXC Cache Failure In-Reply-To: <40C5C9DF-3BD4-43A8-A334-34BC0AA03A65@spots.edu> References: <40C5C9DF-3BD4-43A8-A334-34BC0AA03A65@spots.edu> Message-ID: Oh no! I forgot to restart after disabling selinux? Problem solved! > On Aug 15, 2022, at 1:05 PM, Father Vlasie wrote: > > Hello everyone! > > I am trying to deploy Openstack Ansible with Rocky Linux 8.6. > > I have 3 infra nodes all running the same software/hardware specifications but the 3rd one keeps giving me the following error. > > This occurs when running setup-hosts.yml > > ---------- > > TASK [lxc_hosts : Ensure that the LXC cache has been prepared] ******************************************************************************* > FAILED - RETRYING: [infra3]: Ensure that the LXC cache has been prepared (120 retries left). > FAILED - RETRYING: [infra3]: Ensure that the LXC cache has been prepared (119 retries left). > fatal: [infra3]: FAILED! => {"ansible_job_id": "205392349782.93369", "attempts": 3, "changed": true, "cmd": "chroot /var/lib/machines/rocky-8-amd64 /opt/cache-prep-commands.sh > /var/log/lxc-cache-prep-commands.log 2>&1", "delta": "0:00:16.353232", "end": "2022-08-15 12:59:42.058169", "finished": 1, "msg": "non-zero return code", "rc": 1, "results_file": "/root/.ansible_async/205392349782.93369", "start": "2022-08-15 12:59:25.704937", "started": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} > > ---------- > > > Thank you very much for any help! > > Father Vlasie From bcafarel at redhat.com Mon Aug 15 21:49:00 2022 From: bcafarel at redhat.com (Bernard Cafarelli) Date: Mon, 15 Aug 2022 23:49:00 +0200 Subject: [neutron] Bug deputy report (week starting on Aug-8-2022) Message-ID: Hello neutrinos, it is time again to start a new bug deputy rotation: https://wiki.openstack.org/wiki/Network/Meetings#Bug_deputy and my turn for last week's bugs A relatively quiet week (it is the holiday season in quite a few parts of the world!) with fixes already merged for most bugs. Two bugs to highlight though (and in need of some neutron eyes): a manila job failing with IPv6 VMs (1940324) and a live-migration issue (1986003) Critical * neutron-tempest-plugin failing on networking-bagpipe tests - https://bugs.launchpad.net/neutron/+bug/1986534 networking-bagpipe also needed update after neutron-lib constants rehoming, fix merged: https://review.opendev.org/c/openstack/networking-bagpipe/+/852867 High * [CI][Devstack] Failing to SSH to test VMs with IPv6 - https://bugs.launchpad.net/neutron/+bug/1940324 LVM job for manila fails on IPv6 hosts with socket.timeout: timed out, if CI/IPv6 gurus can take a look and help manila team Medium * neutron-dynamic-routing: Continuous warning because of missing context wrapper - https://bugs.launchpad.net/neutron/+bug/1984238 Previous fix missed one context wrapper, fix by tkajinam https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/852775 Low * Replace "click" with "cliff" - https://bugs.launchpad.net/neutron/+bug/1985049 Several patches in progress to update tools using click, see LP for details Original started as bug to add click to requirements - https://bugs.launchpad.net/neutron/+bug/1984109 * [OVN] Remove "FIPAddDeleteEvent" event - https://bugs.launchpad.net/neutron/+bug/1985096 Unassigned cleanup bug, workaround is not needed since OVN v20.03.0 * DeprecationWarning: Using the 'user' argument is deprecated in version '2.18' and will be removed in version '3.0', please use the 'user_id' argument instead - https://bugs.launchpad.net/neutron/+bug/1986418 Fix merged https://review.opendev.org/c/openstack/neutron/+/853050 * DeprecationWarning: invalid escape sequence \u - https://bugs.launchpad.net/neutron/+bug/1986421 Fix merged https://review.opendev.org/c/openstack/neutron/+/853051 * PkgResourcesDeprecationWarning: is an invalid version and will not be supported in a future release - https://bugs.launchpad.net/neutron/+bug/1986428 Fix also merged: https://review.opendev.org/c/openstack/neutron/+/853053 Thanks Takashi Natsume for this series of deprecation fixes Untriaged * Exception in concurrent port binding activation - https://bugs.launchpad.net/neutron/+bug/1986003 Live migration issue that I did not have time to dig into - any takers? Have a nice week! -- Bernard Cafarelli -------------- next part -------------- An HTML attachment was scrubbed... URL: From the.wade.albright at gmail.com Mon Aug 15 18:47:36 2022 From: the.wade.albright at gmail.com (Wade Albright) Date: Mon, 15 Aug 2022 11:47:36 -0700 Subject: [ironic][xena] problems updating redfish_password for existing node In-Reply-To: References: Message-ID: 1) There are sockets open briefly when the conductor is trying to connect. After three tries the node is set to maintenance mode and there are no more sockets open. 2) My (extremely simple) code was not using connection: close. I was just running "requests.get(" https://10.12.104.174/redfish/v1/Systems/System.Embedded.1", verify=False, auth=('xxxx', 'xxxx'))" in a loop. I just tried it with headers={'Connection':'close'} and it doesn't seem to make any difference. Works fine either way. I was able to confirm that the problem only happens when using session auth. With basic auth it doesn't happen. Versions I'm using here are ironic 18.2.1 and sushy 3.12.2. Here are some fresh logs from the node having the problem: 2022-08-15 10:34:21.726 208875 INFO ironic.conductor.task_manager [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploying" from state "active"; target provision state is "active" 2022-08-15 10:34:22.553 208875 INFO ironic.conductor.utils [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 current power state is 'power on', requested state is 'power off'. 2022-08-15 10:34:35.185 208875 INFO ironic.conductor.utils [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Successfully set node 0c304cea-8ae2-4a12-b658-dec05c190f88 power state to power off by power off. 2022-08-15 10:34:35.200 208875 WARNING ironic.common.pxe_utils [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] IPv6 is enabled and the DHCP driver appears set to a plugin aside from "neutron". Node 0c304cea-8ae2-4a12-b658-dec05c190f88 may not receive proper DHCPv6 provided boot parameters. 2022-08-15 10:34:38.246 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Deploying on node 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20, 'argsinfo': None, 'interface': 'deploy'}] 2022-08-15 10:34:38.255 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Executing {'step': 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'} on node 0c304cea-8ae2-4a12-b658-dec05c190f88 2022-08-15 10:35:27.158 208875 INFO ironic.drivers.modules.ansible.deploy [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Ansible pre-deploy step complete on node 0c304cea-8ae2-4a12-b658-dec05c190f88 2022-08-15 10:35:27.159 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 finished deploy step {'step': 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'} 2022-08-15 10:35:27.160 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Deploying on node 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20, 'argsinfo': None, 'interface': 'deploy'}] 2022-08-15 10:35:27.176 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Executing {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node 0c304cea-8ae2-4a12-b658-dec05c190f88 2022-08-15 10:35:32.037 208875 INFO ironic.conductor.utils [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Successfully set node 0c304cea-8ae2-4a12-b658-dec05c190f88 power state to power on by rebooting. 2022-08-15 10:35:32.037 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Deploy step {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node 0c304cea-8ae2-4a12-b658-dec05c190f88 being executed asynchronously, waiting for driver. 2022-08-15 10:35:32.051 208875 INFO ironic.conductor.task_manager [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "wait call-back" from state "deploying"; target provision state is "active" 2022-08-15 10:39:54.726 208875 INFO ironic.conductor.task_manager [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploying" from state "wait call-back"; target provision state is "active" 2022-08-15 10:39:54.741 208875 INFO ironic.conductor.deployments [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Deploying on node 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20, 'argsinfo': None, 'interface': 'deploy'}] 2022-08-15 10:39:54.748 208875 INFO ironic.conductor.deployments [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Executing {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on node 0c304cea-8ae2-4a12-b658-dec05c190f88 2022-08-15 10:42:24.738 208875 WARNING ironic.drivers.modules.agent_base [-] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping heartbeat processing (will retry on the next heartbeat): ironic.common.exception.NodeLocked: Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host sjc06-c01-irn01.ops.ringcentral.com, please retry after the current operation is completed. 2022-08-15 10:44:29.788 208875 WARNING ironic.drivers.modules.agent_base [-] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping heartbeat processing (will retry on the next heartbeat): ironic.common.exception.NodeLocked: Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host sjc06-c01-irn01.ops.ringcentral.com, please retry after the current operation is completed. 2022-08-15 10:47:24.830 208875 WARNING ironic.drivers.modules.agent_base [-] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping heartbeat processing (will retry on the next heartbeat): ironic.common.exception.NodeLocked: Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host sjc06-c01-irn01.ops.ringcentral.com, please retry after the current operation is completed. 2022-08-15 11:05:59.544 208875 INFO ironic.drivers.modules.ansible.deploy [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Ansible complete deploy on node 0c304cea-8ae2-4a12-b658-dec05c190f88 2022-08-15 11:06:00.141 208875 ERROR ironic.conductor.utils [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 failed deploy step {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) 2022-08-15 11:06:00.218 208875 ERROR ironic.conductor.task_manager [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploy failed" from state "deploying"; target provision state is "active": requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) 2022-08-15 11:06:28.774 208875 WARNING ironic.conductor.manager [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During sync_power_state, could not get power state for node 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 1 of 3. Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)).: requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) 2022-08-15 11:06:53.710 208875 WARNING ironic.conductor.manager [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During sync_power_state, could not get power state for node 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 2 of 3. Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)).: requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) 2022-08-15 11:07:53.727 208875 WARNING ironic.conductor.manager [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During sync_power_state, could not get power state for node 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 3 of 3. Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)).: requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) 2022-08-15 11:08:53.704 208875 ERROR ironic.conductor.manager [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During sync_power_state, max retries exceeded for node 0c304cea-8ae2-4a12-b658-dec05c190f88, node state None does not match expected state 'power on'. Updating DB state to 'None' Switching node to maintenance mode. Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) 2022-08-15 11:13:53.750 208875 ERROR ironic.conductor.manager [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During sync_power_state, max retries exceeded for node 0c304cea-8ae2-4a12-b658-dec05c190f88, node state None does not match expected state 'None'. Updating DB state to 'None' Switching node to maintenance mode. Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) On Fri, Aug 12, 2022 at 9:06 PM Julia Kreger wrote: > Two questions: > > 1) do you see open sockets to the BMCs in netstat output? > 2) is your code using ?connection: close?? Or are you using sushy? > > Honestly, this seems *really* weird with current sushy versions, and is > kind of reminiscent of a cached session which is using kept alive sockets. > > If you could grep out req-b6dd74da-1cc7-4c63-b58e-b7ded37007e9 to see > what the prior couple of conductor actions were, that would give us better > context as to what is going on. > > -Julia > > On Fri, Aug 12, 2022 at 3:11 PM Wade Albright > wrote: > >> Sorry for the spam. The openssl issue may have been a red herring. I am >> not able to reproduce the issue directly with my own python code. I was >> trying to fetch something that required authentication. After I added the >> correct auth info it works fine. I am not able to cause the same error as >> is happening in the Ironic logs. >> >> Anyway I'll do some more testing and report back. >> >> On Fri, Aug 12, 2022 at 2:14 PM Wade Albright < >> the.wade.albright at gmail.com> wrote: >> >>> I'm not sure why this problem only now started showing up, but it >>> appears to be unrelated to Ironic. I was able to reproduce it directly >>> outside of Ironic using a simple python program using urllib to get URLs >>> from the BMC/redfish interface. Seems to be some combination of a buggy >>> server SSL implementation and newer openssl 1.1.1. Apparently it doesn't >>> happen using openssl 1.0. >>> >>> I've found some information about possible workarounds but haven't >>> figured it out yet. If I do I'll update this thread just in case anyone >>> else runs into it. >>> >>> On Fri, Aug 12, 2022 at 8:13 AM Wade Albright < >>> the.wade.albright at gmail.com> wrote: >>> >>>> So I seem to have run into a new issue after upgrading to the newer >>>> versions to fix the password change issue. >>>> >>>> Now I am randomly getting errors like the below. Once I hit this error >>>> for a given node, no operations work on the node. I thought maybe it was an >>>> issue with the node itself, but it doesn't seem like it. The BMC seems to >>>> be working fine. >>>> >>>> After a conductor restart, things start working again. Has anyone seen >>>> something like this? >>>> >>>> Log example: >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils [- - - - >>>> -] Node ef5a2502-680b-4933-a0ee-6737e57ce1c5 failed deploy step {'step': >>>> 'write_image', 'priority': >>>> 80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: >>>> ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", >>>> InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions. >>>> ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got >>>> length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes >>>> read)) >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback >>>> (most recent call last): >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 697, in >>>> _update_chunk_length >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>> self.chunk_left = int(line, 16) >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>> ValueError: invalid literal for int() with base 16: b'' >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During >>>> handling of the above exception, another exception occurred: >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback >>>> (most recent call last): >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 438, in >>>> _error_catcher >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils yield >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 764, in >>>> read_chunked >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>> self._update_chunk_length() >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 701, in >>>> _update_chunk_length >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise >>>> InvalidChunkLength(self, line) >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>> urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 >>>> bytes read) >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During >>>> handling of the above exception, another exception occurred: >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback >>>> (most recent call last): >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>>> "/usr/local/lib/python3.6/site-packages/requests/models.py", line 760, in >>>> generate >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for >>>> chunk in self.raw.stream(chunk_size, decode_content=True): >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 572, in >>>> stream >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for >>>> line in self.read_chunked(amt, decode_content=decode_content): >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 793, in >>>> read_chunked >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>> self._original_response.close() >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>>> "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__ >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>> self.gen.throw(type, value, traceback) >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >>>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 455, in >>>> _error_catcher >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise >>>> ProtocolError("Connection broken: %r" % e, e) >>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>> urllib3.exceptions.ProtocolError: ("Connection broken: >>>> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got >>>> length b'', 0 bytes r >>>> ead)) >>>> >>>> On Wed, Jul 20, 2022 at 2:04 PM Wade Albright < >>>> the.wade.albright at gmail.com> wrote: >>>> >>>>> I forgot to mention, that using session auth solved the problem after >>>>> upgrading to the newer versions that include the two mentioned patches. >>>>> >>>>> On Wed, Jul 20, 2022 at 7:36 AM Wade Albright < >>>>> the.wade.albright at gmail.com> wrote: >>>>> >>>>>> Switching to session auth solved the problem, and it seems like the >>>>>> better way to go anyway for equipment that supports it. Thanks again for >>>>>> all your help! >>>>>> >>>>>> Wade >>>>>> >>>>>> On Tue, Jul 19, 2022 at 5:37 PM Julia Kreger < >>>>>> juliaashleykreger at gmail.com> wrote: >>>>>> >>>>>>> Just to provide a brief update for the mailing list. It looks like >>>>>>> this is a case of use of Basic Auth with the BMC, where we were not >>>>>>> catching the error properly... and thus not reporting the >>>>>>> authentication failure to ironic so it would catch, and initiate a >>>>>>> new >>>>>>> client with the most up to date password. The default, typically used >>>>>>> path is Session based authentication as BMCs generally handle >>>>>>> internal >>>>>>> session/user login tracking in a far better fashion. But not every >>>>>>> BMC >>>>>>> supports sessions. >>>>>>> >>>>>>> Fix in review[0] :) >>>>>>> >>>>>>> -Julia >>>>>>> [0] https://review.opendev.org/c/openstack/sushy/+/850425 >>>>>>> >>>>>>> On Mon, Jul 18, 2022 at 4:15 PM Julia Kreger >>>>>>> wrote: >>>>>>> > >>>>>>> > Excellent, hopefully I'll be able to figure out why Sushy is not >>>>>>> doing >>>>>>> > the needful... Or if it is and Ironic is not picking up on it. >>>>>>> > >>>>>>> > Anyway, I've posted >>>>>>> > https://review.opendev.org/c/openstack/ironic/+/850259 which might >>>>>>> > handle this issue. Obviously a work in progress, but it represents >>>>>>> > what I think is happening inside of ironic itself leading into >>>>>>> sushy >>>>>>> > when cache access occurs. >>>>>>> > >>>>>>> > On Mon, Jul 18, 2022 at 4:04 PM Wade Albright >>>>>>> > wrote: >>>>>>> > > >>>>>>> > > Sounds good, I will do that tomorrow. Thanks Julia. >>>>>>> > > >>>>>>> > > On Mon, Jul 18, 2022 at 3:27 PM Julia Kreger < >>>>>>> juliaashleykreger at gmail.com> wrote: >>>>>>> > >> >>>>>>> > >> Debug would be best. I think I have an idea what is going on, >>>>>>> and this >>>>>>> > >> is a similar variation. If you want, you can email them >>>>>>> directly to >>>>>>> > >> me. Specifically only need entries reported by the sushy >>>>>>> library and >>>>>>> > >> ironic.drivers.modules.redfish.utils. >>>>>>> > >> >>>>>>> > >> On Mon, Jul 18, 2022 at 3:20 PM Wade Albright >>>>>>> > >> wrote: >>>>>>> > >> > >>>>>>> > >> > I'm happy to supply some logs, what verbosity level should i >>>>>>> use? And should I just embed the logs in email to the list or upload >>>>>>> somewhere? >>>>>>> > >> > >>>>>>> > >> > On Mon, Jul 18, 2022 at 3:14 PM Julia Kreger < >>>>>>> juliaashleykreger at gmail.com> wrote: >>>>>>> > >> >> >>>>>>> > >> >> If you could supply some conductor logs, that would be >>>>>>> helpful. It >>>>>>> > >> >> should be re-authenticating, but obviously we have a larger >>>>>>> bug there >>>>>>> > >> >> we need to find the root issue behind. >>>>>>> > >> >> >>>>>>> > >> >> On Mon, Jul 18, 2022 at 3:06 PM Wade Albright >>>>>>> > >> >> wrote: >>>>>>> > >> >> > >>>>>>> > >> >> > I was able to use the patches to update the code, but >>>>>>> unfortunately the problem is still there for me. >>>>>>> > >> >> > >>>>>>> > >> >> > I also tried an RPM upgrade to the versions Julia >>>>>>> mentioned had the fixes, namely Sushy 3.12.1 - Released May 2022 and Ironic >>>>>>> 18.2.1 - Released in January 2022. But it did not fix the problem. >>>>>>> > >> >> > >>>>>>> > >> >> > I am able to consistently reproduce the error. >>>>>>> > >> >> > - step 1: change BMC password directly on the node itself >>>>>>> > >> >> > - step 2: update BMC password (redfish_password) in >>>>>>> ironic with 'openstack baremetal node set --driver-info >>>>>>> redfish_password='newpass' >>>>>>> > >> >> > >>>>>>> > >> >> > After step 1 there are errors in the logs entries like >>>>>>> "Session authentication appears to have been lost at some point in time" >>>>>>> and eventually it puts the node into maintenance mode and marks the power >>>>>>> state as "none." >>>>>>> > >> >> > After step 2 and taking the host back out of maintenance >>>>>>> mode, it goes through a similar set of log entries puts the node into MM >>>>>>> again. >>>>>>> > >> >> > >>>>>>> > >> >> > After the above steps, a conductor restart fixes the >>>>>>> problem and operations work normally again. Given this it seems like there >>>>>>> is still some kind of caching issue. >>>>>>> > >> >> > >>>>>>> > >> >> > On Sat, Jul 16, 2022 at 6:01 PM Wade Albright < >>>>>>> the.wade.albright at gmail.com> wrote: >>>>>>> > >> >> >> >>>>>>> > >> >> >> Hi Julia, >>>>>>> > >> >> >> >>>>>>> > >> >> >> Thank you so much for the reply! Hopefully this is the >>>>>>> issue. I'll try out the patches next week and report back. I'll also email >>>>>>> you on Monday about the versions, that would be very helpful to know. >>>>>>> > >> >> >> >>>>>>> > >> >> >> Thanks again, really appreciate it. >>>>>>> > >> >> >> >>>>>>> > >> >> >> Wade >>>>>>> > >> >> >> >>>>>>> > >> >> >> >>>>>>> > >> >> >> >>>>>>> > >> >> >> On Sat, Jul 16, 2022 at 4:36 PM Julia Kreger < >>>>>>> juliaashleykreger at gmail.com> wrote: >>>>>>> > >> >> >>> >>>>>>> > >> >> >>> Greetings! >>>>>>> > >> >> >>> >>>>>>> > >> >> >>> I believe you need two patches, one in ironic and one in >>>>>>> sushy. >>>>>>> > >> >> >>> >>>>>>> > >> >> >>> Sushy: >>>>>>> > >> >> >>> https://review.opendev.org/c/openstack/sushy/+/832860 >>>>>>> > >> >> >>> >>>>>>> > >> >> >>> Ironic: >>>>>>> > >> >> >>> https://review.opendev.org/c/openstack/ironic/+/820588 >>>>>>> > >> >> >>> >>>>>>> > >> >> >>> I think it is variation, and the comment about working >>>>>>> after you restart the conductor is the big signal to me. I?m on a phone on >>>>>>> a bad data connection, if you email me on Monday I can see what versions >>>>>>> the fixes would be in. >>>>>>> > >> >> >>> >>>>>>> > >> >> >>> For the record, it is a session cache issue, the bug was >>>>>>> that the service didn?t quite know what to do when auth fails. >>>>>>> > >> >> >>> >>>>>>> > >> >> >>> -Julia >>>>>>> > >> >> >>> >>>>>>> > >> >> >>> >>>>>>> > >> >> >>> On Fri, Jul 15, 2022 at 2:55 PM Wade Albright < >>>>>>> the.wade.albright at gmail.com> wrote: >>>>>>> > >> >> >>>> >>>>>>> > >> >> >>>> Hi, >>>>>>> > >> >> >>>> >>>>>>> > >> >> >>>> I'm hitting a problem when trying to update the >>>>>>> redfish_password for an existing node. I'm curious to know if anyone else >>>>>>> has encountered this problem. I'm not sure if I'm just doing something >>>>>>> wrong or if there is a bug. Or if the problem is unique to my setup. >>>>>>> > >> >> >>>> >>>>>>> > >> >> >>>> I have a node already added into ironic with all the >>>>>>> driver details set, and things are working fine. I am able to run >>>>>>> deployments. >>>>>>> > >> >> >>>> >>>>>>> > >> >> >>>> Now I need to change the redfish password on the host. >>>>>>> So I update the password for redfish access on the host, then use an >>>>>>> 'openstack baremetal node set --driver-info >>>>>>> redfish_password=' command to set the new redfish_password. >>>>>>> > >> >> >>>> >>>>>>> > >> >> >>>> Once this has been done, deployment no longer works. I >>>>>>> see redfish authentication errors in the logs and the operation fails. I >>>>>>> waited a bit to see if there might just be a delay in updating the >>>>>>> password, but after awhile it still didn't work. >>>>>>> > >> >> >>>> >>>>>>> > >> >> >>>> I restarted the conductor, and after that things work >>>>>>> fine again. So it seems like the password is cached or something. Is there >>>>>>> a way to force the password to update? I even tried removing the redfish >>>>>>> credentials and re-adding them, but that didn't work either. Only a >>>>>>> conductor restart seems to make the new password work. >>>>>>> > >> >> >>>> >>>>>>> > >> >> >>>> We are running Xena, using rpm installation on Oracle >>>>>>> Linux 8.5. >>>>>>> > >> >> >>>> >>>>>>> > >> >> >>>> Thanks in advance for any help with this issue. >>>>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhitman at groupw.com Mon Aug 15 20:42:36 2022 From: swhitman at groupw.com (Stuart Whitman) Date: Mon, 15 Aug 2022 20:42:36 +0000 Subject: [kolla-ansible][octavia] need networking help Message-ID: Hello, I enabled Octavia on a kolla-ansible installed Openstack cluster. When I try to launch a loadbalancer instance, the octavia-worker.log file reports: "WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance." I researched enough to know that the problem has to do with networking between the controller and the lb-mgmt-net network. I initially overlooked this in the kolla-ansible Octavia documentation: "If using a VLAN provider network, ensure that the traffic is also bridged to Open vSwitch on the controllers." But, I don't know how to do it. Help to create the necessary bridge would be greatly appreciated. Thanks, -Stu _____________________________________ The information contained in this e-mail and any attachments from Group W may contain confidential and/or proprietary information and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, be aware that any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately of that fact by return e-mail and permanently delete the e-mail and any attachments to it. From juliaashleykreger at gmail.com Tue Aug 16 00:27:30 2022 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 15 Aug 2022 17:27:30 -0700 Subject: [ironic][xena] problems updating redfish_password for existing node In-Reply-To: References: Message-ID: Well, that is weird. If I grok it as-is, it almost looks like the BMC returned an empty response.... It is not a failure mode we've seen or had reported (afaik) up to this point. That being said, we should be able to invalidate the session and launch a new client... I suspect https://review.opendev.org/c/openstack/sushy/+/853209 should fix things up. I suspect we will look at just invalidating the session upon any error. On Mon, Aug 15, 2022 at 11:47 AM Wade Albright wrote: > > 1) There are sockets open briefly when the conductor is trying to connect. After three tries the node is set to maintenance mode and there are no more sockets open. > 2) My (extremely simple) code was not using connection: close. I was just running "requests.get("https://10.12.104.174/redfish/v1/Systems/System.Embedded.1", verify=False, auth=('xxxx', 'xxxx'))" in a loop. I just tried it with headers={'Connection':'close'} and it doesn't seem to make any difference. Works fine either way. > > I was able to confirm that the problem only happens when using session auth. With basic auth it doesn't happen. > > Versions I'm using here are ironic 18.2.1 and sushy 3.12.2. > > Here are some fresh logs from the node having the problem: > > 2022-08-15 10:34:21.726 208875 INFO ironic.conductor.task_manager [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploying" from state "active"; target provision state is "active" > 2022-08-15 10:34:22.553 208875 INFO ironic.conductor.utils [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 current power state is 'power on', requested state is 'power off'. > 2022-08-15 10:34:35.185 208875 INFO ironic.conductor.utils [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Successfully set node 0c304cea-8ae2-4a12-b658-dec05c190f88 power state to power off by power off. > 2022-08-15 10:34:35.200 208875 WARNING ironic.common.pxe_utils [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] IPv6 is enabled and the DHCP driver appears set to a plugin aside from "neutron". Node 0c304cea-8ae2-4a12-b658-dec05c190f88 may not receive proper DHCPv6 provided boot parameters. > 2022-08-15 10:34:38.246 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Deploying on node 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20, 'argsinfo': None, 'interface': 'deploy'}] > 2022-08-15 10:34:38.255 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Executing {'step': 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'} on node 0c304cea-8ae2-4a12-b658-dec05c190f88 > 2022-08-15 10:35:27.158 208875 INFO ironic.drivers.modules.ansible.deploy [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Ansible pre-deploy step complete on node 0c304cea-8ae2-4a12-b658-dec05c190f88 > 2022-08-15 10:35:27.159 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 finished deploy step {'step': 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'} > 2022-08-15 10:35:27.160 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Deploying on node 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20, 'argsinfo': None, 'interface': 'deploy'}] > 2022-08-15 10:35:27.176 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Executing {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node 0c304cea-8ae2-4a12-b658-dec05c190f88 > 2022-08-15 10:35:32.037 208875 INFO ironic.conductor.utils [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Successfully set node 0c304cea-8ae2-4a12-b658-dec05c190f88 power state to power on by rebooting. > 2022-08-15 10:35:32.037 208875 INFO ironic.conductor.deployments [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Deploy step {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node 0c304cea-8ae2-4a12-b658-dec05c190f88 being executed asynchronously, waiting for driver. > 2022-08-15 10:35:32.051 208875 INFO ironic.conductor.task_manager [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 b679510ddb6540ca9454e26841f65c89 - default default] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "wait call-back" from state "deploying"; target provision state is "active" > 2022-08-15 10:39:54.726 208875 INFO ironic.conductor.task_manager [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploying" from state "wait call-back"; target provision state is "active" > 2022-08-15 10:39:54.741 208875 INFO ironic.conductor.deployments [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Deploying on node 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo': None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20, 'argsinfo': None, 'interface': 'deploy'}] > 2022-08-15 10:39:54.748 208875 INFO ironic.conductor.deployments [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Executing {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on node 0c304cea-8ae2-4a12-b658-dec05c190f88 > 2022-08-15 10:42:24.738 208875 WARNING ironic.drivers.modules.agent_base [-] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping heartbeat processing (will retry on the next heartbeat): ironic.common.exception.NodeLocked: Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host sjc06-c01-irn01.ops.ringcentral.com, please retry after the current operation is completed. > 2022-08-15 10:44:29.788 208875 WARNING ironic.drivers.modules.agent_base [-] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping heartbeat processing (will retry on the next heartbeat): ironic.common.exception.NodeLocked: Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host sjc06-c01-irn01.ops.ringcentral.com, please retry after the current operation is completed. > 2022-08-15 10:47:24.830 208875 WARNING ironic.drivers.modules.agent_base [-] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping heartbeat processing (will retry on the next heartbeat): ironic.common.exception.NodeLocked: Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host sjc06-c01-irn01.ops.ringcentral.com, please retry after the current operation is completed. > 2022-08-15 11:05:59.544 208875 INFO ironic.drivers.modules.ansible.deploy [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Ansible complete deploy on node 0c304cea-8ae2-4a12-b658-dec05c190f88 > 2022-08-15 11:06:00.141 208875 ERROR ironic.conductor.utils [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 failed deploy step {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) > 2022-08-15 11:06:00.218 208875 ERROR ironic.conductor.task_manager [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploy failed" from state "deploying"; target provision state is "active": requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) > 2022-08-15 11:06:28.774 208875 WARNING ironic.conductor.manager [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During sync_power_state, could not get power state for node 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 1 of 3. Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)).: requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) > 2022-08-15 11:06:53.710 208875 WARNING ironic.conductor.manager [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During sync_power_state, could not get power state for node 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 2 of 3. Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)).: requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) > 2022-08-15 11:07:53.727 208875 WARNING ironic.conductor.manager [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During sync_power_state, could not get power state for node 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 3 of 3. Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)).: requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) > 2022-08-15 11:08:53.704 208875 ERROR ironic.conductor.manager [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During sync_power_state, max retries exceeded for node 0c304cea-8ae2-4a12-b658-dec05c190f88, node state None does not match expected state 'power on'. Updating DB state to 'None' Switching node to maintenance mode. Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) > 2022-08-15 11:13:53.750 208875 ERROR ironic.conductor.manager [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During sync_power_state, max retries exceeded for node 0c304cea-8ae2-4a12-b658-dec05c190f88, node state None does not match expected state 'None'. Updating DB state to 'None' Switching node to maintenance mode. Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) > > > On Fri, Aug 12, 2022 at 9:06 PM Julia Kreger wrote: >> >> Two questions: >> >> 1) do you see open sockets to the BMCs in netstat output? >> 2) is your code using ?connection: close?? Or are you using sushy? >> >> Honestly, this seems *really* weird with current sushy versions, and is kind of reminiscent of a cached session which is using kept alive sockets. >> >> If you could grep out req-b6dd74da-1cc7-4c63-b58e-b7ded37007e9 to see what the prior couple of conductor actions were, that would give us better context as to what is going on. >> >> -Julia >> >> On Fri, Aug 12, 2022 at 3:11 PM Wade Albright wrote: >>> >>> Sorry for the spam. The openssl issue may have been a red herring. I am not able to reproduce the issue directly with my own python code. I was trying to fetch something that required authentication. After I added the correct auth info it works fine. I am not able to cause the same error as is happening in the Ironic logs. >>> >>> Anyway I'll do some more testing and report back. >>> >>> On Fri, Aug 12, 2022 at 2:14 PM Wade Albright wrote: >>>> >>>> I'm not sure why this problem only now started showing up, but it appears to be unrelated to Ironic. I was able to reproduce it directly outside of Ironic using a simple python program using urllib to get URLs from the BMC/redfish interface. Seems to be some combination of a buggy server SSL implementation and newer openssl 1.1.1. Apparently it doesn't happen using openssl 1.0. >>>> >>>> I've found some information about possible workarounds but haven't figured it out yet. If I do I'll update this thread just in case anyone else runs into it. >>>> >>>> On Fri, Aug 12, 2022 at 8:13 AM Wade Albright wrote: >>>>> >>>>> So I seem to have run into a new issue after upgrading to the newer versions to fix the password change issue. >>>>> >>>>> Now I am randomly getting errors like the below. Once I hit this error for a given node, no operations work on the node. I thought maybe it was an issue with the node itself, but it doesn't seem like it. The BMC seems to be working fine. >>>>> >>>>> After a conductor restart, things start working again. Has anyone seen something like this? >>>>> >>>>> Log example: >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils [- - - - -] Node ef5a2502-680b-4933-a0ee-6737e57ce1c5 failed deploy step {'step': 'write_image', 'priority': >>>>> 80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions. >>>>> ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback (most recent call last): >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 697, in _update_chunk_length >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils self.chunk_left = int(line, 16) >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils ValueError: invalid literal for int() with base 16: b'' >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During handling of the above exception, another exception occurred: >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback (most recent call last): >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 438, in _error_catcher >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils yield >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 764, in read_chunked >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils self._update_chunk_length() >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 701, in _update_chunk_length >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise InvalidChunkLength(self, line) >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read) >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During handling of the above exception, another exception occurred: >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback (most recent call last): >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 760, in generate >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for chunk in self.raw.stream(chunk_size, decode_content=True): >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 572, in stream >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for line in self.read_chunked(amt, decode_content=decode_content): >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 793, in read_chunked >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils self._original_response.close() >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__ >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils self.gen.throw(type, value, traceback) >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 455, in _error_catcher >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils raise ProtocolError("Connection broken: %r" % e, e) >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes r >>>>> ead)) >>>>> >>>>> On Wed, Jul 20, 2022 at 2:04 PM Wade Albright wrote: >>>>>> >>>>>> I forgot to mention, that using session auth solved the problem after upgrading to the newer versions that include the two mentioned patches. >>>>>> >>>>>> On Wed, Jul 20, 2022 at 7:36 AM Wade Albright wrote: >>>>>>> >>>>>>> Switching to session auth solved the problem, and it seems like the better way to go anyway for equipment that supports it. Thanks again for all your help! >>>>>>> >>>>>>> Wade >>>>>>> >>>>>>> On Tue, Jul 19, 2022 at 5:37 PM Julia Kreger wrote: >>>>>>>> >>>>>>>> Just to provide a brief update for the mailing list. It looks like >>>>>>>> this is a case of use of Basic Auth with the BMC, where we were not >>>>>>>> catching the error properly... and thus not reporting the >>>>>>>> authentication failure to ironic so it would catch, and initiate a new >>>>>>>> client with the most up to date password. The default, typically used >>>>>>>> path is Session based authentication as BMCs generally handle internal >>>>>>>> session/user login tracking in a far better fashion. But not every BMC >>>>>>>> supports sessions. >>>>>>>> >>>>>>>> Fix in review[0] :) >>>>>>>> >>>>>>>> -Julia >>>>>>>> [0] https://review.opendev.org/c/openstack/sushy/+/850425 >>>>>>>> >>>>>>>> On Mon, Jul 18, 2022 at 4:15 PM Julia Kreger >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > Excellent, hopefully I'll be able to figure out why Sushy is not doing >>>>>>>> > the needful... Or if it is and Ironic is not picking up on it. >>>>>>>> > >>>>>>>> > Anyway, I've posted >>>>>>>> > https://review.opendev.org/c/openstack/ironic/+/850259 which might >>>>>>>> > handle this issue. Obviously a work in progress, but it represents >>>>>>>> > what I think is happening inside of ironic itself leading into sushy >>>>>>>> > when cache access occurs. >>>>>>>> > >>>>>>>> > On Mon, Jul 18, 2022 at 4:04 PM Wade Albright >>>>>>>> > wrote: >>>>>>>> > > >>>>>>>> > > Sounds good, I will do that tomorrow. Thanks Julia. >>>>>>>> > > >>>>>>>> > > On Mon, Jul 18, 2022 at 3:27 PM Julia Kreger wrote: >>>>>>>> > >> >>>>>>>> > >> Debug would be best. I think I have an idea what is going on, and this >>>>>>>> > >> is a similar variation. If you want, you can email them directly to >>>>>>>> > >> me. Specifically only need entries reported by the sushy library and >>>>>>>> > >> ironic.drivers.modules.redfish.utils. >>>>>>>> > >> >>>>>>>> > >> On Mon, Jul 18, 2022 at 3:20 PM Wade Albright >>>>>>>> > >> wrote: >>>>>>>> > >> > >>>>>>>> > >> > I'm happy to supply some logs, what verbosity level should i use? And should I just embed the logs in email to the list or upload somewhere? >>>>>>>> > >> > >>>>>>>> > >> > On Mon, Jul 18, 2022 at 3:14 PM Julia Kreger wrote: >>>>>>>> > >> >> >>>>>>>> > >> >> If you could supply some conductor logs, that would be helpful. It >>>>>>>> > >> >> should be re-authenticating, but obviously we have a larger bug there >>>>>>>> > >> >> we need to find the root issue behind. >>>>>>>> > >> >> >>>>>>>> > >> >> On Mon, Jul 18, 2022 at 3:06 PM Wade Albright >>>>>>>> > >> >> wrote: >>>>>>>> > >> >> > >>>>>>>> > >> >> > I was able to use the patches to update the code, but unfortunately the problem is still there for me. >>>>>>>> > >> >> > >>>>>>>> > >> >> > I also tried an RPM upgrade to the versions Julia mentioned had the fixes, namely Sushy 3.12.1 - Released May 2022 and Ironic 18.2.1 - Released in January 2022. But it did not fix the problem. >>>>>>>> > >> >> > >>>>>>>> > >> >> > I am able to consistently reproduce the error. >>>>>>>> > >> >> > - step 1: change BMC password directly on the node itself >>>>>>>> > >> >> > - step 2: update BMC password (redfish_password) in ironic with 'openstack baremetal node set --driver-info redfish_password='newpass' >>>>>>>> > >> >> > >>>>>>>> > >> >> > After step 1 there are errors in the logs entries like "Session authentication appears to have been lost at some point in time" and eventually it puts the node into maintenance mode and marks the power state as "none." >>>>>>>> > >> >> > After step 2 and taking the host back out of maintenance mode, it goes through a similar set of log entries puts the node into MM again. >>>>>>>> > >> >> > >>>>>>>> > >> >> > After the above steps, a conductor restart fixes the problem and operations work normally again. Given this it seems like there is still some kind of caching issue. >>>>>>>> > >> >> > >>>>>>>> > >> >> > On Sat, Jul 16, 2022 at 6:01 PM Wade Albright wrote: >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> Hi Julia, >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> Thank you so much for the reply! Hopefully this is the issue. I'll try out the patches next week and report back. I'll also email you on Monday about the versions, that would be very helpful to know. >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> Thanks again, really appreciate it. >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> Wade >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> On Sat, Jul 16, 2022 at 4:36 PM Julia Kreger wrote: >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> Greetings! >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> I believe you need two patches, one in ironic and one in sushy. >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> Sushy: >>>>>>>> > >> >> >>> https://review.opendev.org/c/openstack/sushy/+/832860 >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> Ironic: >>>>>>>> > >> >> >>> https://review.opendev.org/c/openstack/ironic/+/820588 >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> I think it is variation, and the comment about working after you restart the conductor is the big signal to me. I?m on a phone on a bad data connection, if you email me on Monday I can see what versions the fixes would be in. >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> For the record, it is a session cache issue, the bug was that the service didn?t quite know what to do when auth fails. >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> -Julia >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> >>>>>>>> > >> >> >>> On Fri, Jul 15, 2022 at 2:55 PM Wade Albright wrote: >>>>>>>> > >> >> >>>> >>>>>>>> > >> >> >>>> Hi, >>>>>>>> > >> >> >>>> >>>>>>>> > >> >> >>>> I'm hitting a problem when trying to update the redfish_password for an existing node. I'm curious to know if anyone else has encountered this problem. I'm not sure if I'm just doing something wrong or if there is a bug. Or if the problem is unique to my setup. >>>>>>>> > >> >> >>>> >>>>>>>> > >> >> >>>> I have a node already added into ironic with all the driver details set, and things are working fine. I am able to run deployments. >>>>>>>> > >> >> >>>> >>>>>>>> > >> >> >>>> Now I need to change the redfish password on the host. So I update the password for redfish access on the host, then use an 'openstack baremetal node set --driver-info redfish_password=' command to set the new redfish_password. >>>>>>>> > >> >> >>>> >>>>>>>> > >> >> >>>> Once this has been done, deployment no longer works. I see redfish authentication errors in the logs and the operation fails. I waited a bit to see if there might just be a delay in updating the password, but after awhile it still didn't work. >>>>>>>> > >> >> >>>> >>>>>>>> > >> >> >>>> I restarted the conductor, and after that things work fine again. So it seems like the password is cached or something. Is there a way to force the password to update? I even tried removing the redfish credentials and re-adding them, but that didn't work either. Only a conductor restart seems to make the new password work. >>>>>>>> > >> >> >>>> >>>>>>>> > >> >> >>>> We are running Xena, using rpm installation on Oracle Linux 8.5. >>>>>>>> > >> >> >>>> >>>>>>>> > >> >> >>>> Thanks in advance for any help with this issue. From gagehugo at gmail.com Tue Aug 16 04:27:13 2022 From: gagehugo at gmail.com (Gage Hugo) Date: Mon, 15 Aug 2022 23:27:13 -0500 Subject: [openstack-helm] No meeting tomorrow Message-ID: Hey team, Since there's nothing on the agenda for tomorrow's meeting, it has been cancelled. We will meet as scheduled next week. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliver.weinmann at me.com Tue Aug 16 05:51:47 2022 From: oliver.weinmann at me.com (Oliver Weinmann) Date: Tue, 16 Aug 2022 07:51:47 +0200 Subject: [kolla-ansible][octavia] need networking help In-Reply-To: References: Message-ID: <27BA5E06-6C57-4FAA-9C3B-0B548925D96E@me.com> Hi, Yes this is not really documented. I used the following guide: https://cloudbase.it/openstack-on-arm64-lbaas/ It is for Arm, but the setup is the same. It basically describes how to create additional virtual Interfaces. If this is not working, and you have spare physical Interfaces, try to use them instead of virtual. Von meinem iPhone gesendet > Am 16.08.2022 um 00:07 schrieb Stuart Whitman : > > ?Hello, > > I enabled Octavia on a kolla-ansible installed Openstack > cluster. When I try to launch a loadbalancer instance, the > octavia-worker.log file reports: > "WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] > Could not connect to instance." > > I researched enough to know that the problem has to do with networking > between the controller and the lb-mgmt-net network. I initially > overlooked this in the kolla-ansible Octavia documentation: > "If using a VLAN provider network, ensure that the traffic is also bridged > to Open vSwitch on the controllers." But, I don't know how to do it. > > Help to create the necessary bridge would be greatly appreciated. > > Thanks, > -Stu > _____________________________________ > The information contained in this e-mail and any attachments from Group W may contain confidential and/or proprietary information and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, be aware that any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately of that fact by return e-mail and permanently delete the e-mail and any attachments to it. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Danny.Webb at thehutgroup.com Tue Aug 16 08:11:20 2022 From: Danny.Webb at thehutgroup.com (Danny Webb) Date: Tue, 16 Aug 2022 08:11:20 +0000 Subject: [kolla-ansible][octavia] need networking help In-Reply-To: <27BA5E06-6C57-4FAA-9C3B-0B548925D96E@me.com> References: <27BA5E06-6C57-4FAA-9C3B-0B548925D96E@me.com> Message-ID: The way we've done it is to have a vlan tagged interface on the controllers with an IP on the lb-mgmt-network. It's simpler to setup than trying to plug in an ovs interface in on the controllers and making it work that way. ________________________________ From: Oliver Weinmann Sent: 16 August 2022 06:51 To: Stuart Whitman Cc: openstack-discuss at lists.openstack.org Subject: Re: [kolla-ansible][octavia] need networking help CAUTION: This email originates from outside THG ________________________________ Hi, Yes this is not really documented. I used the following guide: https://cloudbase.it/openstack-on-arm64-lbaas/ It is for Arm, but the setup is the same. It basically describes how to create additional virtual Interfaces. If this is not working, and you have spare physical Interfaces, try to use them instead of virtual. Von meinem iPhone gesendet Am 16.08.2022 um 00:07 schrieb Stuart Whitman : ?Hello, I enabled Octavia on a kolla-ansible installed Openstack cluster. When I try to launch a loadbalancer instance, the octavia-worker.log file reports: "WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance." I researched enough to know that the problem has to do with networking between the controller and the lb-mgmt-net network. I initially overlooked this in the kolla-ansible Octavia documentation: "If using a VLAN provider network, ensure that the traffic is also bridged to Open vSwitch on the controllers." But, I don't know how to do it. Help to create the necessary bridge would be greatly appreciated. Thanks, -Stu _____________________________________ The information contained in this e-mail and any attachments from Group W may contain confidential and/or proprietary information and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, be aware that any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately of that fact by return e-mail and permanently delete the e-mail and any attachments to it. Danny Webb Principal OpenStack Engineer The Hut Group Tel: Email: Danny.Webb at thehutgroup.com For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Fifth Floor, Voyager House, Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its respective subsidiaries. Confidentiality Notice This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 811888 or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of the company. Encryptions and Viruses Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions sent by e-mail. Monitoring Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also be monitored and may be recorded to secure effective use and operation and for other lawful business purposes. hgvyjuv -------------- next part -------------- An HTML attachment was scrubbed... URL: From nico.lueck at giz.de Tue Aug 16 08:41:41 2022 From: nico.lueck at giz.de (Lueck, Nico GIZ) Date: Tue, 16 Aug 2022 08:41:41 +0000 Subject: [all] GovStack community seeking for infrastructure expertise Message-ID: Do you believe in the potential of digital public goods for positive impact? Ever wished for more digitally-enabled government services? GovStack [1] is a community-driven initiative on a mission to accelerate the digital transformation of government services in order to better serve citizens - and we want you to join us! We create and maintain an openly licenced blueprint for a modular architecture based on reusable building blocks [2]. Our various working groups work to define the (non-)functional and API specifications for building blocks, leveraging existing standards such as OpenStack. These specifications are then used for implementations in our partner countries across the world, to facilitate the delivery of foundational, life-enhancing services. Learn more about the call for contributions here [3]. We are specifically looking for experts with experience in infrastructure and security. The initiative is founded by the German Federal Ministry for Economic Cooperation and Development, Deutsche Gesellschaft fuer Internationale Zusammenarbeit (GIZ), the Ministry of Foreign Affairs of the Republic of Estonia, the International Telecommunication Union (ITU) and the Digital Impact Alliance. Thank you to the team of the Open Infrastructure Foundation for their support. Also, thank you to the whole community and please feel free to directly contact us. Nico (Lueck) [1] https://www.govstack.global/ [2] https://govstack.gitbook.io/ [3] https://www.govstack.global/call-for-expression-of-interest/ ________________________________ Deutsche Gesellschaft fuer Internationale Zusammenarbeit (GIZ) GmbH; Sitz der Gesellschaft Bonn und Eschborn/Registered offices Bonn and Eschborn, Germany; Registergericht/Registered at Amtsgericht Bonn, Germany; Eintragungs-Nr./Registration no. HRB 18384 und/and Amtsgericht Frankfurt am Main, Germany; Eintragungs-Nr./Registration no. HRB 12394; USt-IdNr./VAT ID no. DE 113891176; Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: Jochen Flasbarth; Vorstand/Management Board: Tanja Goenner (Vorstandssprecherin/Chair of the Management Board), Ingrid-Gabriela Hoven, Thorsten Schaefer-Guembel From wchy1001 at gmail.com Tue Aug 16 09:32:31 2022 From: wchy1001 at gmail.com (W Ch) Date: Tue, 16 Aug 2022 17:32:31 +0800 Subject: [kolla-ansible][octavia] need networking help In-Reply-To: References: Message-ID: Hi Stuart: Usually, you need to add a bridge to all network nodes, you can use "ovs-vsctl add-br {br-name}" to add a ovs bridge, then you need to add a physical port to that bridge by executing "ovs-vsctl add-port {bridge} {port}". another alternatives, you can append the physical port to neutron_external_interface variable in globals.yml. in this case, kolla will create the ovs bridge automatically. both of them, you need set octavia_network_interface and configure external switch properly. if you really don't know how this works, I propose you use "octavia_network_type: tenant" ref: [0] , in this case, kolla-ansible will setup the octavia management network for you, you don't need to do anything. [0]: https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html#development-or-testing thanks. Stuart Whitman ?2022?8?16??? 06:28??? > Hello, > > I enabled Octavia on a kolla-ansible installed Openstack > cluster. When I try to launch a loadbalancer instance, the > octavia-worker.log file reports: > "WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] > Could not connect to instance." > > I researched enough to know that the problem has to do with networking > between the controller and the lb-mgmt-net network. I initially > overlooked this in the kolla-ansible Octavia documentation: > "If using a VLAN provider network, ensure that the traffic is also bridged > to Open vSwitch on the controllers." But, I don't know how to do it. > > Help to create the necessary bridge would be greatly appreciated. > > Thanks, > -Stu > _____________________________________ > The information contained in this e-mail and any attachments from Group W > may contain confidential and/or proprietary information and is intended > only for the named recipient to whom it was originally addressed. If you > are not the intended recipient, be aware that any disclosure, distribution, > or copying of this e-mail or its attachments is strictly prohibited. If you > have received this e-mail in error, please notify the sender immediately of > that fact by return e-mail and permanently delete the e-mail and any > attachments to it. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkchn.in at gmail.com Tue Aug 16 09:56:58 2022 From: kkchn.in at gmail.com (KK CHN) Date: Tue, 16 Aug 2022 15:26:58 +0530 Subject: Metering, billing software components for Openstack In-Reply-To: <3d132405-106f-4353-eb32-4786b5cb435b@matthias-runge.de> References: <3d132405-106f-4353-eb32-4786b5cb435b@matthias-runge.de> Message-ID: I have 2 queries here. Preference for query 2 ( as the query 1 is only for clarifying why it failed with [1] and succeeded with [2] on the same VM .) Query 1. There are installation instructions which I followed in [1] .. But On my fresh ubuntu20.04 LTS, focal Multiple times devstack fails with the below local.conf entries . Could someone point out why this fails ?? (Without Ceilometer and gnocchi enable_plugin lines, my devstack installs successfully. With these lines it always fails with the Sqlalchemy module can't be loaded error (dbcounter plugin = dbcounter module unable to load, all the time !!!!) [1] https://docs.openstack.org/cloudkitty/latest/admin/devstack.html I succeeded to install devstack with Ceilometer with Gnocchi only when I followed [2] on Ubuntu 20.04 . [2] https://wiki.openstack.org/wiki/CloudKitty/Devstack .. This time stack completed successfully after 1229 sec. Why does [2] only work for me ?? and [1] always fails with the Error like this " sqlalchemy.exc unable to load the module. dbcounter plugin = dbcounter ," ?? My working copy of local.conf is cat > local.conf << EOF [[local|localrc]] ADMIN_PASSWORD=secret DATABASE_PASSWORD=secret RABBIT_PASSWORD=secret SERVICE_PASSWORD=secret HOST_IP=10.184.48.94 # ceilometer enable_service ceilometer-acompute ceilometer-acentral ceilometer-anotification ceilometer-collector enable_service ceilometer-alarm-notifier ceilometer-alarm-evaluator enable_service ceilometer-api # horizon enable_service horizon # cloudkitty enable_plugin cloudkitty https://github.com/stackforge/cloudkitty master enable_service ck-api ck-proc EOF This completes the devstack ./stack.sh with the finish message.. Query 2. Once devstack completes the installation, how to access or view the ceilometer/ cloudkitty/gnocchi for the billing and metering system ?? Or how to use and view the installed features of ceilometer/cloudkitty/gnocchi output / usage statistics of this devstack cloud installation ? Thank you, Krish On Thu, Aug 11, 2022 at 6:05 PM Matthias Runge wrote: > On 11/08/2022 12:19, KK CHN wrote: > > List, > > > > We are running our datacenter using the Ussuri version. Planning to > > upgrade to higher versions soon. > > > > 1. What are the metering, billing and metric solutions for ussuri and > > other latest openstack versions ? > > > > 2. The ceilometer and gnocchi is the way forward or ? People are > > using any latest tools for the best results pls share your thoughts. > > > > There is the official OpenStack project Cloudkitty for billing and > chargeback. It uses the Gnocchi API. > > Matthias > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kurt at garloff.de Tue Aug 16 10:43:45 2022 From: kurt at garloff.de (Kurt Garloff) Date: Tue, 16 Aug 2022 12:43:45 +0200 Subject: [all] GovStack community seeking for infrastructure expertise In-Reply-To: References: Message-ID: <39b7d738-5336-b9d3-de5b-dfc3ea098e62@garloff.de> Hi Nico, Sounds like a great initiative. I believe there is a significant overlap between our Sovereign Cloud Stack (SCS) [1] initiative and what you are trying to achieve. There was some contact between GIZ and us at some point although this may not have reached you, unfortuantely. Let's ensure we work together and join forces where it makes sense. [1] https://scs.community/ -- Kurt Garloff Cologne, Germany On 16.08.22 10:41, Lueck, Nico GIZ wrote: > Do you believe in the potential of digital public goods for positive impact? Ever wished for more digitally-enabled government services? > > GovStack [1] is a community-driven initiative on a mission to accelerate the digital transformation of government services in order to better serve citizens - and we want you to join us! > > We create and maintain an openly licenced blueprint for a modular architecture based on reusable building blocks [2]. Our various working groups work to define the (non-)functional and API specifications for building blocks, leveraging existing standards such as OpenStack. These specifications are then used for implementations in our partner countries across the world, to facilitate the delivery of foundational, life-enhancing services. > > Learn more about the call for contributions here [3]. We are specifically looking for experts with experience in infrastructure and security. > > The initiative is founded by the German Federal Ministry for Economic Cooperation and Development, Deutsche Gesellschaft fuer Internationale Zusammenarbeit (GIZ), the Ministry of Foreign Affairs of the Republic of Estonia, the International Telecommunication Union (ITU) and the Digital Impact Alliance. > > Thank you to the team of the Open Infrastructure Foundation for their support. > > Also, thank you to the whole community and please feel free to directly contact us. > Nico (Lueck) > > > [1] https://www.govstack.global/ > [2] https://govstack.gitbook.io/ > [3] https://www.govstack.global/call-for-expression-of-interest/ > > ________________________________ > Deutsche Gesellschaft fuer Internationale Zusammenarbeit (GIZ) GmbH; > Sitz der Gesellschaft Bonn und Eschborn/Registered offices Bonn and Eschborn, Germany; > Registergericht/Registered at Amtsgericht Bonn, Germany; Eintragungs-Nr./Registration no. HRB 18384 und/and Amtsgericht Frankfurt am Main, Germany; Eintragungs-Nr./Registration no. HRB 12394; > USt-IdNr./VAT ID no. DE 113891176; > Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: Jochen Flasbarth; > Vorstand/Management Board: Tanja Goenner (Vorstandssprecherin/Chair of the Management Board), Ingrid-Gabriela Hoven, Thorsten Schaefer-Guembel > From fv at spots.edu Tue Aug 16 12:21:07 2022 From: fv at spots.edu (Father Vlasie) Date: Tue, 16 Aug 2022 05:21:07 -0700 Subject: [openstack-ansible] [yoga] utility_container failure Message-ID: <9FEF486C-780F-46B2-B9A4-5DEFC215A139@spots.edu> Hello everyone, I have happily progressed to the second step of running the playbooks, namely "openstack-ansible setup-infrastructure.yml" Everything looks good except for just one error which is mystifying me: ---------------- TASK [Get list of repo packages] ********************************************************************************************************************************************************** fatal: [infra1_utility_container-5ec32cb5]: FAILED! => {"changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: ", "redirected": false, "status": -1, "url": "http://192.168.3.9:8181/constraints/upper_constraints_cached.txt"} ---------------- 192.168.3.9 is the IP listed in user_variables.yml under haproxy_keepalived_internal_vip_cidr Any help or pointers would be very much appreciated! Thank you, Father Vlasie From skaplons at redhat.com Tue Aug 16 12:32:25 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 16 Aug 2022 14:32:25 +0200 Subject: [neutron] CI meeting - agenda for 16.08.2022 Message-ID: <2389518.EluMbIsB9I@p1> Hi, Agenda for today's Neutron CI meeting is at [1]. This will be video meeting, which we will have on [2]. [1] https://etherpad.opendev.org/p/neutron-ci-meetings [2] https://meetpad.opendev.org/neutron-ci-meetings -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From james.denton at rackspace.com Tue Aug 16 13:32:39 2022 From: james.denton at rackspace.com (James Denton) Date: Tue, 16 Aug 2022 13:32:39 +0000 Subject: [openstack-ansible] [yoga] utility_container failure In-Reply-To: <9FEF486C-780F-46B2-B9A4-5DEFC215A139@spots.edu> References: <9FEF486C-780F-46B2-B9A4-5DEFC215A139@spots.edu> Message-ID: Hello, That error means the repo server at 192.168.3.9:8181 is unavailable. The repo server sits behind haproxy, which should be listening on 192.168.3.9 port 8181 on the active (primary) node. You can verify this by issuing a ?curl -v https://192.168.3.9:8181/?. You might check the haproxy service status and/or keepalived status to ensure they are operating properly. If the IP cannot be bound to the correct interface, keepalive may not start. James Denton Rackspace Private Cloud From: Father Vlasie Date: Tuesday, August 16, 2022 at 7:38 AM To: openstack-discuss at lists.openstack.org Subject: [openstack-ansible] [yoga] utility_container failure CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Hello everyone, I have happily progressed to the second step of running the playbooks, namely "openstack-ansible setup-infrastructure.yml" Everything looks good except for just one error which is mystifying me: ---------------- TASK [Get list of repo packages] ********************************************************************************************************************************************************** fatal: [infra1_utility_container-5ec32cb5]: FAILED! => {"changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: ", "redirected": false, "status": -1, "url": "https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.3.9%3A8181%2Fconstraints%2Fupper_constraints_cached.txt&data=05%7C01%7Cjames.denton%40rackspace.com%7Ca51d530625ae4bcaed1008da7f84329f%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637962503012704928%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=viIOEpR8sqc0TxoeDiYSMVEJeE%2FhkE7pxCubo49VsTQ%3D&reserved=0"} ---------------- 192.168.3.9 is the IP listed in user_variables.yml under haproxy_keepalived_internal_vip_cidr Any help or pointers would be very much appreciated! Thank you, Father Vlasie -------------- next part -------------- An HTML attachment was scrubbed... URL: From froyo at redhat.com Tue Aug 16 15:09:11 2022 From: froyo at redhat.com (Fernando Royo) Date: Tue, 16 Aug 2022 17:09:11 +0200 Subject: [Neutron][Octavia][ovn-octavia-provider] proposing Fernando Royo for ovn-octavia-provider core reviewer In-Reply-To: References: <2642162.mvXUDI8C0e@p1> Message-ID: Thanks for the proposal and your support guys! El mar, 26 jul 2022 a las 21:51, Michael Johnson () escribi?: > +1 from me. He has done great work getting the status updates working > in the OVN provider. > > Michael > > On Tue, Jul 26, 2022 at 8:58 AM Luis Tomas Bolivar > wrote: > > > > +1 from me too! He is doing a great job on the ovn-octavia side! > > > > On Tue, Jul 26, 2022 at 3:28 PM Slawek Kaplonski > wrote: > >> > >> Hi, > >> > >> Dnia wtorek, 26 lipca 2022 14:18:41 CEST Lajos Katona pisze: > >> > Hi > >> > > >> > I would like to propose Fernando Royo (froyo) as a core reviewer to > >> > the ovn-octavia-provider project. > >> > Fernando is very active in the project (see [1] and [2]). > >> > > >> > As ovn-octavia-provider is a link between Neutron and Octavia I ask > both > >> > Neutron and Octavia cores to vote by answering to this thread, to > have a > >> > final decision. > >> > Thanks for your consideration. > >> > > >> > [1]: > >> > https://review.opendev.org/q/owner:froyo%2540redhat.com > >> > [2]: > >> > > https://www.stackalytics.io/report/contribution?module=neutron-group&project_type=openstack&days=60 > >> > > >> > Cheers > >> > Lajos > >> > > >> > >> Definitely +1 for Fernando :) > >> > >> -- > >> Slawek Kaplonski > >> Principal Software Engineer > >> Red Hat > > > > > > > > -- > > LUIS TOM?S BOL?VAR > > Principal Software Engineer > > Red Hat > > Madrid, Spain > > ltomasbo at redhat.com > > > > -- Fernando Royo S?nchez Senior Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtantsur at redhat.com Tue Aug 16 15:49:31 2022 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Tue, 16 Aug 2022 17:49:31 +0200 Subject: [ironic][horizon] Call for help with ironic-ui (Was: Horizon angular based plugins are broken after migrating to angularjs 1.8.2.2) In-Reply-To: References: Message-ID: Hi folks, I'd like to repeat our call for help with ironic-ui. The ironic team lacks both JavaScript expertise and free cycles to recover ironic-ui after the horizon changes. If nothing changes, ironic-ui will not be released as part of Zed. So far a few patches have been proposed but none passed the CI: https://review.opendev.org/q/project:openstack%252Fironic-ui. If you can spare some time, please respond here or reach out to TheJulia or myself on IRC. Thanks! Dmitry On Thu, Jun 16, 2022 at 5:55 AM vishal manchanda < manchandavishal143 at gmail.com> wrote: > Hello everyone, > > Horizon team recently migrated XStatic-Angular 1.5.8.0->1.8.2.2 [1] and > after that many horizon angular based plugins start failing [2]. We already > fixed these issues in horizon [3]. So I guess we need to fix the similar > thing in horizon plugins. > > Why did we migrated Angular 1.5.8.0->1.8.2.2? > The angularjs version is updated 1.5.8.0->1.8.2.2 to include the CVE fixed > in the latest version. Also, there is a security bug reported for the same > [4]. > > I have also created an etherpad to track the progress for failed horizon > plugins [5]. > > Action items for the horizon plugins team either fixed their plugins or > review/merge the patch pushed by horizon team members on priority basis to > fix the gate. > > The npm jobs are failing in below horizon plugins : > ? ironic-ui > ? octavia-dashboard > ? senlin-dashboard > ? designate-dashboard > ? vitrage-dashboard > ? murano-dashboard > ? zaqar-ui > ? zun-ui > ? magnum-ui > > In case of any queries, please feel free to reach out to Horizon channel > #openstack-horizon. > > Thanks & Regards, > Vishal Manchanda(irc: vishalmanchanda) > [1] https://review.opendev.org/c/openstack/requirements/+/844099 > [2] https://review.opendev.org/c/openstack/horizon/+/845733 > [3] https://review.opendev.org/c/openstack/horizon/+/843346 > [4] https://bugs.launchpad.net/horizon/+bug/1955556 > [5] > https://etherpad.opendev.org/p/Fix_Horizon_Plugins_With_Angularjs_v1.8.2.2 > -- Red Hat GmbH , Registered seat: Werner von Siemens Ring 14, D-85630 Grasbrunn, Germany Commercial register: Amtsgericht Muenchen/Munich, HRB 153243,Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhitman at groupw.com Tue Aug 16 16:50:44 2022 From: swhitman at groupw.com (Stuart Whitman) Date: Tue, 16 Aug 2022 16:50:44 +0000 Subject: [kolla-ansible][octavia] need networking help In-Reply-To: References: Message-ID: Hello, I prefer for kolla to create the bridge automatically. Each node has two physical interfaces. The network_interface and neutron_external_interface options are set in the inventory file. > you need set octavia_network_interface and configure external switch properly I have octavia_network_interface set to "{{ api_interface }}" and api_interface is set to "{{ network_interface }}", the defaults. What do you mean by "configure external switch properly"? If you mean the external option when creating OpenStack networks, then I used the defaults in globals.yml which does not include that option. If you mean the physical switch, I'm using a low-budget switch I had lying around that is not configurable. Thanks for the help - everything else with kolla-ansible has been fairly easy. -Stu ---- From: W Ch Sent: Tuesday, August 16, 2022 5:32 AM To: Stuart Whitman Cc: openstack-discuss at lists.openstack.org Subject: Re: [kolla-ansible][octavia] need networking help Hi Stuart: Usually, you need to add a bridge to all network nodes, you can use "ovs-vsctl add-br {br-name}" to add a ovs bridge, then you need to add a physical port to that bridge by executing "ovs-vsctl add-port {bridge} {port}". another alternatives, you can append the physical port to neutron_external_interface variable in globals.yml. in this case, kolla will create the ovs bridge automatically. both of them, you need set octavia_network_interface and configure external switch properly. if you really don't know how this works, I propose you use "octavia_network_type: tenant" ref: [0] , in this case, kolla-ansible will setup the octavia management network for you, you don't need to do anything. [0]: https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html#development-or-testing thanks. Stuart Whitman ?2022?8?16??? 06:28??? Hello, I enabled Octavia on a kolla-ansible installed Openstack cluster. When I try to launch a loadbalancer instance, the octavia-worker.log file reports: "WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance." I researched enough to know that the problem has to do with networking between the controller and the lb-mgmt-net network. I initially overlooked this in the kolla-ansible Octavia documentation: "If using a VLAN provider network, ensure that the traffic is also bridged to Open vSwitch on the controllers." But, I don't know how to do it. Help to create the necessary bridge would be greatly appreciated. Thanks, -Stu _____________________________________ The information contained in this e-mail and any attachments from Group W may contain confidential and/or proprietary information and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, be aware that any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately of that fact by return e-mail and permanently delete the e-mail and any attachments to it. From fv at spots.edu Tue Aug 16 21:31:22 2022 From: fv at spots.edu (Father Vlasie) Date: Tue, 16 Aug 2022 14:31:22 -0700 Subject: [openstack-ansible] [yoga] utility_container failure In-Reply-To: References: <9FEF486C-780F-46B2-B9A4-5DEFC215A139@spots.edu> Message-ID: Hello, Thank you very much for the reply! haproxy and keepalived both show status active on infra1 (my primary node). Curl shows "503 Service Unavailable No server is available to handle this request? (Also the URL is http not https?.) If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? Earlier in the output I find the following error (showing for all 3 infra nodes): ------------ TASK [systemd_mount : Set the state of the mount] ***************************************************************************************************************************************** fatal: [infra3_repo_container-7ca5db88]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.022275", "end": "2022-08-16 14:16:34.926861", "msg": "non-zero return code", "rc": 1, "start": "2022-08-16 14:16:34.904586", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []} ?????? Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." Thank you again! Father Vlasie > On Aug 16, 2022, at 6:32 AM, James Denton wrote: > > Hello, > > That error means the repo server at 192.168.3.9:8181 is unavailable. The repo server sits behind haproxy, which should be listening on 192.168.3.9 port 8181 on the active (primary) node. You can verify this by issuing a ?curl -v https://192.168.3.9:8181/?. You might check the haproxy service status and/or keepalived status to ensure they are operating properly. If the IP cannot be bound to the correct interface, keepalive may not start. > > James Denton > Rackspace Private Cloud > > From: Father Vlasie > Date: Tuesday, August 16, 2022 at 7:38 AM > To: openstack-discuss at lists.openstack.org > Subject: [openstack-ansible] [yoga] utility_container failure > > CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > > > Hello everyone, > > I have happily progressed to the second step of running the playbooks, namely "openstack-ansible setup-infrastructure.yml" > > Everything looks good except for just one error which is mystifying me: > > ---------------- > > TASK [Get list of repo packages] ********************************************************************************************************************************************************** > fatal: [infra1_utility_container-5ec32cb5]: FAILED! => {"changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: ", "redirected": false, "status": -1, "url": "https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.3.9%3A8181%2Fconstraints%2Fupper_constraints_cached.txt&data=05%7C01%7Cjames.denton%40rackspace.com%7Ca51d530625ae4bcaed1008da7f84329f%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637962503012704928%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=viIOEpR8sqc0TxoeDiYSMVEJeE%2FhkE7pxCubo49VsTQ%3D&reserved=0"} > > ---------------- > > 192.168.3.9 is the IP listed in user_variables.yml under haproxy_keepalived_internal_vip_cidr > > Any help or pointers would be very much appreciated! > > Thank you, > > Father Vlasie > From wchy1001 at gmail.com Wed Aug 17 01:49:04 2022 From: wchy1001 at gmail.com (W Ch) Date: Wed, 17 Aug 2022 09:49:04 +0800 Subject: [kolla-ansible][octavia] need networking help In-Reply-To: References: Message-ID: Hi: what i mean of 'external switch' is your physical switch. from you description, you just need to configure octavia_amp_network in global.yml. the following is example: please ensure you have set enable_neutron_provider_networks = True before running octavia. octavia_amp_network: name: lb-mgmt-net provider_network_type: vlan provider_segmentation_id: 1000 //vlan id, ensure your physical switch port which connected to 'neutron_external_interface' allows this vlan_id pass (trunk, allow 1000) provider_physical_network: physnet1 //default is physnet1, you can check this in '/etc/kolla/neutron-openvswitch-agent/openvswitch_agent.ini' external: false shared: false subnet: name: lb-mgmt-subnet cidr: "10.1.2.0/24" //this should be the network cidr of vlan 1000. allocation_pool_start: "10.1.2.100" allocation_pool_end: "10.1.2.200" gateway_ip: "10.1.2.1" //this is the gateway for vlan_1000 , most time, this is the vlan 1000 interface ip address in your physical switch. enable_dhcp: yes anyway, the goal is that a vm with octavia_amp_network network is able to access your octavia_network_interface. thanks Stuart Whitman ?2022?8?17??? 00:50??? > Hello, > > I prefer for kolla to create the bridge automatically. Each node has two > physical interfaces. The network_interface and neutron_external_interface > options are set in the inventory file. > > > you need set octavia_network_interface and configure external switch > properly > > I have octavia_network_interface set to "{{ api_interface }}" and > api_interface > is set to "{{ network_interface }}", the defaults. > > What do you mean by "configure external switch properly"? If you mean the > external option when creating OpenStack networks, then I used the defaults > in globals.yml which does not include that option. If you mean the physical > switch, I'm using a low-budget switch I had lying around that is not > configurable. > > Thanks for the help - everything else with kolla-ansible has been fairly > easy. > > -Stu > > ---- > > From: W Ch > Sent: Tuesday, August 16, 2022 5:32 AM > To: Stuart Whitman > Cc: openstack-discuss at lists.openstack.org < > openstack-discuss at lists.openstack.org> > Subject: Re: [kolla-ansible][octavia] need networking help > > Hi Stuart: > > Usually, you need to add a bridge to all network nodes, you can use > "ovs-vsctl add-br {br-name}" to add a ovs bridge, then you need to add a > physical port to that bridge by executing "ovs-vsctl add-port {bridge} > {port}". > another alternatives, you can append the physical port to > neutron_external_interface variable in globals.yml. in this case, kolla > will create the ovs bridge automatically. > both of them, you need set octavia_network_interface and configure > external switch properly. > > if you really don't know how this works, I propose you use > "octavia_network_type: tenant" ref: [0] , in this case, kolla-ansible > will setup the octavia management network for you, you don't need to do > anything. > > [0]: > https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html#development-or-testing > > thanks. > > > Stuart Whitman ?2022?8?16??? 06:28??? > Hello, > > I enabled Octavia on a kolla-ansible installed Openstack > cluster. When I try to launch a loadbalancer instance, the > octavia-worker.log file reports: > "WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] > Could not connect to instance." > > I researched enough to know that the problem has to do with networking > between the controller and the lb-mgmt-net network. I initially > overlooked this in the kolla-ansible Octavia documentation: > "If using a VLAN provider network, ensure that the traffic is also bridged > to Open vSwitch on the controllers." But, I don't know how to do it. > > Help to create the necessary bridge would be greatly appreciated. > > Thanks, > -Stu > > _____________________________________ > The information contained in this e-mail and any attachments from Group W > may contain confidential and/or proprietary information and is intended > only for the named recipient to whom it was originally addressed. If you > are not the intended recipient, be aware that any disclosure, distribution, > or copying of this e-mail or its attachments is strictly prohibited. If you > have received this e-mail in error, please notify the sender immediately of > that fact by return e-mail and permanently delete the e-mail and any > attachments to it. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wchy1001 at gmail.com Wed Aug 17 02:03:40 2022 From: wchy1001 at gmail.com (W Ch) Date: Wed, 17 Aug 2022 10:03:40 +0800 Subject: [kolla-ansible][octavia] need networking help In-Reply-To: References: Message-ID: HI: sorry, i forget a point in last reply. octavia-worker node also needs to access the vm in lb-mgmt-subnet. for above example. please try to ping 10.1.2.1(gateway) in octavia-worker nodes. thanks. W Ch ?2022?8?17??? 09:49??? > Hi: > > what i mean of 'external switch' is your physical switch. > from you description, you just need to configure octavia_amp_network in > global.yml. the following is example: > > please ensure you have set enable_neutron_provider_networks = True before > running octavia. > > octavia_amp_network: name: lb-mgmt-net provider_network_type: vlan provider_segmentation_id: 1000 //vlan id, ensure your physical switch port which connected to 'neutron_external_interface' allows this vlan_id pass (trunk, allow 1000) > > provider_physical_network: physnet1 //default is physnet1, you can check this in '/etc/kolla/neutron-openvswitch-agent/openvswitch_agent.ini' external: false shared: false subnet: name: lb-mgmt-subnet cidr: "10.1.2.0/24" //this should be the network cidr of vlan 1000. allocation_pool_start: "10.1.2.100" allocation_pool_end: "10.1.2.200" gateway_ip: "10.1.2.1" //this is the gateway for vlan_1000 , most time, this is the vlan 1000 interface ip address in your physical switch. enable_dhcp: yes > > > anyway, the goal is that a vm with octavia_amp_network network is able to > access your octavia_network_interface. > > thanks > > > Stuart Whitman ?2022?8?17??? 00:50??? > >> Hello, >> >> I prefer for kolla to create the bridge automatically. Each node has two >> physical interfaces. The network_interface and neutron_external_interface >> options are set in the inventory file. >> >> > you need set octavia_network_interface and configure external switch >> properly >> >> I have octavia_network_interface set to "{{ api_interface }}" and >> api_interface >> is set to "{{ network_interface }}", the defaults. >> >> What do you mean by "configure external switch properly"? If you mean the >> external option when creating OpenStack networks, then I used the defaults >> in globals.yml which does not include that option. If you mean the >> physical >> switch, I'm using a low-budget switch I had lying around that is not >> configurable. >> >> Thanks for the help - everything else with kolla-ansible has been fairly >> easy. >> >> -Stu >> >> ---- >> >> From: W Ch >> Sent: Tuesday, August 16, 2022 5:32 AM >> To: Stuart Whitman >> Cc: openstack-discuss at lists.openstack.org < >> openstack-discuss at lists.openstack.org> >> Subject: Re: [kolla-ansible][octavia] need networking help >> >> Hi Stuart: >> >> Usually, you need to add a bridge to all network nodes, you can use >> "ovs-vsctl add-br {br-name}" to add a ovs bridge, then you need to add a >> physical port to that bridge by executing "ovs-vsctl add-port {bridge} >> {port}". >> another alternatives, you can append the physical port to >> neutron_external_interface variable in globals.yml. in this case, kolla >> will create the ovs bridge automatically. >> both of them, you need set octavia_network_interface and configure >> external switch properly. >> >> if you really don't know how this works, I propose you use >> "octavia_network_type: tenant" ref: [0] , in this case, kolla-ansible >> will setup the octavia management network for you, you don't need to do >> anything. >> >> [0]: >> https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html#development-or-testing >> >> thanks. >> >> >> Stuart Whitman ?2022?8?16??? 06:28??? >> Hello, >> >> I enabled Octavia on a kolla-ansible installed Openstack >> cluster. When I try to launch a loadbalancer instance, the >> octavia-worker.log file reports: >> "WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] >> Could not connect to instance." >> >> I researched enough to know that the problem has to do with networking >> between the controller and the lb-mgmt-net network. I initially >> overlooked this in the kolla-ansible Octavia documentation: >> "If using a VLAN provider network, ensure that the traffic is also bridged >> to Open vSwitch on the controllers." But, I don't know how to do it. >> >> Help to create the necessary bridge would be greatly appreciated. >> >> Thanks, >> -Stu >> >> _____________________________________ >> The information contained in this e-mail and any attachments from Group W >> may contain confidential and/or proprietary information and is intended >> only for the named recipient to whom it was originally addressed. If you >> are not the intended recipient, be aware that any disclosure, distribution, >> or copying of this e-mail or its attachments is strictly prohibited. If you >> have received this e-mail in error, please notify the sender immediately of >> that fact by return e-mail and permanently delete the e-mail and any >> attachments to it. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.denton at rackspace.com Wed Aug 17 02:31:13 2022 From: james.denton at rackspace.com (James Denton) Date: Wed, 17 Aug 2022 02:31:13 +0000 Subject: [openstack-ansible] [yoga] utility_container failure In-Reply-To: References: <9FEF486C-780F-46B2-B9A4-5DEFC215A139@spots.edu> Message-ID: Hello, >> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? This will likely be the bond0 interface and not the individual bond member. However, the interface defined here will ultimately depend on the networking of that host, and should be an external facing one (i.e. the interface with the default gateway). In many environments, you?ll have something like this (or using 2 bonds, but same idea): * Bond0 (192.168.100.5/24 gw 192.168.100.1) * Em49 * Em50 * Br-mgmt (172.29.236.5/22) * Bond0.236 * Br-vxlan (172.29.240.5/22) * Bond0.240 * Br-storage (172.29.244.5/22) * Bond0.244 In this example, bond0 has the management IP 192.168.100.5 and br-mgmt is the ?container? bridge with an IP configured from the ?container? network (see cidr_networks in openstack_user_config.yml). FYI: LXC containers will automatically be assigned IPs from the ?container? network outside of the ?used_ips? range(s). The infra host will communicate with the containers via this br-mgmt interface. I?m using FQDNs for the VIPs, which are specified in openstack_user_config.yml here: global_overrides: internal_lb_vip_address: internalapi.openstack.rackspace.lab external_lb_vip_address: publicapi.openstack.rackspace.lab To avoid DNS resolution issues internally (or rather, to ensure the IP is configured in the config files and not the domain name) I?ll override with the IP and hard set the preferred interface(s): haproxy_keepalived_external_vip_cidr: "192.168.100.10/32" haproxy_keepalived_internal_vip_cidr: "172.29.236.10/32" haproxy_keepalived_external_interface: bond0 haproxy_keepalived_internal_interface: br-mgmt haproxy_bind_external_lb_vip_address: 192.168.100.10 haproxy_bind_internal_lb_vip_address: 172.29.236.10 With the above configuration, keepalived will manage two VIPs - one external and one internal, and endpoints will have the FQDN rather than IP. >> Curl shows "503 Service Unavailable No server is available to handle this request? Hard to say without seeing logs why this is happening, but I will assume that keepalived is having issues binding the IP to the interface. You might find the reason in syslog or ?journalctl -xe -f -u keepalived?. >> Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." You might try running ?umount /var/www/repo? and re-run the repo-install.yml playbook (or setup-infrastructure.yml). Hope that helps! James Denton Rackspace Private Cloud From: Father Vlasie Date: Tuesday, August 16, 2022 at 4:31 PM To: James Denton Cc: openstack-discuss at lists.openstack.org Subject: Re: [openstack-ansible] [yoga] utility_container failure CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Hello, Thank you very much for the reply! haproxy and keepalived both show status active on infra1 (my primary node). Curl shows "503 Service Unavailable No server is available to handle this request? (Also the URL is http not https?.) If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? Earlier in the output I find the following error (showing for all 3 infra nodes): ------------ TASK [systemd_mount : Set the state of the mount] ***************************************************************************************************************************************** fatal: [infra3_repo_container-7ca5db88]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.022275", "end": "2022-08-16 14:16:34.926861", "msg": "non-zero return code", "rc": 1, "start": "2022-08-16 14:16:34.904586", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []} ?????? Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." Thank you again! Father Vlasie > On Aug 16, 2022, at 6:32 AM, James Denton wrote: > > Hello, > > That error means the repo server at 192.168.3.9:8181 is unavailable. The repo server sits behind haproxy, which should be listening on 192.168.3.9 port 8181 on the active (primary) node. You can verify this by issuing a ?curl -v https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2F192.168.3.9%3A8181%2F%25E2%2580%2599&data=05%7C01%7Cjames.denton%40rackspace.com%7C17cc3373086d47b3321408da7fceb08a%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637962823085464366%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PSsQW%2FW9N5JXapmMal%2FrsUm%2B8IYkDFTxwqxaH3K7tbA%3D&reserved=0. You might check the haproxy service status and/or keepalived status to ensure they are operating properly. If the IP cannot be bound to the correct interface, keepalive may not start. > > James Denton > Rackspace Private Cloud > > From: Father Vlasie > Date: Tuesday, August 16, 2022 at 7:38 AM > To: openstack-discuss at lists.openstack.org > Subject: [openstack-ansible] [yoga] utility_container failure > > CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > > > Hello everyone, > > I have happily progressed to the second step of running the playbooks, namely "openstack-ansible setup-infrastructure.yml" > > Everything looks good except for just one error which is mystifying me: > > ---------------- > > TASK [Get list of repo packages] ********************************************************************************************************************************************************** > fatal: [infra1_utility_container-5ec32cb5]: FAILED! => {"changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: ", "redirected": false, "status": -1, "url": "https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.3.9%3A8181%2Fconstraints%2Fupper_constraints_cached.txt&data=05%7C01%7Cjames.denton%40rackspace.com%7C17cc3373086d47b3321408da7fceb08a%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637962823085620584%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XhtQdDC8GpQuJXGbQxhlBkUOS3krH%2F9d%2FxU9hWB1Cts%3D&reserved=0"} > > ---------------- > > 192.168.3.9 is the IP listed in user_variables.yml under haproxy_keepalived_internal_vip_cidr > > Any help or pointers would be very much appreciated! > > Thank you, > > Father Vlasie > -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Wed Aug 17 04:52:43 2022 From: akekane at redhat.com (Abhishek Kekane) Date: Wed, 17 Aug 2022 10:22:43 +0530 Subject: [glance] Antelope PTG Planning Message-ID: Hi everyone, Virtual PTG for the Antelope cycle will be held between 17th - 21st October [1]. I've also created an etherpad [2] to collect ideas/topics for the PTG sessions. If you have anything to discuss, please don't hesitate to write it there. [1] https://openinfra-ptg.eventbrite.com/ [2] https://etherpad.opendev.org/p/antelope-ptg-glance-planning Thanks, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Wed Aug 17 07:36:52 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Wed, 17 Aug 2022 09:36:52 +0200 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: References: <21589662.aDxSllVl8Y@p1> Message-ID: <6101105.DvuYhMxLoT@p1> Hi, Dnia poniedzia?ek, 8 sierpnia 2022 17:37:05 CEST Dmitriy Rabotyagov pisze: > Hey > > At the very least in OpenStack-Ansible we already handle that case, > and have overwritten heartbeat_in_pthread for non-UWSGI services, > which is already in stable branches. So backporting this new default > setting would make us revert this patch and apply a set of new ones > for uWSGI which is kind of nasty thing to do on stable branches. I'm not sure I understand why You would need to revert changes in openstack-ansible if that oslo.messaging patch would be merged. It's "just" default value which would be changed but OSA can still configure it explicitly to the same value as default if needed, right? > > IIRC (can be wrong here), kolla-ansible and TripleO also adopted such > changes in their codebase. So with quite high probability, if you use > any deployment tooling, this should be already handled relatively > well. As Sean already mentioned, Tripleo didn't set this value and is using default. That's why it hit us now. Takashi proposed patch for it: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/852713 > > We also can post a release note to stable branches about "known issue" > instead of backporting a new default. > > ??, 8 ???. 2022 ?. ? 12:46, Rados?aw Piliszek : > > > > Hi all, > > > > May this config option support "auto" by default and autodetect > > whether the application is running under mod_wsgi (and uwsgi if it > > also has the issue with green threads but here I'm not really sure...) > > and then decide on the best option? > > This way I would consider this backporting a fix (i.e. the library > > tries better to work in the target environment). > > > > As a final thought, bear in mind there are operators who have already > > overwritten the default, the deployment projects can help as well. > > > > -yoctozepto > > > > On Mon, 8 Aug 2022 at 10:30, Rodolfo Alonso Hernandez > > wrote: > > > > > > Hello all: > > > > > > I understand that by default we don't allow backporting a config knob default value. But I'm with Sean and his explanation. For "uwsgi" applications, if pthread is False, the only drawback will be the reconnection of the MQ socket. But in the case described by Slawek, the problem is more relevant because once the agent has been disconnected for a long time from the MQ, it is not possible to reconnect again and the agent needs to be manually restarted. I would backport the patch setting this config knob to False. > > > > > > Regards. > > > > > > > > > On Sat, Aug 6, 2022 at 12:08 AM Sean Mooney wrote: > > >> > > >> On Fri, Aug 5, 2022 at 7:40 PM Ghanshyam Mann wrote: > > >> > > > >> > ---- On Fri, 05 Aug 2022 17:54:25 +0530 Slawek Kaplonski wrote --- > > >> > > Hi, > > >> > > > > >> > > Some time ago oslo.messaging changed default value of the "heartbeat_in_pthread" config option to "True" [1]. > > >> > > As was noticed some time ago, this don't works well with nova-compute - see bug [2] for details. > > >> > > Recently we noticed in our downstream Red Hat OpenStack, that it's not only nova-compute which don't works well with it and can hangs. We saw the same issue in various neutron agent processes. And it seems that it can be the same for any non-wsgi service which is using rabbitmq to send heartbeats. > > >> > > So giving all of that, I just proposed change of the default value of that config option to be "False" again [3]. > > >> > > And my question is - would it be possible and acceptable to backport such change up to stable/wallaby (if and when it will be approved for master of course). IMO this could be useful for users as using this option set as "True" be default don't makes any sense for the non-wsgi applications really and may cause more bad then good things really. What are You opinions about it? > > >> > > > >> > This is tricky, in general the default value change should not be backported because it change > > >> > the default behavior and so does the compatibility. But along with considering the cases do not > > >> > work with the current default value (you mentioned in this email), we should consider if this worked > > >> > in any other case or not. If so then I think we should not backport this and tell operator to override > > >> > it to False as workaround for stable branch fixes. > > >> as afar as i am aware the only impact of setting the default to false > > >> for wsgi applications is > > >> running under mod_wsgi or uwsgi may have the heatbeat greenthread > > >> killed when the wsgi server susspand the application > > >> after a time out following the processing of an api request. > > >> > > >> there is no known negitive impact to this other then a log message > > >> that can safely be ignored on both rabbitmq and the api log relating > > >> to the amqp messing connection being closed and repopend. > > >> > > >> keeping the value at true can cause the nova compute agent, neutron > > >> agent and i susppoct nova conductor/schduler to hang following a > > >> rabbitmq disconnect. > > >> that can leave the relevnet service unresponcei until its restarted. > > >> > > >> so having the default set to true is known to breake several services > > >> but tehre are no know issue that are caused by setting it to false > > >> that impact the operation fo any service. > > >> > > >> so i have a stong preference for setting thsi to false by default on > > >> stable branches. > > >> > > > >> > -gmann > > >> > > > >> > > > > >> > > [1] https://review.opendev.org/c/openstack/oslo.messaging/+/747395 > > >> > > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 > > >> > > [3] https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ > > >> > > > > >> > > -- > > >> > > Slawek Kaplonski > > >> > > Principal Software Engineer > > >> > > Red Hat > > >> > > > >> > > >> > > > > -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From ekuvaja at redhat.com Wed Aug 17 11:25:39 2022 From: ekuvaja at redhat.com (Erno Kuvaja) Date: Wed, 17 Aug 2022 12:25:39 +0100 Subject: [oslo][stable] Backport of the default value of the config option change In-Reply-To: <6101105.DvuYhMxLoT@p1> References: <21589662.aDxSllVl8Y@p1> <6101105.DvuYhMxLoT@p1> Message-ID: On Wed, 17 Aug 2022 at 08:56, Slawek Kaplonski wrote: > Hi, > > Dnia poniedzia?ek, 8 sierpnia 2022 17:37:05 CEST Dmitriy Rabotyagov pisze: > > Hey > > > > At the very least in OpenStack-Ansible we already handle that case, > > and have overwritten heartbeat_in_pthread for non-UWSGI services, > > which is already in stable branches. So backporting this new default > > setting would make us revert this patch and apply a set of new ones > > for uWSGI which is kind of nasty thing to do on stable branches. > > I'm not sure I understand why You would need to revert changes in > openstack-ansible if that oslo.messaging patch would be merged. It's "just" > default value which would be changed but OSA can still configure it > explicitly to the same value as default if needed, right? > > > > > IIRC (can be wrong here), kolla-ansible and TripleO also adopted such > > changes in their codebase. So with quite high probability, if you use > > any deployment tooling, this should be already handled relatively > > well. > > As Sean already mentioned, Tripleo didn't set this value and is using > default. That's why it hit us now. > Takashi proposed patch for it: > https://review.opendev.org/c/openstack/tripleo-heat-templates/+/852713 > > > > > We also can post a release note to stable branches about "known issue" > > instead of backporting a new default. > > > > ??, 8 ???. 2022 ?. ? 12:46, Rados?aw Piliszek < > radoslaw.piliszek at gmail.com>: > > > > > > Hi all, > > > > > > May this config option support "auto" by default and autodetect > > > whether the application is running under mod_wsgi (and uwsgi if it > > > also has the issue with green threads but here I'm not really sure...) > > > and then decide on the best option? > > > This way I would consider this backporting a fix (i.e. the library > > > tries better to work in the target environment). > > > > > > As a final thought, bear in mind there are operators who have already > > > overwritten the default, the deployment projects can help as well. > > > > > > -yoctozepto > > > > > > On Mon, 8 Aug 2022 at 10:30, Rodolfo Alonso Hernandez > > > wrote: > > > > > > > > Hello all: > > > > > > > > I understand that by default we don't allow backporting a config > knob default value. But I'm with Sean and his explanation. For "uwsgi" > applications, if pthread is False, the only drawback will be the > reconnection of the MQ socket. But in the case described by Slawek, the > problem is more relevant because once the agent has been disconnected for a > long time from the MQ, it is not possible to reconnect again and the agent > needs to be manually restarted. I would backport the patch setting this > config knob to False. > > > > > > > > Regards. > > > > > > > > > > > > On Sat, Aug 6, 2022 at 12:08 AM Sean Mooney > wrote: > > > >> > > > >> On Fri, Aug 5, 2022 at 7:40 PM Ghanshyam Mann < > gmann at ghanshyammann.com> wrote: > > > >> > > > > >> > ---- On Fri, 05 Aug 2022 17:54:25 +0530 Slawek Kaplonski wrote > --- > > > >> > > Hi, > > > >> > > > > > >> > > Some time ago oslo.messaging changed default value of the > "heartbeat_in_pthread" config option to "True" [1]. > > > >> > > As was noticed some time ago, this don't works well with > nova-compute - see bug [2] for details. > > > >> > > Recently we noticed in our downstream Red Hat OpenStack, that > it's not only nova-compute which don't works well with it and can hangs. We > saw the same issue in various neutron agent processes. And it seems that it > can be the same for any non-wsgi service which is using rabbitmq to send > heartbeats. > > > >> > > So giving all of that, I just proposed change of the default > value of that config option to be "False" again [3]. > > > >> > > And my question is - would it be possible and acceptable to > backport such change up to stable/wallaby (if and when it will be approved > for master of course). IMO this could be useful for users as using this > option set as "True" be default don't makes any sense for the non-wsgi > applications really and may cause more bad then good things really. What > are You opinions about it? > > > >> > > > > >> > This is tricky, in general the default value change should not be > backported because it change > > > >> > the default behavior and so does the compatibility. But along > with considering the cases do not > > > >> > work with the current default value (you mentioned in this > email), we should consider if this worked > > > >> > in any other case or not. If so then I think we should not > backport this and tell operator to override > > > >> > it to False as workaround for stable branch fixes. > > > >> as afar as i am aware the only impact of setting the default to > false > > > >> for wsgi applications is > > > >> running under mod_wsgi or uwsgi may have the heatbeat greenthread > > > >> killed when the wsgi server susspand the application > > > >> after a time out following the processing of an api request. > > > >> > > > >> there is no known negitive impact to this other then a log message > > > >> that can safely be ignored on both rabbitmq and the api log relating > > > >> to the amqp messing connection being closed and repopend. > > > >> > > > >> keeping the value at true can cause the nova compute agent, neutron > > > >> agent and i susppoct nova conductor/schduler to hang following a > > > >> rabbitmq disconnect. > > > >> that can leave the relevnet service unresponcei until its restarted. > > > >> > > > >> so having the default set to true is known to breake several > services > > > >> but tehre are no know issue that are caused by setting it to false > > > >> that impact the operation fo any service. > > > >> > > > >> so i have a stong preference for setting thsi to false by default on > > > >> stable branches. > > > >> > > > > >> > -gmann > > > >> > > > > >> > > > > > >> > > [1] > https://review.opendev.org/c/openstack/oslo.messaging/+/747395 > > > >> > > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937 > > > >> > > [3] > https://review.opendev.org/c/openstack/oslo.messaging/+/852251/ > > > >> > > > > > >> > > -- > > > >> > > Slawek Kaplonski > > > >> > > Principal Software Engineer > > > >> > > Red Hat > > > >> > > > > >> > > > >> > > > > > > > > > > -- > Slawek Kaplonski > Principal Software Engineer > Red Hat > Hi All, +1 from me too for reverting the default value. Sounds like a nasty side effect with low risk and minor impact (sorry OSA) to fix. Slawek: I assume OSA did not decide to set it always explicitly but only when deploying non-wsgi services so they will need to flip the logic around if the default changes, which is indeed unfortunate. (Set it for wsgi leave default otherwise from leave it default for wsgi, set false if not) IMHO your "auto" proposal might be beneficial for future development but I'd be way more hesitant to greenlight backport for a new feature with new possible value which widens the risk surface quite significantly than reverting default value change that broke things. Thanks all for good conversation! - jokke -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Wed Aug 17 13:23:10 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 17 Aug 2022 10:23:10 -0300 Subject: [cinder] Bug deputy report for week of 08-17-2022 Message-ID: This is a bug report from 08-10-2022 to 08-17-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- Medium - https://bugs.launchpad.net/cinder/+bug/1985962 "Wrong volume_quota exception msg when extending volume_size." Fix proposed to master. Low - https://bugs.launchpad.net/cinder/+bug/1985065 "[Storwize] mkhost command failure". Fix proposed to master. - https://bugs.launchpad.net/cinder/+bug/1986658 "NetApp driver is hitting the QoS policy limit due soft deletion." Fix proposed to master. Cheers, Sofia -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From the.wade.albright at gmail.com Tue Aug 16 16:22:24 2022 From: the.wade.albright at gmail.com (Wade Albright) Date: Tue, 16 Aug 2022 09:22:24 -0700 Subject: [ironic][xena] problems updating redfish_password for existing node In-Reply-To: References: Message-ID: Thanks Julia! When I get a chance I will test that out and report back. I may not be able to get to it right away as the system where I could reproduce this issue reliably is in use by another team now. On Mon, Aug 15, 2022 at 5:27 PM Julia Kreger wrote: > Well, that is weird. If I grok it as-is, it almost looks like the BMC > returned an empty response.... It is not a failure mode we've seen or > had reported (afaik) up to this point. > > That being said, we should be able to invalidate the session and > launch a new client... > > I suspect https://review.opendev.org/c/openstack/sushy/+/853209 > should fix things up. I suspect we will look at just invalidating the > session upon any error. > > On Mon, Aug 15, 2022 at 11:47 AM Wade Albright > wrote: > > > > 1) There are sockets open briefly when the conductor is trying to > connect. After three tries the node is set to maintenance mode and there > are no more sockets open. > > 2) My (extremely simple) code was not using connection: close. I was > just running "requests.get(" > https://10.12.104.174/redfish/v1/Systems/System.Embedded.1", > verify=False, auth=('xxxx', 'xxxx'))" in a loop. I just tried it with > headers={'Connection':'close'} and it doesn't seem to make any difference. > Works fine either way. > > > > I was able to confirm that the problem only happens when using session > auth. With basic auth it doesn't happen. > > > > Versions I'm using here are ironic 18.2.1 and sushy 3.12.2. > > > > Here are some fresh logs from the node having the problem: > > > > 2022-08-15 10:34:21.726 208875 INFO ironic.conductor.task_manager > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Node > 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploying" > from state "active"; target provision state is "active" > > 2022-08-15 10:34:22.553 208875 INFO ironic.conductor.utils > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Node > 0c304cea-8ae2-4a12-b658-dec05c190f88 current power state is 'power on', > requested state is 'power off'. > > 2022-08-15 10:34:35.185 208875 INFO ironic.conductor.utils > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Successfully set node > 0c304cea-8ae2-4a12-b658-dec05c190f88 power state to power off by power off. > > 2022-08-15 10:34:35.200 208875 WARNING ironic.common.pxe_utils > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] IPv6 is enabled and the > DHCP driver appears set to a plugin aside from "neutron". Node > 0c304cea-8ae2-4a12-b658-dec05c190f88 may not receive proper DHCPv6 provided > boot parameters. > > 2022-08-15 10:34:38.246 208875 INFO ironic.conductor.deployments > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Deploying on node > 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': > 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'}, > {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': > 'deploy'}, {'step': 'write_image', 'priority': 80, 'argsinfo': None, > 'interface': 'deploy'}, {'step': 'tear_down_agent', 'priority': 40, > 'argsinfo': None, 'interface': 'deploy'}, {'step': > 'switch_to_tenant_network', 'priority': 30, 'argsinfo': None, 'interface': > 'deploy'}, {'step': 'boot_instance', 'priority': 20, 'argsinfo': None, > 'interface': 'deploy'}] > > 2022-08-15 10:34:38.255 208875 INFO ironic.conductor.deployments > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Executing {'step': > 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'} on > node 0c304cea-8ae2-4a12-b658-dec05c190f88 > > 2022-08-15 10:35:27.158 208875 INFO > ironic.drivers.modules.ansible.deploy > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Ansible pre-deploy step > complete on node 0c304cea-8ae2-4a12-b658-dec05c190f88 > > 2022-08-15 10:35:27.159 208875 INFO ironic.conductor.deployments > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Node > 0c304cea-8ae2-4a12-b658-dec05c190f88 finished deploy step {'step': > 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'} > > 2022-08-15 10:35:27.160 208875 INFO ironic.conductor.deployments > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Deploying on node > 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': 'deploy', > 'priority': 100, 'argsinfo': None, 'interface': 'deploy'}, {'step': > 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}, > {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface': > 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo': > None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20, > 'argsinfo': None, 'interface': 'deploy'}] > > 2022-08-15 10:35:27.176 208875 INFO ironic.conductor.deployments > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Executing {'step': > 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node > 0c304cea-8ae2-4a12-b658-dec05c190f88 > > 2022-08-15 10:35:32.037 208875 INFO ironic.conductor.utils > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Successfully set node > 0c304cea-8ae2-4a12-b658-dec05c190f88 power state to power on by rebooting. > > 2022-08-15 10:35:32.037 208875 INFO ironic.conductor.deployments > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Deploy step {'step': > 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node > 0c304cea-8ae2-4a12-b658-dec05c190f88 being executed asynchronously, waiting > for driver. > > 2022-08-15 10:35:32.051 208875 INFO ironic.conductor.task_manager > [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 > b679510ddb6540ca9454e26841f65c89 - default default] Node > 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "wait > call-back" from state "deploying"; target provision state is "active" > > 2022-08-15 10:39:54.726 208875 INFO ironic.conductor.task_manager > [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node > 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploying" > from state "wait call-back"; target provision state is "active" > > 2022-08-15 10:39:54.741 208875 INFO ironic.conductor.deployments > [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Deploying on node > 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': > 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}, > {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface': > 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo': > None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20, > 'argsinfo': None, 'interface': 'deploy'}] > > 2022-08-15 10:39:54.748 208875 INFO ironic.conductor.deployments > [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Executing {'step': > 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on > node 0c304cea-8ae2-4a12-b658-dec05c190f88 > > 2022-08-15 10:42:24.738 208875 WARNING ironic.drivers.modules.agent_base > [-] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping > heartbeat processing (will retry on the next heartbeat): > ironic.common.exception.NodeLocked: Node > 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host > sjc06-c01-irn01.ops.ringcentral.com, please retry after the current > operation is completed. > > 2022-08-15 10:44:29.788 208875 WARNING ironic.drivers.modules.agent_base > [-] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping > heartbeat processing (will retry on the next heartbeat): > ironic.common.exception.NodeLocked: Node > 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host > sjc06-c01-irn01.ops.ringcentral.com, please retry after the current > operation is completed. > > 2022-08-15 10:47:24.830 208875 WARNING ironic.drivers.modules.agent_base > [-] Node 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping > heartbeat processing (will retry on the next heartbeat): > ironic.common.exception.NodeLocked: Node > 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host > sjc06-c01-irn01.ops.ringcentral.com, please retry after the current > operation is completed. > > 2022-08-15 11:05:59.544 208875 INFO > ironic.drivers.modules.ansible.deploy > [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Ansible complete > deploy on node 0c304cea-8ae2-4a12-b658-dec05c190f88 > > 2022-08-15 11:06:00.141 208875 ERROR ironic.conductor.utils > [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node > 0c304cea-8ae2-4a12-b658-dec05c190f88 failed deploy step {'step': > 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} > with unexpected error: ("Connection broken: InvalidChunkLength(got length > b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): > requests.exceptions.ChunkedEncodingError: ("Connection broken: > InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got > length b'', 0 bytes read)) > > 2022-08-15 11:06:00.218 208875 ERROR ironic.conductor.task_manager > [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node > 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploy > failed" from state "deploying"; target provision state is "active": > requests.exceptions.ChunkedEncodingError: ("Connection broken: > InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got > length b'', 0 bytes read)) > > 2022-08-15 11:06:28.774 208875 WARNING ironic.conductor.manager > [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During > sync_power_state, could not get power state for node > 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 1 of 3. Error: ("Connection > broken: InvalidChunkLength(got length b'', 0 bytes read)", > InvalidChunkLength(got length b'', 0 bytes read)).: > requests.exceptions.ChunkedEncodingError: ("Connection broken: > InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got > length b'', 0 bytes read)) > > 2022-08-15 11:06:53.710 208875 WARNING ironic.conductor.manager > [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During > sync_power_state, could not get power state for node > 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 2 of 3. Error: ("Connection > broken: InvalidChunkLength(got length b'', 0 bytes read)", > InvalidChunkLength(got length b'', 0 bytes read)).: > requests.exceptions.ChunkedEncodingError: ("Connection broken: > InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got > length b'', 0 bytes read)) > > 2022-08-15 11:07:53.727 208875 WARNING ironic.conductor.manager > [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During > sync_power_state, could not get power state for node > 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 3 of 3. Error: ("Connection > broken: InvalidChunkLength(got length b'', 0 bytes read)", > InvalidChunkLength(got length b'', 0 bytes read)).: > requests.exceptions.ChunkedEncodingError: ("Connection broken: > InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got > length b'', 0 bytes read)) > > 2022-08-15 11:08:53.704 208875 ERROR ironic.conductor.manager > [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During > sync_power_state, max retries exceeded for node > 0c304cea-8ae2-4a12-b658-dec05c190f88, node state None does not match > expected state 'power on'. Updating DB state to 'None' Switching node to > maintenance mode. Error: ("Connection broken: InvalidChunkLength(got length > b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): > requests.exceptions.ChunkedEncodingError: ("Connection broken: > InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got > length b'', 0 bytes read)) > > 2022-08-15 11:13:53.750 208875 ERROR ironic.conductor.manager > [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During > sync_power_state, max retries exceeded for node > 0c304cea-8ae2-4a12-b658-dec05c190f88, node state None does not match > expected state 'None'. Updating DB state to 'None' Switching node to > maintenance mode. Error: ("Connection broken: InvalidChunkLength(got length > b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): > requests.exceptions.ChunkedEncodingError: ("Connection broken: > InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got > length b'', 0 bytes read)) > > > > > > On Fri, Aug 12, 2022 at 9:06 PM Julia Kreger < > juliaashleykreger at gmail.com> wrote: > >> > >> Two questions: > >> > >> 1) do you see open sockets to the BMCs in netstat output? > >> 2) is your code using ?connection: close?? Or are you using sushy? > >> > >> Honestly, this seems *really* weird with current sushy versions, and is > kind of reminiscent of a cached session which is using kept alive sockets. > >> > >> If you could grep out req-b6dd74da-1cc7-4c63-b58e-b7ded37007e9 to see > what the prior couple of conductor actions were, that would give us better > context as to what is going on. > >> > >> -Julia > >> > >> On Fri, Aug 12, 2022 at 3:11 PM Wade Albright < > the.wade.albright at gmail.com> wrote: > >>> > >>> Sorry for the spam. The openssl issue may have been a red herring. I > am not able to reproduce the issue directly with my own python code. I was > trying to fetch something that required authentication. After I added the > correct auth info it works fine. I am not able to cause the same error as > is happening in the Ironic logs. > >>> > >>> Anyway I'll do some more testing and report back. > >>> > >>> On Fri, Aug 12, 2022 at 2:14 PM Wade Albright < > the.wade.albright at gmail.com> wrote: > >>>> > >>>> I'm not sure why this problem only now started showing up, but it > appears to be unrelated to Ironic. I was able to reproduce it directly > outside of Ironic using a simple python program using urllib to get URLs > from the BMC/redfish interface. Seems to be some combination of a buggy > server SSL implementation and newer openssl 1.1.1. Apparently it doesn't > happen using openssl 1.0. > >>>> > >>>> I've found some information about possible workarounds but haven't > figured it out yet. If I do I'll update this thread just in case anyone > else runs into it. > >>>> > >>>> On Fri, Aug 12, 2022 at 8:13 AM Wade Albright < > the.wade.albright at gmail.com> wrote: > >>>>> > >>>>> So I seem to have run into a new issue after upgrading to the newer > versions to fix the password change issue. > >>>>> > >>>>> Now I am randomly getting errors like the below. Once I hit this > error for a given node, no operations work on the node. I thought maybe it > was an issue with the node itself, but it doesn't seem like it. The BMC > seems to be working fine. > >>>>> > >>>>> After a conductor restart, things start working again. Has anyone > seen something like this? > >>>>> > >>>>> Log example: > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils [- - - > - -] Node ef5a2502-680b-4933-a0ee-6737e57ce1c5 failed deploy step {'step': > 'write_image', 'priority': > >>>>> 80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: > ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", > InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions. > >>>>> ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got > length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes > read)) > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > Traceback (most recent call last): > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 697, in > _update_chunk_length > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > self.chunk_left = int(line, 16) > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > ValueError: invalid literal for int() with base 16: b'' > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During > handling of the above exception, another exception occurred: > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > Traceback (most recent call last): > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 438, in > _error_catcher > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > yield > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 764, in > read_chunked > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > self._update_chunk_length() > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 701, in > _update_chunk_length > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > raise InvalidChunkLength(self, line) > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 > bytes read) > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During > handling of the above exception, another exception occurred: > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > Traceback (most recent call last): > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/requests/models.py", line 760, in > generate > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for > chunk in self.raw.stream(chunk_size, decode_content=True): > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 572, in > stream > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils for > line in self.read_chunked(amt, decode_content=decode_content): > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 793, in > read_chunked > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > self._original_response.close() > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__ > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > self.gen.throw(type, value, traceback) > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File > "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 455, in > _error_catcher > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > raise ProtocolError("Connection broken: %r" % e, e) > >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils > urllib3.exceptions.ProtocolError: ("Connection broken: > InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got > length b'', 0 bytes r > >>>>> ead)) > >>>>> > >>>>> On Wed, Jul 20, 2022 at 2:04 PM Wade Albright < > the.wade.albright at gmail.com> wrote: > >>>>>> > >>>>>> I forgot to mention, that using session auth solved the problem > after upgrading to the newer versions that include the two mentioned > patches. > >>>>>> > >>>>>> On Wed, Jul 20, 2022 at 7:36 AM Wade Albright < > the.wade.albright at gmail.com> wrote: > >>>>>>> > >>>>>>> Switching to session auth solved the problem, and it seems like > the better way to go anyway for equipment that supports it. Thanks again > for all your help! > >>>>>>> > >>>>>>> Wade > >>>>>>> > >>>>>>> On Tue, Jul 19, 2022 at 5:37 PM Julia Kreger < > juliaashleykreger at gmail.com> wrote: > >>>>>>>> > >>>>>>>> Just to provide a brief update for the mailing list. It looks like > >>>>>>>> this is a case of use of Basic Auth with the BMC, where we were > not > >>>>>>>> catching the error properly... and thus not reporting the > >>>>>>>> authentication failure to ironic so it would catch, and initiate > a new > >>>>>>>> client with the most up to date password. The default, typically > used > >>>>>>>> path is Session based authentication as BMCs generally handle > internal > >>>>>>>> session/user login tracking in a far better fashion. But not > every BMC > >>>>>>>> supports sessions. > >>>>>>>> > >>>>>>>> Fix in review[0] :) > >>>>>>>> > >>>>>>>> -Julia > >>>>>>>> [0] https://review.opendev.org/c/openstack/sushy/+/850425 > >>>>>>>> > >>>>>>>> On Mon, Jul 18, 2022 at 4:15 PM Julia Kreger > >>>>>>>> wrote: > >>>>>>>> > > >>>>>>>> > Excellent, hopefully I'll be able to figure out why Sushy is > not doing > >>>>>>>> > the needful... Or if it is and Ironic is not picking up on it. > >>>>>>>> > > >>>>>>>> > Anyway, I've posted > >>>>>>>> > https://review.opendev.org/c/openstack/ironic/+/850259 which > might > >>>>>>>> > handle this issue. Obviously a work in progress, but it > represents > >>>>>>>> > what I think is happening inside of ironic itself leading into > sushy > >>>>>>>> > when cache access occurs. > >>>>>>>> > > >>>>>>>> > On Mon, Jul 18, 2022 at 4:04 PM Wade Albright > >>>>>>>> > wrote: > >>>>>>>> > > > >>>>>>>> > > Sounds good, I will do that tomorrow. Thanks Julia. > >>>>>>>> > > > >>>>>>>> > > On Mon, Jul 18, 2022 at 3:27 PM Julia Kreger < > juliaashleykreger at gmail.com> wrote: > >>>>>>>> > >> > >>>>>>>> > >> Debug would be best. I think I have an idea what is going > on, and this > >>>>>>>> > >> is a similar variation. If you want, you can email them > directly to > >>>>>>>> > >> me. Specifically only need entries reported by the sushy > library and > >>>>>>>> > >> ironic.drivers.modules.redfish.utils. > >>>>>>>> > >> > >>>>>>>> > >> On Mon, Jul 18, 2022 at 3:20 PM Wade Albright > >>>>>>>> > >> wrote: > >>>>>>>> > >> > > >>>>>>>> > >> > I'm happy to supply some logs, what verbosity level should > i use? And should I just embed the logs in email to the list or upload > somewhere? > >>>>>>>> > >> > > >>>>>>>> > >> > On Mon, Jul 18, 2022 at 3:14 PM Julia Kreger < > juliaashleykreger at gmail.com> wrote: > >>>>>>>> > >> >> > >>>>>>>> > >> >> If you could supply some conductor logs, that would be > helpful. It > >>>>>>>> > >> >> should be re-authenticating, but obviously we have a > larger bug there > >>>>>>>> > >> >> we need to find the root issue behind. > >>>>>>>> > >> >> > >>>>>>>> > >> >> On Mon, Jul 18, 2022 at 3:06 PM Wade Albright > >>>>>>>> > >> >> wrote: > >>>>>>>> > >> >> > > >>>>>>>> > >> >> > I was able to use the patches to update the code, but > unfortunately the problem is still there for me. > >>>>>>>> > >> >> > > >>>>>>>> > >> >> > I also tried an RPM upgrade to the versions Julia > mentioned had the fixes, namely Sushy 3.12.1 - Released May 2022 and Ironic > 18.2.1 - Released in January 2022. But it did not fix the problem. > >>>>>>>> > >> >> > > >>>>>>>> > >> >> > I am able to consistently reproduce the error. > >>>>>>>> > >> >> > - step 1: change BMC password directly on the node > itself > >>>>>>>> > >> >> > - step 2: update BMC password (redfish_password) in > ironic with 'openstack baremetal node set --driver-info > redfish_password='newpass' > >>>>>>>> > >> >> > > >>>>>>>> > >> >> > After step 1 there are errors in the logs entries like > "Session authentication appears to have been lost at some point in time" > and eventually it puts the node into maintenance mode and marks the power > state as "none." > >>>>>>>> > >> >> > After step 2 and taking the host back out of > maintenance mode, it goes through a similar set of log entries puts the > node into MM again. > >>>>>>>> > >> >> > > >>>>>>>> > >> >> > After the above steps, a conductor restart fixes the > problem and operations work normally again. Given this it seems like there > is still some kind of caching issue. > >>>>>>>> > >> >> > > >>>>>>>> > >> >> > On Sat, Jul 16, 2022 at 6:01 PM Wade Albright < > the.wade.albright at gmail.com> wrote: > >>>>>>>> > >> >> >> > >>>>>>>> > >> >> >> Hi Julia, > >>>>>>>> > >> >> >> > >>>>>>>> > >> >> >> Thank you so much for the reply! Hopefully this is the > issue. I'll try out the patches next week and report back. I'll also email > you on Monday about the versions, that would be very helpful to know. > >>>>>>>> > >> >> >> > >>>>>>>> > >> >> >> Thanks again, really appreciate it. > >>>>>>>> > >> >> >> > >>>>>>>> > >> >> >> Wade > >>>>>>>> > >> >> >> > >>>>>>>> > >> >> >> > >>>>>>>> > >> >> >> > >>>>>>>> > >> >> >> On Sat, Jul 16, 2022 at 4:36 PM Julia Kreger < > juliaashleykreger at gmail.com> wrote: > >>>>>>>> > >> >> >>> > >>>>>>>> > >> >> >>> Greetings! > >>>>>>>> > >> >> >>> > >>>>>>>> > >> >> >>> I believe you need two patches, one in ironic and one > in sushy. > >>>>>>>> > >> >> >>> > >>>>>>>> > >> >> >>> Sushy: > >>>>>>>> > >> >> >>> https://review.opendev.org/c/openstack/sushy/+/832860 > >>>>>>>> > >> >> >>> > >>>>>>>> > >> >> >>> Ironic: > >>>>>>>> > >> >> >>> > https://review.opendev.org/c/openstack/ironic/+/820588 > >>>>>>>> > >> >> >>> > >>>>>>>> > >> >> >>> I think it is variation, and the comment about > working after you restart the conductor is the big signal to me. I?m on a > phone on a bad data connection, if you email me on Monday I can see what > versions the fixes would be in. > >>>>>>>> > >> >> >>> > >>>>>>>> > >> >> >>> For the record, it is a session cache issue, the bug > was that the service didn?t quite know what to do when auth fails. > >>>>>>>> > >> >> >>> > >>>>>>>> > >> >> >>> -Julia > >>>>>>>> > >> >> >>> > >>>>>>>> > >> >> >>> > >>>>>>>> > >> >> >>> On Fri, Jul 15, 2022 at 2:55 PM Wade Albright < > the.wade.albright at gmail.com> wrote: > >>>>>>>> > >> >> >>>> > >>>>>>>> > >> >> >>>> Hi, > >>>>>>>> > >> >> >>>> > >>>>>>>> > >> >> >>>> I'm hitting a problem when trying to update the > redfish_password for an existing node. I'm curious to know if anyone else > has encountered this problem. I'm not sure if I'm just doing something > wrong or if there is a bug. Or if the problem is unique to my setup. > >>>>>>>> > >> >> >>>> > >>>>>>>> > >> >> >>>> I have a node already added into ironic with all the > driver details set, and things are working fine. I am able to run > deployments. > >>>>>>>> > >> >> >>>> > >>>>>>>> > >> >> >>>> Now I need to change the redfish password on the > host. So I update the password for redfish access on the host, then use an > 'openstack baremetal node set --driver-info > redfish_password=' command to set the new redfish_password. > >>>>>>>> > >> >> >>>> > >>>>>>>> > >> >> >>>> Once this has been done, deployment no longer works. > I see redfish authentication errors in the logs and the operation fails. I > waited a bit to see if there might just be a delay in updating the > password, but after awhile it still didn't work. > >>>>>>>> > >> >> >>>> > >>>>>>>> > >> >> >>>> I restarted the conductor, and after that things > work fine again. So it seems like the password is cached or something. Is > there a way to force the password to update? I even tried removing the > redfish credentials and re-adding them, but that didn't work either. Only a > conductor restart seems to make the new password work. > >>>>>>>> > >> >> >>>> > >>>>>>>> > >> >> >>>> We are running Xena, using rpm installation on > Oracle Linux 8.5. > >>>>>>>> > >> >> >>>> > >>>>>>>> > >> >> >>>> Thanks in advance for any help with this issue. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Wed Aug 17 19:38:52 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 17 Aug 2022 15:38:52 -0400 Subject: [cinder] propose to EOL cinderlib train and ussuri Message-ID: <0df831c8-bf55-4703-9dac-1fdf2dc5a6ae@gmail.com> At last week's cinder project midcycle [0], the team discussed recent fixes to keep the cinderlib CI functional in the oldest stable branches. At this point, stable/train is running the bare minimum of CI to keep the branch open, and stable/ussuri is running only one functional job in excess of the bare minimum. The only changes merged into either branch since each was tagged -em have been non-functional, so we believe that there is no demand for keeping the branches open. Thus, the following patches to EOL cinderlib train and ussuri have been posted: - https://review.opendev.org/c/openstack/releases/+/853534 - https://review.opendev.org/c/openstack/releases/+/853535 This email serves as notice of the intent of the cinder project to EOL the cinderlib train and ussuri branches. If you have comments or concerns, please reply to this email or leave a comment on the appropriate patch. [0] https://etherpad.opendev.org/p/cinder-zed-midcycles From rosmaita.fossdev at gmail.com Wed Aug 17 20:32:32 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 17 Aug 2022 16:32:32 -0400 Subject: [cinder][drivers] third-party CI for os-brick changes Message-ID: To all third-party CI maintainers, As you are aware, cinder third-party CI systems are required to run on all cinder changes. However, the os-brick library used in cinder CI testing is the latest appropriate *released* version of os-brick. Thus, it is possible for changes to be happening in os-brick development that might impact the functionality of your driver. If you aren't testing os-brick changes, you won't find out about these until *after* the next os-brick release, which is bad news all around. Therefore, at last week's cinder midcycle [0], the cinder project team agreed to require that cinder third-party CI systems run on all os-brick changes in addition to all cinder changes. This is a nice-to-have for the current (Zed) development cycle, but will be required in order for a driver to be considered 'supported' in the 2023.1 (Antelope) release [1]. If you have comments or concerns about this policy, please reply on the list to this email or put an item on the agenda [2] for the cinder weekly meeting. [0] https://etherpad.opendev.org/p/cinder-zed-midcycles [1] https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers#What_changes_should_I_test_on.3F [2] https://etherpad.opendev.org/p/cinder-zed-meetings From fv at spots.edu Wed Aug 17 21:16:26 2022 From: fv at spots.edu (Father Vlasie) Date: Wed, 17 Aug 2022 14:16:26 -0700 Subject: [openstack-ansible] [yoga] utility_container failure In-Reply-To: References: <9FEF486C-780F-46B2-B9A4-5DEFC215A139@spots.edu> Message-ID: <5C79D786-B1AF-425B-9BEE-D683FA94A907@spots.edu> Hello, I am very appreciative of your help! I think my interface setup might be questionable. I did not realise that the nodes need to talk to each other on the external IP. I thought that was only for communication with entities external to the cluster. My bond0 is associated with br-vlan so I put the external IP there and set br-vlan as the external interface in user_variables. The nodes can now ping each other on the external network. This is how I have user_variables configured: ??? haproxy_keepalived_external_vip_cidr: ?192.168.2.9/26" haproxy_keepalived_internal_vip_cidr: "192.168.3.9/32" haproxy_keepalived_external_interface: br-vlan haproxy_keepalived_internal_interface: br-mgmt haproxy_bind_external_lb_vip_address: 192.168.2.9 haproxy_bind_internal_lb_vip_address: 192.168.3.9 ??? My IP addresses are configured thusly (one sample from each node type): ??? infra1 bond0->br-vlan 192.168.2.13 br-mgmt 192.168.3.13 br-vxlan 192.168.30.13 br-storage compute1 br-vlan br-mgmt 192.168.3.16 br-vxlan 192.168.30.16 br-storage 192.168.20.16 log1 br-vlan br-mgmt 192.168.3.19 br-vxlan br-storage ??? I have destroyed all of my containers and I am running setup-hosts again. Here?s to hoping it all turns out this time! Very gratefully, FV > On Aug 16, 2022, at 7:31 PM, James Denton wrote: > > Hello, > > >> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? > > This will likely be the bond0 interface and not the individual bond member. However, the interface defined here will ultimately depend on the networking of that host, and should be an external facing one (i.e. the interface with the default gateway). > > In many environments, you?ll have something like this (or using 2 bonds, but same idea): > > ? Bond0 (192.168.100.5/24 gw 192.168.100.1) > ? Em49 > ? Em50 > ? Br-mgmt (172.29.236.5/22) > ? Bond0.236 > ? Br-vxlan (172.29.240.5/22) > ? Bond0.240 > ? Br-storage (172.29.244.5/22) > ? Bond0.244 > > In this example, bond0 has the management IP 192.168.100.5 and br-mgmt is the ?container? bridge with an IP configured from the ?container? network (see cidr_networks in openstack_user_config.yml). FYI: LXC containers will automatically be assigned IPs from the ?container? network outside of the ?used_ips? range(s). The infra host will communicate with the containers via this br-mgmt interface. > > I?m using FQDNs for the VIPs, which are specified in openstack_user_config.yml here: > > global_overrides: > internal_lb_vip_address: internalapi.openstack.rackspace.lab > external_lb_vip_address: publicapi.openstack.rackspace.lab > > To avoid DNS resolution issues internally (or rather, to ensure the IP is configured in the config files and not the domain name) I?ll override with the IP and hard set the preferred interface(s): > > haproxy_keepalived_external_vip_cidr: "192.168.100.10/32" > haproxy_keepalived_internal_vip_cidr: "172.29.236.10/32" > haproxy_keepalived_external_interface: bond0 > haproxy_keepalived_internal_interface: br-mgmt > haproxy_bind_external_lb_vip_address: 192.168.100.10 > haproxy_bind_internal_lb_vip_address: 172.29.236.10 > > With the above configuration, keepalived will manage two VIPs - one external and one internal, and endpoints will have the FQDN rather than IP. > > >> Curl shows "503 Service Unavailable No server is available to handle this request? > > Hard to say without seeing logs why this is happening, but I will assume that keepalived is having issues binding the IP to the interface. You might find the reason in syslog or ?journalctl -xe -f -u keepalived?. > > >> Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." > > You might try running ?umount /var/www/repo? and re-run the repo-install.yml playbook (or setup-infrastructure.yml). > > Hope that helps! > > James Denton > Rackspace Private Cloud > > From: Father Vlasie > Date: Tuesday, August 16, 2022 at 4:31 PM > To: James Denton > Cc: openstack-discuss at lists.openstack.org > Subject: Re: [openstack-ansible] [yoga] utility_container failure > > CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > > > Hello, > > Thank you very much for the reply! > > haproxy and keepalived both show status active on infra1 (my primary node). > > Curl shows "503 Service Unavailable No server is available to handle this request? > > (Also the URL is http not https?.) > > If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? > > Earlier in the output I find the following error (showing for all 3 infra nodes): > > ------------ > > TASK [systemd_mount : Set the state of the mount] ***************************************************************************************************************************************** > fatal: [infra3_repo_container-7ca5db88]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.022275", "end": "2022-08-16 14:16:34.926861", "msg": "non-zero return code", "rc": 1, "start": "2022-08-16 14:16:34.904586", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []} > > ?????? > > Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." > > Thank you again! > > Father Vlasie > > > On Aug 16, 2022, at 6:32 AM, James Denton wrote: > > > > Hello, > > > > That error means the repo server at 192.168.3.9:8181 is unavailable. The repo server sits behind haproxy, which should be listening on 192.168.3.9 port 8181 on the active (primary) node. You can verify this by issuing a ?curl -vhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2F192.168.3.9%3A8181%2F%25E2%2580%2599&data=05%7C01%7Cjames.denton%40rackspace.com%7C17cc3373086d47b3321408da7fceb08a%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637962823085464366%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PSsQW%2FW9N5JXapmMal%2FrsUm%2B8IYkDFTxwqxaH3K7tbA%3D&reserved=0. You might check the haproxy service status and/or keepalived status to ensure they are operating properly. If the IP cannot be bound to the correct interface, keepalive may not start. > > > > James Denton > > Rackspace Private Cloud > > > > From: Father Vlasie > > Date: Tuesday, August 16, 2022 at 7:38 AM > > To: openstack-discuss at lists.openstack.org > > Subject: [openstack-ansible] [yoga] utility_container failure > > > > CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > > > > > > Hello everyone, > > > > I have happily progressed to the second step of running the playbooks, namely "openstack-ansible setup-infrastructure.yml" > > > > Everything looks good except for just one error which is mystifying me: > > > > ---------------- > > > > TASK [Get list of repo packages] ********************************************************************************************************************************************************** > > fatal: [infra1_utility_container-5ec32cb5]: FAILED! => {"changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: ", "redirected": false, "status": -1, "url": "https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.3.9%3A8181%2Fconstraints%2Fupper_constraints_cached.txt&data=05%7C01%7Cjames.denton%40rackspace.com%7C17cc3373086d47b3321408da7fceb08a%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637962823085620584%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XhtQdDC8GpQuJXGbQxhlBkUOS3krH%2F9d%2FxU9hWB1Cts%3D&reserved=0"} > > > > ---------------- > > > > 192.168.3.9 is the IP listed in user_variables.yml under haproxy_keepalived_internal_vip_cidr > > > > Any help or pointers would be very much appreciated! > > > > Thank you, > > > > Father Vlasie > > > From fv at spots.edu Wed Aug 17 22:22:34 2022 From: fv at spots.edu (Father Vlasie) Date: Wed, 17 Aug 2022 15:22:34 -0700 Subject: [openstack-ansible] [yoga] utility_container failure In-Reply-To: <5C79D786-B1AF-425B-9BEE-D683FA94A907@spots.edu> References: <9FEF486C-780F-46B2-B9A4-5DEFC215A139@spots.edu> <5C79D786-B1AF-425B-9BEE-D683FA94A907@spots.edu> Message-ID: <8681B9FD-D061-4D97-A46D-91FFC33AFE96@spots.edu> Hello again! I have completed the run of setup-hosts successfully. However I am still seeing errors when running setup-infrastructure: ------ TASK [openstack.osa.glusterfs : Start glusterfs server] ********************************************************************** fatal: [infra1_repo_container-20deb465]: FAILED! => {"changed": false, "msg": "Unable to start service glusterd: Job for glusterd.service failed because the control process exited with error code.\nSee \"systemctl status glusterd.service\" and \"journalctl -xe\" for details.\n"} ------ TASK [systemd_mount : Set the state of the mount] **************************************************************************** fatal: [infra2_repo_container-6cd61edd]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.021452", "end": "2022-08-17 18:17:37.172187", "msg": "non-zero return code", "rc": 1, "start": "2022-08-17 18:17:37.150735", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []} ------ fatal: [infra2_repo_container-6cd61edd]: FAILED! => {"attempts": 5, "changed": false, "cmd": ["mountpoint", "-q", "/var/www/repo"], "delta": "0:00:00.002310", "end": "2022-08-17 18:18:04.297940", "msg": "non-zero return code", "rc": 1, "start": "2022-08-17 18:18:04.295630", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} ------ infra1_repo_container-20deb465 : ok=30 changed=2 unreachable=0 failed=1 skipped=14 rescued=0 ignored=0 infra2_repo_container-6cd61edd : ok=66 changed=6 unreachable=0 failed=2 skipped=22 rescued=1 ignored=0 infra3_repo_container-7ca5db88 : ok=64 changed=6 unreachable=0 failed=2 skipped=22 rescued=1 ignored=0 ------ Again any help is much appreciated! Thank you, FV > On Aug 17, 2022, at 2:16 PM, Father Vlasie wrote: > > Hello, > > I am very appreciative of your help! > > I think my interface setup might be questionable. > > I did not realise that the nodes need to talk to each other on the external IP. I thought that was only for communication with entities external to the cluster. > > My bond0 is associated with br-vlan so I put the external IP there and set br-vlan as the external interface in user_variables. > > The nodes can now ping each other on the external network. > > This is how I have user_variables configured: > > ??? > > haproxy_keepalived_external_vip_cidr: ?192.168.2.9/26" > haproxy_keepalived_internal_vip_cidr: "192.168.3.9/32" > haproxy_keepalived_external_interface: br-vlan > haproxy_keepalived_internal_interface: br-mgmt > haproxy_bind_external_lb_vip_address: 192.168.2.9 > haproxy_bind_internal_lb_vip_address: 192.168.3.9 > > ??? > > My IP addresses are configured thusly (one sample from each node type): > > ??? > > infra1 > bond0->br-vlan 192.168.2.13 > br-mgmt 192.168.3.13 > br-vxlan 192.168.30.13 > br-storage > > compute1 > br-vlan > br-mgmt 192.168.3.16 > br-vxlan 192.168.30.16 > br-storage 192.168.20.16 > > log1 > br-vlan > br-mgmt 192.168.3.19 > br-vxlan > br-storage > > ??? > > I have destroyed all of my containers and I am running setup-hosts again. > > Here?s to hoping it all turns out this time! > > Very gratefully, > > FV > >> On Aug 16, 2022, at 7:31 PM, James Denton wrote: >> >> Hello, >> >>>> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? >> >> This will likely be the bond0 interface and not the individual bond member. However, the interface defined here will ultimately depend on the networking of that host, and should be an external facing one (i.e. the interface with the default gateway). >> >> In many environments, you?ll have something like this (or using 2 bonds, but same idea): >> >> ? Bond0 (192.168.100.5/24 gw 192.168.100.1) >> ? Em49 >> ? Em50 >> ? Br-mgmt (172.29.236.5/22) >> ? Bond0.236 >> ? Br-vxlan (172.29.240.5/22) >> ? Bond0.240 >> ? Br-storage (172.29.244.5/22) >> ? Bond0.244 >> >> In this example, bond0 has the management IP 192.168.100.5 and br-mgmt is the ?container? bridge with an IP configured from the ?container? network (see cidr_networks in openstack_user_config.yml). FYI: LXC containers will automatically be assigned IPs from the ?container? network outside of the ?used_ips? range(s). The infra host will communicate with the containers via this br-mgmt interface. >> >> I?m using FQDNs for the VIPs, which are specified in openstack_user_config.yml here: >> >> global_overrides: >> internal_lb_vip_address: internalapi.openstack.rackspace.lab >> external_lb_vip_address: publicapi.openstack.rackspace.lab >> >> To avoid DNS resolution issues internally (or rather, to ensure the IP is configured in the config files and not the domain name) I?ll override with the IP and hard set the preferred interface(s): >> >> haproxy_keepalived_external_vip_cidr: "192.168.100.10/32" >> haproxy_keepalived_internal_vip_cidr: "172.29.236.10/32" >> haproxy_keepalived_external_interface: bond0 >> haproxy_keepalived_internal_interface: br-mgmt >> haproxy_bind_external_lb_vip_address: 192.168.100.10 >> haproxy_bind_internal_lb_vip_address: 172.29.236.10 >> >> With the above configuration, keepalived will manage two VIPs - one external and one internal, and endpoints will have the FQDN rather than IP. >> >>>> Curl shows "503 Service Unavailable No server is available to handle this request? >> >> Hard to say without seeing logs why this is happening, but I will assume that keepalived is having issues binding the IP to the interface. You might find the reason in syslog or ?journalctl -xe -f -u keepalived?. >> >>>> Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." >> >> You might try running ?umount /var/www/repo? and re-run the repo-install.yml playbook (or setup-infrastructure.yml). >> >> Hope that helps! >> >> James Denton >> Rackspace Private Cloud >> >> From: Father Vlasie >> Date: Tuesday, August 16, 2022 at 4:31 PM >> To: James Denton >> Cc: openstack-discuss at lists.openstack.org >> Subject: Re: [openstack-ansible] [yoga] utility_container failure >> >> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! >> >> >> Hello, >> >> Thank you very much for the reply! >> >> haproxy and keepalived both show status active on infra1 (my primary node). >> >> Curl shows "503 Service Unavailable No server is available to handle this request? >> >> (Also the URL is http not https?.) >> >> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? >> >> Earlier in the output I find the following error (showing for all 3 infra nodes): >> >> ------------ >> >> TASK [systemd_mount : Set the state of the mount] ***************************************************************************************************************************************** >> fatal: [infra3_repo_container-7ca5db88]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.022275", "end": "2022-08-16 14:16:34.926861", "msg": "non-zero return code", "rc": 1, "start": "2022-08-16 14:16:34.904586", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []} >> >> ?????? >> >> Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." >> >> Thank you again! >> >> Father Vlasie >> >>> On Aug 16, 2022, at 6:32 AM, James Denton wrote: >>> >>> Hello, >>> >>> That error means the repo server at 192.168.3.9:8181 is unavailable. The repo server sits behind haproxy, which should be listening on 192.168.3.9 port 8181 on the active (primary) node. You can verify this by issuing a ?curl -vhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2F192.168.3.9%3A8181%2F%25E2%2580%2599&data=05%7C01%7Cjames.denton%40rackspace.com%7C17cc3373086d47b3321408da7fceb08a%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637962823085464366%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PSsQW%2FW9N5JXapmMal%2FrsUm%2B8IYkDFTxwqxaH3K7tbA%3D&reserved=0. You might check the haproxy service status and/or keepalived status to ensure they are operating properly. If the IP cannot be bound to the correct interface, keepalive may not start. >>> >>> James Denton >>> Rackspace Private Cloud >>> >>> From: Father Vlasie >>> Date: Tuesday, August 16, 2022 at 7:38 AM >>> To: openstack-discuss at lists.openstack.org >>> Subject: [openstack-ansible] [yoga] utility_container failure >>> >>> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! >>> >>> >>> Hello everyone, >>> >>> I have happily progressed to the second step of running the playbooks, namely "openstack-ansible setup-infrastructure.yml" >>> >>> Everything looks good except for just one error which is mystifying me: >>> >>> ---------------- >>> >>> TASK [Get list of repo packages] ********************************************************************************************************************************************************** >>> fatal: [infra1_utility_container-5ec32cb5]: FAILED! => {"changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: ", "redirected": false, "status": -1, "url": "https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.3.9%3A8181%2Fconstraints%2Fupper_constraints_cached.txt&data=05%7C01%7Cjames.denton%40rackspace.com%7C17cc3373086d47b3321408da7fceb08a%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637962823085620584%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XhtQdDC8GpQuJXGbQxhlBkUOS3krH%2F9d%2FxU9hWB1Cts%3D&reserved=0"} >>> >>> ---------------- >>> >>> 192.168.3.9 is the IP listed in user_variables.yml under haproxy_keepalived_internal_vip_cidr >>> >>> Any help or pointers would be very much appreciated! >>> >>> Thank you, >>> >>> Father Vlasie >>> >> > From james.denton at rackspace.com Thu Aug 18 00:18:57 2022 From: james.denton at rackspace.com (James Denton) Date: Thu, 18 Aug 2022 00:18:57 +0000 Subject: [openstack-ansible] [yoga] utility_container failure In-Reply-To: <8681B9FD-D061-4D97-A46D-91FFC33AFE96@spots.edu> References: <9FEF486C-780F-46B2-B9A4-5DEFC215A139@spots.edu> <5C79D786-B1AF-425B-9BEE-D683FA94A907@spots.edu> <8681B9FD-D061-4D97-A46D-91FFC33AFE96@spots.edu> Message-ID: Hello, My recommendation is to try running these commands from the deploy node and see what the output is (or maybe try running the playbooks in verbose mode with -vvv): # ssh infra1_repo_container-20deb465 # systemctl status glusterd.service # journalctl -xe -u glusterd.service # exit ^^ Might also consider restarting glusterd and checking the journal to see if there?s an error. # ssh infra2_repo_container-6cd61edd # systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\") # systemctl status var-www-repo.mount # journalctl -xe # exit The issue may be obvious. Maybe not. If you can ship that output to paste.openstack.org we might be able to diagnose. The mountpoint command will return 0 if /var/www/repo is a mountpoint, and 1 if it is not a mountpoint. Looks like it is probably failing due to a previous task (ie. It is not being mounted). Understanding why glusterfs is failing may be key here. > I have destroyed all of my containers and I am running setup-hosts again Can you describe what you did here? Simply destroy the LXC containers or did you wipe the inventory, too? Thanks, James Denton Rackspace Private Cloud From: Father Vlasie Date: Wednesday, August 17, 2022 at 5:22 PM To: James Denton Cc: openstack-discuss at lists.openstack.org Subject: Re: [openstack-ansible] [yoga] utility_container failure CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Hello again! I have completed the run of setup-hosts successfully. However I am still seeing errors when running setup-infrastructure: ------ TASK [openstack.osa.glusterfs : Start glusterfs server] ********************************************************************** fatal: [infra1_repo_container-20deb465]: FAILED! => {"changed": false, "msg": "Unable to start service glusterd: Job for glusterd.service failed because the control process exited with error code.\nSee \"systemctl status glusterd.service\" and \"journalctl -xe\" for details.\n"} ------ TASK [systemd_mount : Set the state of the mount] **************************************************************************** fatal: [infra2_repo_container-6cd61edd]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.021452", "end": "2022-08-17 18:17:37.172187", "msg": "non-zero return code", "rc": 1, "start": "2022-08-17 18:17:37.150735", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []} ------ fatal: [infra2_repo_container-6cd61edd]: FAILED! => {"attempts": 5, "changed": false, "cmd": ["mountpoint", "-q", "/var/www/repo"], "delta": "0:00:00.002310", "end": "2022-08-17 18:18:04.297940", "msg": "non-zero return code", "rc": 1, "start": "2022-08-17 18:18:04.295630", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} ------ infra1_repo_container-20deb465 : ok=30 changed=2 unreachable=0 failed=1 skipped=14 rescued=0 ignored=0 infra2_repo_container-6cd61edd : ok=66 changed=6 unreachable=0 failed=2 skipped=22 rescued=1 ignored=0 infra3_repo_container-7ca5db88 : ok=64 changed=6 unreachable=0 failed=2 skipped=22 rescued=1 ignored=0 ------ Again any help is much appreciated! Thank you, FV > On Aug 17, 2022, at 2:16 PM, Father Vlasie wrote: > > Hello, > > I am very appreciative of your help! > > I think my interface setup might be questionable. > > I did not realise that the nodes need to talk to each other on the external IP. I thought that was only for communication with entities external to the cluster. > > My bond0 is associated with br-vlan so I put the external IP there and set br-vlan as the external interface in user_variables. > > The nodes can now ping each other on the external network. > > This is how I have user_variables configured: > > ??? > > haproxy_keepalived_external_vip_cidr: ?192.168.2.9/26" > haproxy_keepalived_internal_vip_cidr: "192.168.3.9/32" > haproxy_keepalived_external_interface: br-vlan > haproxy_keepalived_internal_interface: br-mgmt > haproxy_bind_external_lb_vip_address: 192.168.2.9 > haproxy_bind_internal_lb_vip_address: 192.168.3.9 > > ??? > > My IP addresses are configured thusly (one sample from each node type): > > ??? > > infra1 > bond0->br-vlan 192.168.2.13 > br-mgmt 192.168.3.13 > br-vxlan 192.168.30.13 > br-storage > > compute1 > br-vlan > br-mgmt 192.168.3.16 > br-vxlan 192.168.30.16 > br-storage 192.168.20.16 > > log1 > br-vlan > br-mgmt 192.168.3.19 > br-vxlan > br-storage > > ??? > > I have destroyed all of my containers and I am running setup-hosts again. > > Here?s to hoping it all turns out this time! > > Very gratefully, > > FV > >> On Aug 16, 2022, at 7:31 PM, James Denton wrote: >> >> Hello, >> >>>> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? >> >> This will likely be the bond0 interface and not the individual bond member. However, the interface defined here will ultimately depend on the networking of that host, and should be an external facing one (i.e. the interface with the default gateway). >> >> In many environments, you?ll have something like this (or using 2 bonds, but same idea): >> >> ? Bond0 (192.168.100.5/24 gw 192.168.100.1) >> ? Em49 >> ? Em50 >> ? Br-mgmt (172.29.236.5/22) >> ? Bond0.236 >> ? Br-vxlan (172.29.240.5/22) >> ? Bond0.240 >> ? Br-storage (172.29.244.5/22) >> ? Bond0.244 >> >> In this example, bond0 has the management IP 192.168.100.5 and br-mgmt is the ?container? bridge with an IP configured from the ?container? network (see cidr_networks in openstack_user_config.yml). FYI: LXC containers will automatically be assigned IPs from the ?container? network outside of the ?used_ips? range(s). The infra host will communicate with the containers via this br-mgmt interface. >> >> I?m using FQDNs for the VIPs, which are specified in openstack_user_config.yml here: >> >> global_overrides: >> internal_lb_vip_address: internalapi.openstack.rackspace.lab >> external_lb_vip_address: publicapi.openstack.rackspace.lab >> >> To avoid DNS resolution issues internally (or rather, to ensure the IP is configured in the config files and not the domain name) I?ll override with the IP and hard set the preferred interface(s): >> >> haproxy_keepalived_external_vip_cidr: "192.168.100.10/32" >> haproxy_keepalived_internal_vip_cidr: "172.29.236.10/32" >> haproxy_keepalived_external_interface: bond0 >> haproxy_keepalived_internal_interface: br-mgmt >> haproxy_bind_external_lb_vip_address: 192.168.100.10 >> haproxy_bind_internal_lb_vip_address: 172.29.236.10 >> >> With the above configuration, keepalived will manage two VIPs - one external and one internal, and endpoints will have the FQDN rather than IP. >> >>>> Curl shows "503 Service Unavailable No server is available to handle this request? >> >> Hard to say without seeing logs why this is happening, but I will assume that keepalived is having issues binding the IP to the interface. You might find the reason in syslog or ?journalctl -xe -f -u keepalived?. >> >>>> Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." >> >> You might try running ?umount /var/www/repo? and re-run the repo-install.yml playbook (or setup-infrastructure.yml). >> >> Hope that helps! >> >> James Denton >> Rackspace Private Cloud >> >> From: Father Vlasie >> Date: Tuesday, August 16, 2022 at 4:31 PM >> To: James Denton >> Cc: openstack-discuss at lists.openstack.org >> Subject: Re: [openstack-ansible] [yoga] utility_container failure >> >> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! >> >> >> Hello, >> >> Thank you very much for the reply! >> >> haproxy and keepalived both show status active on infra1 (my primary node). >> >> Curl shows "503 Service Unavailable No server is available to handle this request? >> >> (Also the URL is http not https?.) >> >> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? >> >> Earlier in the output I find the following error (showing for all 3 infra nodes): >> >> ------------ >> >> TASK [systemd_mount : Set the state of the mount] ***************************************************************************************************************************************** >> fatal: [infra3_repo_container-7ca5db88]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.022275", "end": "2022-08-16 14:16:34.926861", "msg": "non-zero return code", "rc": 1, "start": "2022-08-16 14:16:34.904586", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []} >> >> ?????? >> >> Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." >> >> Thank you again! >> >> Father Vlasie >> >>> On Aug 16, 2022, at 6:32 AM, James Denton wrote: >>> >>> Hello, >>> >>> That error means the repo server at 192.168.3.9:8181 is unavailable. The repo server sits behind haproxy, which should be listening on 192.168.3.9 port 8181 on the active (primary) node. You can verify this by issuing a ?curl -vhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2F192.168.3.9%3A8181%2F%25E2%2580%2599&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf703a2b6782d40dddd0608da809f0017%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963717766518451%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=KQs9XMTGeIP7zKOwA6P1bVxzpIxCuzixlnniGAK38z8%3D&reserved=0. You might check the haproxy service status and/or keepalived status to ensure they are operating properly. If the IP cannot be bound to the correct interface, keepalive may not start. >>> >>> James Denton >>> Rackspace Private Cloud >>> >>> From: Father Vlasie >>> Date: Tuesday, August 16, 2022 at 7:38 AM >>> To: openstack-discuss at lists.openstack.org >>> Subject: [openstack-ansible] [yoga] utility_container failure >>> >>> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! >>> >>> >>> Hello everyone, >>> >>> I have happily progressed to the second step of running the playbooks, namely "openstack-ansible setup-infrastructure.yml" >>> >>> Everything looks good except for just one error which is mystifying me: >>> >>> ---------------- >>> >>> TASK [Get list of repo packages] ********************************************************************************************************************************************************** >>> fatal: [infra1_utility_container-5ec32cb5]: FAILED! => {"changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: ", "redirected": false, "status": -1, "url": "https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.3.9%3A8181%2Fconstraints%2Fupper_constraints_cached.txt&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf703a2b6782d40dddd0608da809f0017%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963717766518451%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=6a6wq7o2UlSmwfeS3%2B5F5SXdsLrTuOVLwxuysHGsjhs%3D&reserved=0"} >>> >>> ---------------- >>> >>> 192.168.3.9 is the IP listed in user_variables.yml under haproxy_keepalived_internal_vip_cidr >>> >>> Any help or pointers would be very much appreciated! >>> >>> Thank you, >>> >>> Father Vlasie >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fv at spots.edu Thu Aug 18 02:17:17 2022 From: fv at spots.edu (Father Vlasie) Date: Wed, 17 Aug 2022 19:17:17 -0700 Subject: [openstack-ansible] [yoga] utility_container failure In-Reply-To: References: <9FEF486C-780F-46B2-B9A4-5DEFC215A139@spots.edu> <5C79D786-B1AF-425B-9BEE-D683FA94A907@spots.edu> <8681B9FD-D061-4D97-A46D-91FFC33AFE96@spots.edu> Message-ID: Hello, > On Aug 17, 2022, at 5:18 PM, James Denton wrote: > > Hello, > > My recommendation is to try running these commands from the deploy node and see what the output is (or maybe try running the playbooks in verbose mode with -vvv): Here is the output from "setup-infrastructure.yml -vvv" https://paste.opendev.org/show/bCGUOb177z2oC5P3nR5Z/ > # ssh infra1_repo_container-20deb465 > # systemctl status glusterd.service > # journalctl -xe -u glusterd.service > # exit > > ^^ Might also consider restarting glusterd and checking the journal to see if there?s an error. Strangely I get "ssh: connect to host infra1_repo_container-20deb465 port 22: No route to host" > # ssh infra2_repo_container-6cd61edd > # systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\") > # systemctl status var-www-repo.mount > # journalctl -xe > # exit > A similar error for this too "ssh: connect to host infra2_repo_container-6cd61edd port 22: Network is unreachable" > The issue may be obvious. Maybe not. If you can ship that output to paste.openstack.org we might be able to diagnose. Here is the verbose output for the glusterfs error: https://paste.openstack.org/show/bw0qIhUzuZ1de0qjKzfK/ > > The mountpoint command will return 0 if /var/www/repo is a mountpoint, and 1 if it is not a mountpoint. Looks like it is probably failing due to a previous task (ie. It is not being mounted). Understanding why glusterfs is failing may be key here. > > > I have destroyed all of my containers and I am running setup-hosts again > > Can you describe what you did here? Simply destroy the LXC containers or did you wipe the inventory, too? I used the command: openstack-ansible lxc-containers-destroy.yml I answered affirmatively to the two questions asked about the removal of the containers and the container data. Thank you once again! FV > > Thanks, > James Denton > Rackspace Private Cloud > > From: Father Vlasie > Date: Wednesday, August 17, 2022 at 5:22 PM > To: James Denton > Cc: openstack-discuss at lists.openstack.org > Subject: Re: [openstack-ansible] [yoga] utility_container failure > > CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > > > Hello again! > > I have completed the run of setup-hosts successfully. > > However I am still seeing errors when running setup-infrastructure: > > ------ > > TASK [openstack.osa.glusterfs : Start glusterfs server] ********************************************************************** > fatal: [infra1_repo_container-20deb465]: FAILED! => {"changed": false, "msg": "Unable to start service glusterd: Job for glusterd.service failed because the control process exited with error code.\nSee \"systemctl status glusterd.service\" and \"journalctl -xe\" for details.\n"} > > ------ > > TASK [systemd_mount : Set the state of the mount] **************************************************************************** > fatal: [infra2_repo_container-6cd61edd]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.021452", "end": "2022-08-17 18:17:37.172187", "msg": "non-zero return code", "rc": 1, "start": "2022-08-17 18:17:37.150735", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []} > > ------ > > fatal: [infra2_repo_container-6cd61edd]: FAILED! => {"attempts": 5, "changed": false, "cmd": ["mountpoint", "-q", "/var/www/repo"], "delta": "0:00:00.002310", "end": "2022-08-17 18:18:04.297940", "msg": "non-zero return code", "rc": 1, "start": "2022-08-17 18:18:04.295630", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} > > ------ > > infra1_repo_container-20deb465 : ok=30 changed=2 unreachable=0 failed=1 skipped=14 rescued=0 ignored=0 > infra2_repo_container-6cd61edd : ok=66 changed=6 unreachable=0 failed=2 skipped=22 rescued=1 ignored=0 > infra3_repo_container-7ca5db88 : ok=64 changed=6 unreachable=0 failed=2 skipped=22 rescued=1 ignored=0 > > ------ > > Again any help is much appreciated! > > Thank you, > > FV > > > On Aug 17, 2022, at 2:16 PM, Father Vlasie wrote: > > > > Hello, > > > > I am very appreciative of your help! > > > > I think my interface setup might be questionable. > > > > I did not realise that the nodes need to talk to each other on the external IP. I thought that was only for communication with entities external to the cluster. > > > > My bond0 is associated with br-vlan so I put the external IP there and set br-vlan as the external interface in user_variables. > > > > The nodes can now ping each other on the external network. > > > > This is how I have user_variables configured: > > > > ??? > > > > haproxy_keepalived_external_vip_cidr: ?192.168.2.9/26" > > haproxy_keepalived_internal_vip_cidr: "192.168.3.9/32" > > haproxy_keepalived_external_interface: br-vlan > > haproxy_keepalived_internal_interface: br-mgmt > > haproxy_bind_external_lb_vip_address: 192.168.2.9 > > haproxy_bind_internal_lb_vip_address: 192.168.3.9 > > > > ??? > > > > My IP addresses are configured thusly (one sample from each node type): > > > > ??? > > > > infra1 > > bond0->br-vlan 192.168.2.13 > > br-mgmt 192.168.3.13 > > br-vxlan 192.168.30.13 > > br-storage > > > > compute1 > > br-vlan > > br-mgmt 192.168.3.16 > > br-vxlan 192.168.30.16 > > br-storage 192.168.20.16 > > > > log1 > > br-vlan > > br-mgmt 192.168.3.19 > > br-vxlan > > br-storage > > > > ??? > > > > I have destroyed all of my containers and I am running setup-hosts again. > > > > Here?s to hoping it all turns out this time! > > > > Very gratefully, > > > > FV > > > >> On Aug 16, 2022, at 7:31 PM, James Denton wrote: > >> > >> Hello, > >> > >>>> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? > >> > >> This will likely be the bond0 interface and not the individual bond member. However, the interface defined here will ultimately depend on the networking of that host, and should be an external facing one (i.e. the interface with the default gateway). > >> > >> In many environments, you?ll have something like this (or using 2 bonds, but same idea): > >> > >> ? Bond0 (192.168.100.5/24 gw 192.168.100.1) > >> ? Em49 > >> ? Em50 > >> ? Br-mgmt (172.29.236.5/22) > >> ? Bond0.236 > >> ? Br-vxlan (172.29.240.5/22) > >> ? Bond0.240 > >> ? Br-storage (172.29.244.5/22) > >> ? Bond0.244 > >> > >> In this example, bond0 has the management IP 192.168.100.5 and br-mgmt is the ?container? bridge with an IP configured from the ?container? network (see cidr_networks in openstack_user_config.yml). FYI: LXC containers will automatically be assigned IPs from the ?container? network outside of the ?used_ips? range(s). The infra host will communicate with the containers via this br-mgmt interface. > >> > >> I?m using FQDNs for the VIPs, which are specified in openstack_user_config.yml here: > >> > >> global_overrides: > >> internal_lb_vip_address: internalapi.openstack.rackspace.lab > >> external_lb_vip_address: publicapi.openstack.rackspace.lab > >> > >> To avoid DNS resolution issues internally (or rather, to ensure the IP is configured in the config files and not the domain name) I?ll override with the IP and hard set the preferred interface(s): > >> > >> haproxy_keepalived_external_vip_cidr: "192.168.100.10/32" > >> haproxy_keepalived_internal_vip_cidr: "172.29.236.10/32" > >> haproxy_keepalived_external_interface: bond0 > >> haproxy_keepalived_internal_interface: br-mgmt > >> haproxy_bind_external_lb_vip_address: 192.168.100.10 > >> haproxy_bind_internal_lb_vip_address: 172.29.236.10 > >> > >> With the above configuration, keepalived will manage two VIPs - one external and one internal, and endpoints will have the FQDN rather than IP. > >> > >>>> Curl shows "503 Service Unavailable No server is available to handle this request? > >> > >> Hard to say without seeing logs why this is happening, but I will assume that keepalived is having issues binding the IP to the interface. You might find the reason in syslog or ?journalctl -xe -f -u keepalived?. > >> > >>>> Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." > >> > >> You might try running ?umount /var/www/repo? and re-run the repo-install.yml playbook (or setup-infrastructure.yml). > >> > >> Hope that helps! > >> > >> James Denton > >> Rackspace Private Cloud > >> > >> From: Father Vlasie > >> Date: Tuesday, August 16, 2022 at 4:31 PM > >> To: James Denton > >> Cc: openstack-discuss at lists.openstack.org > >> Subject: Re: [openstack-ansible] [yoga] utility_container failure > >> > >> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > >> > >> > >> Hello, > >> > >> Thank you very much for the reply! > >> > >> haproxy and keepalived both show status active on infra1 (my primary node). > >> > >> Curl shows "503 Service Unavailable No server is available to handle this request? > >> > >> (Also the URL is http not https?.) > >> > >> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? > >> > >> Earlier in the output I find the following error (showing for all 3 infra nodes): > >> > >> ------------ > >> > >> TASK [systemd_mount : Set the state of the mount] ***************************************************************************************************************************************** > >> fatal: [infra3_repo_container-7ca5db88]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.022275", "end": "2022-08-16 14:16:34.926861", "msg": "non-zero return code", "rc": 1, "start": "2022-08-16 14:16:34.904586", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []} > >> > >> ?????? > >> > >> Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." > >> > >> Thank you again! > >> > >> Father Vlasie > >> > >>> On Aug 16, 2022, at 6:32 AM, James Denton wrote: > >>> > >>> Hello, > >>> > >>> That error means the repo server at 192.168.3.9:8181 is unavailable. The repo server sits behind haproxy, which should be listening on 192.168.3.9 port 8181 on the active (primary) node. You can verify this by issuing a ?curl -vhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2F192.168.3.9%3A8181%2F%25E2%2580%2599&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf703a2b6782d40dddd0608da809f0017%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963717766518451%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=KQs9XMTGeIP7zKOwA6P1bVxzpIxCuzixlnniGAK38z8%3D&reserved=0. You might check the haproxy service status and/or keepalived status to ensure they are operating properly. If the IP cannot be bound to the correct interface, keepalive may not start. > >>> > >>> James Denton > >>> Rackspace Private Cloud > >>> > >>> From: Father Vlasie > >>> Date: Tuesday, August 16, 2022 at 7:38 AM > >>> To: openstack-discuss at lists.openstack.org > >>> Subject: [openstack-ansible] [yoga] utility_container failure > >>> > >>> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > >>> > >>> > >>> Hello everyone, > >>> > >>> I have happily progressed to the second step of running the playbooks, namely "openstack-ansible setup-infrastructure.yml" > >>> > >>> Everything looks good except for just one error which is mystifying me: > >>> > >>> ---------------- > >>> > >>> TASK [Get list of repo packages] ********************************************************************************************************************************************************** > >>> fatal: [infra1_utility_container-5ec32cb5]: FAILED! => {"changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: ", "redirected": false, "status": -1, "url": "https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.3.9%3A8181%2Fconstraints%2Fupper_constraints_cached.txt&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf703a2b6782d40dddd0608da809f0017%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963717766518451%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=6a6wq7o2UlSmwfeS3%2B5F5SXdsLrTuOVLwxuysHGsjhs%3D&reserved=0"} > >>> > >>> ---------------- > >>> > >>> 192.168.3.9 is the IP listed in user_variables.yml under haproxy_keepalived_internal_vip_cidr > >>> > >>> Any help or pointers would be very much appreciated! > >>> > >>> Thank you, > >>> > >>> Father Vlasie > >>> > >> > > > From katonalala at gmail.com Thu Aug 18 05:16:22 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 18 Aug 2022 07:16:22 +0200 Subject: [Neutron][Octavia][ovn-octavia-provider] proposing Fernando Royo for ovn-octavia-provider core reviewer In-Reply-To: References: <2642162.mvXUDI8C0e@p1> Message-ID: Hi, Sorry for sitting too much on this (vacation period....), I added Fernando to ovn-octavia-provider-core group. Welcome Fernando in the group, and thanks for your work :-) Lajos Katona (lajoskatona) Fernando Royo ezt ?rta (id?pont: 2022. aug. 16., K, 17:09): > Thanks for the proposal and your support guys! > > El mar, 26 jul 2022 a las 21:51, Michael Johnson () > escribi?: > >> +1 from me. He has done great work getting the status updates working >> in the OVN provider. >> >> Michael >> >> On Tue, Jul 26, 2022 at 8:58 AM Luis Tomas Bolivar >> wrote: >> > >> > +1 from me too! He is doing a great job on the ovn-octavia side! >> > >> > On Tue, Jul 26, 2022 at 3:28 PM Slawek Kaplonski >> wrote: >> >> >> >> Hi, >> >> >> >> Dnia wtorek, 26 lipca 2022 14:18:41 CEST Lajos Katona pisze: >> >> > Hi >> >> > >> >> > I would like to propose Fernando Royo (froyo) as a core reviewer to >> >> > the ovn-octavia-provider project. >> >> > Fernando is very active in the project (see [1] and [2]). >> >> > >> >> > As ovn-octavia-provider is a link between Neutron and Octavia I ask >> both >> >> > Neutron and Octavia cores to vote by answering to this thread, to >> have a >> >> > final decision. >> >> > Thanks for your consideration. >> >> > >> >> > [1]: >> >> > https://review.opendev.org/q/owner:froyo%2540redhat.com >> >> > [2]: >> >> > >> https://www.stackalytics.io/report/contribution?module=neutron-group&project_type=openstack&days=60 >> >> > >> >> > Cheers >> >> > Lajos >> >> > >> >> >> >> Definitely +1 for Fernando :) >> >> >> >> -- >> >> Slawek Kaplonski >> >> Principal Software Engineer >> >> Red Hat >> > >> > >> > >> > -- >> > LUIS TOM?S BOL?VAR >> > Principal Software Engineer >> > Red Hat >> > Madrid, Spain >> > ltomasbo at redhat.com >> > >> >> > > -- > > Fernando Royo S?nchez > > Senior Software Engineer > > Red Hat > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From derekokeeffe85 at yahoo.ie Thu Aug 18 08:12:40 2022 From: derekokeeffe85 at yahoo.ie (Derek O keeffe) Date: Thu, 18 Aug 2022 08:12:40 +0000 (UTC) Subject: Weird behaviour on a provider net References: <1860297127.3838975.1660810360958.ref@mail.yahoo.com> Message-ID: <1860297127.3838975.1660810360958@mail.yahoo.com> Hi all, We have an Openstack Victoria cluster deployed and a provider network (IPv4) created that worked fine for months. Lately we've noticed that about 30-40% of the VM's that are spun up using this network are inaccessible over ssh,?Permission denied (publickey). Also, if you add a customisation script?to create a password for the Ubuntu user that fails too.? Initially we thought it was confined to one/two compute nodes but further testing this week shows that it can be any of the 6 compute nodes we have and always the original two that we identified as being the problem. If we create the VM's from the cli and send them to specific hosts everything works fine including the original two hosts that had issues. We're finding it hard to nail down the cause of this because it's not consistent behaviour so any tips on where to investigate would be welcome, we have a second provider net (IPv4 & IPv6) that works fine and have went through all the configs to see if there's any difference and all are identical over the hosts. Thanks in advance for any info. Regards,Derek -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-francois.taltavull at elca.ch Thu Aug 18 08:54:56 2022 From: jean-francois.taltavull at elca.ch (=?iso-8859-1?Q?Taltavull_Jean-Fran=E7ois?=) Date: Thu, 18 Aug 2022 08:54:56 +0000 Subject: [openstack-ansible][wallaby] lxc-containers-create.yml fails when specifying a container group Message-ID: Hello everyone, Whatever the container group, lxc-containers-create playbook fails when specifying the group with -l option. For example : $ sudo openstack-ansible lxc-containers-create.yml -l repo_container Fails with: TASK [lxc_container_create : Gather variables for each operating system] ******************************************************************************************************************************************************************************************************fatal: [p3controller1a_repo_container-b54c9eba]: FAILED! => {"msg": "No file was found when using first_found. Use errors='ignore' to allow this task to be skipped if no files are found"} fatal: [p3controller1b_repo_container-1c7f8c1f]: FAILED! => {"msg": "No file was found when using first_found. Use errors='ignore' to allow this task to be skipped if no files are found"} fatal: [p3controller1c_repo_container-1e73e8a7]: FAILED! => {"msg": "No file was found when using first_found. Use errors='ignore' to allow this task to be skipped if no files are found"} Any similar experience ? Any idea ? Thank you, Jean-Francois From noonedeadpunk at gmail.com Thu Aug 18 09:12:36 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 18 Aug 2022 11:12:36 +0200 Subject: [openstack-ansible][wallaby] lxc-containers-create.yml fails when specifying a container group In-Reply-To: References: Message-ID: Hey, You should always include "lxc_hosts" to the limit when running lxc-containers-create.yml ??, 18 ???. 2022 ?., 11:05 Taltavull Jean-Fran?ois < jean-francois.taltavull at elca.ch>: > Hello everyone, > > Whatever the container group, lxc-containers-create playbook fails when > specifying the group with -l option. > > For example : > $ sudo openstack-ansible lxc-containers-create.yml -l repo_container > > Fails with: > TASK [lxc_container_create : Gather variables for each operating system] > ******************************************************************************************************************************************************************************************************fatal: > [p3controller1a_repo_container-b54c9eba]: FAILED! => {"msg": "No file was > found when using first_found. Use errors='ignore' to allow this task to be > skipped if no files are found"} > fatal: [p3controller1b_repo_container-1c7f8c1f]: FAILED! => {"msg": "No > file was found when using first_found. Use errors='ignore' to allow this > task to be skipped if no files are found"} > fatal: [p3controller1c_repo_container-1e73e8a7]: FAILED! => {"msg": "No > file was found when using first_found. Use errors='ignore' to allow this > task to be skipped if no files are found"} > > > Any similar experience ? Any idea ? > > Thank you, > Jean-Francois > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-francois.taltavull at elca.ch Thu Aug 18 09:37:02 2022 From: jean-francois.taltavull at elca.ch (=?utf-8?B?VGFsdGF2dWxsIEplYW4tRnJhbsOnb2lz?=) Date: Thu, 18 Aug 2022 09:37:02 +0000 Subject: [openstack-ansible][wallaby] lxc-containers-create.yml fails when specifying a container group In-Reply-To: References: Message-ID: <1797b04567e74243909ea3fc555747da@elca.ch> Thanks a lot Dmitriy, it works with -l ?repo_container,lxc_hosts? ! Jean-Francois From: Dmitriy Rabotyagov Sent: jeudi, 18 ao?t 2022 11:13 To: Taltavull Jean-Fran?ois Cc: openstack-discuss Subject: Re: [openstack-ansible][wallaby] lxc-containers-create.yml fails when specifying a container group EXTERNAL MESSAGE - This email comes from outside ELCA companies. Hey, You should always include "lxc_hosts" to the limit when running lxc-containers-create.yml ??, 18 ???. 2022 ?., 11:05 Taltavull Jean-Fran?ois >: Hello everyone, Whatever the container group, lxc-containers-create playbook fails when specifying the group with -l option. For example : $ sudo openstack-ansible lxc-containers-create.yml -l repo_container Fails with: TASK [lxc_container_create : Gather variables for each operating system] ******************************************************************************************************************************************************************************************************fatal: [p3controller1a_repo_container-b54c9eba]: FAILED! => {"msg": "No file was found when using first_found. Use errors='ignore' to allow this task to be skipped if no files are found"} fatal: [p3controller1b_repo_container-1c7f8c1f]: FAILED! => {"msg": "No file was found when using first_found. Use errors='ignore' to allow this task to be skipped if no files are found"} fatal: [p3controller1c_repo_container-1e73e8a7]: FAILED! => {"msg": "No file was found when using first_found. Use errors='ignore' to allow this task to be skipped if no files are found"} Any similar experience ? Any idea ? Thank you, Jean-Francois -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkchn.in at gmail.com Thu Aug 18 09:49:11 2022 From: kkchn.in at gmail.com (KK CHN) Date: Thu, 18 Aug 2022 15:19:11 +0530 Subject: Which deployment Method for a production cloud ? Message-ID: List, I want to install Openstack Xena version on a DELL ( R650 2 CPU and 40 cores . 2.4 TB) server machine. Operating System preference is Debian 10. I want to deploy this for production use with Ceph OSDs. There are four methods described in the URL : Confused which to select. https://docs.openstack.org/xena/deploy/ 1. Using Charms deployment 2. Deploy openstack using Ansible in Docker containers 3. Openstack - Ansible Deployment guide. 4. Triple O deployment guide Which method do I have to follow a full fledged production deployment ? why ? Going ahead with Ansible and Docker is a best option or not ? Your kind guidance requested.. Thanks in advance, Krish ps:- Once Completed this installation I have to extend this as a High available(HA) Multi Node cloud. (multi node production cloud with 3 Controllers, 6 compute Nodes and 3 Storage nodes with Ceph OSD pools) . Request to consider this future scalability requirement while answering my query. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Thu Aug 18 10:07:19 2022 From: wodel.youchi at gmail.com (wodel youchi) Date: Thu, 18 Aug 2022 11:07:19 +0100 Subject: [kolla-ansible] upgrade process when it involves OS and ceph major version Message-ID: Hi, What are the steps of upgrading an openstack platform when the upgrade involves: - OS upgrade: how to perform the upgrade? What type of node to upgrade first? Controllers, network or compute? - Ceph storage when using HCI deployment? Just general steps. Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Aug 18 10:50:49 2022 From: smooney at redhat.com (Sean Mooney) Date: Thu, 18 Aug 2022 11:50:49 +0100 Subject: Which deployment Method for a production cloud ? In-Reply-To: References: Message-ID: <413b828f669d71bd2d89539cfcc3c2738226b7e6.camel@redhat.com> On Thu, 2022-08-18 at 15:19 +0530, KK CHN wrote: > List, > > I want to install Openstack Xena version on a DELL ( R650 2 CPU and 40 > cores . 2.4 TB) server machine. Operating System preference is Debian 10. > I want to deploy this for production use with Ceph OSDs. > > > There are four methods described in the URL : Confused which to select. > > https://docs.openstack.org/xena/deploy/ > > 1. Using Charms deployment i think this mainly targets ubuntu not debian > 2. Deploy openstack using Ansible in Docker containers > 3. Openstack - Ansible Deployment guide. for both of the above i would suggest looking at kolla-ansible or openstack ansible (OSA) my personal prefernce between the two is kolla but both are production ready deploymnets with good upgrade support and fast release cadnace following a new upstream release. both support debian as far as im aware too. > 4. Triple O deployment guide if you want to use debian triplo is not an option there is aly the option of the debian openstack installer. zingo can tell you more about that. i have not personlly used it but i hear good things about it i belive its based on puppet which is not really my preference im much more partial to ansible based installers but if debian 10 is what you are targeting its proably a good option to consider. https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029910.html there was an anounchment about deabin lts/10 exteding there lts supprot for rocky recently so its well maintianed long term. kolla is still my preference but that is more as a developer and home user for its simplicity and feature set. OSA or the Debian installer might suite you bettter depending on your usecase. in general i would recommend using an existing installer for openstack vs creating your own or using the puppet/ansible moduels to build your own. long term if you use an excisting installer you will benifit form the maintaince and skill set of the comunity that maintain it. > > > Which method do I have to follow a full fledged production deployment ? most installers provide fully ha production ready deployments. i would recomend mangain ceph externally via cephadm regardless of how you deploy openstack. > why ? > > Going ahead with Ansible and Docker is a best option or not ? that is what kolla does i think its a solid option for many usecases if you dont like docker or down want contaienrs you dont have to use them to deploy openstack > > Your kind guidance requested.. > > Thanks in advance, > Krish > > ps:- > > Once Completed this installation I have to extend this as a High > available(HA) Multi Node cloud. (multi node production cloud with 3 > Controllers, 6 compute Nodes and 3 Storage nodes with Ceph OSD pools) . > Request to consider this future scalability requirement while answering my > query. From james.denton at rackspace.com Thu Aug 18 12:40:17 2022 From: james.denton at rackspace.com (James Denton) Date: Thu, 18 Aug 2022 12:40:17 +0000 Subject: Weird behaviour on a provider net In-Reply-To: <1860297127.3838975.1660810360958@mail.yahoo.com> References: <1860297127.3838975.1660810360958.ref@mail.yahoo.com> <1860297127.3838975.1660810360958@mail.yahoo.com> Message-ID: Hi Derek, The symptoms you?ve described sound very much like a metadata service issue, whether the service is unavailable or not functioning properly. Sometimes this can be determined by looking at the VM console to see if access to http://169.254.169.254 is repeatedly tried (and failing). If it?s intermittent, there may be a particular instance of the metadata service having issues. For VMs on provider networks (w/ OVS or LXB, specifically), metadata access may provided via a static route that?s pushed by the DHCP server that answered the call. I recommend checking out the metadata agent and dhcp agent logs. If you?re using OVN, then it could still be a metadata issue but the solution could look different. Can you describe your topology? Hope that helps, James Denton Rackspace Private Cloud From: Derek O keeffe Date: Thursday, August 18, 2022 at 3:41 AM To: Openstack-discuss Subject: Weird behaviour on a provider net CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Hi all, We have an Openstack Victoria cluster deployed and a provider network (IPv4) created that worked fine for months. Lately we've noticed that about 30-40% of the VM's that are spun up using this network are inaccessible over ssh, Permission denied (publickey). Also, if you add a customisation script to create a password for the Ubuntu user that fails too. Initially we thought it was confined to one/two compute nodes but further testing this week shows that it can be any of the 6 compute nodes we have and always the original two that we identified as being the problem. If we create the VM's from the cli and send them to specific hosts everything works fine including the original two hosts that had issues. We're finding it hard to nail down the cause of this because it's not consistent behaviour so any tips on where to investigate would be welcome, we have a second provider net (IPv4 & IPv6) that works fine and have went through all the configs to see if there's any difference and all are identical over the hosts. Thanks in advance for any info. Regards, Derek -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Thu Aug 18 14:23:29 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 18 Aug 2022 16:23:29 +0200 Subject: [neutron] Drivers meeting agenda - 19.08.2022. Message-ID: Hi Neutron Drivers, The agenda for tomorrow's drivers meeting is at [1]. We have the following RFEs to discuss tomorrow: [RFE] Add possibility to define default security group rules (#link https://bugs.launchpad.net/neutron/+bug/1983053 ) [rfe][fwaas]support standard_attrs for firewall_group (#link https://bugs.launchpad.net/neutron/+bug/1986906 ) [1] https://wiki.openstack.org/wiki/Meetings/NeutronDrivers#Agenda See you at the meeting tomorrow. Lajos Katona (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Aug 18 14:39:18 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 18 Aug 2022 20:09:18 +0530 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Aug 18 at 1500 UTC Message-ID: <182b164ee7b.c1aeb4971287043.3881583483112578454@ghanshyammann.com> Hello Everyone, Below is the agenda for Today's TC meeting schedule at 1500 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting * Roll call * Follow up on past action items * Gate health check ** Bare 'recheck' state *** https://etherpad.opendev.org/p/recheck-weekly-summary * 2023.1 cycle PTG Planning ** Encourage projects to schedule 'operator hours' as a separate slot in PTG(avoiding conflicts among other projects 'operator hours') * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open -gmann From melwittt at gmail.com Thu Aug 18 18:02:19 2022 From: melwittt at gmail.com (melanie witt) Date: Thu, 18 Aug 2022 11:02:19 -0700 Subject: [nova] Is the VMware NSX CI still being maintained? Message-ID: Hi all, I have noticed lately the VMware NSX CI does not seem to be running since March/April of this year based on casual gerrit search [1]. More recently, maybe starting in May, I have seen it post comments on reviews but they say: Build failed - ext-nova-zuul : NOT_REGISTERED Does anyone know whether this job is still being maintained and if it will be fixed? Thanks, -melwitt [1] https://review.opendev.org/q/reviewer:%2522VMware+NSX+CI%2522+project:openstack/nova From andrew at etc.gen.nz Thu Aug 18 21:18:21 2022 From: andrew at etc.gen.nz (Andrew Ruthven) Date: Fri, 19 Aug 2022 09:18:21 +1200 Subject: [swift] Terraform has deprecated the Swift backend Message-ID: Hey, It has come to our attention that Terraform has recently added deprecation messages to a number of backends, including Swift[0], warning that these backends will be removed in a future version of Terraform. Unfortunately we don't have the bandwidth within Catalyst Cloud to pick this up, but I was hopeful that there'd be others on this list who share our concern. It looks as though Terraform has had the Swift backend marked as unmaintained since at least March 2020[1]. If there is another backend, or another method of managing Swift that isn't the S3 API then I'd be keen to hear about it. Kind regards, Andrew [0]?https://github.com/hashicorp/terraform/commit/7941b2fbdc33a42a68b9b32af51e09f7df35fe66 [1]?https://github.com/hashicorp/terraform/commit/c434db158e631b0bfddb92e1dd342b924880f29a -- Andrew Ruthven, Wellington, New Zealand andrew at etc.gen.nz | Catalyst Cloud: | This space intentionally left blank https://catalystcloud.nz | -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Fri Aug 19 06:32:48 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 19 Aug 2022 08:32:48 +0200 Subject: [all][tc] What's happening in Technical Committee: summary 2022 Aug 19 Message-ID: <2472677.mjr10LBT5U@p1> Hi, Here is this week's summary of the Technical Committee activities. 1. TC Meetings: ============ * We had this week's meeting on Aug 18. Most of the meeting discussions are summarized in this email. Meeting full logs are available at https://meetings.opendev.org/meetings/tc/2022/tc.2022-08-18-15.00.log.html * Next TC weekly meeting will be on 25 Aug Thursday at 15:00 UTC, feel free to add the topic on the agenda[1] by Aug 3. 2. What we completed this week: ========================= 3. Activities In progress: ================== TC Tracker for Zed cycle ------------------------------ * Zed tracker etherpad includes the TC working items[2], Two are completed and others items are in-progress. Open Reviews ----------------- * Four open reviews for ongoing activities[3]. 2021 User Survey TC Question Analysis ----------------------------------------------- No update on this. The survey summary is up for review[4]. Feel free to check and provide feedback. Zed cycle Leaderless projects ---------------------------------- Dale Smith volunteer to be PTL for Adjutant project [5] Fixing Zuul config error ---------------------------- Requesting projects with zuul config error to look into those and fix them which should not take much time[6][7]. Project updates ------------------- * Retire openstack-help-addons [8] 4. How to contact the TC: ==================== If you would like to discuss or give feedback to TC, you can reach out to us in multiple ways: 1. Email: you can send the email with tag [tc] on openstack-discuss ML[9]. 2. Weekly meeting: The Technical Committee conduct a weekly meeting every Thursday 15 UTC [10] 3. Ping us using 'tc-members' nickname on #openstack-tc IRC channel. [1] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting [2] https://etherpad.opendev.org/p/tc-zed-tracker [3] https://review.opendev.org/q/projects:openstack/governance+status:open [4] https://review.opendev.org/c/openstack/governance/+/836888 [5] https://review.opendev.org/c/openstack/governance/+/849606 [6] https://etherpad.opendev.org/p/zuul-config-error-openstack [7] http://lists.openstack.org/pipermail/openstack-discuss/2022-May/028603.html [8] https://review.opendev.org/c/openstack/governance/+/851859 [9] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss [10] http://eavesdrop.openstack.org/#Technical_Committee_Meeting -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From lokendrarathour at gmail.com Fri Aug 19 07:44:57 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Fri, 19 Aug 2022 13:14:57 +0530 Subject: [Triple0] [Wallaby] External Ceph Integration getting failed In-Reply-To: References: Message-ID: Hi Fulton, Thanks for the inputs and apologies for the delay in response. to my surprise passing the container prepare in standard worked for me, new container-prepare is: parameter_defaults: ContainerImagePrepare: - push_destination: true set: ceph_alertmanager_image: alertmanager ceph_alertmanager_namespace: quay.ceph.io/prometheus ceph_alertmanager_tag: v0.16.2 ceph_grafana_image: grafana ceph_grafana_namespace: quay.ceph.io/app-sre ceph_grafana_tag: 6.7.4 ceph_image: daemon ceph_namespace: quay.io/ceph ceph_node_exporter_image: node-exporter ceph_node_exporter_namespace: quay.ceph.io/prometheus ceph_node_exporter_tag: v0.17.0 ceph_prometheus_image: prometheus ceph_prometheus_namespace: quay.ceph.io/prometheus ceph_prometheus_tag: v2.7.2 ceph_tag: v6.0.7-stable-6.0-pacific-centos-stream8 name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version But if we see or look at these containers I do not see any such containers available. we have tried looking at Undercloud and overcloud. Also, the deployment is done when we are passing this config. Thanks once again. Also, we need to understand some use cases of using the storage from this external ceph, which can work as the mount for the VM as direct or Shared storage. Any idea or available document which tells more about how to consume external Ceph in the existing triple Overcloud? Do share in case you know any, please. Thanks once again for the support, it was really helpful On Thu, Aug 11, 2022 at 9:59 PM John Fulton wrote: > The ceph container should no longer be needed for external ceph > configuration (since the move from ceph-ansible to cephadm) but if removing > the ceph env files makes the error go away, then try adding it back and > then following these steps to prepare the ceph container on your undercloud > before deploying. > > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html#container-options > > On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour > wrote: > >> Hi Thanks, >> for the inputs, we could see the miss, >> now we have added the required miss : >> "TripleO resource >> OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml" >> >> Now with this setting if we deploy the setup in wallaby, we are >> getting error as: >> >> >> PLAY [External deployment step 1] >> ********************************************** >> 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | >> TASK | External deployment step 1 >> 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | >> OK | External deployment step 1 | undercloud -> localhost | result={ >> "changed": false, >> "msg": "Use --start-at-task 'External deployment step 1' to resume >> from this task" >> } >> [WARNING]: ('undercloud -> localhost', >> '525400d4-7124-4a42-664c-0000000000a8') >> missing from stats >> 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | >> TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s >> 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | >> INCLUDED | >> /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml >> | undercloud >> 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | >> TASK | Set some tripleo-ansible facts >> 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | >> OK | Set some tripleo-ansible facts | undercloud >> 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | >> TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | >> 0.03s >> 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | >> TASK | Container image prepare >> 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | >> FATAL | Container image prepare | *undercloud | error={"changed": >> false, "error": "None: Max retries exceeded with url: /v2/ (Caused by >> None)", "msg": "Error running container image prepare: None: Max retries >> exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* >> 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | >> TIMING | tripleo_container_image_prepare : Container image prepare | >> undercloud | 0:06:13.385607 | 72.12s >> >> This gets failed at step 1, As this is wallaby and based on the document (Use >> an external Ceph cluster with the Overcloud ? TripleO 3.0.0 documentation >> (openstack.org) >> ) >> we should only pass this external-ceph.yaml for the external ceph >> intergration. >> But it is not happening. >> >> >> Few things to note: >> 1. Container Prepare: >> >> (undercloud) [stack at undercloud ~]$ cat containers-prepare-parameter.yaml >> # Generated with the following on 2022-06-28T18:56:38.642315 >> # >> # openstack tripleo container image prepare default >> --local-push-destination --output-env-file >> /home/stack/containers-prepare-parameter.yaml >> # >> >> >> parameter_defaults: >> ContainerImagePrepare: >> - push_destination: true >> set: >> name_prefix: openstack- >> name_suffix: '' >> namespace: myserver.com:5000/tripleowallaby >> neutron_driver: ovn >> rhel_containers: false >> tag: current-tripleo >> tag_from_label: rdo_version >> (undercloud) [stack at undercloud ~]$ >> >> 2. this is SSL based deployment. >> >> Any idea for the error, the issue is seen only once we have the external >> ceph integration enabled. >> >> Best Regards, >> Lokendra >> >> >> >> >> On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano >> wrote: >> >>> Hi, >>> ceph is supposed to be configured by this tripleo-ansible role [1], >>> which is triggered by tht on external_deploy_steps [2]. >>> In theory adding [3] should just work, assuming you customize the ceph >>> cluster mon ip addresses, fsid and a few other related variables. >>> From your previous email I suspect in your external-ceph.yaml you missed >>> the TripleO resource OS::TripleO::Services::CephExternal: >>> ../deployment/cephadm/ceph-client.yaml >>> (see [3]). >>> >>> Thanks, >>> Francesco >>> >>> >>> [1] >>> https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_ceph_client >>> [2] >>> https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/cephadm/ceph-client.yaml#L93 >>> [3] >>> https://github.com/openstack/tripleo-heat-templates/blob/master/environments/external-ceph.yaml >>> >>> On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < >>> lokendrarathour at gmail.com> wrote: >>> >>>> Hi Team, >>>> I was trying to integrate External Ceph with Triple0 Wallaby, and at >>>> the end of deployment in step4 getting the below error: >>>> >>>> 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 >>>> 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | >>>> Create containers from >>>> /var/lib/tripleo-config/container-startup-config/step_4 >>>> 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 >>>> 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | >>>> /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | >>>> overcloud-controller-2 >>>> 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 >>>> 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | >>>> Create containers managed by Podman for >>>> /var/lib/tripleo-config/container-startup-config/step_4 >>>> 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 >>>> 18:37:24.530812 | | WARNING | >>>> ERROR: Can't run container nova_libvirt_init_secret >>>> stderr: >>>> 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 >>>> 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | >>>> Create containers managed by Podman for >>>> /var/lib/tripleo-config/container-startup-config/step_4 | >>>> overcloud-novacompute-0 | error={"changed": false, "msg": "Failed >>>> containers: nova_libvirt_init_secret"} >>>> 2022-08-03 18:37:44,282 p=507732 u >>>> >>>> >>>> *external-ceph.conf:* >>>> >>>> parameter_defaults: >>>> # Enable use of RBD backend in nova-compute >>>> NovaEnableRbdBackend: True >>>> # Enable use of RBD backend in cinder-volume >>>> CinderEnableRbdBackend: True >>>> # Backend to use for cinder-backup >>>> CinderBackupBackend: ceph >>>> # Backend to use for glance >>>> GlanceBackend: rbd >>>> # Name of the Ceph pool hosting Nova ephemeral images >>>> NovaRbdPoolName: vms >>>> # Name of the Ceph pool hosting Cinder volumes >>>> CinderRbdPoolName: volumes >>>> # Name of the Ceph pool hosting Cinder backups >>>> CinderBackupRbdPoolName: backups >>>> # Name of the Ceph pool hosting Glance images >>>> GlanceRbdPoolName: images >>>> # Name of the user to authenticate with the external Ceph cluster >>>> CephClientUserName: admin >>>> # The cluster FSID >>>> CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' >>>> # The CephX user auth key >>>> CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' >>>> # The list of Ceph monitors >>>> CephExternalMonHost: >>>> 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' >>>> ~ >>>> >>>> >>>> Have tried checking and validating the ceph client details and they >>>> seem to be correct, further digging the container log I could see something >>>> like this : >>>> >>>> [root at overcloud-novacompute-0 containers]# tail -f >>>> nova_libvirt_init_secret.log >>>> tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such >>>> file or directory >>>> tail: no files remaining >>>> [root at overcloud-novacompute-0 containers]# tail -f >>>> stdouts/nova_libvirt_init_secret.log >>>> 2022-08-04T11:48:47.689898197+05:30 stdout F >>>> ------------------------------------------------ >>>> 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets >>>> for: ceph:admin >>>> 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf >>>> was not found >>>> 2022-08-04T11:48:47.690625088+05:30 stdout F Path to >>>> nova_libvirt_init_secret was ceph:admin >>>> 2022-08-04T16:20:29.643785538+05:30 stdout F >>>> ------------------------------------------------ >>>> 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets >>>> for: ceph:admin >>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf >>>> was not found >>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Path to >>>> nova_libvirt_init_secret was ceph:admin >>>> ^C >>>> [root at overcloud-novacompute-0 containers]# tail -f >>>> stdouts/nova_compute_init_log.log >>>> >>>> -- >>>> ~ Lokendra >>>> skype: lokendrarathour >>>> >>>> >>>> >>> >>> -- >>> Francesco Pantano >>> GPG KEY: F41BD75C >>> >> >> >> -- >> ~ Lokendra >> skype: lokendrarathour >> >> >> -- ~ Lokendra skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregory.orange at pawsey.org.au Fri Aug 19 08:39:27 2022 From: gregory.orange at pawsey.org.au (Gregory Orange) Date: Fri, 19 Aug 2022 16:39:27 +0800 Subject: [kolla-ansible] upgrade process when it involves OS and ceph major version In-Reply-To: References: Message-ID: <4d97667a-48fa-576e-7cad-83218b94ea35@pawsey.org.au> Hi Wodel, We haven't done it yet (but will need to soon since we're on Train and stable/train branch is gone!) but this looks like a good place to start: https://docs.openstack.org/kolla-ansible/train/user/operating-kolla.html I'm not sure about this, but it came up in a web search and might help too: https://www.reversengineered.com/2019/05/10/upgrading-kolla-ansible-for-deploying-openstack/ HTH, Greg. On 18/8/22 18:07, wodel youchi wrote: > Hi, > > What are the steps of upgrading an openstack platform when the upgrade > involves: > > - OS upgrade: how to perform the upgrade? What type of node to upgrade > first? Controllers, network or compute? > > - Ceph storage when using HCI deployment? > > Just general steps. > > Regards -- Gregory Orange Cloud System Administrator Scientific Platforms Team building representative Pawsey Supercomputing Centre, CSIRO From rdhasman at redhat.com Fri Aug 19 08:42:00 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Fri, 19 Aug 2022 14:12:00 +0530 Subject: [cinder] Festival of XS reviews Message-ID: Hello Argonauts, We will be having our monthly festival of XS reviews today i.e. 19th August (Friday) from 1400-1600 UTC. Following are some additional details: Date: 19th August, 2022 Time: 1400-1600 UTC Meeting link: https://bluejeans.com/556681290 etherpad: https://etherpad.opendev.org/p/cinder-festival-of-reviews Thanks and regards Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From clay.gerrard at gmail.com Fri Aug 19 14:10:31 2022 From: clay.gerrard at gmail.com (Clay Gerrard) Date: Fri, 19 Aug 2022 09:10:31 -0500 Subject: [swift] Terraform has deprecated the Swift backend In-Reply-To: References: Message-ID: Do you use the swift backend for terraform state storage? It looks like they're dumping a bunch of other backends too - maybe they did a user-survey and they're just keeping the top-5 cloud providers or something. On Thu, Aug 18, 2022 at 4:24 PM Andrew Ruthven wrote: > Hey, > > It has come to our attention that Terraform has recently added deprecation > messages to a number of backends, including Swift[0], warning that these > backends will be removed in a future version of Terraform. Unfortunately we > don't have the bandwidth within Catalyst Cloud to pick this up, but I was > hopeful that there'd be others on this list who share our concern. > > It looks as though Terraform has had the Swift backend marked as > unmaintained since at least March 2020[1]. > > If there is another backend, or another method of managing Swift that > isn't the S3 API then I'd be keen to hear about it. > > Kind regards, > Andrew > > [0] > https://github.com/hashicorp/terraform/commit/7941b2fbdc33a42a68b9b32af51e09f7df35fe66 > [1] > https://github.com/hashicorp/terraform/commit/c434db158e631b0bfddb92e1dd342b924880f29a > > > -- > > Andrew Ruthven, Wellington, New Zealandandrew at etc.gen.nz | > Catalyst Cloud: | This space intentionally left blank > https://catalystcloud.nz | > > > -- Clay Gerrard 210 788 9431 -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Fri Aug 19 14:13:09 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Fri, 19 Aug 2022 16:13:09 +0200 Subject: [release] Release countdown for week R-6, Aug 22 - 26 Message-ID: <6ea064fb-7a9e-ceab-2eaf-1033e8ac49d5@est.tech> Development Focus ----------------- Work on libraries should be wrapping up, in preparation for the various library-related deadlines coming up. Now is a good time to make decisions on deferring feature work to the next development cycle in order to be able to focus on finishing already-started feature work. General Information ------------------- We are now getting close to the end of the cycle, and will be gradually freezing feature work on the various deliverables that make up the OpenStack release. This coming week is the deadline for general libraries (except client libraries): their last feature release needs to happen before "Non-client library freeze" on August 25th, 2022. Only bugfix releases will be allowed beyond this point. When requesting those library releases, you can also include the stable/zed branching request with the review (as an example, see the "branches" section here: https://opendev.org/openstack/releases/src/branch/master/deliverables/pike/os-brick.yaml#n2 In the next weeks we will have deadlines for: * Client libraries (think python-*client libraries), which need to have ? their last feature release before "Client library freeze" (September ? 1st, 2022) * Deliverables following a cycle-with-rc model (that would be most ? services), which observe a Feature freeze on that same date, ? September 1st, 2022. Any feature addition beyond that date should be ? discussed on the mailing-list and get PTL approval. As we are getting to the point of creating stable/zed branches, this would be a good point for teams to review membership in their $project-stable-maint groups. Once the stable/zed branches are cut for a repo, the ability to approve any necessary backports into those branches for Zed will be limited to the members of that stable team. If there are any questions about stable policy or stable team membership, please reach out in the #openstack-stable channel. Upcoming Deadlines & Dates -------------------------- Non-client library freeze: August 25th, 2022 (R-6 week) Client library freeze: September 1st, 2022 (R-5 week) Zed-3 milestone: September 1st, 2022 (R-5 week) Cycle Highlights Due: September 1st, 2022 (R-5 week) Zed final release: October 5th, 2022 El?d Ill?s irc: elodilles From gregory.orange at pawsey.org.au Fri Aug 19 14:25:18 2022 From: gregory.orange at pawsey.org.au (Gregory Orange) Date: Fri, 19 Aug 2022 22:25:18 +0800 Subject: [kolla-ansible] upgrade process when it involves OS and ceph major version In-Reply-To: <4d97667a-48fa-576e-7cad-83218b94ea35@pawsey.org.au> References: <4d97667a-48fa-576e-7cad-83218b94ea35@pawsey.org.au> Message-ID: <34b73769-ea4c-61a6-cf50-f9ad33537812@pawsey.org.au> I should also add that my understanding from conversations in IRC is that if operating system upgrade is needed, then you 1. upgrade Kolla-Ansible to the crossover version, then 2. rebuild that same control plane onto new machines with the newer operating system (or perhaps pull them out, rebuild and put back in again, depending on how many you have for redundancy) 3. upgrade Kolla-Ansible to a later version as needed For us, I /believe/ Ussuri is the crossover version because it supports both Ubuntu Bionic and Focal - we will need to check that before embarking on the upgrade. Can anyone confirm, also whether I am barking mad with the above understanding? On 19/8/22 16:39, Gregory Orange wrote: > Hi Wodel, > > We haven't done it yet (but will need to soon since we're on Train and > stable/train branch is gone!) but this looks like a good place to start: > > https://docs.openstack.org/kolla-ansible/train/user/operating-kolla.html > > I'm not sure about this, but it came up in a web search and might help too: > > https://www.reversengineered.com/2019/05/10/upgrading-kolla-ansible-for-deploying-openstack/ > > > HTH, > Greg. > > > On 18/8/22 18:07, wodel youchi wrote: >> Hi, >> >> What are the steps of upgrading an openstack platform when the upgrade >> involves: >> >> - OS upgrade: how to perform the upgrade? What type of node to upgrade >> first? Controllers, network or compute? >> >> - Ceph storage when using HCI deployment? >> >> Just general steps. From romain.chanu at univ-lyon1.fr Fri Aug 19 15:47:09 2022 From: romain.chanu at univ-lyon1.fr (CHANU ROMAIN) Date: Fri, 19 Aug 2022 15:47:09 +0000 Subject: [xena][placement] Xena placement upgrade leads to 500 on ubuntu focal Message-ID: Hello, I just did upgrade my Placement to Xena on Ubuntu Focal (20.04). When I tried to start the process I got this error and all HTTP requests receive an HTTP 500 error: 2022-08-19 15:05:43.960573 2022-08-19 15:05:43.960 43 INFO placement.requestlog [req-f4c4d4f1-5d59-49d3-aa3e-1e8a09fe02fe 3ec54dee59424109913d4628ae8dac4c 19e62bc767484849a2763937883a256e - default default] 192.168.236.5 "GET /resource_providers/791c09ed-57f3- 4bfc-9278-4af6c5c137d8/allocations" status: 500 len: 244 microversion: 1.0\x1b[00m 2022-08-19 15:05:44.094951 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap [req-527dce52-c207-43ee-80f2-016a6f031cf5 3ec54dee59424109913d4628ae8dac4c 19e62bc767484849a2763937883a256e - default default] Placement API unexpected error: unsupported callable: TypeError: unsupported callable 2022-08-19 15:05:44.094973 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap Traceback (most recent call last): 2022-08-19 15:05:44.094977 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 1135, in getfullargspec 2022-08-19 15:05:44.094980 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap sig = _signature_from_callable(func, 2022-08-19 15:05:44.094998 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2233, in _signature_from_callable 2022-08-19 15:05:44.095001 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap sig = _signature_from_callable( 2022-08-19 15:05:44.095004 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2304, in _signature_from_callable 2022-08-19 15:05:44.095007 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _signature_from_function(sigcls, obj, 2022-08-19 15:05:44.095010 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2168, in _signature_from_function 2022-08-19 15:05:44.095013 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap parameters.append(Parameter(name, annotation=annotation, 2022-08-19 15:05:44.095015 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2491, in __init__ 2022-08-19 15:05:44.095018 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap self._kind = _ParameterKind(kind) 2022-08-19 15:05:44.095021 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap RecursionError: maximum recursion depth exceeded 2022-08-19 15:05:44.095024 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap 2022-08-19 15:05:44.095026 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap The above exception was the direct cause of the following exception: 2022-08-19 15:05:44.095029 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap 2022-08-19 15:05:44.095032 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap Traceback (most recent call last): 2022-08-19 15:05:44.095035 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/placement/fault_wrap.py", line 39, in __call__ 2022-08-19 15:05:44.095038 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return self.application(environ, start_response) 2022-08-19 15:05:44.095040 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/webob/dec.py", line 129, in __call__ 2022-08-19 15:05:44.095043 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap resp = self.call_func(req, *args, **kw) 2022-08-19 15:05:44.095046 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/webob/dec.py", line 193, in call_func 2022-08-19 15:05:44.095049 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return self.func(req, *args, **kwargs) 2022-08-19 15:05:44.095052 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/microversion_parse/middleware.py", line 80, in __call__ 2022-08-19 15:05:44.095055 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap response = req.get_response(self.application) 2022-08-19 15:05:44.095057 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/webob/request.py", line 1313, in send 2022-08-19 15:05:44.095060 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap status, headers, app_iter = self.call_application( 2022-08-19 15:05:44.095063 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/webob/request.py", line 1278, in call_application 2022-08-19 15:05:44.095065 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap app_iter = application(self.environ, start_response) 2022-08-19 15:05:44.095068 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/placement/handler.py", line 215, in __call__ 2022-08-19 15:05:44.095071 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return dispatch(environ, start_response, self._map) 2022-08-19 15:05:44.095074 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/placement/handler.py", line 149, in dispatch 2022-08-19 15:05:44.095077 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return handler(environ, start_response) 2022-08-19 15:05:44.095083 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/webob/dec.py", line 129, in __call__ 2022-08-19 15:05:44.095086 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap resp = self.call_func(req, *args, **kw) 2022-08-19 15:05:44.095089 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/placement/wsgi_wrapper.py", line 29, in call_func 2022-08-19 15:05:44.095092 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap super(PlacementWsgify, self).call_func(req, *args, **kwargs) 2022-08-19 15:05:44.095094 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/webob/dec.py", line 193, in call_func 2022-08-19 15:05:44.095097 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return self.func(req, *args, **kwargs) 2022-08-19 15:05:44.095100 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/placement/util.py", line 64, in decorated_function 2022-08-19 15:05:44.095103 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return f(req) 2022-08-19 15:05:44.095106 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/placement/handlers/allocation.py", line 299, in list_for_resource_provider 2022-08-19 15:05:44.098861 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098864 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098867 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098870 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098873 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098876 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098893 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098896 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098899 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098905 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098907 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 255, in __call__ 2022-08-19 15:05:44.098910 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _check( 2022-08-19 15:05:44.098913 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098916 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098919 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 255, in __call__ 2022-08-19 15:05:44.098922 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _check( 2022-08-19 15:05:44.098925 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098928 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098930 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098933 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098936 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098939 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098941 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098944 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098947 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098950 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098952 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 255, in __call__ 2022-08-19 15:05:44.098955 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _check( 2022-08-19 15:05:44.098958 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098960 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098963 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 255, in __call__ 2022-08-19 15:05:44.098966 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _check( 2022-08-19 15:05:44.098968 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098971 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098974 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098977 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098980 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098982 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098985 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098991 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098993 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098996 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098999 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 255, in __call__ 2022-08-19 15:05:44.099002 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _check( 2022-08-19 15:05:44.099004 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist- packages/oslo_policy/_checks.py", line 75, in _check 2022-08-19 15:05:44.099007 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap argspec = inspect.getfullargspec(rule.__call__) 2022-08-19 15:05:44.099010 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 1144, in getfullargspec 2022-08-19 15:05:44.099013 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap raise TypeError('unsupported callable') from ex 2022-08-19 15:05:44.099016 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap TypeError: unsupported callable 2022-08-19 15:05:44.099018 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap \x1b[00m Placement-api is deployed in a container, so I got a fresh policy.yaml file. Did someone already face this? Do you have any idea how to fix this? Best regards, Romain -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4513 bytes Desc: not available URL: From alsotoes at gmail.com Fri Aug 19 15:49:22 2022 From: alsotoes at gmail.com (Alvaro Soto) Date: Fri, 19 Aug 2022 10:49:22 -0500 Subject: [swift] Terraform has deprecated the Swift backend In-Reply-To: References: Message-ID: S3 will do it. On Fri, Aug 19, 2022 at 9:21 AM Clay Gerrard wrote: > Do you use the swift backend for terraform state storage? It looks like > they're dumping a bunch of other backends too - maybe they did a > user-survey and they're just keeping the top-5 cloud providers or something. > > On Thu, Aug 18, 2022 at 4:24 PM Andrew Ruthven wrote: > >> Hey, >> >> It has come to our attention that Terraform has recently added >> deprecation messages to a number of backends, including Swift[0], warning >> that these backends will be removed in a future version of Terraform. >> Unfortunately we don't have the bandwidth within Catalyst Cloud to pick >> this up, but I was hopeful that there'd be others on this list who share >> our concern. >> >> It looks as though Terraform has had the Swift backend marked as >> unmaintained since at least March 2020[1]. >> >> If there is another backend, or another method of managing Swift that >> isn't the S3 API then I'd be keen to hear about it. >> >> Kind regards, >> Andrew >> >> [0] >> https://github.com/hashicorp/terraform/commit/7941b2fbdc33a42a68b9b32af51e09f7df35fe66 >> [1] >> https://github.com/hashicorp/terraform/commit/c434db158e631b0bfddb92e1dd342b924880f29a >> >> >> -- >> >> Andrew Ruthven, Wellington, New Zealandandrew at etc.gen.nz | >> Catalyst Cloud: | This space intentionally left blank >> https://catalystcloud.nz | >> >> >> > > -- > Clay Gerrard > 210 788 9431 > -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Fri Aug 19 16:48:38 2022 From: kennelson11 at gmail.com (Kendall Nelson) Date: Fri, 19 Aug 2022 11:48:38 -0500 Subject: Fwd: Virtual PTG October 2022 Team Signup Reminder - Last Call In-Reply-To: References: Message-ID: Hello Everyone, This is your last call to sign up your team up for the next virtual Project Teams Gathering (PTG), which will be held from Monday, October 17 to Friday, October 21st, 2022! NOTE: Since we are now hosting the PTG virtually, we will be holding the event all week (Monday - Friday) and will be using the ?normal? virtual schedule of 4 hour blocks of time covering hours for Americas/APAC, APAC/Europe+Africa, and Europe+Africa/Americas. If you haven't already done so and your team is interested in participating, please complete the survey[1] by August 26th, 2022 at 7:00 UTC. Then make sure to register[2] - it?s free :) Thanks! -Kendall (diablo_rojo) [1] Team Survey: https://openinfrafoundation.formstack.com/forms/oct2022_ptg_team_signup [2] PTG Registration: https://openinfra-ptg.eventbrite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Fri Aug 19 17:02:25 2022 From: johfulto at redhat.com (John Fulton) Date: Fri, 19 Aug 2022 13:02:25 -0400 Subject: [Triple0] [Wallaby] External Ceph Integration getting failed In-Reply-To: References: Message-ID: On Fri, Aug 19, 2022 at 3:45 AM Lokendra Rathour wrote: > Hi Fulton, > Thanks for the inputs and apologies for the delay in response. > to my surprise passing the container prepare in standard worked for me, > new container-prepare is: > > parameter_defaults: > ContainerImagePrepare: > - push_destination: true > set: > ceph_alertmanager_image: alertmanager > ceph_alertmanager_namespace: quay.ceph.io/prometheus > ceph_alertmanager_tag: v0.16.2 > ceph_grafana_image: grafana > ceph_grafana_namespace: quay.ceph.io/app-sre > ceph_grafana_tag: 6.7.4 > ceph_image: daemon > ceph_namespace: quay.io/ceph > ceph_node_exporter_image: node-exporter > ceph_node_exporter_namespace: quay.ceph.io/prometheus > ceph_node_exporter_tag: v0.17.0 > ceph_prometheus_image: prometheus > ceph_prometheus_namespace: quay.ceph.io/prometheus > ceph_prometheus_tag: v2.7.2 > ceph_tag: v6.0.7-stable-6.0-pacific-centos-stream8 > name_prefix: openstack- > name_suffix: '' > namespace: myserver.com:5000/tripleowallaby > neutron_driver: ovn > rhel_containers: false > tag: current-tripleo > tag_from_label: rdo_version > > But if we see or look at these containers I do not see any such containers > available. we have tried looking at Undercloud and overcloud. > The undercloud can download continers from the sources above and then act as a container registry. It's described here: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/container_image_prepare.html > Also, the deployment is done when we are passing this config. > Thanks once again. > > Also, we need to understand some use cases of using the storage from this > external ceph, which can work as the mount for the VM as direct or Shared > storage. Any idea or available document which tells more about how to > consume external Ceph in the existing triple Overcloud? > Ceph can provide OpenStack Block, Object and File storage and TripleO supports a variety of integration options for them. TripleO can deploy Ceph as part of the OpenStack overcloud: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html TripleO can also deploy an OpenStack overcloud which uses an existing external ceph cluster: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/ceph_external.html At the end of both of these documents you can expect Glance, Nova, and Cinder to use Ceph block storage (RBD). You can also have OpenStack use Ceph object storage (RGW). When RGW is used, a command like "openstack container create foo" will create an object storage container (not to be confused with podman/docker) on CephRGW as if your overcloud were running OpenStack Swift. If you have TripleO deploy Ceph as part of the OpenStack overcloud, RGW will be deployed and configured for OpenStack object storage by default (in Wallaby+). The OpenStack Manila service can use CephFS as one of its backends. TripleO can deploy that too as described here: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_manila.html John > Do share in case you know any, please. > > Thanks once again for the support, it was really helpful > > > On Thu, Aug 11, 2022 at 9:59 PM John Fulton wrote: > >> The ceph container should no longer be needed for external ceph >> configuration (since the move from ceph-ansible to cephadm) but if removing >> the ceph env files makes the error go away, then try adding it back and >> then following these steps to prepare the ceph container on your undercloud >> before deploying. >> >> >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html#container-options >> >> On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour < >> lokendrarathour at gmail.com> wrote: >> >>> Hi Thanks, >>> for the inputs, we could see the miss, >>> now we have added the required miss : >>> "TripleO resource >>> OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml" >>> >>> Now with this setting if we deploy the setup in wallaby, we are >>> getting error as: >>> >>> >>> PLAY [External deployment step 1] >>> ********************************************** >>> 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | >>> TASK | External deployment step 1 >>> 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | >>> OK | External deployment step 1 | undercloud -> localhost | result={ >>> "changed": false, >>> "msg": "Use --start-at-task 'External deployment step 1' to resume >>> from this task" >>> } >>> [WARNING]: ('undercloud -> localhost', >>> '525400d4-7124-4a42-664c-0000000000a8') >>> missing from stats >>> 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | >>> TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s >>> 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | >>> INCLUDED | >>> /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml >>> | undercloud >>> 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | >>> TASK | Set some tripleo-ansible facts >>> 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | >>> OK | Set some tripleo-ansible facts | undercloud >>> 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | >>> TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | >>> 0.03s >>> 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | >>> TASK | Container image prepare >>> 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | >>> FATAL | Container image prepare | *undercloud | error={"changed": >>> false, "error": "None: Max retries exceeded with url: /v2/ (Caused by >>> None)", "msg": "Error running container image prepare: None: Max retries >>> exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* >>> 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | >>> TIMING | tripleo_container_image_prepare : Container image prepare | >>> undercloud | 0:06:13.385607 | 72.12s >>> >>> This gets failed at step 1, As this is wallaby and based on the document >>> (Use an external Ceph cluster with the Overcloud ? TripleO 3.0.0 >>> documentation (openstack.org) >>> ) >>> we should only pass this external-ceph.yaml for the external ceph >>> intergration. >>> But it is not happening. >>> >>> >>> Few things to note: >>> 1. Container Prepare: >>> >>> (undercloud) [stack at undercloud ~]$ cat containers-prepare-parameter.yaml >>> # Generated with the following on 2022-06-28T18:56:38.642315 >>> # >>> # openstack tripleo container image prepare default >>> --local-push-destination --output-env-file >>> /home/stack/containers-prepare-parameter.yaml >>> # >>> >>> >>> parameter_defaults: >>> ContainerImagePrepare: >>> - push_destination: true >>> set: >>> name_prefix: openstack- >>> name_suffix: '' >>> namespace: myserver.com:5000/tripleowallaby >>> neutron_driver: ovn >>> rhel_containers: false >>> tag: current-tripleo >>> tag_from_label: rdo_version >>> (undercloud) [stack at undercloud ~]$ >>> >>> 2. this is SSL based deployment. >>> >>> Any idea for the error, the issue is seen only once we have the external >>> ceph integration enabled. >>> >>> Best Regards, >>> Lokendra >>> >>> >>> >>> >>> On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano >>> wrote: >>> >>>> Hi, >>>> ceph is supposed to be configured by this tripleo-ansible role [1], >>>> which is triggered by tht on external_deploy_steps [2]. >>>> In theory adding [3] should just work, assuming you customize the ceph >>>> cluster mon ip addresses, fsid and a few other related variables. >>>> From your previous email I suspect in your external-ceph.yaml you >>>> missed the TripleO resource OS::TripleO::Services::CephExternal: >>>> ../deployment/cephadm/ceph-client.yaml >>>> (see [3]). >>>> >>>> Thanks, >>>> Francesco >>>> >>>> >>>> [1] >>>> https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_ceph_client >>>> [2] >>>> https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/cephadm/ceph-client.yaml#L93 >>>> [3] >>>> https://github.com/openstack/tripleo-heat-templates/blob/master/environments/external-ceph.yaml >>>> >>>> On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < >>>> lokendrarathour at gmail.com> wrote: >>>> >>>>> Hi Team, >>>>> I was trying to integrate External Ceph with Triple0 Wallaby, and at >>>>> the end of deployment in step4 getting the below error: >>>>> >>>>> 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 >>>>> 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | >>>>> Create containers from >>>>> /var/lib/tripleo-config/container-startup-config/step_4 >>>>> 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 >>>>> 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | >>>>> /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | >>>>> overcloud-controller-2 >>>>> 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 >>>>> 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | >>>>> Create containers managed by Podman for >>>>> /var/lib/tripleo-config/container-startup-config/step_4 >>>>> 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 >>>>> 18:37:24.530812 | | WARNING | >>>>> ERROR: Can't run container nova_libvirt_init_secret >>>>> stderr: >>>>> 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 >>>>> 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | >>>>> Create containers managed by Podman for >>>>> /var/lib/tripleo-config/container-startup-config/step_4 | >>>>> overcloud-novacompute-0 | error={"changed": false, "msg": "Failed >>>>> containers: nova_libvirt_init_secret"} >>>>> 2022-08-03 18:37:44,282 p=507732 u >>>>> >>>>> >>>>> *external-ceph.conf:* >>>>> >>>>> parameter_defaults: >>>>> # Enable use of RBD backend in nova-compute >>>>> NovaEnableRbdBackend: True >>>>> # Enable use of RBD backend in cinder-volume >>>>> CinderEnableRbdBackend: True >>>>> # Backend to use for cinder-backup >>>>> CinderBackupBackend: ceph >>>>> # Backend to use for glance >>>>> GlanceBackend: rbd >>>>> # Name of the Ceph pool hosting Nova ephemeral images >>>>> NovaRbdPoolName: vms >>>>> # Name of the Ceph pool hosting Cinder volumes >>>>> CinderRbdPoolName: volumes >>>>> # Name of the Ceph pool hosting Cinder backups >>>>> CinderBackupRbdPoolName: backups >>>>> # Name of the Ceph pool hosting Glance images >>>>> GlanceRbdPoolName: images >>>>> # Name of the user to authenticate with the external Ceph cluster >>>>> CephClientUserName: admin >>>>> # The cluster FSID >>>>> CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' >>>>> # The CephX user auth key >>>>> CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' >>>>> # The list of Ceph monitors >>>>> CephExternalMonHost: >>>>> 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' >>>>> ~ >>>>> >>>>> >>>>> Have tried checking and validating the ceph client details and they >>>>> seem to be correct, further digging the container log I could see something >>>>> like this : >>>>> >>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>> nova_libvirt_init_secret.log >>>>> tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such >>>>> file or directory >>>>> tail: no files remaining >>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>> stdouts/nova_libvirt_init_secret.log >>>>> 2022-08-04T11:48:47.689898197+05:30 stdout F >>>>> ------------------------------------------------ >>>>> 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh >>>>> secrets for: ceph:admin >>>>> 2022-08-04T11:48:47.690590594+05:30 stdout F Error: >>>>> /etc/ceph/ceph.conf was not found >>>>> 2022-08-04T11:48:47.690625088+05:30 stdout F Path to >>>>> nova_libvirt_init_secret was ceph:admin >>>>> 2022-08-04T16:20:29.643785538+05:30 stdout F >>>>> ------------------------------------------------ >>>>> 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh >>>>> secrets for: ceph:admin >>>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Error: >>>>> /etc/ceph/ceph.conf was not found >>>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Path to >>>>> nova_libvirt_init_secret was ceph:admin >>>>> ^C >>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>> stdouts/nova_compute_init_log.log >>>>> >>>>> -- >>>>> ~ Lokendra >>>>> skype: lokendrarathour >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Francesco Pantano >>>> GPG KEY: F41BD75C >>>> >>> >>> >>> -- >>> ~ Lokendra >>> skype: lokendrarathour >>> >>> >>> > > -- > ~ Lokendra > skype: lokendrarathour > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at etc.gen.nz Sat Aug 20 04:27:34 2022 From: andrew at etc.gen.nz (Andrew Ruthven) Date: Sat, 20 Aug 2022 16:27:34 +1200 Subject: [swift] Terraform has deprecated the Swift backend In-Reply-To: References: Message-ID: <32c78882b31f13a10518d7f25b69ce043772161d.camel@etc.gen.nz> Hey, Some of our customer are using the swift backend for terraform state storage. By the look of things they're going to drop all of the backends that are unmaintained. We've had a look at the S3 remote, sadly it isn't suitable as it uses DynamoDB for locking. We'd rather not tell customers they have to use services from another cloud provider. :) The locking in the Swift backend isn't that great either, as it appears to store the lock state in Swift which with eventual consistency... It seems that our current recommendation to customers will end up being to use the PostgreSQL backend which they can use by deploying a Trove instance. Cheers, Andrew On Fri, 2022-08-19 at 10:49 -0500, Alvaro Soto wrote: > S3 will do it. > > On Fri, Aug 19, 2022 at 9:21 AM Clay Gerrard > wrote: > > Do you use the swift backend for terraform state storage?? It looks > > like they're dumping a bunch of other backends too - maybe they did > > a user-survey and they're just keeping the top-5 cloud providers or > > something. > > > > On Thu, Aug 18, 2022 at 4:24 PM Andrew Ruthven > > wrote: > > > Hey, > > > > > > It has come to our attention that Terraform has recently added > > > deprecation messages to a number of backends, including Swift[0], > > > warning that these backends will be removed in a future version > > > of Terraform. Unfortunately we don't have the bandwidth within > > > Catalyst Cloud to pick this up, but I was hopeful that there'd be > > > others on this list who share our concern. > > > > > > It looks as though Terraform has had the Swift backend marked as > > > unmaintained since at least March 2020[1]. > > > > > > If there is another backend, or another method of managing Swift > > > that isn't the S3 API then I'd be keen to hear about it. > > > > > > Kind regards, > > > Andrew > > > > > > [0]? > > > https://github.com/hashicorp/terraform/commit/7941b2fbdc33a42a68b9b32af51e09f7df35fe66 > > > [1]? > > > https://github.com/hashicorp/terraform/commit/c434db158e631b0bfddb92e1dd342b924880f29a > > > > > > > > > -- > > > Andrew Ruthven, Wellington, New Zealand > > > andrew at etc.gen.nz | > > > Catalyst Cloud: | This space intentionally left blank > > > https://catalystcloud.nz | > > > > > > > > > -- > > Clay Gerrard > > 210 788 9431 > > -- Andrew Ruthven, Wellington, New Zealand andrew at etc.gen.nz | Catalyst Cloud: | This space intentionally left blank https://catalystcloud.nz | -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.urdin at binero.com Sat Aug 20 11:18:28 2022 From: tobias.urdin at binero.com (Tobias Urdin) Date: Sat, 20 Aug 2022 11:18:28 +0000 Subject: [swift] Terraform has deprecated the Swift backend In-Reply-To: <32c78882b31f13a10518d7f25b69ce043772161d.camel@etc.gen.nz> References: <32c78882b31f13a10518d7f25b69ce043772161d.camel@etc.gen.nz> Message-ID: <51E47B83-4308-48CA-BDE9-129691AE83AD@binero.com> Hello, This is a rather sad trend that we are seeing now, unfortunately that?s the reality when people don?t step up supporting the integrations they need in open source. We are seeing the same issue in Ceph RadosGW now that there is interest in deprecating the whole Barbican integration which I assume we are not alone in using. I for one will we be working on supporting the Ceph developers maintaining the Barbican support hopefully somebody with insight into Terraform can do the same and step up to maintain the Swift support. Best regards Tobias Sent from my iPhone On 20 Aug 2022, at 06:44, Andrew Ruthven wrote: ? Hey, Some of our customer are using the swift backend for terraform state storage. By the look of things they're going to drop all of the backends that are unmaintained. We've had a look at the S3 remote, sadly it isn't suitable as it uses DynamoDB for locking. We'd rather not tell customers they have to use services from another cloud provider. :) The locking in the Swift backend isn't that great either, as it appears to store the lock state in Swift which with eventual consistency... It seems that our current recommendation to customers will end up being to use the PostgreSQL backend which they can use by deploying a Trove instance. Cheers, Andrew On Fri, 2022-08-19 at 10:49 -0500, Alvaro Soto wrote: S3 will do it. On Fri, Aug 19, 2022 at 9:21 AM Clay Gerrard > wrote: Do you use the swift backend for terraform state storage? It looks like they're dumping a bunch of other backends too - maybe they did a user-survey and they're just keeping the top-5 cloud providers or something. On Thu, Aug 18, 2022 at 4:24 PM Andrew Ruthven > wrote: Hey, It has come to our attention that Terraform has recently added deprecation messages to a number of backends, including Swift[0], warning that these backends will be removed in a future version of Terraform. Unfortunately we don't have the bandwidth within Catalyst Cloud to pick this up, but I was hopeful that there'd be others on this list who share our concern. It looks as though Terraform has had the Swift backend marked as unmaintained since at least March 2020[1]. If there is another backend, or another method of managing Swift that isn't the S3 API then I'd be keen to hear about it. Kind regards, Andrew [0] https://github.com/hashicorp/terraform/commit/7941b2fbdc33a42a68b9b32af51e09f7df35fe66 [1] https://github.com/hashicorp/terraform/commit/c434db158e631b0bfddb92e1dd342b924880f29a -- Andrew Ruthven, Wellington, New Zealand andrew at etc.gen.nz | Catalyst Cloud: | This space intentionally left blank https://catalystcloud.nz | -- Clay Gerrard 210 788 9431 -- Andrew Ruthven, Wellington, New Zealand andrew at etc.gen.nz | Catalyst Cloud: | This space intentionally left blank https://catalystcloud.nz | -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Sat Aug 20 13:25:48 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 20 Aug 2022 18:55:48 +0530 Subject: [all] Proposed Antelope cycle schedule In-Reply-To: <7f3c8429-b6b7-8c42-da2e-f933d3b993b4@openstack.org> References: <1449e74b-9c0b-74b3-cb7b-63a63037a453@est.tech> <7f3c8429-b6b7-8c42-da2e-f933d3b993b4@openstack.org> Message-ID: <182bb6e5c9b.12238194e22294.3318056690748204523@ghanshyammann.com> ---- On Thu, 11 Aug 2022 14:39:58 +0530 Thierry Carrez wrote --- > El?d Ill?s wrote: > > [...] > > Please review this as well and give us feedback which one is better. > > Also, we would like to ask Foundation and Technical Committee to decide > > between the 2 options based on the reviews. > From a Foundation marketing perspective both solutions can work. The > only difference is where a PTG could happen, given religious holidays in > the early weeks of April. > > If we pick the 24-week option[1], we keep the option to hold a PTG the > week of March 27. > > If we pick the 25-week option[2], if we wanted to do a PTG within the > first weeks of the cycle, our only option would be to hold the PTG on > the same week as release week. > > So... small preference for the 24-week option as people are generally > more available to participate in release announcements if PTG week is > not happening at the same time, so it gives us more flexibility. > > That said, the TC should ultimately pick between the two options, as > there may be other factors playing in. IMO, 24th week one (852741) will be good as PTG and final release in same week might be hectic. -gmann > > [1] > https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_dd6/852741/1/check/openstack-tox-docs/dd6af3f/docs/antelope/schedule.html > > [2] > https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_699/850753/5/check/openstack-tox-docs/699fc2d/docs/antelope/schedule.html > > -- > Thierry Carrez (ttx) > > From fv at spots.edu Sat Aug 20 17:59:16 2022 From: fv at spots.edu (Father Vlasie) Date: Sat, 20 Aug 2022 10:59:16 -0700 Subject: [openstack-ansible] Reset Target Host Message-ID: <5A2D4C60-D5AD-429C-8842-FABE0F031F2F@spots.edu> Hello everyone, Is there a way to reset a target host with OpenStack-Ansible? What I mean is to remove all the changes that have been applied, say, to infra1 so that the scripts can be deployed afresh. (It seems that the only way to get back to square one is to do a OS reinstall?is that right?) I am trying to avoid a reinstall because I am deploying from a remote location. Thank you, Father Vlasie From james.denton at rackspace.com Thu Aug 18 02:38:09 2022 From: james.denton at rackspace.com (James Denton) Date: Thu, 18 Aug 2022 02:38:09 +0000 Subject: [openstack-ansible] [yoga] utility_container failure In-Reply-To: References: <9FEF486C-780F-46B2-B9A4-5DEFC215A139@spots.edu> <5C79D786-B1AF-425B-9BEE-D683FA94A907@spots.edu> <8681B9FD-D061-4D97-A46D-91FFC33AFE96@spots.edu> Message-ID: Hello, > Strangely I get "ssh: connect to host infra1_repo_container-20deb465 port 22: No route to host" This could mean the hosts file doesn?t have an entry. It looks like the Ansible inventory has corresponding entries, so you?re probably fine there. From ?infra1?, you can try ?lxc-attach -n infra1_repo_container-20deb465? to attach to the container directly, and run those same commands mentioned earlier. For the infra2 container, you?ll want to connect from infra2 with ?lxc-attach?. Can you confirm if this is Yoga or Master? Also, are you running w/ Rocky Linux 8.6 (as a previous thread indicates)? TBH I have not tested that, yet, and am not sure of the gotchas. James Denton Rackspace Private Cloud From: Father Vlasie Date: Wednesday, August 17, 2022 at 9:18 PM To: James Denton Cc: openstack-discuss at lists.openstack.org Subject: Re: [openstack-ansible] [yoga] utility_container failure CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Hello, > On Aug 17, 2022, at 5:18 PM, James Denton wrote: > > Hello, > > My recommendation is to try running these commands from the deploy node and see what the output is (or maybe try running the playbooks in verbose mode with -vvv): Here is the output from "setup-infrastructure.yml -vvv" https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpaste.opendev.org%2Fshow%2FbCGUOb177z2oC5P3nR5Z%2F&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf947b56a0f9249d40cec08da80bfca26%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963858891500677%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DV2%2Bji7aeHe7HhAwGFmT8tkCWh2SS%2BKeHpAm1JVMIhs%3D&reserved=0 > # ssh infra1_repo_container-20deb465 > # systemctl status glusterd.service > # journalctl -xe -u glusterd.service > # exit > > ^^ Might also consider restarting glusterd and checking the journal to see if there?s an error. Strangely I get "ssh: connect to host infra1_repo_container-20deb465 port 22: No route to host" > # ssh infra2_repo_container-6cd61edd > # systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\") > # systemctl status var-www-repo.mount > # journalctl -xe > # exit > A similar error for this too "ssh: connect to host infra2_repo_container-6cd61edd port 22: Network is unreachable" > The issue may be obvious. Maybe not. If you can ship that output to paste.openstack.org we might be able to diagnose. Here is the verbose output for the glusterfs error: https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpaste.openstack.org%2Fshow%2Fbw0qIhUzuZ1de0qjKzfK%2F&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf947b56a0f9249d40cec08da80bfca26%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963858891500677%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=j2S5KZZnTHH9vD0oKEZfLxaNmjGxlEyCKbzoxxvXm5M%3D&reserved=0 > > The mountpoint command will return 0 if /var/www/repo is a mountpoint, and 1 if it is not a mountpoint. Looks like it is probably failing due to a previous task (ie. It is not being mounted). Understanding why glusterfs is failing may be key here. > > > I have destroyed all of my containers and I am running setup-hosts again > > Can you describe what you did here? Simply destroy the LXC containers or did you wipe the inventory, too? I used the command: openstack-ansible lxc-containers-destroy.yml I answered affirmatively to the two questions asked about the removal of the containers and the container data. Thank you once again! FV > > Thanks, > James Denton > Rackspace Private Cloud > > From: Father Vlasie > Date: Wednesday, August 17, 2022 at 5:22 PM > To: James Denton > Cc: openstack-discuss at lists.openstack.org > Subject: Re: [openstack-ansible] [yoga] utility_container failure > > CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > > > Hello again! > > I have completed the run of setup-hosts successfully. > > However I am still seeing errors when running setup-infrastructure: > > ------ > > TASK [openstack.osa.glusterfs : Start glusterfs server] ********************************************************************** > fatal: [infra1_repo_container-20deb465]: FAILED! => {"changed": false, "msg": "Unable to start service glusterd: Job for glusterd.service failed because the control process exited with error code.\nSee \"systemctl status glusterd.service\" and \"journalctl -xe\" for details.\n"} > > ------ > > TASK [systemd_mount : Set the state of the mount] **************************************************************************** > fatal: [infra2_repo_container-6cd61edd]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.021452", "end": "2022-08-17 18:17:37.172187", "msg": "non-zero return code", "rc": 1, "start": "2022-08-17 18:17:37.150735", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []} > > ------ > > fatal: [infra2_repo_container-6cd61edd]: FAILED! => {"attempts": 5, "changed": false, "cmd": ["mountpoint", "-q", "/var/www/repo"], "delta": "0:00:00.002310", "end": "2022-08-17 18:18:04.297940", "msg": "non-zero return code", "rc": 1, "start": "2022-08-17 18:18:04.295630", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} > > ------ > > infra1_repo_container-20deb465 : ok=30 changed=2 unreachable=0 failed=1 skipped=14 rescued=0 ignored=0 > infra2_repo_container-6cd61edd : ok=66 changed=6 unreachable=0 failed=2 skipped=22 rescued=1 ignored=0 > infra3_repo_container-7ca5db88 : ok=64 changed=6 unreachable=0 failed=2 skipped=22 rescued=1 ignored=0 > > ------ > > Again any help is much appreciated! > > Thank you, > > FV > > > On Aug 17, 2022, at 2:16 PM, Father Vlasie wrote: > > > > Hello, > > > > I am very appreciative of your help! > > > > I think my interface setup might be questionable. > > > > I did not realise that the nodes need to talk to each other on the external IP. I thought that was only for communication with entities external to the cluster. > > > > My bond0 is associated with br-vlan so I put the external IP there and set br-vlan as the external interface in user_variables. > > > > The nodes can now ping each other on the external network. > > > > This is how I have user_variables configured: > > > > ??? > > > > haproxy_keepalived_external_vip_cidr: ?192.168.2.9/26" > > haproxy_keepalived_internal_vip_cidr: "192.168.3.9/32" > > haproxy_keepalived_external_interface: br-vlan > > haproxy_keepalived_internal_interface: br-mgmt > > haproxy_bind_external_lb_vip_address: 192.168.2.9 > > haproxy_bind_internal_lb_vip_address: 192.168.3.9 > > > > ??? > > > > My IP addresses are configured thusly (one sample from each node type): > > > > ??? > > > > infra1 > > bond0->br-vlan 192.168.2.13 > > br-mgmt 192.168.3.13 > > br-vxlan 192.168.30.13 > > br-storage > > > > compute1 > > br-vlan > > br-mgmt 192.168.3.16 > > br-vxlan 192.168.30.16 > > br-storage 192.168.20.16 > > > > log1 > > br-vlan > > br-mgmt 192.168.3.19 > > br-vxlan > > br-storage > > > > ??? > > > > I have destroyed all of my containers and I am running setup-hosts again. > > > > Here?s to hoping it all turns out this time! > > > > Very gratefully, > > > > FV > > > >> On Aug 16, 2022, at 7:31 PM, James Denton wrote: > >> > >> Hello, > >> > >>>> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? > >> > >> This will likely be the bond0 interface and not the individual bond member. However, the interface defined here will ultimately depend on the networking of that host, and should be an external facing one (i.e. the interface with the default gateway). > >> > >> In many environments, you?ll have something like this (or using 2 bonds, but same idea): > >> > >> ? Bond0 (192.168.100.5/24 gw 192.168.100.1) > >> ? Em49 > >> ? Em50 > >> ? Br-mgmt (172.29.236.5/22) > >> ? Bond0.236 > >> ? Br-vxlan (172.29.240.5/22) > >> ? Bond0.240 > >> ? Br-storage (172.29.244.5/22) > >> ? Bond0.244 > >> > >> In this example, bond0 has the management IP 192.168.100.5 and br-mgmt is the ?container? bridge with an IP configured from the ?container? network (see cidr_networks in openstack_user_config.yml). FYI: LXC containers will automatically be assigned IPs from the ?container? network outside of the ?used_ips? range(s). The infra host will communicate with the containers via this br-mgmt interface. > >> > >> I?m using FQDNs for the VIPs, which are specified in openstack_user_config.yml here: > >> > >> global_overrides: > >> internal_lb_vip_address: internalapi.openstack.rackspace.lab > >> external_lb_vip_address: publicapi.openstack.rackspace.lab > >> > >> To avoid DNS resolution issues internally (or rather, to ensure the IP is configured in the config files and not the domain name) I?ll override with the IP and hard set the preferred interface(s): > >> > >> haproxy_keepalived_external_vip_cidr: "192.168.100.10/32" > >> haproxy_keepalived_internal_vip_cidr: "172.29.236.10/32" > >> haproxy_keepalived_external_interface: bond0 > >> haproxy_keepalived_internal_interface: br-mgmt > >> haproxy_bind_external_lb_vip_address: 192.168.100.10 > >> haproxy_bind_internal_lb_vip_address: 172.29.236.10 > >> > >> With the above configuration, keepalived will manage two VIPs - one external and one internal, and endpoints will have the FQDN rather than IP. > >> > >>>> Curl shows "503 Service Unavailable No server is available to handle this request? > >> > >> Hard to say without seeing logs why this is happening, but I will assume that keepalived is having issues binding the IP to the interface. You might find the reason in syslog or ?journalctl -xe -f -u keepalived?. > >> > >>>> Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." > >> > >> You might try running ?umount /var/www/repo? and re-run the repo-install.yml playbook (or setup-infrastructure.yml). > >> > >> Hope that helps! > >> > >> James Denton > >> Rackspace Private Cloud > >> > >> From: Father Vlasie > >> Date: Tuesday, August 16, 2022 at 4:31 PM > >> To: James Denton > >> Cc: openstack-discuss at lists.openstack.org > >> Subject: Re: [openstack-ansible] [yoga] utility_container failure > >> > >> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > >> > >> > >> Hello, > >> > >> Thank you very much for the reply! > >> > >> haproxy and keepalived both show status active on infra1 (my primary node). > >> > >> Curl shows "503 Service Unavailable No server is available to handle this request? > >> > >> (Also the URL is http not https?.) > >> > >> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0? > >> > >> Earlier in the output I find the following error (showing for all 3 infra nodes): > >> > >> ------------ > >> > >> TASK [systemd_mount : Set the state of the mount] ***************************************************************************************************************************************** > >> fatal: [infra3_repo_container-7ca5db88]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.022275", "end": "2022-08-16 14:16:34.926861", "msg": "non-zero return code", "rc": 1, "start": "2022-08-16 14:16:34.904586", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []} > >> > >> ?????? > >> > >> Running "systemctl status var-www-repo.mount? gives an output of ?Unit var-www-repo.mount could not be found." > >> > >> Thank you again! > >> > >> Father Vlasie > >> > >>> On Aug 16, 2022, at 6:32 AM, James Denton wrote: > >>> > >>> Hello, > >>> > >>> That error means the repo server at 192.168.3.9:8181 is unavailable. The repo server sits behind haproxy, which should be listening on 192.168.3.9 port 8181 on the active (primary) node. You can verify this by issuing a ?curl -vhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2F192.168.3.9%3A8181%2F%25E2%2580%2599&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf947b56a0f9249d40cec08da80bfca26%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963858891500677%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BK4QPjACSUzlGKvXGmijRhDooI5CjuEvyIo%2BPJIM3is%3D&reserved=0. You might check the haproxy service status and/or keepalived status to ensure they are operating properly. If the IP cannot be bound to the correct interface, keepalive may not start. > >>> > >>> James Denton > >>> Rackspace Private Cloud > >>> > >>> From: Father Vlasie > >>> Date: Tuesday, August 16, 2022 at 7:38 AM > >>> To: openstack-discuss at lists.openstack.org > >>> Subject: [openstack-ansible] [yoga] utility_container failure > >>> > >>> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > >>> > >>> > >>> Hello everyone, > >>> > >>> I have happily progressed to the second step of running the playbooks, namely "openstack-ansible setup-infrastructure.yml" > >>> > >>> Everything looks good except for just one error which is mystifying me: > >>> > >>> ---------------- > >>> > >>> TASK [Get list of repo packages] ********************************************************************************************************************************************************** > >>> fatal: [infra1_utility_container-5ec32cb5]: FAILED! => {"changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: ", "redirected": false, "status": -1, "url": "https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.3.9%3A8181%2Fconstraints%2Fupper_constraints_cached.txt&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf947b56a0f9249d40cec08da80bfca26%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963858891500677%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XT5cfYQY6iWcdK860squouNdjwSdubSD%2FdzNbWOdHPY%3D&reserved=0"} > >>> > >>> ---------------- > >>> > >>> 192.168.3.9 is the IP listed in user_variables.yml under haproxy_keepalived_internal_vip_cidr > >>> > >>> Any help or pointers would be very much appreciated! > >>> > >>> Thank you, > >>> > >>> Father Vlasie > >>> > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From venkat.reddybe at gmail.com Fri Aug 19 05:33:44 2022 From: venkat.reddybe at gmail.com (Venkat Reddy) Date: Fri, 19 Aug 2022 11:03:44 +0530 Subject: Floating IP Message-ID: Hi All Goodmorning, Please help me ,I have the requirement to assign floating IP automatically while creating an instance. Can please help me. -- Thanks & Regards, Venkat Reddy.S -------------- next part -------------- An HTML attachment was scrubbed... URL: From rishat.azizov at gmail.com Sat Aug 20 14:18:30 2022 From: rishat.azizov at gmail.com (=?UTF-8?B?0KDQuNGI0LDRgiDQkNC30LjQt9C+0LI=?=) Date: Sat, 20 Aug 2022 20:18:30 +0600 Subject: [octavia] Help with fix barbican client in octavia when use trust-scoped token Message-ID: Hello! I have error with terminated https loadbalancer, it described here: https://storyboard.openstack.org/#!/story/2007619 Could you please help with fix for barbican client? Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Sat Aug 20 19:08:41 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Sat, 20 Aug 2022 21:08:41 +0200 Subject: [octavia] Help with fix barbican client in octavia when use trust-scoped token In-Reply-To: References: Message-ID: Hey, It's not barbican client and issue, but how Octavia does create token out of application credentials. We've also catched that issue and tried to solve it from keystone side [1], but seems that code refactoring is required. While proposed workaround for keystone kind of works, I guess it might cause more serious security concerns, as basically creating token from application credentials token seems to be never supported by keystone. [1] https://bugs.launchpad.net/keystone/+bug/1959674 ??, 20 ???. 2022 ?., 20:29 ????? ?????? : > Hello! > > I have error with terminated https loadbalancer, it described here: > https://storyboard.openstack.org/#!/story/2007619 > > Could you please help with fix for barbican client? > Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From berndbausch at gmail.com Sun Aug 21 02:56:26 2022 From: berndbausch at gmail.com (Bernd Bausch) Date: Sun, 21 Aug 2022 11:56:26 +0900 Subject: Floating IP In-Reply-To: References: Message-ID: It's easily scripted. /openstack server create/ /myserver ... /followed by /openstack server add floating ip//myserver 192.168.1.123/. If this doesn't help you, consider explaining what your problem is. On 2022/08/19 2:33 PM, Venkat Reddy wrote: > Hi All Goodmorning, > > Please help me ,I have the requirement to assign?floating IP > automatically?while creating an instance. Can please help me. > > -- > Thanks & Regards, > > Venkat Reddy.S -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Sun Aug 21 12:43:24 2022 From: satish.txt at gmail.com (Satish Patel) Date: Sun, 21 Aug 2022 08:43:24 -0400 Subject: [openstack-ansible] Reset Target Host In-Reply-To: <5A2D4C60-D5AD-429C-8842-FABE0F031F2F@spots.edu> References: <5A2D4C60-D5AD-429C-8842-FABE0F031F2F@spots.edu> Message-ID: <614291AB-5F81-47EC-B340-0600DBC9C083@gmail.com> Hi, I would prefer reinstall to make everything clean and fresh. If it?s production the highly recommend reinstall. But sure you can just destroy containers and /var/lib/lxc and /openstack directory should be enough to restart deployment. There is a openstack-ansible-ops repo which has script to cleanup deployment also. Sent from my iPhone > On Aug 20, 2022, at 2:03 PM, Father Vlasie wrote: > > ?Hello everyone, > > Is there a way to reset a target host with OpenStack-Ansible? What I mean is to remove all the changes that have been applied, say, to infra1 so that the scripts can be deployed afresh. > > (It seems that the only way to get back to square one is to do a OS reinstall?is that right?) > > I am trying to avoid a reinstall because I am deploying from a remote location. > > Thank you, > > Father Vlasie From rishat.azizov at gmail.com Sat Aug 20 19:51:44 2022 From: rishat.azizov at gmail.com (=?utf-8?B?0KDQuNGI0LDRgiDQkNC30LjQt9C+0LI=?=) Date: Sun, 21 Aug 2022 01:51:44 +0600 Subject: [octavia] Help with fix barbican client in octavia when use trust-scoped token In-Reply-To: References: Message-ID: Hello, Dmitry, Thanks for answer, I think this issue with Barbican client inside Octavia, and it can be fixed in the same way as the problem with the neutron client was fixed by Ligxian Kong https://review.opendev.org/c/openstack/octavia/+/726042/ Or not? -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Sun Aug 21 20:17:11 2022 From: johnsomor at gmail.com (Michael Johnson) Date: Sun, 21 Aug 2022 13:17:11 -0700 Subject: [octavia] Help with fix barbican client in octavia when use trust-scoped token In-Reply-To: References: Message-ID: Hi, Can you please attach your traceback to the story? Michael On Sat, Aug 20, 2022 at 12:24 PM Dmitriy Rabotyagov wrote: > > Hey, > > It's not barbican client and issue, but how Octavia does create token out of application credentials. > > We've also catched that issue and tried to solve it from keystone side [1], but seems that code refactoring is required. > While proposed workaround for keystone kind of works, I guess it might cause more serious security concerns, as basically creating token from application credentials token seems to be never supported by keystone. > > [1] https://bugs.launchpad.net/keystone/+bug/1959674 > > > ??, 20 ???. 2022 ?., 20:29 ????? ?????? : >> >> Hello! >> >> I have error with terminated https loadbalancer, it described here: https://storyboard.openstack.org/#!/story/2007619 >> >> Could you please help with fix for barbican client? >> Thank you. From katonalala at gmail.com Mon Aug 22 07:28:17 2022 From: katonalala at gmail.com (Lajos Katona) Date: Mon, 22 Aug 2022 09:28:17 +0200 Subject: Floating IP In-Reply-To: References: Message-ID: Hi, Perhaps you can use Heat and hot templates: https://docs.openstack.org/heat/latest/template_guide/basic_resources.html#create-and-associate-a-floating-ip-to-an-instance Lajos Katona (lajoskatona) Venkat Reddy ezt ?rta (id?pont: 2022. aug. 20., Szo, 20:41): > Hi All Goodmorning, > > Please help me ,I have the requirement to assign floating IP > automatically while creating an instance. Can please help me. > > -- > Thanks & Regards, > > Venkat Reddy.S > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ginton.4k at gmail.com Mon Aug 22 08:43:03 2022 From: ginton.4k at gmail.com (ginton 4k) Date: Mon, 22 Aug 2022 15:43:03 +0700 Subject: [OVS] Can't access instances on a compute and after restarted ovs-vswitchd it's back to normal Message-ID: *Bug Description* We are using openstack ussuri with openvswitch version 2.13.3 and ovn version 20.03.2. We are using ubuntu 18.04.5 LTS with kernel 4.15.0-122-generic. My question is sometimes and consistently all instances in one compute can't access and when hard reboot instance we got an error message: Unable to add port tapxxx to OVS bridge br-int. After restart ovs-vswitchd service all instances can be accessed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Mon Aug 22 09:39:15 2022 From: katonalala at gmail.com (Lajos Katona) Date: Mon, 22 Aug 2022 11:39:15 +0200 Subject: [neutron] Bug deputy report for week of August 15 Message-ID: Hi Neutron Team I was the bug deputy in neutron last week, please check my summary. Needs attention ================= * OVN Metadata set up very slow (https://bugs.launchpad.net/neutron/+bug/1987060) * [stable/stein][stable/rocky}[stable/queens] CI jobs failing (https://bugs.launchpad.net/neutron/+bug/1986682 ) For stein I pushed 2 patches, not sure if with minimal effort it can be fixed: * https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/853794 * https://review.opendev.org/c/openstack/neutron/+/853608 In Progress ================= * [ovn-octavia-provider] HealthMonitor affecting several LBs (https://bugs.launchpad.net/neutron/+bug/1986977 ) In Progress ( https://review.opendev.org/c/openstack/ovn-octavia-provider/+/853681) * ovn: db sync fails if all ips are allocated in the subnet and metadata port is missing an ip (https://bugs.launchpad.net/neutron/+bug/1987135) In Progress: https://review.opendev.org/c/openstack/neutron/+/853840 Documentation bug ================= * Manually assign --device and --device-owner to a port does NOT binds the port immediately (https://bugs.launchpad.net/neutron/+bug/1986969 ) RFE proposals ================= * [rfe][fwaas]support standard_attrs for firewall_group (https://bugs.launchpad.net/neutron/+bug/1986906 ) we already discussed and approved this RFE on Drivers meeting * [RFE] Add a port extension to set/define the switchdev capabilities (https://bugs.launchpad.net/neutron/+bug/1987093 ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Mon Aug 22 10:19:21 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 22 Aug 2022 11:19:21 +0100 Subject: Floating IP In-Reply-To: References: Message-ID: <6b926e4230f20a6ecbb33d6fcae968392c7a30d7.camel@redhat.com> On Mon, 2022-08-22 at 09:28 +0200, Lajos Katona wrote: > Hi, > Perhaps you can use Heat and hot templates: > https://docs.openstack.org/heat/latest/template_guide/basic_resources.html#create-and-associate-a-floating-ip-to-an-instance heat would be one way. nova does not support creating an instance and automatically assigining a floating ip atomicly its also not something we plan to add in the future. if creating the server and addign the floating ip is not somethign that works for you you can do it the other way create the neutron port and add the floating ip to it then boot the vm with that port. that way the first time the vms boots it will already have the floating ip assgined to it but tis still going to require multiple steps. > > Lajos Katona (lajoskatona) > > Venkat Reddy ezt ?rta (id?pont: 2022. aug. 20., > Szo, 20:41): > > > Hi All Goodmorning, > > > > Please help me ,I have the requirement to assign floating IP > > automatically while creating an instance. Can please help me. > > > > -- > > Thanks & Regards, > > > > Venkat Reddy.S > > From p.aminian.server at gmail.com Mon Aug 22 10:34:50 2022 From: p.aminian.server at gmail.com (Parsa Aminian) Date: Mon, 22 Aug 2022 15:04:50 +0430 Subject: docker.repo file In-Reply-To: <2b808d03-759a-48dc-a03b-cbf3570d11fe@www.fastmail.com> References: <2b808d03-759a-48dc-a03b-cbf3570d11fe@www.fastmail.com> Message-ID: there is no option on /etc/kolla/globals.yml file for enable_docker_repo . please tell me how can i enable this flag On Mon, Aug 15, 2022 at 11:51 PM Clark Boylan wrote: > On Sun, Aug 14, 2022, at 11:33 PM, Faezeh Salali wrote: > > Hi > > on Kolla ansible victoria version in bootstrap command this error is > > displayed: > > failed to fetch key at https://download.docker.com/linux/rocky/gpg , > > error was: HTTP Error 404: Not Found > > my compute os is rocky Linux 8 and it seems there is some problem with > > the docker repo on compute and /etc/yum.repos.d/docker.repo directory. > > Please send me the link of the baseurl and gpgkey. > > Thank you. > > There is no Rocky Linux dir at https://download.docker.com/linux/. > Instead I suspect that you will want to use either the CentOS or the RHEL > packages from that location. > > > https://opendev.org/openstack/kolla-ansible/src/branch/master/doc/source/reference/deployment-and-bootstrapping/bootstrap-servers.rst#bootstrap-servers-docker-package-repos > also indicates that if you don't set enable_docker_repo then Kolla will use > the distro provided packages. That would install docker using the packages > shipped by Rocky. > > Clark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Mon Aug 22 10:53:23 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 22 Aug 2022 12:53:23 +0200 Subject: docker.repo file In-Reply-To: References: Message-ID: Support for Rocky Linux 8 was only added in Xena. See the support matrix for Victoria, Rocky Linux is not listed: https://docs.openstack.org/kolla-ansible/victoria/user/support-matrix.html You may be able to install Docker by setting docker_yum_url to " https://download.docker.com/linux/centos", but some other configuration may break. On Mon, 15 Aug 2022 at 15:30, Faezeh Salali wrote: > Hi > on Kolla ansible victoria version in bootstrap command this error is > displayed: > failed to fetch key at https://download.docker.com/linux/rocky/gpg , > error was: HTTP Error 404: Not Found > my compute os is rocky Linux 8 and it seems there is some problem with the > docker repo on compute and /etc/yum.repos.d/docker.repo directory. > Please send me the link of the baseurl and gpgkey. > Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkchn.in at gmail.com Mon Aug 22 11:11:51 2022 From: kkchn.in at gmail.com (KK CHN) Date: Mon, 22 Aug 2022 16:41:51 +0530 Subject: Kolla-ansible Xena installation Error Message-ID: I am following the documentation https://docs.openstack.org/project-deploy-guide/kolla-ansible/xena/quickstart.html#host-machine-requirements on VM with Debian 10 distro But I am getting the errors as follows . Here is my /etc/hosts whats wrong here ?? ############################################## (venv) cloud at Xena:~$ cat /etc/hosts 127.0.0.1 localhost #127.0.1.1 Xena #10.184.48.94 localhost Xena # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters # BEGIN ANSIBLE GENERATED HOSTS 10.184.48.94 Xena # END ANSIBLE GENERATED HOSTS (venv) cloud at Xena:~$ ################################################### ERROR fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "sudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1} I am trying this installation on a Virtual Machine on Debian 10 OS. (venv) cloud at Xena:~$ kolla-ansible -i ./all-in-one bootstrap-servers Bootstrapping servers : ansible-playbook -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action=bootstrap-servers /home/cloud/venv/share/kolla-ansible/ansible/kolla-host.yml --inventory ./all-in-one [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details PLAY [Gather facts for all hosts] *************************************************************************************************************************************** TASK [Gather facts] ***************************************************************************************************************************************************** ok: [localhost] TASK [Group hosts to determine when using --limit] ********************************************************************************************************************** ok: [localhost] [WARNING]: Could not match supplied host pattern, ignoring: all_using_limit_True PLAY [Gather facts for all hosts (if using --limit)] ******************************************************************************************************************** skipping: no hosts matched PLAY [Apply role baremetal] ********************************************************************************************************************************************* TASK [baremetal : include_tasks] **************************************************************************************************************************************** included: /home/cloud/venv/share/kolla-ansible/ansible/roles/baremetal/tasks/bootstrap-servers.yml for localhost TASK [baremetal : Ensure localhost in /etc/hosts] *********************************************************************************************************************** fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "sudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1} PLAY RECAP ************************************************************************************************************************************************************** localhost : ok=3 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0 Command failed ansible-playbook -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action=bootstrap-servers /home/cloud/venv/share/kolla-ansible/ansible/kolla-host.yml --inventory ./all-in-one (venv) cloud at Xena:~$ Any hints welcome. Thank you all krish -------------- next part -------------- An HTML attachment was scrubbed... URL: From iurygregory at gmail.com Mon Aug 22 11:19:09 2022 From: iurygregory at gmail.com (Iury Gregory) Date: Mon, 22 Aug 2022 08:19:09 -0300 Subject: [ironic] Ilya Etingof Passed Away Message-ID: With a heavy heart, I am sorry to announce that after a long struggle with an illness, Ilya Etingof (etingof on IRC) passed away on Wednesday, 10th August, 2022. He is survived by his mother and two daughters. Born in Russia, Ilya had moved to Czechia and in 2015 joined Red Hat. In 2017, Ilya started contributing to the Ironic project. He was the lead contributor to the Sushy project, Sushy Redfish emulator, an important contributor to Ironic and worked tirelessly to make hardware management reliable and consistent. His important contributions included boot management, out-of-band hardware inspection and virtual media boot. Outside of the OpenStack community, he has maintained Python packages with a particular focus on SNMP. For example: pysnmp, pysmi, pyasn1, softboxen and many others. Ilya was dedicated to helping younger and under-represented people. He used to run a virtualisation technologies course at the Masaryk University in Brno where he spoke about OpenStack and Kubernetes. He was also an active participant in the Outreachy program where he mentored those who contributed to Open Source projects. Goodbye Ilya from the Ironic community, may you rest in peace. -- *Att[]'s* *Iury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Ironic PTL * *Senior Software Engineer at Red Hat Brazil* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliver.weinmann at me.com Mon Aug 22 11:31:24 2022 From: oliver.weinmann at me.com (Oliver Weinmann) Date: Mon, 22 Aug 2022 13:31:24 +0200 Subject: Kolla-ansible Xena installation Error In-Reply-To: References: Message-ID: Hi, Looks like the user ?cloud? you use doesn?t have root permissions. Or at least you need to specify its password in the all-in-one yaml file. Cheers, Oliver Von meinem iPhone gesendet > Am 22.08.2022 um 13:18 schrieb KK CHN : > > ? > > > I am following the documentation > > https://docs.openstack.org/project-deploy-guide/kolla-ansible/xena/quickstart.html#host-machine-requirements > on VM with Debian 10 distro > > > But I am getting the errors as follows . Here is my /etc/hosts whats wrong here ?? > > ############################################## > (venv) cloud at Xena:~$ cat /etc/hosts > 127.0.0.1 localhost > #127.0.1.1 Xena > > #10.184.48.94 localhost Xena > # The following lines are desirable for IPv6 capable hosts > ::1 localhost ip6-localhost ip6-loopback > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > > # BEGIN ANSIBLE GENERATED HOSTS > 10.184.48.94 Xena > # END ANSIBLE GENERATED HOSTS > (venv) cloud at Xena:~$ > > ################################################### > > ERROR fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "sudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1} > > I am trying this installation on a Virtual Machine on Debian 10 OS. > > > (venv) cloud at Xena:~$ kolla-ansible -i ./all-in-one bootstrap-servers > Bootstrapping servers : ansible-playbook -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action=bootstrap-servers /home/cloud/venv/share/kolla-ansible/ansible/kolla-host.yml --inventory ./all-in-one > [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details > > PLAY [Gather facts for all hosts] *************************************************************************************************************************************** > > TASK [Gather facts] ***************************************************************************************************************************************************** > ok: [localhost] > > TASK [Group hosts to determine when using --limit] ********************************************************************************************************************** > ok: [localhost] > [WARNING]: Could not match supplied host pattern, ignoring: all_using_limit_True > > PLAY [Gather facts for all hosts (if using --limit)] ******************************************************************************************************************** > skipping: no hosts matched > > PLAY [Apply role baremetal] ********************************************************************************************************************************************* > > TASK [baremetal : include_tasks] **************************************************************************************************************************************** > included: /home/cloud/venv/share/kolla-ansible/ansible/roles/baremetal/tasks/bootstrap-servers.yml for localhost > > TASK [baremetal : Ensure localhost in /etc/hosts] *********************************************************************************************************************** > fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "sudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1} > > PLAY RECAP ************************************************************************************************************************************************************** > localhost : ok=3 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0 > > Command failed ansible-playbook -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action=bootstrap-servers /home/cloud/venv/share/kolla-ansible/ansible/kolla-host.yml --inventory ./all-in-one > (venv) cloud at Xena:~$ > > > Any hints welcome. > > Thank you all > krish -------------- next part -------------- An HTML attachment was scrubbed... URL: From fereshtehloghmani at gmail.com Mon Aug 22 12:00:29 2022 From: fereshtehloghmani at gmail.com (fereshteh loghmani) Date: Mon, 22 Aug 2022 16:30:29 +0430 Subject: evacuate vm Message-ID: hello i use kolla ansiable with wallaby version. when i evacuate server some of my vm with centos OS have error about xfs and for solve this issue i change server to rescue mode and from rescue i use command xfs_repair -l and my server will be ok and boot successfully. could you help me about this issue that why some of my vm had xfs error ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From venkat.reddybe at gmail.com Mon Aug 22 11:17:56 2022 From: venkat.reddybe at gmail.com (Venkat Reddy) Date: Mon, 22 Aug 2022 16:47:56 +0530 Subject: Floating IP In-Reply-To: <6b926e4230f20a6ecbb33d6fcae968392c7a30d7.camel@redhat.com> References: <6b926e4230f20a6ecbb33d6fcae968392c7a30d7.camel@redhat.com> Message-ID: Hi Sean Mooney and Lajos Katona, Thanks alot for your support,Accuvally we are planning to create VM through for my application (Using API and python script) after creating vm through python script we get JSON response in that response we get instance ID,through that id we can find VM IP's,for that public ID application connect to VM and do some process.This is orally idea.The application creates somany VM's Thanks & Regards, Venkat Reddy Sangam. On Mon, Aug 22, 2022 at 3:49 PM Sean Mooney wrote: > On Mon, 2022-08-22 at 09:28 +0200, Lajos Katona wrote: > > Hi, > > Perhaps you can use Heat and hot templates: > > > https://docs.openstack.org/heat/latest/template_guide/basic_resources.html#create-and-associate-a-floating-ip-to-an-instance > > heat would be one way. > nova does not support creating an instance and automatically assigining a > floating ip atomicly its also not something we plan to add in the future. > > if creating the server and addign the floating ip is not somethign that > works for you you can do it the other way > > create the neutron port and add the floating ip to it then boot the vm > with that port. > that way the first time the vms boots it will already have the floating ip > assgined to it but tis still going to require > multiple steps. > > > > > Lajos Katona (lajoskatona) > > > > Venkat Reddy ezt ?rta (id?pont: 2022. aug. > 20., > > Szo, 20:41): > > > > > Hi All Goodmorning, > > > > > > Please help me ,I have the requirement to assign floating IP > > > automatically while creating an instance. Can please help me. > > > > > > -- > > > Thanks & Regards, > > > > > > Venkat Reddy.S > > > > > -- Thanks & Regards, Venkat Reddy.S -------------- next part -------------- An HTML attachment was scrubbed... URL: From fsb4000 at yandex.ru Mon Aug 22 12:53:56 2022 From: fsb4000 at yandex.ru (Igor Zhukov) Date: Mon, 22 Aug 2022 19:53:56 +0700 Subject: [Neutron] How to add Fake ML2 extension to Neutron? Message-ID: <5523221661172836@myt6-bbc622793f1b.qloud-c.yandex.net> Hi all! Sorry for a complete noob question but I can't figure it out ? So if I want to add Fake ML2 extension what should I do? I have neutron server installed and I have the file: https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/extensions/fake_extension.py How to configure neutron server, where should I put the file, should I create another files? How can I test that it works? From katonalala at gmail.com Mon Aug 22 18:31:38 2022 From: katonalala at gmail.com (Lajos Katona) Date: Mon, 22 Aug 2022 20:31:38 +0200 Subject: [Neutron] How to add Fake ML2 extension to Neutron? In-Reply-To: <5523221661172836@myt6-bbc622793f1b.qloud-c.yandex.net> References: <5523221661172836@myt6-bbc622793f1b.qloud-c.yandex.net> Message-ID: Hi, The fake_extension is used only in unit tests to test the extension framework, i.e. : https://opendev.org/openstack/neutron/src/branch/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L37 If you would like to write an API extension check neutron-lib/api/definitions/ (and you can find the extensions "counterpart" under neutron/extensions in neutron repository) You can also check other Networking projects like networking-bgvpn, neutron-dynamic-routing to have examples of API extensions. If you have an extension under neutron/extensions and there's somebody who uses it (see [1]) you will see it is loaded in neutron servers logs (something like this: "Loaded extension: address-group") and you can find it in the output of openstack extension list --network [1]: https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L200 Best wishes Lajos Katona Igor Zhukov ezt ?rta (id?pont: 2022. aug. 22., H, 19:41): > Hi all! > Sorry for a complete noob question but I can't figure it out ? > So if I want to add Fake ML2 extension what should I do? > I have neutron server installed and I have the file: > https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/extensions/fake_extension.py > How to configure neutron server, where should I put the file, should I > create another files? How can I test that it works? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fv at spots.edu Mon Aug 22 19:47:02 2022 From: fv at spots.edu (Father Vlasie) Date: Mon, 22 Aug 2022 12:47:02 -0700 Subject: [openstack-ansible] [yoga] utility_container failure In-Reply-To: References: <9FEF486C-780F-46B2-B9A4-5DEFC215A139@spots.edu> <5C79D786-B1AF-425B-9BEE-D683FA94A907@spots.edu> <8681B9FD-D061-4D97-A46D-91FFC33AFE96@spots.edu> Message-ID: Hello, I found the problem! I had incorrect netmasks in my CIDRs. The containers were getting IPs that were not accessible from the nodes as a result. Thank you! > On Aug 17, 2022, at 7:38 PM, James Denton > wrote: > > Hello, > > > Strangely I get "ssh: connect to host infra1_repo_container-20deb465 port 22: No route to host" > > This could mean the hosts file doesn?t have an entry. It looks like the Ansible inventory has corresponding entries, so you?re probably fine there. From ?infra1?, you can try ?lxc-attach -n infra1_repo_container-20deb465? to attach to the container directly, and run those same commands mentioned earlier. For the infra2 container, you?ll want to connect from infra2 with ?lxc-attach?. > > Can you confirm if this is Yoga or Master? Also, are you running w/ Rocky Linux 8.6 (as a previous thread indicates)? TBH I have not tested that, yet, and am not sure of the gotchas. > > James Denton > Rackspace Private Cloud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fv at spots.edu Mon Aug 22 20:08:48 2022 From: fv at spots.edu (Father Vlasie) Date: Mon, 22 Aug 2022 13:08:48 -0700 Subject: [openstack-ansible][ceph][yoga] wait for all osd to be up Message-ID: <7275FB31-C4F3-4356-8C66-87F3757637BF@spots.edu> Hello everyone, I am running setup-infrastucture.yml. I have followed the ceph production example here: https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html I have set things up so the compute and storage nodes are the same machine (hyperconverged). And the storage devices are devoid of any volumes or partitions. I see the following error: ------ FAILED - RETRYING: [compute3 -> infra1_ceph-mon_container-0d679d8d]: wait for all osd to be up (1 retries left). fatal: [compute3 -> infra1_ceph-mon_container-0d679d8d(192.168.3.145)]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["ceph", "--cluster", "ceph", "osd", "stat", "-f", "json"], "delta": "0:00:00.223291", "end": "2022-08-22 19:36:29.473358", "msg": "", "rc": 0, "start": "2022-08-22 19:36:29.250067", "stderr": "", "stderr_lines": [], "stdout": "\n{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}", "stdout_lines": ["", "{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}?]} ------ I am not sure where to look to find more information. Any help would be much appreciated! Thank you, FV From christian.rohmann at inovex.de Mon Aug 22 20:32:35 2022 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Mon, 22 Aug 2022 22:32:35 +0200 Subject: [neutron] Switching the ML2 driver in-place from linuxbridge to OVN for an existing Cloud Message-ID: Hello openstack-discuss and neutron experts, I'd like to ask for your input and discussion on the idea of changing the ML2 driver for an existing cloud, read: changing a tire while still riding the bike. I actually like to find out if it's feasible to switch from the trusted Linuxbridge driver to the more modern SDN stack of OVN - in place. With all existing user networks, subnets, security groups and (running) instances in the database already.? And while I know there a push of migrating from OVS to OVN and a clear migration path documented (https://docs.openstack.org/neutron/latest/ovn/migration.html), but they are much more similar in their data plane. And to get this out of the way: I am not asking to do be able to do this without downtime, interruptions, migrations or any active orchestration of the process. I just want to know all the possible options apart from setting up a new cloud and asking folks to migrate all of their things over... 1) Are the data models of the user managed resources abstract (enough) from the ML2 used? So would the composition of a router, a network, some subnets, a few security group and a few instances in a project just result in a different instantiation of packet handling components, but be otherwise transparent to the user? 2) What could be possible migration strategies? While it might be a little early to think about the actual migration steps. But just to consider more than a full cloud shutdown following a cold start with modified neutron config then using the OVN ML2. I know there is more than just the network and getting a virtual layer 2 network. There are DHCP, DNS and the metadata service and last but not least the gateway / router and the security groups. But if using VXLAN for OVN (as Linuxbridge uses currently) could the shift from one implementation to the other potentially be done node by node? Or project by project by changing the network agents over to nodes already running OVN? If not, is there any other hot cut-over approach that would avoid having to shutdown all the instances, but only cause them some network downtime (until the next DHCP renew or similar?) Has anybody ever done something similar or heard about this being done anywhere? Thanks for your time and input, Christian From fv at spots.edu Mon Aug 22 23:53:04 2022 From: fv at spots.edu (Father Vlasie) Date: Mon, 22 Aug 2022 16:53:04 -0700 Subject: [openstack-ansible][ceph][yoga] wait for all osd to be up In-Reply-To: <7275FB31-C4F3-4356-8C66-87F3757637BF@spots.edu> References: <7275FB31-C4F3-4356-8C66-87F3757637BF@spots.edu> Message-ID: <6873C352-1364-45AD-9346-E87F1DAF177D@spots.edu> I have done a bit more searching?the error is related to the _reporting_ on the OSDs. I tried to get some info from journalctl while the infrasrtucture playbook was running and all I could see was this: Aug 22 22:11:31 compute3 python3[57496]: ansible-ceph_volume Invoked with cluster=ceph action=list objectstore=bluestore dmcrypt=False batch_devices=[] osds_per_device=1 journal_size=5120 journal_devices=[] block_db_size=-1 block_db_devices=[] wal_devices=[] report=False destroy=True data=None data_vg=None journal=None journal_vg=None db=None db_vg=None wal=None wal_vg=None crush_device_class=None osd_fsid=None osd_id=None Aug 22 22:12:01 compute3 audit[57503]: USER_ACCT pid=57503 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:accounting grantors=pam_permit acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success' Aug 22 22:12:01 compute3 audit[57503]: CRED_ACQ pid=57503 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_permit,pam_cap acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success' Aug 22 22:12:01 compute3 audit[57503]: SYSCALL arch=c000003e syscall=1 success=yes exit=1 a0=7 a1=7ffe656d1100 a2=1 a3=7fe9c3d53371 items=0 ppid=1725 pid=57503 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=1445 comm="cron" exe="/usr/sbin/cron" key=(null) Aug 22 22:12:01 compute3 audit: PROCTITLE proctitle=2F7573722F7362696E2F43524F4E002D66 Aug 22 22:12:01 compute3 CRON[57503]: pam_unix(cron:session): session opened for user root by (uid=0) The only thing that stands out to me is that there are no devices listed but in all of the openstack-ansible ceph documentation devices are never mentioned so I assume they are being detected automatically, is that right? Thank you, FV > On Aug 22, 2022, at 1:08 PM, Father Vlasie wrote: > > > Hello everyone, > > I am running setup-infrastucture.yml. I have followed the ceph production example here: https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html > > I have set things up so the compute and storage nodes are the same machine (hyperconverged). And the storage devices are devoid of any volumes or partitions. > > I see the following error: > > ------ > > FAILED - RETRYING: [compute3 -> infra1_ceph-mon_container-0d679d8d]: wait for all osd to be up (1 retries left). > fatal: [compute3 -> infra1_ceph-mon_container-0d679d8d(192.168.3.145)]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["ceph", "--cluster", "ceph", "osd", "stat", "-f", "json"], "delta": "0:00:00.223291", "end": "2022-08-22 19:36:29.473358", "msg": "", "rc": 0, "start": "2022-08-22 19:36:29.250067", "stderr": "", "stderr_lines": [], "stdout": "\n{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}", "stdout_lines": ["", "{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}?]} > > ------ > > I am not sure where to look to find more information. Any help would be much appreciated! > > Thank you, > > FV From ildiko.vancsa at gmail.com Tue Aug 23 02:56:56 2022 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Mon, 22 Aug 2022 19:56:56 -0700 Subject: [edge] OpenStack updates for an Akraino presentation Message-ID: <42610260-4CA9-4DCB-AECB-F5DC38EE63B6@gmail.com> Hi OpenStack community, I?m reaching out to you as I would like to ask for a little help and pointers for a presentation I?m building about OpenStack. I was recently asked by the Akraino community to give a presentation about recently added features and enhancements and further updates in OpenStack. Akraino is a project that is primarily focused on edge computing use cases and building blocks of end-to-end edge infrastructures. To make my presentation relevant and most informative for them, I would like to highlight any recent development or ongoing work item that is, or can be, relevant for edge use cases. If there are features in your project that you think would be good to highlight or if your project team is working on relevant work items in the current release cycle, I would really appreciate if you could share pointers to any information about them! Thanks and Best Regards, Ildik? ??? Ildik? V?ncsa Senior Manager, Community & Ecosystem Open Infrastructure Foundation From gagehugo at gmail.com Tue Aug 23 03:01:22 2022 From: gagehugo at gmail.com (Gage Hugo) Date: Mon, 22 Aug 2022 22:01:22 -0500 Subject: [openstack-helm] No Meeting Tomorrow Message-ID: Hey team, Since there is nothing on the agenda for tomorrow's meeting, it has been cancelled. We will plan on meeting next week at the usual time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Aug 23 05:47:21 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 23 Aug 2022 07:47:21 +0200 Subject: [neutron] Switching the ML2 driver in-place from linuxbridge to OVN for an existing Cloud In-Reply-To: References: Message-ID: <2446920.D5JjJbiaP6@p1> Hi, Dnia poniedzia?ek, 22 sierpnia 2022 22:32:35 CEST Christian Rohmann pisze: > Hello openstack-discuss and neutron experts, > > I'd like to ask for your input and discussion on the idea of changing > the ML2 driver for an existing cloud, read: changing a tire while still > riding the bike. > > I actually like to find out if it's feasible to switch from the trusted > Linuxbridge driver to the more modern SDN stack of OVN - in place. > With all existing user networks, subnets, security groups and (running) > instances in the database already. And while I know there a push of > migrating > from OVS to OVN and a clear migration path documented > (https://docs.openstack.org/neutron/latest/ovn/migration.html), but they > are much more similar in their data plane. > > And to get this out of the way: I am not asking to do be able to do this > without downtime, interruptions, migrations or any active orchestration > of the process. > I just want to know all the possible options apart from setting up a new > cloud and asking folks to migrate all of their things over... > > 1) Are the data models of the user managed resources abstract (enough) > from the ML2 used? > So would the composition of a router, a network, some subnets, a few > security group and a few instances in a project just result in a > different instantiation of packet handling components, > but be otherwise transparent to the user? Yes, data models are the same so all networks, routers, subnets will be the same but implemented differently by different backend. The only significant difference may be network types as OVN works mostly with Geneve tunnel networks and with LB backend You are using VXLAN IIUC your email. > > 2) What could be possible migration strategies? > > While it might be a little early to think about the actual migration > steps. But just to consider more than a full cloud shutdown following a > cold start with modified neutron config then using the OVN ML2. > I know there is more than just the network and getting a virtual layer 2 > network. There are DHCP, DNS and the metadata service and last but not > least the gateway / router and the security groups. > But if using VXLAN for OVN (as Linuxbridge uses currently) could the > shift from one implementation to the other potentially be done node by > node? Or project by project by changing the network agents over > to nodes already running OVN? Even if You will keep vxlan networks with OVN backend (support is kind of limited really) You will not be able to have tunnels established between nodes with different backends so there will be no connectivity between VMs on hosts with different backends. > > If not, is there any other hot cut-over approach that would avoid having > to shutdown all the instances, but only cause them some network downtime > (until the next DHCP renew or similar?) TBH I don't know about any way to do that. We never tried and tested migration from LB to OVN backend. The only currently supported migration is from ML2/OVS to ML2/OVN backend and it depends on the Tripleo framework. > > > Has anybody ever done something similar or heard about this being done > anywhere? I don't know about anyone who did that but if there is someone, I would be happy to hear about how it was done and how it went :) > > > > Thanks for your time and input, > > > Christian > > > > -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From gilles.mocellin at nuagelibre.org Tue Aug 23 08:36:19 2022 From: gilles.mocellin at nuagelibre.org (Gilles Mocellin) Date: Tue, 23 Aug 2022 10:36:19 +0200 Subject: [glance][cinder] Glance cinder backend or image cache ? Message-ID: Hello ! We're planning a new cluster without Ceph storage. We will use an iSCSI pureStorage array for all storage except object storage. PureStorage provide only a cinder driver. So we will need to user boot on volume nova instances. My question is about glance. PureStorage speak about Image Cache since Liberty: https://support.purestorage.com/Solutions/OpenStack/z_Legacy_OpenStack_Reference/OpenStack%C2%AE_Liberty%3A_A_Look_at_the_Glance_Image-Cache_for_Cinder But I wonder if using the cinder backend of glance is a better, simpler, transparent option ? From fsb4000 at yandex.ru Tue Aug 23 06:43:56 2022 From: fsb4000 at yandex.ru (Igor Zhukov) Date: Tue, 23 Aug 2022 13:43:56 +0700 Subject: [Neutron] How to add Fake ML2 extension to Neutron? In-Reply-To: References: <5523221661172836@myt6-bbc622793f1b.qloud-c.yandex.net> Message-ID: <1407401661237036@myt6-4218ece6190d.qloud-c.yandex.net> Hi. Thank you for the answers! I didn't know about `openstack extension list --network` and now I saved the command Yes, I saw networking-bgvpn and other Neutron projects and the blog: http://control-that-vm.blogspot.com/2014/07/understanding-pre-requisites-of-ml2.html?view=sidebar but I need a simple example. So I want to understand how this test plugin works. So I need to add `extension_drivers = neutron.tests.unit.plugins.ml2.drivers.ext_test:TestExtensionDriver` to /etc/neutron/plugins/ml2/ml2_conf.ini if I want to try the extension driver, right? > Hi,The fake_extension is used only in unit tests to test the extension framework, i.e. : > https://opendev.org/openstack/neutron/src/branch/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L37 > > If you would like to write an API extension check neutron-lib/api/definitions/ (and you can find the extensions "counterpart" under neutron/extensions in neutron repository) > > You can also check other Networking projects like networking-bgvpn, neutron-dynamic-routing to have examples of API extensions. > If you have an extension under neutron/extensions and there's somebody who uses it (see [1]) you will see it is loaded in neutron servers logs (something like this: "Loaded extension: address-group") and you can find it in the output of openstack extension list --network > > [1]: https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L200 > > Best wishes > Lajos Katona > > Igor Zhukov ezt ?rta (id?pont: 2022. aug. 22., H, 19:41): > >> Hi all! >> >> Sorry for a complete noob question but I can't figure it out ? >> >> So if I want to add Fake ML2 extension what should I do? >> >> I have neutron server installed and I have the file: https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/extensions/fake_extension.py >> >> How to configure neutron server, where should I put the file, should I create another files? How can I test that it works? From ruslanas at lpic.lt Tue Aug 23 13:55:24 2022 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 23 Aug 2022 16:55:24 +0300 Subject: [TripleO] Adding Undercloud and overcloud container images to Foreman Message-ID: Hi all, Have anyone tried adding a container images registry from quay.io to the foreman installation? So far what I managed to achieve is - adding each image one by one, not a registry itself. I add it over Content > Products > (Added Product docker-registries) > Repositories > New Repository > Name > Type (select docker) > Upstream URL: quay.io > Upstream repository (tripleoussuri or tripleowallaby or any other) fails, it works only if I specify full image: tripleoussuri/centos-binary-heat-engine So my question, is there any way to copy whole repo using Foreman? Maybe I just go in incorrect place? Thanks. Regards -- Ruslanas G?ibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From fsb4000 at yandex.ru Tue Aug 23 14:04:23 2022 From: fsb4000 at yandex.ru (Igor Zhukov) Date: Tue, 23 Aug 2022 21:04:23 +0700 Subject: [Neutron] How to add Fake ML2 extension to Neutron? In-Reply-To: References: <5523221661172836@myt6-bbc622793f1b.qloud-c.yandex.net> Message-ID: <2183551661263463@myt5-b646bde4b8f3.qloud-c.yandex.net> Hi again! Do you know how to debug ML2 extension drivers? I created folder with two python files: vpc/extensions/vpc.py and vpc/plugins/ml2/drivers/vpc.py (also empty __init__.py files) I added to neuron.conf api_extensions_path = /path/to/vpc/extensions and I added to ml2_ini.conf extension_drivers = port_security, vpc.plugins.ml2.drivers.vpc:VpcExtensionDriver and my neutron.server.log has: INFO neutron.plugins.ml2.managers [-] Configured extension driver names: ['port_security', 'vpc_neutron.plugins.ml2.drivers.vpc:VpcExtensionDriver'] WARNING stevedore.named [-] Could not load vpc_neutron.plugins.ml2.drivers.vpc:VpcExtensionDriver .... INFO neutron.api.extensions [req-fd226631-b0cd-4ff8-956b-9470e7f26ebe - - - - -] Extension vpc_extension not supported by any of loaded plugins How can I find why the extension driver could not be loaded? > Hi,The fake_extension is used only in unit tests to test the extension framework, i.e. : > https://opendev.org/openstack/neutron/src/branch/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L37 > > If you would like to write an API extension check neutron-lib/api/definitions/ (and you can find the extensions "counterpart" under neutron/extensions in neutron repository) > > You can also check other Networking projects like networking-bgvpn, neutron-dynamic-routing to have examples of API extensions. > If you have an extension under neutron/extensions and there's somebody who uses it (see [1]) you will see it is loaded in neutron servers logs (something like this: "Loaded extension: address-group") and you can find it in the output of openstack extension list --network > > [1]: https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L200 > > Best wishes > Lajos Katona > > Igor Zhukov ezt ?rta (id?pont: 2022. aug. 22., H, 19:41): > >> Hi all! >> >> Sorry for a complete noob question but I can't figure it out ? >> >> So if I want to add Fake ML2 extension what should I do? >> >> I have neutron server installed and I have the file: https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/extensions/fake_extension.py >> >> How to configure neutron server, where should I put the file, should I create another files? How can I test that it works? From simon at purestorage.com Tue Aug 23 15:27:44 2022 From: simon at purestorage.com (Simon Dodsley) Date: Tue, 23 Aug 2022 11:27:44 -0400 Subject: [glance][cinder] Glance cinder backend or image cache ? Message-ID: > Hello ! > > We're planning a new cluster without Ceph storage. > We will use an iSCSI pureStorage array for all storage except object > storage. > > PureStorage provide only a cinder driver. > So we will need to user boot on volume nova instances. > > My question is about glance. > PureStorage speak about Image Cache since Liberty: > https://support.purestorage.com/Solutions/OpenStack/z_Legacy_OpenStack_Reference/OpenStack%C2%AE_Liberty%3A_A_Look_at_the_Glance_Image-Cache_for_Cinder > > But I wonder if using the cinder backend of glance is a better, simpler, > transparent option ? The Image Cache you referenced is not specific to Pure. It is a common Cinder feature that many backend can leverage. You can use both the image cache and cinder as a glance store as well but they are redundant. It is best to pick one or the other. It depends on the images format you are using. For example if you are using QCOW2 as the format the glance cinder backend cannot optimize it, but the image cache can optimize. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Tue Aug 23 15:37:38 2022 From: satish.txt at gmail.com (Satish Patel) Date: Tue, 23 Aug 2022 11:37:38 -0400 Subject: ovn-bgp-agent installation issue In-Reply-To: References: <95C66BE4-2944-45C6-A3C4-EBFE1FECDD25@gmail.com> Message-ID: Hi Luis, /cc - openstack discuss mailing list Thank you so much for clearing my doubts. I have a counter question. BGP Mode: - In bgp mode where i should run ovn-bgp-agent ? Network node or Compute node? EVPN Mode: - In evpn mode where i should run ovn-bgp-agent? - In my lab i am using your heck "How to use it without BGPVPN" to create vni mapping, do you think because of that i am not seeing vni pushed out to FRR automatically? https://ltomasbo.wordpress.com/2021/06/25/openstack-networking-with-evpn/ - In your demo link I can see you have bridge two different deployment clouds using the same VNI, In this scenario we need a router (or vrf) somewhere correct to bridge to different subnet IPs, correct? or is this L2 stretch? I am going to open a new thread to discuss issues related to expose_tenant_networks=True flag issue. On Tue, Aug 23, 2022 at 3:14 AM Luis Tomas Bolivar wrote: > Hi Satish! See inline > > On Mon, Aug 22, 2022 at 5:22 PM Satish Patel wrote: > >> Hi Luis, >> >> Welcome back from your vacation, sorry i didn't know you are on vacation. >> Hope you had a wonderful time. >> >> How do i start conversion on opendev thread? is there a mailing list or >> are you saying open thread here - >> https://storyboard.openstack.org/#!/project/x/ovn-bgp-agent >> > > Actually, you did it right (sending it to openstack-discuss), but it seems > at some point we stop adding it on our replies... probably my fault due to > replying on the phone while on vacation. > > For the next issue we can open a different thread and I'll try not to drop > the openstack-discuss list! xD > > It would be great though if you can sent a follow up to the previous > thread with the final outcome (fixed by updating, or adding config X, ..), > so that community can see the project is alive, and other people facing > similar issues can try your solution/config. > >> I have an update for you, after upgrading the OVN/OVS version to the >> latest and that resolved lots of issues. As you mentioned earlier running >> the latest code is very important to get proper functionality. >> > > Awesome! Yeah, it is a new project, so functionality/fixes are being added > regularly, therefore using the latest version is the best approach. The > plan is to soon create a more stable version/release, once we have enough > testing coverage and main functionality covered > >> >> You are saying in BGP mode only FIP will get exposed but not the tenant >> VM ( What is this flag for expose_tenant_network=True ?) >> > > Main idea was: > - BGP mode for exposing VMs/LBs on provider networks, or with FIPs attached > - EVPN mode for tenant networks > > To give you some more context, we started with the BGP mode, and did the > expose_tenant_network to expose also tenant IPs. However, the BGP mode is > lacking an API to decide what to expose. So we moved to the EVPN mode (with > networking-bgpvpn as the API) to expose the tenant networks. > > That said, you are right, and the expose_tenant_network flag is intended > for the BGP mode, to expose all tenant networks (so you need to ensure no > overlapping CIDRs). And, if that is not working (it may be, as there is no > testing coverage for it), it is something we can definitely look at and > fix. So, feel free to open a new thread on openstack-discuss, or bug in > storyboard (whatever works for you better), and I'll try to fix asap. > > >> >> You are saying in EVPN mode only tenant VM ip will get exposed but not >> FIP but in this design what is the use of getting tenant VM exposed but you >> can't expose FIP then how does external people access VMs? ( I am totally >> confused in EVPN design and use case because your tenant VM is already >> talking to each other using geneve tunnel so why do we need EVPN/L2VPN to >> expose them?) >> > > EVPN mode is indeed only for tenant VMs (and loadbalancer). Idea for this > mode is to connect (N/S traffic, as you mention E/W is still using the > normal geneve path) tenant networks between different OpenStack clusters. > EVPN will create a vxlan tunnel connecting them. So, the vni/vxlan id > selected (with networking-bgpvpn) is the vxlan tunnel encap id being used > to connect both tenant networks. > > So, the idea behind the EVPN mode is not to make your VMs in a tenant > network publicly accessible, but being able to access them from a different > tenant network in a different cloud (it actually does not need to be an > OpenStack cloud, anything connected to the same vxlan id. Perhaps this is a > bit more clear in this demo: > https://ltomasbo.wordpress.com/2021/10/01/ovn-bgp-agent-interconnecting-kubernetes-pods-and-openstack-tenant-vms-with-evpn/ > > >> >> This is current status of my LAB >> >> ovn-bgp-agent in BGP Mode: >> I am successfully able to expose FIP and provider IPs (as you mentioned) >> but when i use expose_tenant_network=True in config then getting error and >> vm tenant ips not getting exposed to bgp so i am not sure what is the use >> case of that flag. >> > > Yep, most probably we broke something here due to lack of upstream testing > for this flag. I'll check asap and try to fix it > >> >> ovn-bgp-agent in EVPN Mode: >> In this mode everything is working and tenant vm also gets exposed but >> when I attach FIP to vm those FIP ip are not getting exposed in BGP (as you >> mentioned in your reply). >> > Yes, EVPN mode is not intended for FIPs or VMs on provider networks. Only > for tenants > > >> But i am seeing one bug here where vni config not getting inserted in FRR >> https://opendev.org/x/ovn-bgp-agent/src/branch/master/ovn_bgp_agent/drivers/openstack/utils/frr.py#L26 >> >> >> Who will trigger that code and when will that code get triggered to >> configure FRR for vrf/vni ? >> > > This is being added by networking-bgpvpn. You can see some details about > the integration into my blogpost ( > https://ltomasbo.wordpress.com/2021/06/25/openstack-networking-with-evpn/) > or in the upstream documentation: > https://opendev.org/x/ovn-bgp-agent/src/branch/master/doc/source/contributor/evpn_mode_design.rst > > Note though, to make networking-bgpvpn to work with this integration, this > patch is needed (which btw I should include into the documentation): > https://review.opendev.org/c/openstack/networking-bgpvpn/+/803161 > > Cheers, > Luis > > >> >> >> >> >> >> On Mon, Aug 22, 2022 at 5:44 AM Luis Tomas Bolivar >> wrote: >> >>> Hi Satish, >>> >>> Sorry I was on vacation. Trying to get through my inbox. >>> >>> I'm not sure what is the current status, note there is a difference >>> between EVPN and BGP mode. BGP mode is for FIPs and VMs/LBs on the provider >>> network or with FIPs, while EVPN is for VMs on the tenant networks, without >>> FIPs. Right now you cannot mix them and need to choose either EVPN or BGP >>> mode. I'm planning to give it a try to make it multidriver, but I haven't >>> even started yet. >>> >>> BTW, as you did with the initial email, perhaps it is worth to have >>> conversations on the opendev thread, so that anyone else facing the same >>> issues can get some hints (or even provide feedback), that was one of the >>> main ideas about moving it in there. To be able to have better support. >>> >>> As for the questions: >>> # # on rack-2-host-1 >>> # when i created vm1 which endup on rack-2-host-1 but it doesn't expose >>> the vm ip address. Is that normal behavior? >>> >>> This should be exposed in the node with cr-lrp port, i.e., rack1-host2, >>> and it should be as simple as adding the IP to the lo-2001 dummy device >>> >>> # When I attach a floating ip to vm2 then why does my floating ip >>> address not get exposed in BGP? >>> This is what I mentioned before, you cannot merge BGP and EVPN mode. >>> >>> On Mon, Aug 8, 2022 at 1:41 PM Satish Patel >>> wrote: >>> >>>> Hi Luis, >>>> >>>> Sorry for bugging. I?m almost there now. Do you have any thought of >>>> following question >>>> >>>> Sent from my iPhone >>>> >>>> On Aug 5, 2022, at 1:16 AM, Satish Patel wrote: >>>> >>>> ? >>>> Good morning Luis, >>>> >>>> Quick question, I have following deployment as per your lab >>>> >>>> rack-1-host-1 (controller) >>>> rack-1-host-2 (compute1 - This is hosting cr-lrp ports, inshort router) >>>> rack-2-host-1 (compute2) >>>> >>>> I have created two vms >>>> >>>> vagrant at rack-1-host-1:~$ nova list >>>> nova CLI is deprecated and will be a removed in a future release >>>> >>>> +--------------------------------------+------+--------+------------+-------------+--------------------------------------+ >>>> | ID | Name | Status | Task State | >>>> Power State | Networks | >>>> >>>> +--------------------------------------+------+--------+------------+-------------+--------------------------------------+ >>>> | aecb4f10-c46f-4551-b112-44e4dc007e88 | vm1 | ACTIVE | - | >>>> Running | private-test=10.0.0.105 | >>>> | ceae14b9-70c2-4dbc-8071-0d64d9a0ca84 | vm2 | ACTIVE | - | >>>> Running | private-test=10.0.0.86, 172.16.1.200 | >>>> >>>> +--------------------------------------+------+--------+------------+-------------+--------------------------------------+ >>>> >>>> # on rack-1-host-2 >>>> when i spun up vm2 which endup on rack-1-host-2 hence it created >>>> vrf-2001 on dummy lo-2001 interface and exposed vm2 ip address >>>> 10.0.0.86/32 >>>> >>>> 96: vrf-2001: mtu 65575 qdisc noqueue state >>>> UP group default qlen 1000 >>>> link/ether 22:cc:25:b3:7b:96 brd ff:ff:ff:ff:ff:ff >>>> 97: br-2001: mtu 1500 qdisc noqueue >>>> master vrf-2001 state UP group default qlen 1000 >>>> link/ether 0a:c3:23:7a:8f:0c brd ff:ff:ff:ff:ff:ff >>>> inet6 fe80::851:67ff:fe64:b2c3/64 scope link >>>> valid_lft forever preferred_lft forever >>>> 98: vxlan-2001: mtu 1500 qdisc >>>> noqueue master br-2001 state UNKNOWN group default qlen 1000 >>>> link/ether 0a:c3:23:7a:8f:0c brd ff:ff:ff:ff:ff:ff >>>> inet6 fe80::8c3:23ff:fe7a:8f0c/64 scope link >>>> valid_lft forever preferred_lft forever >>>> 99: lo-2001: mtu 1500 qdisc noqueue >>>> master vrf-2001 state UNKNOWN group default qlen 1000 >>>> link/ether d6:60:da:91:2e:6d brd ff:ff:ff:ff:ff:ff >>>> inet 10.0.0.86/32 scope global lo-2001 >>>> valid_lft forever preferred_lft forever >>>> inet6 fe80::d460:daff:fe91:2e6d/64 scope link >>>> valid_lft forever preferred_lft forever >>>> >>>> >>>> # on rack-2-host-1 >>>> when i created vm1 which endup on rack-2-host-1 but it doesn't expose >>>> the vm ip address. Is that normal behavior? >>>> >>>> When I attach a floating ip to vm2 then why does my floating ip address >>>> not get exposed in BGP? >>>> >>>> Thank you in advance >>>> >>>> >>>> On Thu, Aug 4, 2022 at 2:30 PM Satish Patel >>>> wrote: >>>> >>>>> Update: Good news, I found what was wrong. >>>>> >>>>> After adding AS it works. In your doc you only added VNI but look like >>>>> it is required to add BGP AS. Or may be your doc is little older and new >>>>> code required AS. >>>>> >>>>> vagrant at rack-1-host-1:~$ sudo ovn-nbctl set logical_switch_port >>>>> c32dcd90-7820-44bd-894f-416e44b36aa0 external_ids:"neutron_bgpvpn\:as"=64999 >>>>> vagrant at rack-1-host-1:~$ sudo ovn-nbctl set logical_switch_port >>>>> f55c1d1e-4b5b-4d8c-b922-3ad4a9700c81 external_ids:"neutron_bgpvpn\:as"=64999 >>>>> >>>>> Now i can see it created VRF and exposed VM tenant ip in lo-2001 >>>>> >>>>> 73: vrf-2001: mtu 65575 qdisc noqueue state >>>>> UP group default qlen 1000 >>>>> link/ether b2:29:65:3b:ac:db brd ff:ff:ff:ff:ff:ff >>>>> 74: br-2001: mtu 1500 qdisc noqueue >>>>> master vrf-2001 state UP group default qlen 1000 >>>>> link/ether d6:0d:61:2d:b7:29 brd ff:ff:ff:ff:ff:ff >>>>> inet6 fe80::6830:47ff:febc:b10b/64 scope link >>>>> valid_lft forever preferred_lft forever >>>>> 75: vxlan-2001: mtu 1500 qdisc >>>>> noqueue master br-2001 state UNKNOWN group default qlen 1000 >>>>> link/ether d6:0d:61:2d:b7:29 brd ff:ff:ff:ff:ff:ff >>>>> inet6 fe80::d40d:61ff:fe2d:b729/64 scope link >>>>> valid_lft forever preferred_lft forever >>>>> 76: lo-2001: mtu 1500 qdisc noqueue >>>>> master vrf-2001 state UNKNOWN group default qlen 1000 >>>>> link/ether fe:a1:a0:87:76:7c brd ff:ff:ff:ff:ff:ff >>>>> inet 10.0.0.83/32 scope global lo-2001 >>>>> valid_lft forever preferred_lft forever >>>>> inet6 fe80::fca1:a0ff:fe87:767c/64 scope link >>>>> >>>>> >>>>> I am continuing doing testing and seeing if I hit any other bug.. >>>>> >>>>> On Thu, Aug 4, 2022 at 11:58 AM Satish Patel >>>>> wrote: >>>>> >>>>>> Luis, >>>>>> >>>>>> I am following your doc, tell me if that doc is outdated or not >>>>>> https://ltomasbo.wordpress.com/2021/06/25/openstack-networking-with-evpn/ >>>>>> >>>>>> I have upgraded pyroute2 to 0.7.2 but still getting the same error, >>>>>> if you look at carefully following logs you will see an agent saying I >>>>>> can't find VNI but it's there, i can see in ovn-nbctl list >>>>>> logical_switch_port. >>>>>> >>>>>> pyroute2 0.7.2 >>>>>> pyroute2.core 0.6.13 >>>>>> pyroute2.ethtool 0.6.13 >>>>>> pyroute2.ipdb 0.6.13 >>>>>> pyroute2.ipset 0.6.13 >>>>>> pyroute2.ndb 0.6.13 >>>>>> pyroute2.nftables 0.6.13 >>>>>> pyroute2.nslink 0.6.13 >>>>>> >>>>>> >>>>>> ovn-bgp-agent logs >>>>>> >>>>>> 2022-08-04 15:52:52.898 396714 DEBUG >>>>>> ovn_bgp_agent.drivers.openstack.utils.ovn [-] Either "neutron_bgpvpn:vni" >>>>>> or "neutron_bgpvpn:as" were not found or have an invalid value in the port >>>>>> f55c1d1e-4b5b-4d8c-b922-3ad4a9700c81 external_ids {'neutron:cidrs': ' >>>>>> 172.16.1.132/24', 'neutron:device_id': >>>>>> '43b2c756-d92c-4fe5-b17b-32d5ab9b1f37', 'neutron:device_owner': >>>>>> 'network:router_gateway', 'neutron:network_name': >>>>>> 'neutron-0d82e6b0-bf9f-484d-85bd-0ba38aab508d', 'neutron:port_name': '', >>>>>> 'neutron:project_id': '', 'neutron:revision_number': '4', >>>>>> 'neutron:security_group_ids': '', 'neutron_bgpvpn:vni': '2001'} >>>>>> get_evpn_info >>>>>> /usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/drivers/openstack/utils/ovn.py:251 >>>>>> >>>>>> 2022-08-04 15:52:52.899 396714 DEBUG >>>>>> ovn_bgp_agent.drivers.openstack.ovn_evpn_driver [-] No EVPN information for >>>>>> CR-LRP Port with IPs ['172.16.1.132/24']. Not exposing it. >>>>>> _expose_ip >>>>>> /usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/drivers/openstack/ovn_evpn_driver.py:220 >>>>>> >>>>>> >>>>>> In OVN i can see vni number >>>>>> >>>>>> _uuid : 3473d0ce-1348-4423-a2dc-6d4df4c06f74 >>>>>> addresses : [router] >>>>>> dhcpv4_options : [] >>>>>> dhcpv6_options : [] >>>>>> dynamic_addresses : [] >>>>>> enabled : true >>>>>> external_ids : {"neutron:cidrs"="172.16.1.132/24", >>>>>> "neutron:device_id"="43b2c756-d92c-4fe5-b17b-32d5ab9b1f37", >>>>>> "neutron:device_owner"="network:router_gateway", >>>>>> "neutron:network_name"=neutron-0d82e6b0-bf9f-484d-85bd-0ba38aab508d, >>>>>> "neutron:port_name"="", "neutron:project_id"="", >>>>>> "neutron:revision_number"="4", "neutron:security_group_ids"="", >>>>>> "neutron_bgpvpn:vni"="2001"} >>>>>> ha_chassis_group : [] >>>>>> name : "f55c1d1e-4b5b-4d8c-b922-3ad4a9700c81" >>>>>> options : {exclude-lb-vips-from-garp="true", >>>>>> nat-addresses=router, router-port=lrp-f55c1d1e-4b5b-4d8c-b922-3ad4a9700c81} >>>>>> parent_name : [] >>>>>> port_security : [] >>>>>> tag : [] >>>>>> tag_request : [] >>>>>> type : router >>>>>> up : true >>>>>> >>>>>> >>>>>> On Thu, Aug 4, 2022 at 11:39 AM Satish Patel >>>>>> wrote: >>>>>> >>>>>>> Hi Luis, >>>>>>> >>>>>>> This is what I have installed on the compute and controller nodes. >>>>>>> >>>>>>> pyroute2 0.6.13 >>>>>>> pyroute2.core 0.6.13 >>>>>>> pyroute2.ethtool 0.6.13 >>>>>>> pyroute2.ipdb 0.6.13 >>>>>>> pyroute2.ipset 0.6.13 >>>>>>> pyroute2.ndb 0.6.13 >>>>>>> pyroute2.nftables 0.6.13 >>>>>>> pyroute2.nslink 0.6.13 >>>>>>> >>>>>>> On Wed, Aug 3, 2022 at 10:10 AM Luis Tomas Bolivar < >>>>>>> ltomasbo at redhat.com> wrote: >>>>>>> >>>>>>>> What version of pyroute2 are you using? Maybe it is related to some >>>>>>>> bug in there in an old version (I hit quite a few). You can try to upgrade >>>>>>>> that. Also, as it is on the resync, you can also disable that by setting a >>>>>>>> very long time in there. >>>>>>>> >>>>>>>> >>>>>>>> On Wednesday, August 3, 2022, Satish Patel >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Any thoughts Luis. Thanks >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>> On Jul 27, 2022, at 11:53 AM, Satish Patel >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> ? >>>>>>>>> Luis, >>>>>>>>> >>>>>>>>> If you look at this logs - >>>>>>>>> https://paste.opendev.org/show/buRbY415guvHFUtSapFK/ >>>>>>>>> >>>>>>>>> I am able to expose the tenant ip when I set "expose_tenant_networks=True" >>>>>>>>> in the ovn-bgp-agent.conf file. But interestingly it exposes ip for the >>>>>>>>> first time but when i delete vm and re-create new vm then its throws >>>>>>>>> following error which you can see full track in above link >>>>>>>>> >>>>>>>>> ERROR oslo_service.periodic_task [-] Error during BGPAgent.sync: KeyError: 'object does not exists' >>>>>>>>> >>>>>>>>> >>>>>>>>> I have just 1x controller and 1x compute machine at present. As >>>>>>>>> you said something is broken when setting up "expose_tenant_networks=True" >>>>>>>>> Let me know if you need more info or logs etc. i will continue to poke and >>>>>>>>> see if anything else I can find. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 27, 2022 at 10:35 AM Luis Tomas Bolivar < >>>>>>>>> ltomasbo at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Satish,sorry for the delay in replies, I'm on vacations and >>>>>>>>>> have limited connectivity. >>>>>>>>>> >>>>>>>>>> See soon comments/replies inline >>>>>>>>>> >>>>>>>>>> On Wed, Jul 27, 2022 at 4:58 AM Satish Patel < >>>>>>>>>> satish.txt at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Luis, >>>>>>>>>>> >>>>>>>>>>> Just checking incase you missed my last email. If you are on >>>>>>>>>>> vacation then ignore it :) Thank you. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Jul 23, 2022 at 12:27 AM Satish Patel < >>>>>>>>>>> satish.txt at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Luis, >>>>>>>>>>>> >>>>>>>>>>>> As you suggested, checkout 5 commits back to avoid the >>>>>>>>>>>> Load_Balancer issue that made good progress. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Great to hear! >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> But now i am seeing very odd behavior where floating IP can get >>>>>>>>>>>> exposed to BGP but when i am trying to expose VM Tenant IP then I get a >>>>>>>>>>>> strange error. Here is the full logs output; >>>>>>>>>>>> https://paste.opendev.org/show/buRbY415guvHFUtSapFK/ >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> I was using the evpn mode for the tenant networks, so it has been >>>>>>>>>> a while since I tested the vm tenant ip with plain bgp. Perhaps we broke it >>>>>>>>>> with some of the new features/reshapes. >>>>>>>>>> >>>>>>>>>> Regarding the error, two things: >>>>>>>>>> - Do you have several hosts or just one? The VM IP should be >>>>>>>>>> exposed in the host holding the ovn cr-lrp port (ovn router gateway port >>>>>>>>>> connecting the router to the provider network). >>>>>>>>>> - The error there seems to be related to the re-sync task. I've >>>>>>>>>> hit some issues with pyroute2 in the past, please be sure to use a recent >>>>>>>>>> enough version (minimum supported is 0.6.4) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jul 22, 2022 at 8:55 AM Satish Patel < >>>>>>>>>>>> satish.txt at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Luis, >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you for reply, >>>>>>>>>>>>> >>>>>>>>>>>>> Let me tell you that your blog is wonderful. I have used your >>>>>>>>>>>>> method to install frr without any docker container. >>>>>>>>>>>>> >>>>>>>>>>>>> What is the workaround here? Can I tell ovn-bgp-agent to not >>>>>>>>>>>>> look for container of frr? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> You just need to point where the frr socket is on the host. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> I have notice one more thing that it?s trying to run ?copy >>>>>>>>>>>>> /tmp/blah running-config? but that command isn?t supported. >>>>>>>>>>>>> >>>>>>>>>>>>> I have tried to run copy command manually on vtysh shell to >>>>>>>>>>>>> see what options are available for copy command but there is only one >>>>>>>>>>>>> command available with copy which is ?copy running-config startup-config? >>>>>>>>>>>>> do you think I have wrong version of frr running? I?m running 7.2. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> I think the minimum version I tried was either 7.4 or 7.5. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> Please let me know if any workaround here. Thank you >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> No easy workaround for this, as it is relying on that way to >>>>>>>>>> reconfigure the frr.conf >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Luis >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> Sent from my iPhone >>>>>>>>>>>>> >>>>>>>>>>>>> On Jul 22, 2022, at 2:41 AM, Luis Tomas Bolivar < >>>>>>>>>>>>> ltomasbo at redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> ? >>>>>>>>>>>>> Hi Satish, >>>>>>>>>>>>> >>>>>>>>>>>>> The one to use should be https://opendev.org/x/ovn-bgp-agent. >>>>>>>>>>>>> The one on my personal github repo was the initial PoC for it. But the >>>>>>>>>>>>> opendev one is the upstream effort to develop it, and is the one being >>>>>>>>>>>>> maintained/updated. >>>>>>>>>>>>> >>>>>>>>>>>>> Looking at your second logs, it seems you are missing FRR (and >>>>>>>>>>>>> its shell, vtysh) in the node. >>>>>>>>>>>>> >>>>>>>>>>>>> Actually, thinking about this: >>>>>>>>>>>>> "Unexpected error while running command. >>>>>>>>>>>>> Command: /usr/bin/vtysh --vty_socket /run/frr/ -c copy >>>>>>>>>>>>> /tmp/tmpiz5s_wvs running-config" >>>>>>>>>>>>> >>>>>>>>>>>>> The ovn-bgp-agent has been developed with "deploying on >>>>>>>>>>>>> containers" in mind, meaning it is assuming there is a frr container >>>>>>>>>>>>> running, and the container running the agent is trying to connect to the >>>>>>>>>>>>> same socket so that it can run the vtysh commands. Perhaps in your case the >>>>>>>>>>>>> frr socket is in a different location than /run/frr/ >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jul 22, 2022 at 6:27 AM Satish Patel < >>>>>>>>>>>>> satish.txt at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Folks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am trying to create lab of of ovn-bgp-agent using this blog >>>>>>>>>>>>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> So far everything went well but I'm stuck at the bgp-agent >>>>>>>>>>>>>> installation and I encounter following error when running bgp-agent. >>>>>>>>>>>>>> Any suggestions? >>>>>>>>>>>>>> >>>>>>>>>>>>>> root at rack-1-host-2:/home/vagrant/bgp-agent# bgp-agent >>>>>>>>>>>>>> 2022-07-22 04:02:39.123 111551 INFO bgp_agent.config [-] >>>>>>>>>>>>>> Logging enabled! >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 CRITICAL bgp-agent [-] >>>>>>>>>>>>>> Unhandled error: AssertionError >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent Traceback >>>>>>>>>>>>>> (most recent call last): >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/bin/bgp-agent", line 10, in >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>> sys.exit(start()) >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/agent.py", line 76, in >>>>>>>>>>>>>> start >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>> bgp_agent_launcher = service.launch(config.CONF, BGPAgent()) >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/agent.py", line 44, in >>>>>>>>>>>>>> __init__ >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>> self.agent_driver = driver_api.AgentDriverBase.get_instance( >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/platform/driver_api.py", >>>>>>>>>>>>>> line 25, in get_instance >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>> agent_driver = stevedore_driver.DriverManager( >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/stevedore/driver.py", line 54, in >>>>>>>>>>>>>> __init__ >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>> super(DriverManager, self).__init__( >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/stevedore/named.py", line 78, in >>>>>>>>>>>>>> __init__ >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent extensions >>>>>>>>>>>>>> = self._load_plugins(invoke_on_load, >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/stevedore/extension.py", line 221, >>>>>>>>>>>>>> in _load_plugins >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent ext = >>>>>>>>>>>>>> self._load_one_plugin(ep, >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/stevedore/named.py", line 156, in >>>>>>>>>>>>>> _load_one_plugin >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent return >>>>>>>>>>>>>> super(NamedExtensionManager, self)._load_one_plugin( >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/stevedore/extension.py", line 257, >>>>>>>>>>>>>> in _load_one_plugin >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent obj = >>>>>>>>>>>>>> plugin(*invoke_args, **invoke_kwds) >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/platform/osp/ovn_bgp_driver.py", >>>>>>>>>>>>>> line 64, in __init__ >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>> self._sb_idl = ovn.OvnSbIdl( >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/platform/osp/utils/ovn.py", >>>>>>>>>>>>>> line 62, in __init__ >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>> super(OvnSbIdl, self).__init__( >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/platform/osp/utils/ovn.py", >>>>>>>>>>>>>> line 31, in __init__ >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>> super(OvnIdl, self).__init__(remote, schema) >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 283, in >>>>>>>>>>>>>> __init__ >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent schema = >>>>>>>>>>>>>> schema_helper.get_idl_schema() >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 2323, in >>>>>>>>>>>>>> get_idl_schema >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>> self._keep_table_columns(schema, table, columns)) >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 2330, in >>>>>>>>>>>>>> _keep_table_columns >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent assert >>>>>>>>>>>>>> table_name in schema.tables >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent AssertionError >>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> After googling I found one more agent at >>>>>>>>>>>>>> https://opendev.org/x/ovn-bgp-agent and its also throwing an >>>>>>>>>>>>>> error. Which agent should I be using? >>>>>>>>>>>>>> >>>>>>>>>>>>>> root at rack-1-host-2:~# ovn-bgp-agent >>>>>>>>>>>>>> 2022-07-22 04:04:36.780 111761 INFO ovn_bgp_agent.config [-] >>>>>>>>>>>>>> Logging enabled! >>>>>>>>>>>>>> 2022-07-22 04:04:37.247 111761 INFO ovn_bgp_agent.agent [-] >>>>>>>>>>>>>> Service 'BGPAgent' stopped >>>>>>>>>>>>>> 2022-07-22 04:04:37.248 111761 INFO ovn_bgp_agent.agent [-] >>>>>>>>>>>>>> Service 'BGPAgent' starting >>>>>>>>>>>>>> 2022-07-22 04:04:37.248 111761 INFO >>>>>>>>>>>>>> ovn_bgp_agent.drivers.openstack.utils.frr [-] Add VRF leak for VRF >>>>>>>>>>>>>> ovn-bgp-vrf on router bgp 64999 >>>>>>>>>>>>>> 2022-07-22 04:04:37.248 111761 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>> Running privsep helper: ['sudo', 'privsep-helper', '--privsep_context', >>>>>>>>>>>>>> 'ovn_bgp_agent.privileged.vtysh_cmd', '--privsep_sock_path', >>>>>>>>>>>>>> '/tmp/tmp4cie9eiz/privsep.sock'] >>>>>>>>>>>>>> 2022-07-22 04:04:37.687 111761 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>> Spawned new privsep daemon via rootwrap >>>>>>>>>>>>>> 2022-07-22 04:04:37.598 111769 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>> privsep daemon starting >>>>>>>>>>>>>> 2022-07-22 04:04:37.613 111769 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>> privsep process running with uid/gid: 0/0 >>>>>>>>>>>>>> 2022-07-22 04:04:37.617 111769 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>> privsep process running with capabilities (eff/prm/inh): >>>>>>>>>>>>>> CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_NET_ADMIN|CAP_SYS_ADMIN/none >>>>>>>>>>>>>> 2022-07-22 04:04:37.617 111769 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>> privsep daemon running as pid 111769 >>>>>>>>>>>>>> 2022-07-22 04:04:37.987 111769 ERROR >>>>>>>>>>>>>> ovn_bgp_agent.privileged.vtysh [-] Unable to execute vtysh with >>>>>>>>>>>>>> ['/usr/bin/vtysh', '--vty_socket', '/run/frr/', '-c', 'copy >>>>>>>>>>>>>> /tmp/tmpiz5s_wvs running-config']. Exception: Unexpected error while >>>>>>>>>>>>>> running command. >>>>>>>>>>>>>> Command: /usr/bin/vtysh --vty_socket /run/frr/ -c copy >>>>>>>>>>>>>> /tmp/tmpiz5s_wvs running-config >>>>>>>>>>>>>> Exit code: 1 >>>>>>>>>>>>>> Stdout: '% Unknown command: copy /tmp/tmpiz5s_wvs >>>>>>>>>>>>>> running-config\n' >>>>>>>>>>>>>> Stderr: '' >>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/privileged/vtysh.py", >>>>>>>>>>>>>> line 30, in run_vtysh_config >>>>>>>>>>>>>> return processutils.execute(*full_args) >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/processutils.py", >>>>>>>>>>>>>> line 438, in execute >>>>>>>>>>>>>> raise ProcessExecutionError(exit_code=_returncode, >>>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>>>>>>>>>>> Unexpected error while running command. >>>>>>>>>>>>>> Command: /usr/bin/vtysh --vty_socket /run/frr/ -c copy >>>>>>>>>>>>>> /tmp/tmpiz5s_wvs running-config >>>>>>>>>>>>>> Exit code: 1 >>>>>>>>>>>>>> Stdout: '% Unknown command: copy /tmp/tmpiz5s_wvs >>>>>>>>>>>>>> running-config\n' >>>>>>>>>>>>>> Stderr: '' >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service [-] >>>>>>>>>>>>>> Error starting thread.: >>>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>>>>>>>>> running command. >>>>>>>>>>>>>> Command: /usr/bin/vtysh --vty_socket /run/frr/ -c copy >>>>>>>>>>>>>> /tmp/tmpiz5s_wvs running-config >>>>>>>>>>>>>> Exit code: 1 >>>>>>>>>>>>>> Stdout: '% Unknown command: copy /tmp/tmpiz5s_wvs >>>>>>>>>>>>>> running-config\n' >>>>>>>>>>>>>> Stderr: '' >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> File "/usr/local/lib/python3.8/dist-packages/oslo_service/service.py", line >>>>>>>>>>>>>> 806, in run_service >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> service.start() >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> File "/usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/agent.py", line >>>>>>>>>>>>>> 50, in start >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> self.agent_driver.start() >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/drivers/openstack/ovn_bgp_driver.py", >>>>>>>>>>>>>> line 73, in start >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> frr.vrf_leak(constants.OVN_BGP_VRF, CONF.bgp_AS, CONF.bgp_router_id) >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/drivers/openstack/utils/frr.py", >>>>>>>>>>>>>> line 110, in vrf_leak >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> _run_vtysh_config_with_tempfile(vrf_config) >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/drivers/openstack/utils/frr.py", >>>>>>>>>>>>>> line 93, in _run_vtysh_config_with_tempfile >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> ovn_bgp_agent.privileged.vtysh.run_vtysh_config(f.name) >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> File "/usr/local/lib/python3.8/dist-packages/oslo_privsep/priv_context.py", >>>>>>>>>>>>>> line 271, in _wrap >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> return self.channel.remote_call(name, args, kwargs, >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> File "/usr/local/lib/python3.8/dist-packages/oslo_privsep/daemon.py", line >>>>>>>>>>>>>> 215, in remote_call >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> raise exc_type(*result[2]) >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>>>>>>>>> running command. >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> Command: /usr/bin/vtysh --vty_socket /run/frr/ -c copy /tmp/tmpiz5s_wvs >>>>>>>>>>>>>> running-config >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> Exit code: 1 >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> Stdout: '% Unknown command: copy /tmp/tmpiz5s_wvs running-config\n' >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> Stderr: '' >>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>> 2022-07-22 04:04:37.993 111761 INFO ovn_bgp_agent.agent [-] >>>>>>>>>>>>>> Service 'BGPAgent' stopping >>>>>>>>>>>>>> 2022-07-22 04:04:37.994 111761 INFO ovn_bgp_agent.agent [-] >>>>>>>>>>>>>> Service 'BGPAgent' stopped >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> LUIS TOM?S BOL?VAR >>>>>>>>>>>>> Principal Software Engineer >>>>>>>>>>>>> Red Hat >>>>>>>>>>>>> Madrid, Spain >>>>>>>>>>>>> ltomasbo at redhat.com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> LUIS TOM?S BOL?VAR >>>>>>>>>> Principal Software Engineer >>>>>>>>>> Red Hat >>>>>>>>>> Madrid, Spain >>>>>>>>>> ltomasbo at redhat.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> LUIS TOM?S BOL?VAR >>>>>>>> Principal Software Engineer >>>>>>>> Red Hat >>>>>>>> Madrid, Spain >>>>>>>> ltomasbo at redhat.com >>>>>>>> >>>>>>>> >>>>>>>> >>> >>> -- >>> LUIS TOM?S BOL?VAR >>> Principal Software Engineer >>> Red Hat >>> Madrid, Spain >>> ltomasbo at redhat.com >>> >>> >> > > -- > LUIS TOM?S BOL?VAR > Principal Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Tue Aug 23 15:49:19 2022 From: satish.txt at gmail.com (Satish Patel) Date: Tue, 23 Aug 2022 11:49:19 -0400 Subject: [ovn-bgp-agent][neutron] - expose_tenant_networks bug Message-ID: Folks, I am setting up ovn-bgp-agent lab in "BGP mode" and i found everything working great except expose tenant network https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ Lab Summary: 1 controller node 3 compute node ovn-bgp-agent running on all compute node because i am using "enable_distributed_floating_ip=True" ovn-bgp-agent config: [DEFAULT] debug=False expose_tenant_networks=True driver=ovn_bgp_driver reconcile_interval=120 ovsdb_connection=unix:/var/run/openvswitch/db.sock I am not seeing my vm on tenant ip getting exposed but when i attach FIP which gets exposed in loopback address. here is the full trace of debug logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Wed Aug 24 06:54:30 2022 From: katonalala at gmail.com (Lajos Katona) Date: Wed, 24 Aug 2022 08:54:30 +0200 Subject: [Neutron] How to add Fake ML2 extension to Neutron? In-Reply-To: <2183551661263463@myt5-b646bde4b8f3.qloud-c.yandex.net> References: <5523221661172836@myt6-bbc622793f1b.qloud-c.yandex.net> <2183551661263463@myt5-b646bde4b8f3.qloud-c.yandex.net> Message-ID: Hi Igor, The line which is interesting for you: "Extension vpc_extension not supported by any of loaded plugins" In core Neutron for ml2 there is a list of supported extension aliases: https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L200-L239 And there is a similar for l3 also: https://opendev.org/openstack/neutron/src/branch/master/neutron/services/l3_router/l3_router_plugin.py#L98-L110 Or similarly for QoS: https://opendev.org/openstack/neutron/src/branch/master/neutron/services/qos/qos_plugin.py#L76-L90 So you need a plugin that uses the extension. Good luck :-) Lajos Katona (lajoskatona) Igor Zhukov ezt ?rta (id?pont: 2022. aug. 23., K, 16:04): > Hi again! > > Do you know how to debug ML2 extension drivers? > > I created folder with two python files: vpc/extensions/vpc.py and > vpc/plugins/ml2/drivers/vpc.py (also empty __init__.py files) > > I added to neuron.conf > api_extensions_path = /path/to/vpc/extensions > > and I added to ml2_ini.conf > extension_drivers = port_security, > vpc.plugins.ml2.drivers.vpc:VpcExtensionDriver > > and my neutron.server.log has: > > INFO neutron.plugins.ml2.managers [-] Configured extension driver names: > ['port_security', 'vpc_neutron.plugins.ml2.drivers.vpc:VpcExtensionDriver'] > WARNING stevedore.named [-] Could not load > vpc_neutron.plugins.ml2.drivers.vpc:VpcExtensionDriver > .... > INFO neutron.api.extensions [req-fd226631-b0cd-4ff8-956b-9470e7f26ebe - - > - - -] Extension vpc_extension not supported by any of loaded plugins > > How can I find why the extension driver could not be loaded? > > > Hi,The fake_extension is used only in unit tests to test the extension > framework, i.e. : > > > https://opendev.org/openstack/neutron/src/branch/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L37 > > > > If you would like to write an API extension check > neutron-lib/api/definitions/ (and you can find the extensions "counterpart" > under neutron/extensions in neutron repository) > > > > You can also check other Networking projects like networking-bgvpn, > neutron-dynamic-routing to have examples of API extensions. > > If you have an extension under neutron/extensions and there's somebody > who uses it (see [1]) you will see it is loaded in neutron servers logs > (something like this: "Loaded extension: address-group") and you can find > it in the output of openstack extension list --network > > > > [1]: > https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L200 > > > > Best wishes > > Lajos Katona > > > > Igor Zhukov ezt ?rta (id?pont: 2022. aug. 22., H, > 19:41): > > > >> Hi all! > >> > >> Sorry for a complete noob question but I can't figure it out ? > >> > >> So if I want to add Fake ML2 extension what should I do? > >> > >> I have neutron server installed and I have the file: > https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/extensions/fake_extension.py > >> > >> How to configure neutron server, where should I put the file, should I > create another files? How can I test that it works? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltomasbo at redhat.com Wed Aug 24 06:58:16 2022 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Wed, 24 Aug 2022 08:58:16 +0200 Subject: [ovn-bgp-agent][neutron] - expose_tenant_networks bug In-Reply-To: References: Message-ID: On Tue, Aug 23, 2022 at 6:04 PM Satish Patel wrote: > Folks, > > I am setting up ovn-bgp-agent lab in "BGP mode" and i found everything > working great except expose tenant network > https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ > > Lab Summary: > > 1 controller node > 3 compute node > > ovn-bgp-agent running on all compute node because i am using > "enable_distributed_floating_ip=True" > > ovn-bgp-agent config: > > [DEFAULT] > debug=False > expose_tenant_networks=True > driver=ovn_bgp_driver > reconcile_interval=120 > ovsdb_connection=unix:/var/run/openvswitch/db.sock > > I am not seeing my vm on tenant ip getting exposed but when i attach FIP > which gets exposed in loopback address. here is the full trace of debug > logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/ > It is not exposed in any node, right? Note when expose_tenant_network is enabled, the traffic to the tenant VM is exposed in the node holding the cr-lrp (ovn router gateway port) for the router connecting the tenant network to the provider one. The FIP will be exposed in the node where the VM is. On the other hand, the error you see there should not happen, so I'll investigate why that is and also double check if the expose_tenant_network flag is broken somehow. Thanks! -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Wed Aug 24 12:36:00 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 24 Aug 2022 09:36:00 -0300 Subject: [cinder] Bug deputy report for week of 08-24-2022 Message-ID: This is a bug report from 08-17-2022 to 08-24-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- Low - https://bugs.launchpad.net/cinder/+bug/1982945 "[docs] Cannot start VM from disk with burst QoS." Unassigned. - https://bugs.launchpad.net/cinder/+bug/1900406 "RBD support extends for in-use volume." Unassigned. Cheers, Sofia -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcin.juszkiewicz at linaro.org Wed Aug 24 12:38:07 2022 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Wed, 24 Aug 2022 14:38:07 +0200 Subject: High memory use with RabbitMQ on RHEL9 hosts Message-ID: If you run RabbitMQ on RHEL9 family hosts (CentOS Stream 9, RockyLinux 9, AlmaLinux 9) and you see 1.6GB memory use then consider limiting amount of file descriptors: > EL9 (both CentOS Stream 9 and Rocky Linux 9) have 1073741816 while > CentOS Stream 8 has 1048576 - after lowering it's acceptable > memory usage. https://github.com/rabbitmq/erlang-rpm/discussions/104 for more From the.wade.albright at gmail.com Wed Aug 24 00:02:46 2022 From: the.wade.albright at gmail.com (Wade Albright) Date: Tue, 23 Aug 2022 17:02:46 -0700 Subject: [ironic][xena] problems updating redfish_password for existing node In-Reply-To: References: Message-ID: I tested out the latest change posted here https://review.opendev.org/c/openstack/sushy/+/853209 This solved the issue, things work fine now with session auth. Thanks for that! Much appreciated. Wade On Tue, Aug 16, 2022 at 9:22 AM Wade Albright wrote: > Thanks Julia! When I get a chance I will test that out and report back. I > may not be able to get to it right away as the system where I could > reproduce this issue reliably is in use by another team now. > > > > On Mon, Aug 15, 2022 at 5:27 PM Julia Kreger > wrote: > >> Well, that is weird. If I grok it as-is, it almost looks like the BMC >> returned an empty response.... It is not a failure mode we've seen or >> had reported (afaik) up to this point. >> >> That being said, we should be able to invalidate the session and >> launch a new client... >> >> I suspect https://review.opendev.org/c/openstack/sushy/+/853209 >> should fix things up. I suspect we will look at just invalidating the >> session upon any error. >> >> On Mon, Aug 15, 2022 at 11:47 AM Wade Albright >> wrote: >> > >> > 1) There are sockets open briefly when the conductor is trying to >> connect. After three tries the node is set to maintenance mode and there >> are no more sockets open. >> > 2) My (extremely simple) code was not using connection: close. I was >> just running "requests.get(" >> https://10.12.104.174/redfish/v1/Systems/System.Embedded.1", >> verify=False, auth=('xxxx', 'xxxx'))" in a loop. I just tried it with >> headers={'Connection':'close'} and it doesn't seem to make any difference. >> Works fine either way. >> > >> > I was able to confirm that the problem only happens when using session >> auth. With basic auth it doesn't happen. >> > >> > Versions I'm using here are ironic 18.2.1 and sushy 3.12.2. >> > >> > Here are some fresh logs from the node having the problem: >> > >> > 2022-08-15 10:34:21.726 208875 INFO ironic.conductor.task_manager >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploying" >> from state "active"; target provision state is "active" >> > 2022-08-15 10:34:22.553 208875 INFO ironic.conductor.utils >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 current power state is 'power on', >> requested state is 'power off'. >> > 2022-08-15 10:34:35.185 208875 INFO ironic.conductor.utils >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Successfully set node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 power state to power off by power off. >> > 2022-08-15 10:34:35.200 208875 WARNING ironic.common.pxe_utils >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] IPv6 is enabled and the >> DHCP driver appears set to a plugin aside from "neutron". Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 may not receive proper DHCPv6 provided >> boot parameters. >> > 2022-08-15 10:34:38.246 208875 INFO ironic.conductor.deployments >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Deploying on node >> 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': >> 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'}, >> {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': >> 'deploy'}, {'step': 'write_image', 'priority': 80, 'argsinfo': None, >> 'interface': 'deploy'}, {'step': 'tear_down_agent', 'priority': 40, >> 'argsinfo': None, 'interface': 'deploy'}, {'step': >> 'switch_to_tenant_network', 'priority': 30, 'argsinfo': None, 'interface': >> 'deploy'}, {'step': 'boot_instance', 'priority': 20, 'argsinfo': None, >> 'interface': 'deploy'}] >> > 2022-08-15 10:34:38.255 208875 INFO ironic.conductor.deployments >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Executing {'step': >> 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'} on >> node 0c304cea-8ae2-4a12-b658-dec05c190f88 >> > 2022-08-15 10:35:27.158 208875 INFO >> ironic.drivers.modules.ansible.deploy >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Ansible pre-deploy step >> complete on node 0c304cea-8ae2-4a12-b658-dec05c190f88 >> > 2022-08-15 10:35:27.159 208875 INFO ironic.conductor.deployments >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 finished deploy step {'step': >> 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'} >> > 2022-08-15 10:35:27.160 208875 INFO ironic.conductor.deployments >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Deploying on node >> 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': 'deploy', >> 'priority': 100, 'argsinfo': None, 'interface': 'deploy'}, {'step': >> 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}, >> {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface': >> 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo': >> None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20, >> 'argsinfo': None, 'interface': 'deploy'}] >> > 2022-08-15 10:35:27.176 208875 INFO ironic.conductor.deployments >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Executing {'step': >> 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 >> > 2022-08-15 10:35:32.037 208875 INFO ironic.conductor.utils >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Successfully set node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 power state to power on by rebooting. >> > 2022-08-15 10:35:32.037 208875 INFO ironic.conductor.deployments >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Deploy step {'step': >> 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 being executed asynchronously, waiting >> for driver. >> > 2022-08-15 10:35:32.051 208875 INFO ironic.conductor.task_manager >> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338 >> b679510ddb6540ca9454e26841f65c89 - default default] Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "wait >> call-back" from state "deploying"; target provision state is "active" >> > 2022-08-15 10:39:54.726 208875 INFO ironic.conductor.task_manager >> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploying" >> from state "wait call-back"; target provision state is "active" >> > 2022-08-15 10:39:54.741 208875 INFO ironic.conductor.deployments >> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Deploying on node >> 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': >> 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}, >> {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface': >> 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo': >> None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20, >> 'argsinfo': None, 'interface': 'deploy'}] >> > 2022-08-15 10:39:54.748 208875 INFO ironic.conductor.deployments >> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Executing {'step': >> 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on >> node 0c304cea-8ae2-4a12-b658-dec05c190f88 >> > 2022-08-15 10:42:24.738 208875 WARNING >> ironic.drivers.modules.agent_base [-] Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping >> heartbeat processing (will retry on the next heartbeat): >> ironic.common.exception.NodeLocked: Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host >> sjc06-c01-irn01.ops.ringcentral.com, please retry after the current >> operation is completed. >> > 2022-08-15 10:44:29.788 208875 WARNING >> ironic.drivers.modules.agent_base [-] Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping >> heartbeat processing (will retry on the next heartbeat): >> ironic.common.exception.NodeLocked: Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host >> sjc06-c01-irn01.ops.ringcentral.com, please retry after the current >> operation is completed. >> > 2022-08-15 10:47:24.830 208875 WARNING >> ironic.drivers.modules.agent_base [-] Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping >> heartbeat processing (will retry on the next heartbeat): >> ironic.common.exception.NodeLocked: Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host >> sjc06-c01-irn01.ops.ringcentral.com, please retry after the current >> operation is completed. >> > 2022-08-15 11:05:59.544 208875 INFO >> ironic.drivers.modules.ansible.deploy >> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Ansible complete >> deploy on node 0c304cea-8ae2-4a12-b658-dec05c190f88 >> > 2022-08-15 11:06:00.141 208875 ERROR ironic.conductor.utils >> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 failed deploy step {'step': >> 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} >> with unexpected error: ("Connection broken: InvalidChunkLength(got length >> b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): >> requests.exceptions.ChunkedEncodingError: ("Connection broken: >> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got >> length b'', 0 bytes read)) >> > 2022-08-15 11:06:00.218 208875 ERROR ironic.conductor.task_manager >> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node >> 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploy >> failed" from state "deploying"; target provision state is "active": >> requests.exceptions.ChunkedEncodingError: ("Connection broken: >> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got >> length b'', 0 bytes read)) >> > 2022-08-15 11:06:28.774 208875 WARNING ironic.conductor.manager >> [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During >> sync_power_state, could not get power state for node >> 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 1 of 3. Error: ("Connection >> broken: InvalidChunkLength(got length b'', 0 bytes read)", >> InvalidChunkLength(got length b'', 0 bytes read)).: >> requests.exceptions.ChunkedEncodingError: ("Connection broken: >> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got >> length b'', 0 bytes read)) >> > 2022-08-15 11:06:53.710 208875 WARNING ironic.conductor.manager >> [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During >> sync_power_state, could not get power state for node >> 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 2 of 3. Error: ("Connection >> broken: InvalidChunkLength(got length b'', 0 bytes read)", >> InvalidChunkLength(got length b'', 0 bytes read)).: >> requests.exceptions.ChunkedEncodingError: ("Connection broken: >> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got >> length b'', 0 bytes read)) >> > 2022-08-15 11:07:53.727 208875 WARNING ironic.conductor.manager >> [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During >> sync_power_state, could not get power state for node >> 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 3 of 3. Error: ("Connection >> broken: InvalidChunkLength(got length b'', 0 bytes read)", >> InvalidChunkLength(got length b'', 0 bytes read)).: >> requests.exceptions.ChunkedEncodingError: ("Connection broken: >> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got >> length b'', 0 bytes read)) >> > 2022-08-15 11:08:53.704 208875 ERROR ironic.conductor.manager >> [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During >> sync_power_state, max retries exceeded for node >> 0c304cea-8ae2-4a12-b658-dec05c190f88, node state None does not match >> expected state 'power on'. Updating DB state to 'None' Switching node to >> maintenance mode. Error: ("Connection broken: InvalidChunkLength(got length >> b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): >> requests.exceptions.ChunkedEncodingError: ("Connection broken: >> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got >> length b'', 0 bytes read)) >> > 2022-08-15 11:13:53.750 208875 ERROR ironic.conductor.manager >> [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During >> sync_power_state, max retries exceeded for node >> 0c304cea-8ae2-4a12-b658-dec05c190f88, node state None does not match >> expected state 'None'. Updating DB state to 'None' Switching node to >> maintenance mode. Error: ("Connection broken: InvalidChunkLength(got length >> b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)): >> requests.exceptions.ChunkedEncodingError: ("Connection broken: >> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got >> length b'', 0 bytes read)) >> > >> > >> > On Fri, Aug 12, 2022 at 9:06 PM Julia Kreger < >> juliaashleykreger at gmail.com> wrote: >> >> >> >> Two questions: >> >> >> >> 1) do you see open sockets to the BMCs in netstat output? >> >> 2) is your code using ?connection: close?? Or are you using sushy? >> >> >> >> Honestly, this seems *really* weird with current sushy versions, and >> is kind of reminiscent of a cached session which is using kept alive >> sockets. >> >> >> >> If you could grep out req-b6dd74da-1cc7-4c63-b58e-b7ded37007e9 to see >> what the prior couple of conductor actions were, that would give us better >> context as to what is going on. >> >> >> >> -Julia >> >> >> >> On Fri, Aug 12, 2022 at 3:11 PM Wade Albright < >> the.wade.albright at gmail.com> wrote: >> >>> >> >>> Sorry for the spam. The openssl issue may have been a red herring. I >> am not able to reproduce the issue directly with my own python code. I was >> trying to fetch something that required authentication. After I added the >> correct auth info it works fine. I am not able to cause the same error as >> is happening in the Ironic logs. >> >>> >> >>> Anyway I'll do some more testing and report back. >> >>> >> >>> On Fri, Aug 12, 2022 at 2:14 PM Wade Albright < >> the.wade.albright at gmail.com> wrote: >> >>>> >> >>>> I'm not sure why this problem only now started showing up, but it >> appears to be unrelated to Ironic. I was able to reproduce it directly >> outside of Ironic using a simple python program using urllib to get URLs >> from the BMC/redfish interface. Seems to be some combination of a buggy >> server SSL implementation and newer openssl 1.1.1. Apparently it doesn't >> happen using openssl 1.0. >> >>>> >> >>>> I've found some information about possible workarounds but haven't >> figured it out yet. If I do I'll update this thread just in case anyone >> else runs into it. >> >>>> >> >>>> On Fri, Aug 12, 2022 at 8:13 AM Wade Albright < >> the.wade.albright at gmail.com> wrote: >> >>>>> >> >>>>> So I seem to have run into a new issue after upgrading to the newer >> versions to fix the password change issue. >> >>>>> >> >>>>> Now I am randomly getting errors like the below. Once I hit this >> error for a given node, no operations work on the node. I thought maybe it >> was an issue with the node itself, but it doesn't seem like it. The BMC >> seems to be working fine. >> >>>>> >> >>>>> After a conductor restart, things start working again. Has anyone >> seen something like this? >> >>>>> >> >>>>> Log example: >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils [- - - >> - -] Node ef5a2502-680b-4933-a0ee-6737e57ce1c5 failed deploy step {'step': >> 'write_image', 'priority': >> >>>>> 80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: >> ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", >> InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions. >> >>>>> ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got >> length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes >> read)) >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> Traceback (most recent call last): >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 697, in >> _update_chunk_length >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> self.chunk_left = int(line, 16) >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> ValueError: invalid literal for int() with base 16: b'' >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During >> handling of the above exception, another exception occurred: >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> Traceback (most recent call last): >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 438, in >> _error_catcher >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> yield >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 764, in >> read_chunked >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> self._update_chunk_length() >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 701, in >> _update_chunk_length >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> raise InvalidChunkLength(self, line) >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 >> bytes read) >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During >> handling of the above exception, another exception occurred: >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> Traceback (most recent call last): >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/requests/models.py", line 760, in >> generate >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> for chunk in self.raw.stream(chunk_size, decode_content=True): >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 572, in >> stream >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> for line in self.read_chunked(amt, decode_content=decode_content): >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 793, in >> read_chunked >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> self._original_response.close() >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__ >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> self.gen.throw(type, value, traceback) >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File >> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 455, in >> _error_catcher >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> raise ProtocolError("Connection broken: %r" % e, e) >> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils >> urllib3.exceptions.ProtocolError: ("Connection broken: >> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got >> length b'', 0 bytes r >> >>>>> ead)) >> >>>>> >> >>>>> On Wed, Jul 20, 2022 at 2:04 PM Wade Albright < >> the.wade.albright at gmail.com> wrote: >> >>>>>> >> >>>>>> I forgot to mention, that using session auth solved the problem >> after upgrading to the newer versions that include the two mentioned >> patches. >> >>>>>> >> >>>>>> On Wed, Jul 20, 2022 at 7:36 AM Wade Albright < >> the.wade.albright at gmail.com> wrote: >> >>>>>>> >> >>>>>>> Switching to session auth solved the problem, and it seems like >> the better way to go anyway for equipment that supports it. Thanks again >> for all your help! >> >>>>>>> >> >>>>>>> Wade >> >>>>>>> >> >>>>>>> On Tue, Jul 19, 2022 at 5:37 PM Julia Kreger < >> juliaashleykreger at gmail.com> wrote: >> >>>>>>>> >> >>>>>>>> Just to provide a brief update for the mailing list. It looks >> like >> >>>>>>>> this is a case of use of Basic Auth with the BMC, where we were >> not >> >>>>>>>> catching the error properly... and thus not reporting the >> >>>>>>>> authentication failure to ironic so it would catch, and initiate >> a new >> >>>>>>>> client with the most up to date password. The default, typically >> used >> >>>>>>>> path is Session based authentication as BMCs generally handle >> internal >> >>>>>>>> session/user login tracking in a far better fashion. But not >> every BMC >> >>>>>>>> supports sessions. >> >>>>>>>> >> >>>>>>>> Fix in review[0] :) >> >>>>>>>> >> >>>>>>>> -Julia >> >>>>>>>> [0] https://review.opendev.org/c/openstack/sushy/+/850425 >> >>>>>>>> >> >>>>>>>> On Mon, Jul 18, 2022 at 4:15 PM Julia Kreger >> >>>>>>>> wrote: >> >>>>>>>> > >> >>>>>>>> > Excellent, hopefully I'll be able to figure out why Sushy is >> not doing >> >>>>>>>> > the needful... Or if it is and Ironic is not picking up on it. >> >>>>>>>> > >> >>>>>>>> > Anyway, I've posted >> >>>>>>>> > https://review.opendev.org/c/openstack/ironic/+/850259 which >> might >> >>>>>>>> > handle this issue. Obviously a work in progress, but it >> represents >> >>>>>>>> > what I think is happening inside of ironic itself leading into >> sushy >> >>>>>>>> > when cache access occurs. >> >>>>>>>> > >> >>>>>>>> > On Mon, Jul 18, 2022 at 4:04 PM Wade Albright >> >>>>>>>> > wrote: >> >>>>>>>> > > >> >>>>>>>> > > Sounds good, I will do that tomorrow. Thanks Julia. >> >>>>>>>> > > >> >>>>>>>> > > On Mon, Jul 18, 2022 at 3:27 PM Julia Kreger < >> juliaashleykreger at gmail.com> wrote: >> >>>>>>>> > >> >> >>>>>>>> > >> Debug would be best. I think I have an idea what is going >> on, and this >> >>>>>>>> > >> is a similar variation. If you want, you can email them >> directly to >> >>>>>>>> > >> me. Specifically only need entries reported by the sushy >> library and >> >>>>>>>> > >> ironic.drivers.modules.redfish.utils. >> >>>>>>>> > >> >> >>>>>>>> > >> On Mon, Jul 18, 2022 at 3:20 PM Wade Albright >> >>>>>>>> > >> wrote: >> >>>>>>>> > >> > >> >>>>>>>> > >> > I'm happy to supply some logs, what verbosity level >> should i use? And should I just embed the logs in email to the list or >> upload somewhere? >> >>>>>>>> > >> > >> >>>>>>>> > >> > On Mon, Jul 18, 2022 at 3:14 PM Julia Kreger < >> juliaashleykreger at gmail.com> wrote: >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> If you could supply some conductor logs, that would be >> helpful. It >> >>>>>>>> > >> >> should be re-authenticating, but obviously we have a >> larger bug there >> >>>>>>>> > >> >> we need to find the root issue behind. >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> On Mon, Jul 18, 2022 at 3:06 PM Wade Albright >> >>>>>>>> > >> >> wrote: >> >>>>>>>> > >> >> > >> >>>>>>>> > >> >> > I was able to use the patches to update the code, but >> unfortunately the problem is still there for me. >> >>>>>>>> > >> >> > >> >>>>>>>> > >> >> > I also tried an RPM upgrade to the versions Julia >> mentioned had the fixes, namely Sushy 3.12.1 - Released May 2022 and Ironic >> 18.2.1 - Released in January 2022. But it did not fix the problem. >> >>>>>>>> > >> >> > >> >>>>>>>> > >> >> > I am able to consistently reproduce the error. >> >>>>>>>> > >> >> > - step 1: change BMC password directly on the node >> itself >> >>>>>>>> > >> >> > - step 2: update BMC password (redfish_password) in >> ironic with 'openstack baremetal node set --driver-info >> redfish_password='newpass' >> >>>>>>>> > >> >> > >> >>>>>>>> > >> >> > After step 1 there are errors in the logs entries like >> "Session authentication appears to have been lost at some point in time" >> and eventually it puts the node into maintenance mode and marks the power >> state as "none." >> >>>>>>>> > >> >> > After step 2 and taking the host back out of >> maintenance mode, it goes through a similar set of log entries puts the >> node into MM again. >> >>>>>>>> > >> >> > >> >>>>>>>> > >> >> > After the above steps, a conductor restart fixes the >> problem and operations work normally again. Given this it seems like there >> is still some kind of caching issue. >> >>>>>>>> > >> >> > >> >>>>>>>> > >> >> > On Sat, Jul 16, 2022 at 6:01 PM Wade Albright < >> the.wade.albright at gmail.com> wrote: >> >>>>>>>> > >> >> >> >> >>>>>>>> > >> >> >> Hi Julia, >> >>>>>>>> > >> >> >> >> >>>>>>>> > >> >> >> Thank you so much for the reply! Hopefully this is >> the issue. I'll try out the patches next week and report back. I'll also >> email you on Monday about the versions, that would be very helpful to know. >> >>>>>>>> > >> >> >> >> >>>>>>>> > >> >> >> Thanks again, really appreciate it. >> >>>>>>>> > >> >> >> >> >>>>>>>> > >> >> >> Wade >> >>>>>>>> > >> >> >> >> >>>>>>>> > >> >> >> >> >>>>>>>> > >> >> >> >> >>>>>>>> > >> >> >> On Sat, Jul 16, 2022 at 4:36 PM Julia Kreger < >> juliaashleykreger at gmail.com> wrote: >> >>>>>>>> > >> >> >>> >> >>>>>>>> > >> >> >>> Greetings! >> >>>>>>>> > >> >> >>> >> >>>>>>>> > >> >> >>> I believe you need two patches, one in ironic and >> one in sushy. >> >>>>>>>> > >> >> >>> >> >>>>>>>> > >> >> >>> Sushy: >> >>>>>>>> > >> >> >>> >> https://review.opendev.org/c/openstack/sushy/+/832860 >> >>>>>>>> > >> >> >>> >> >>>>>>>> > >> >> >>> Ironic: >> >>>>>>>> > >> >> >>> >> https://review.opendev.org/c/openstack/ironic/+/820588 >> >>>>>>>> > >> >> >>> >> >>>>>>>> > >> >> >>> I think it is variation, and the comment about >> working after you restart the conductor is the big signal to me. I?m on a >> phone on a bad data connection, if you email me on Monday I can see what >> versions the fixes would be in. >> >>>>>>>> > >> >> >>> >> >>>>>>>> > >> >> >>> For the record, it is a session cache issue, the bug >> was that the service didn?t quite know what to do when auth fails. >> >>>>>>>> > >> >> >>> >> >>>>>>>> > >> >> >>> -Julia >> >>>>>>>> > >> >> >>> >> >>>>>>>> > >> >> >>> >> >>>>>>>> > >> >> >>> On Fri, Jul 15, 2022 at 2:55 PM Wade Albright < >> the.wade.albright at gmail.com> wrote: >> >>>>>>>> > >> >> >>>> >> >>>>>>>> > >> >> >>>> Hi, >> >>>>>>>> > >> >> >>>> >> >>>>>>>> > >> >> >>>> I'm hitting a problem when trying to update the >> redfish_password for an existing node. I'm curious to know if anyone else >> has encountered this problem. I'm not sure if I'm just doing something >> wrong or if there is a bug. Or if the problem is unique to my setup. >> >>>>>>>> > >> >> >>>> >> >>>>>>>> > >> >> >>>> I have a node already added into ironic with all >> the driver details set, and things are working fine. I am able to run >> deployments. >> >>>>>>>> > >> >> >>>> >> >>>>>>>> > >> >> >>>> Now I need to change the redfish password on the >> host. So I update the password for redfish access on the host, then use an >> 'openstack baremetal node set --driver-info >> redfish_password=' command to set the new redfish_password. >> >>>>>>>> > >> >> >>>> >> >>>>>>>> > >> >> >>>> Once this has been done, deployment no longer >> works. I see redfish authentication errors in the logs and the operation >> fails. I waited a bit to see if there might just be a delay in updating the >> password, but after awhile it still didn't work. >> >>>>>>>> > >> >> >>>> >> >>>>>>>> > >> >> >>>> I restarted the conductor, and after that things >> work fine again. So it seems like the password is cached or something. Is >> there a way to force the password to update? I even tried removing the >> redfish credentials and re-adding them, but that didn't work either. Only a >> conductor restart seems to make the new password work. >> >>>>>>>> > >> >> >>>> >> >>>>>>>> > >> >> >>>> We are running Xena, using rpm installation on >> Oracle Linux 8.5. >> >>>>>>>> > >> >> >>>> >> >>>>>>>> > >> >> >>>> Thanks in advance for any help with this issue. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltomasbo at redhat.com Wed Aug 24 06:53:53 2022 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Wed, 24 Aug 2022 08:53:53 +0200 Subject: ovn-bgp-agent installation issue In-Reply-To: References: <95C66BE4-2944-45C6-A3C4-EBFE1FECDD25@gmail.com> Message-ID: On Tue, Aug 23, 2022 at 5:37 PM Satish Patel wrote: > Hi Luis, > > /cc - openstack discuss mailing list > > Thank you so much for clearing my doubts. I have a counter question. > > BGP Mode: > - In bgp mode where i should run ovn-bgp-agent ? Network node or Compute > node? > It need to be running in all the nodes, unless you are only interested in a limited functionality: - For FIP and VMs on the provider network you would only need them in the compute nodes - For the expose_tenant_network flag, you would need it on the network nodes, as the traffic is exposed through the ovn router gateway port - For octavia load balancer with ovn provider (where the VIP is on the provider network and the members in a tenant network) you will also need the agent running on the networker node, as the traffic is also injected into the OVN overlay at the networker node (through the ovn router gateway port) > EVPN Mode: > - In evpn mode where i should run ovn-bgp-agent? > Only in the networker nodes is enough, as this is where the IPs gets exposed and where the traffic gets injected into the OVN overlay through the ovn router gateway port - In my lab i am using your heck "How to use it without BGPVPN" to create > vni mapping, do you think because of that i am not seeing vni pushed out to > FRR automatically? > https://ltomasbo.wordpress.com/2021/06/25/openstack-networking-with-evpn/ > indeed, I think the hack may be a bit outdated, and now you need to set up both the VNI and the AS number: https://opendev.org/x/ovn-bgp-agent/src/branch/master/ovn_bgp_agent/drivers/openstack/watchers/evpn_watcher.py#L92-L93 Otherwise the event won't match and will be dismissed. Try also doing: ovn-nbctl set logical_switch_port XXX external_ids:"neutron_bgpvpn\:as"=65000 > - In your demo link I can see you have bridge two different deployment > clouds using the same VNI, In this scenario we need a router (or > vrf) somewhere correct to bridge to different subnet IPs, correct? or is > this L2 stretch? > Yes, in my demo I have: - One spine/leaf deployment configured with frr where and OpenStack deployment is connected to the leafs - One devstack deployment with Kuryr and Kubernetes (to use neutron ports for kubernetes pods, therefore connecting neutron ports to neutron ports under the hood) - One router VM where both spines are connected to, as well as the devstack VM So, it is all L3, with different networks on each side. In the router VM I have the next frr configuration: ... router bgp 65000 bgp log-neighbor-changes bgp graceful-shutdown neighbor downlink peer-group neighbor downlink remote-as internal neighbor downlink bfd neighbor eth0 interface peer-group downlink neighbor eth1 interface peer-group downlink neighbor uplink peer-group neighbor uplink remote-as external neighbor uplink bfd neighbor eth2 interface peer-group uplink address-family ipv4 unicast redistribute connected neighbor downlink default-originate neighbor downlink prefix-list only-host-prefixes in neighbor uplink prefix-list only-default-host-prefixes in exit-address-family address-family l2vpn evpn neighbor uplink activate neighbor downlink activate neighbor downlink route-reflector-client exit-address-family ... Where I define the connectivity to spines as internal and the one to devstack VM as external. And I just activated the l2vpn evpn so that the vni information is considered. > I am going to open a new thread to discuss issues related to > expose_tenant_networks=True flag issue. > Thanks > On Tue, Aug 23, 2022 at 3:14 AM Luis Tomas Bolivar > wrote: > >> Hi Satish! See inline >> >> On Mon, Aug 22, 2022 at 5:22 PM Satish Patel >> wrote: >> >>> Hi Luis, >>> >>> Welcome back from your vacation, sorry i didn't know you are on >>> vacation. Hope you had a wonderful time. >>> >>> How do i start conversion on opendev thread? is there a mailing list or >>> are you saying open thread here - >>> https://storyboard.openstack.org/#!/project/x/ovn-bgp-agent >>> >> >> Actually, you did it right (sending it to openstack-discuss), but it >> seems at some point we stop adding it on our replies... probably my fault >> due to replying on the phone while on vacation. >> >> For the next issue we can open a different thread and I'll try not to >> drop the openstack-discuss list! xD >> >> It would be great though if you can sent a follow up to the previous >> thread with the final outcome (fixed by updating, or adding config X, ..), >> so that community can see the project is alive, and other people facing >> similar issues can try your solution/config. >> > >>> I have an update for you, after upgrading the OVN/OVS version to the >>> latest and that resolved lots of issues. As you mentioned earlier running >>> the latest code is very important to get proper functionality. >>> >> >> Awesome! Yeah, it is a new project, so functionality/fixes are being >> added regularly, therefore using the latest version is the best approach. >> The plan is to soon create a more stable version/release, once we have >> enough testing coverage and main functionality covered >> >>> >>> You are saying in BGP mode only FIP will get exposed but not the tenant >>> VM ( What is this flag for expose_tenant_network=True ?) >>> >> >> Main idea was: >> - BGP mode for exposing VMs/LBs on provider networks, or with FIPs >> attached >> - EVPN mode for tenant networks >> >> To give you some more context, we started with the BGP mode, and did the >> expose_tenant_network to expose also tenant IPs. However, the BGP mode is >> lacking an API to decide what to expose. So we moved to the EVPN mode (with >> networking-bgpvpn as the API) to expose the tenant networks. >> >> That said, you are right, and the expose_tenant_network flag is intended >> for the BGP mode, to expose all tenant networks (so you need to ensure no >> overlapping CIDRs). And, if that is not working (it may be, as there is no >> testing coverage for it), it is something we can definitely look at and >> fix. So, feel free to open a new thread on openstack-discuss, or bug in >> storyboard (whatever works for you better), and I'll try to fix asap. >> >> >>> >>> You are saying in EVPN mode only tenant VM ip will get exposed but not >>> FIP but in this design what is the use of getting tenant VM exposed but you >>> can't expose FIP then how does external people access VMs? ( I am totally >>> confused in EVPN design and use case because your tenant VM is already >>> talking to each other using geneve tunnel so why do we need EVPN/L2VPN to >>> expose them?) >>> >> >> EVPN mode is indeed only for tenant VMs (and loadbalancer). Idea for this >> mode is to connect (N/S traffic, as you mention E/W is still using the >> normal geneve path) tenant networks between different OpenStack clusters. >> EVPN will create a vxlan tunnel connecting them. So, the vni/vxlan id >> selected (with networking-bgpvpn) is the vxlan tunnel encap id being used >> to connect both tenant networks. >> >> So, the idea behind the EVPN mode is not to make your VMs in a tenant >> network publicly accessible, but being able to access them from a different >> tenant network in a different cloud (it actually does not need to be an >> OpenStack cloud, anything connected to the same vxlan id. Perhaps this is a >> bit more clear in this demo: >> https://ltomasbo.wordpress.com/2021/10/01/ovn-bgp-agent-interconnecting-kubernetes-pods-and-openstack-tenant-vms-with-evpn/ >> >> >>> >>> This is current status of my LAB >>> >>> ovn-bgp-agent in BGP Mode: >>> I am successfully able to expose FIP and provider IPs (as you >>> mentioned) but when i use expose_tenant_network=True in config then >>> getting error and vm tenant ips not getting exposed to bgp so i am not sure >>> what is the use case of that flag. >>> >> >> Yep, most probably we broke something here due to lack of upstream >> testing for this flag. I'll check asap and try to fix it >> >>> >>> ovn-bgp-agent in EVPN Mode: >>> In this mode everything is working and tenant vm also gets exposed but >>> when I attach FIP to vm those FIP ip are not getting exposed in BGP (as you >>> mentioned in your reply). >>> >> Yes, EVPN mode is not intended for FIPs or VMs on provider networks. Only >> for tenants >> >> >>> But i am seeing one bug here where vni config not getting inserted in >>> FRR >>> https://opendev.org/x/ovn-bgp-agent/src/branch/master/ovn_bgp_agent/drivers/openstack/utils/frr.py#L26 >>> >>> >>> Who will trigger that code and when will that code get triggered to >>> configure FRR for vrf/vni ? >>> >> >> This is being added by networking-bgpvpn. You can see some details about >> the integration into my blogpost ( >> https://ltomasbo.wordpress.com/2021/06/25/openstack-networking-with-evpn/) >> or in the upstream documentation: >> https://opendev.org/x/ovn-bgp-agent/src/branch/master/doc/source/contributor/evpn_mode_design.rst >> >> Note though, to make networking-bgpvpn to work with this integration, >> this patch is needed (which btw I should include into the documentation): >> https://review.opendev.org/c/openstack/networking-bgpvpn/+/803161 >> >> Cheers, >> Luis >> >> >>> >>> >>> >>> >>> >>> On Mon, Aug 22, 2022 at 5:44 AM Luis Tomas Bolivar >>> wrote: >>> >>>> Hi Satish, >>>> >>>> Sorry I was on vacation. Trying to get through my inbox. >>>> >>>> I'm not sure what is the current status, note there is a difference >>>> between EVPN and BGP mode. BGP mode is for FIPs and VMs/LBs on the provider >>>> network or with FIPs, while EVPN is for VMs on the tenant networks, without >>>> FIPs. Right now you cannot mix them and need to choose either EVPN or BGP >>>> mode. I'm planning to give it a try to make it multidriver, but I haven't >>>> even started yet. >>>> >>>> BTW, as you did with the initial email, perhaps it is worth to have >>>> conversations on the opendev thread, so that anyone else facing the same >>>> issues can get some hints (or even provide feedback), that was one of the >>>> main ideas about moving it in there. To be able to have better support. >>>> >>>> As for the questions: >>>> # # on rack-2-host-1 >>>> # when i created vm1 which endup on rack-2-host-1 but it doesn't expose >>>> the vm ip address. Is that normal behavior? >>>> >>>> This should be exposed in the node with cr-lrp port, i.e., rack1-host2, >>>> and it should be as simple as adding the IP to the lo-2001 dummy device >>>> >>>> # When I attach a floating ip to vm2 then why does my floating ip >>>> address not get exposed in BGP? >>>> This is what I mentioned before, you cannot merge BGP and EVPN mode. >>>> >>>> On Mon, Aug 8, 2022 at 1:41 PM Satish Patel >>>> wrote: >>>> >>>>> Hi Luis, >>>>> >>>>> Sorry for bugging. I?m almost there now. Do you have any thought of >>>>> following question >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Aug 5, 2022, at 1:16 AM, Satish Patel wrote: >>>>> >>>>> ? >>>>> Good morning Luis, >>>>> >>>>> Quick question, I have following deployment as per your lab >>>>> >>>>> rack-1-host-1 (controller) >>>>> rack-1-host-2 (compute1 - This is hosting cr-lrp ports, inshort router) >>>>> rack-2-host-1 (compute2) >>>>> >>>>> I have created two vms >>>>> >>>>> vagrant at rack-1-host-1:~$ nova list >>>>> nova CLI is deprecated and will be a removed in a future release >>>>> >>>>> +--------------------------------------+------+--------+------------+-------------+--------------------------------------+ >>>>> | ID | Name | Status | Task State | >>>>> Power State | Networks | >>>>> >>>>> +--------------------------------------+------+--------+------------+-------------+--------------------------------------+ >>>>> | aecb4f10-c46f-4551-b112-44e4dc007e88 | vm1 | ACTIVE | - | >>>>> Running | private-test=10.0.0.105 | >>>>> | ceae14b9-70c2-4dbc-8071-0d64d9a0ca84 | vm2 | ACTIVE | - | >>>>> Running | private-test=10.0.0.86, 172.16.1.200 | >>>>> >>>>> +--------------------------------------+------+--------+------------+-------------+--------------------------------------+ >>>>> >>>>> # on rack-1-host-2 >>>>> when i spun up vm2 which endup on rack-1-host-2 hence it created >>>>> vrf-2001 on dummy lo-2001 interface and exposed vm2 ip address >>>>> 10.0.0.86/32 >>>>> >>>>> 96: vrf-2001: mtu 65575 qdisc noqueue state >>>>> UP group default qlen 1000 >>>>> link/ether 22:cc:25:b3:7b:96 brd ff:ff:ff:ff:ff:ff >>>>> 97: br-2001: mtu 1500 qdisc noqueue >>>>> master vrf-2001 state UP group default qlen 1000 >>>>> link/ether 0a:c3:23:7a:8f:0c brd ff:ff:ff:ff:ff:ff >>>>> inet6 fe80::851:67ff:fe64:b2c3/64 scope link >>>>> valid_lft forever preferred_lft forever >>>>> 98: vxlan-2001: mtu 1500 qdisc >>>>> noqueue master br-2001 state UNKNOWN group default qlen 1000 >>>>> link/ether 0a:c3:23:7a:8f:0c brd ff:ff:ff:ff:ff:ff >>>>> inet6 fe80::8c3:23ff:fe7a:8f0c/64 scope link >>>>> valid_lft forever preferred_lft forever >>>>> 99: lo-2001: mtu 1500 qdisc noqueue >>>>> master vrf-2001 state UNKNOWN group default qlen 1000 >>>>> link/ether d6:60:da:91:2e:6d brd ff:ff:ff:ff:ff:ff >>>>> inet 10.0.0.86/32 scope global lo-2001 >>>>> valid_lft forever preferred_lft forever >>>>> inet6 fe80::d460:daff:fe91:2e6d/64 scope link >>>>> valid_lft forever preferred_lft forever >>>>> >>>>> >>>>> # on rack-2-host-1 >>>>> when i created vm1 which endup on rack-2-host-1 but it doesn't expose >>>>> the vm ip address. Is that normal behavior? >>>>> >>>>> When I attach a floating ip to vm2 then why does my floating ip >>>>> address not get exposed in BGP? >>>>> >>>>> Thank you in advance >>>>> >>>>> >>>>> On Thu, Aug 4, 2022 at 2:30 PM Satish Patel >>>>> wrote: >>>>> >>>>>> Update: Good news, I found what was wrong. >>>>>> >>>>>> After adding AS it works. In your doc you only added VNI but look >>>>>> like it is required to add BGP AS. Or may be your doc is little older and >>>>>> new code required AS. >>>>>> >>>>>> vagrant at rack-1-host-1:~$ sudo ovn-nbctl set logical_switch_port >>>>>> c32dcd90-7820-44bd-894f-416e44b36aa0 external_ids:"neutron_bgpvpn\:as"=64999 >>>>>> vagrant at rack-1-host-1:~$ sudo ovn-nbctl set logical_switch_port >>>>>> f55c1d1e-4b5b-4d8c-b922-3ad4a9700c81 external_ids:"neutron_bgpvpn\:as"=64999 >>>>>> >>>>>> Now i can see it created VRF and exposed VM tenant ip in lo-2001 >>>>>> >>>>>> 73: vrf-2001: mtu 65575 qdisc noqueue >>>>>> state UP group default qlen 1000 >>>>>> link/ether b2:29:65:3b:ac:db brd ff:ff:ff:ff:ff:ff >>>>>> 74: br-2001: mtu 1500 qdisc noqueue >>>>>> master vrf-2001 state UP group default qlen 1000 >>>>>> link/ether d6:0d:61:2d:b7:29 brd ff:ff:ff:ff:ff:ff >>>>>> inet6 fe80::6830:47ff:febc:b10b/64 scope link >>>>>> valid_lft forever preferred_lft forever >>>>>> 75: vxlan-2001: mtu 1500 qdisc >>>>>> noqueue master br-2001 state UNKNOWN group default qlen 1000 >>>>>> link/ether d6:0d:61:2d:b7:29 brd ff:ff:ff:ff:ff:ff >>>>>> inet6 fe80::d40d:61ff:fe2d:b729/64 scope link >>>>>> valid_lft forever preferred_lft forever >>>>>> 76: lo-2001: mtu 1500 qdisc noqueue >>>>>> master vrf-2001 state UNKNOWN group default qlen 1000 >>>>>> link/ether fe:a1:a0:87:76:7c brd ff:ff:ff:ff:ff:ff >>>>>> inet 10.0.0.83/32 scope global lo-2001 >>>>>> valid_lft forever preferred_lft forever >>>>>> inet6 fe80::fca1:a0ff:fe87:767c/64 scope link >>>>>> >>>>>> >>>>>> I am continuing doing testing and seeing if I hit any other bug.. >>>>>> >>>>>> On Thu, Aug 4, 2022 at 11:58 AM Satish Patel >>>>>> wrote: >>>>>> >>>>>>> Luis, >>>>>>> >>>>>>> I am following your doc, tell me if that doc is outdated or not >>>>>>> https://ltomasbo.wordpress.com/2021/06/25/openstack-networking-with-evpn/ >>>>>>> >>>>>>> I have upgraded pyroute2 to 0.7.2 but still getting the same error, >>>>>>> if you look at carefully following logs you will see an agent saying I >>>>>>> can't find VNI but it's there, i can see in ovn-nbctl list >>>>>>> logical_switch_port. >>>>>>> >>>>>>> pyroute2 0.7.2 >>>>>>> pyroute2.core 0.6.13 >>>>>>> pyroute2.ethtool 0.6.13 >>>>>>> pyroute2.ipdb 0.6.13 >>>>>>> pyroute2.ipset 0.6.13 >>>>>>> pyroute2.ndb 0.6.13 >>>>>>> pyroute2.nftables 0.6.13 >>>>>>> pyroute2.nslink 0.6.13 >>>>>>> >>>>>>> >>>>>>> ovn-bgp-agent logs >>>>>>> >>>>>>> 2022-08-04 15:52:52.898 396714 DEBUG >>>>>>> ovn_bgp_agent.drivers.openstack.utils.ovn [-] Either "neutron_bgpvpn:vni" >>>>>>> or "neutron_bgpvpn:as" were not found or have an invalid value in the port >>>>>>> f55c1d1e-4b5b-4d8c-b922-3ad4a9700c81 external_ids {'neutron:cidrs': ' >>>>>>> 172.16.1.132/24', 'neutron:device_id': >>>>>>> '43b2c756-d92c-4fe5-b17b-32d5ab9b1f37', 'neutron:device_owner': >>>>>>> 'network:router_gateway', 'neutron:network_name': >>>>>>> 'neutron-0d82e6b0-bf9f-484d-85bd-0ba38aab508d', 'neutron:port_name': '', >>>>>>> 'neutron:project_id': '', 'neutron:revision_number': '4', >>>>>>> 'neutron:security_group_ids': '', 'neutron_bgpvpn:vni': '2001'} >>>>>>> get_evpn_info >>>>>>> /usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/drivers/openstack/utils/ovn.py:251 >>>>>>> >>>>>>> 2022-08-04 15:52:52.899 396714 DEBUG >>>>>>> ovn_bgp_agent.drivers.openstack.ovn_evpn_driver [-] No EVPN information for >>>>>>> CR-LRP Port with IPs ['172.16.1.132/24']. Not exposing it. >>>>>>> _expose_ip >>>>>>> /usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/drivers/openstack/ovn_evpn_driver.py:220 >>>>>>> >>>>>>> >>>>>>> In OVN i can see vni number >>>>>>> >>>>>>> _uuid : 3473d0ce-1348-4423-a2dc-6d4df4c06f74 >>>>>>> addresses : [router] >>>>>>> dhcpv4_options : [] >>>>>>> dhcpv6_options : [] >>>>>>> dynamic_addresses : [] >>>>>>> enabled : true >>>>>>> external_ids : {"neutron:cidrs"="172.16.1.132/24", >>>>>>> "neutron:device_id"="43b2c756-d92c-4fe5-b17b-32d5ab9b1f37", >>>>>>> "neutron:device_owner"="network:router_gateway", >>>>>>> "neutron:network_name"=neutron-0d82e6b0-bf9f-484d-85bd-0ba38aab508d, >>>>>>> "neutron:port_name"="", "neutron:project_id"="", >>>>>>> "neutron:revision_number"="4", "neutron:security_group_ids"="", >>>>>>> "neutron_bgpvpn:vni"="2001"} >>>>>>> ha_chassis_group : [] >>>>>>> name : "f55c1d1e-4b5b-4d8c-b922-3ad4a9700c81" >>>>>>> options : {exclude-lb-vips-from-garp="true", >>>>>>> nat-addresses=router, router-port=lrp-f55c1d1e-4b5b-4d8c-b922-3ad4a9700c81} >>>>>>> parent_name : [] >>>>>>> port_security : [] >>>>>>> tag : [] >>>>>>> tag_request : [] >>>>>>> type : router >>>>>>> up : true >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 4, 2022 at 11:39 AM Satish Patel >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Luis, >>>>>>>> >>>>>>>> This is what I have installed on the compute and controller nodes. >>>>>>>> >>>>>>>> pyroute2 0.6.13 >>>>>>>> pyroute2.core 0.6.13 >>>>>>>> pyroute2.ethtool 0.6.13 >>>>>>>> pyroute2.ipdb 0.6.13 >>>>>>>> pyroute2.ipset 0.6.13 >>>>>>>> pyroute2.ndb 0.6.13 >>>>>>>> pyroute2.nftables 0.6.13 >>>>>>>> pyroute2.nslink 0.6.13 >>>>>>>> >>>>>>>> On Wed, Aug 3, 2022 at 10:10 AM Luis Tomas Bolivar < >>>>>>>> ltomasbo at redhat.com> wrote: >>>>>>>> >>>>>>>>> What version of pyroute2 are you using? Maybe it is related to >>>>>>>>> some bug in there in an old version (I hit quite a few). You can try to >>>>>>>>> upgrade that. Also, as it is on the resync, you can also disable that by >>>>>>>>> setting a very long time in there. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wednesday, August 3, 2022, Satish Patel >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Any thoughts Luis. Thanks >>>>>>>>>> >>>>>>>>>> Sent from my iPhone >>>>>>>>>> >>>>>>>>>> On Jul 27, 2022, at 11:53 AM, Satish Patel >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> ? >>>>>>>>>> Luis, >>>>>>>>>> >>>>>>>>>> If you look at this logs - >>>>>>>>>> https://paste.opendev.org/show/buRbY415guvHFUtSapFK/ >>>>>>>>>> >>>>>>>>>> I am able to expose the tenant ip when I set "expose_tenant_networks=True" >>>>>>>>>> in the ovn-bgp-agent.conf file. But interestingly it exposes ip for the >>>>>>>>>> first time but when i delete vm and re-create new vm then its throws >>>>>>>>>> following error which you can see full track in above link >>>>>>>>>> >>>>>>>>>> ERROR oslo_service.periodic_task [-] Error during BGPAgent.sync: KeyError: 'object does not exists' >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I have just 1x controller and 1x compute machine at present. As >>>>>>>>>> you said something is broken when setting up "expose_tenant_networks=True" >>>>>>>>>> Let me know if you need more info or logs etc. i will continue to poke and >>>>>>>>>> see if anything else I can find. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 27, 2022 at 10:35 AM Luis Tomas Bolivar < >>>>>>>>>> ltomasbo at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Satish,sorry for the delay in replies, I'm on vacations and >>>>>>>>>>> have limited connectivity. >>>>>>>>>>> >>>>>>>>>>> See soon comments/replies inline >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 27, 2022 at 4:58 AM Satish Patel < >>>>>>>>>>> satish.txt at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Luis, >>>>>>>>>>>> >>>>>>>>>>>> Just checking incase you missed my last email. If you are on >>>>>>>>>>>> vacation then ignore it :) Thank you. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Jul 23, 2022 at 12:27 AM Satish Patel < >>>>>>>>>>>> satish.txt at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Luis, >>>>>>>>>>>>> >>>>>>>>>>>>> As you suggested, checkout 5 commits back to avoid the >>>>>>>>>>>>> Load_Balancer issue that made good progress. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Great to hear! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> But now i am seeing very odd behavior where floating IP can >>>>>>>>>>>>> get exposed to BGP but when i am trying to expose VM Tenant IP then I get a >>>>>>>>>>>>> strange error. Here is the full logs output; >>>>>>>>>>>>> https://paste.opendev.org/show/buRbY415guvHFUtSapFK/ >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> I was using the evpn mode for the tenant networks, so it has >>>>>>>>>>> been a while since I tested the vm tenant ip with plain bgp. Perhaps we >>>>>>>>>>> broke it with some of the new features/reshapes. >>>>>>>>>>> >>>>>>>>>>> Regarding the error, two things: >>>>>>>>>>> - Do you have several hosts or just one? The VM IP should be >>>>>>>>>>> exposed in the host holding the ovn cr-lrp port (ovn router gateway port >>>>>>>>>>> connecting the router to the provider network). >>>>>>>>>>> - The error there seems to be related to the re-sync task. I've >>>>>>>>>>> hit some issues with pyroute2 in the past, please be sure to use a recent >>>>>>>>>>> enough version (minimum supported is 0.6.4) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jul 22, 2022 at 8:55 AM Satish Patel < >>>>>>>>>>>>> satish.txt at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Luis, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you for reply, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Let me tell you that your blog is wonderful. I have used your >>>>>>>>>>>>>> method to install frr without any docker container. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What is the workaround here? Can I tell ovn-bgp-agent to not >>>>>>>>>>>>>> look for container of frr? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> You just need to point where the frr socket is on the host. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> I have notice one more thing that it?s trying to run ?copy >>>>>>>>>>>>>> /tmp/blah running-config? but that command isn?t supported. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have tried to run copy command manually on vtysh shell to >>>>>>>>>>>>>> see what options are available for copy command but there is only one >>>>>>>>>>>>>> command available with copy which is ?copy running-config startup-config? >>>>>>>>>>>>>> do you think I have wrong version of frr running? I?m running 7.2. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> I think the minimum version I tried was either 7.4 or 7.5. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> Please let me know if any workaround here. Thank you >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> No easy workaround for this, as it is relying on that way to >>>>>>>>>>> reconfigure the frr.conf >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Luis >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Sent from my iPhone >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Jul 22, 2022, at 2:41 AM, Luis Tomas Bolivar < >>>>>>>>>>>>>> ltomasbo at redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ? >>>>>>>>>>>>>> Hi Satish, >>>>>>>>>>>>>> >>>>>>>>>>>>>> The one to use should be https://opendev.org/x/ovn-bgp-agent. >>>>>>>>>>>>>> The one on my personal github repo was the initial PoC for it. But the >>>>>>>>>>>>>> opendev one is the upstream effort to develop it, and is the one being >>>>>>>>>>>>>> maintained/updated. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Looking at your second logs, it seems you are missing FRR >>>>>>>>>>>>>> (and its shell, vtysh) in the node. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Actually, thinking about this: >>>>>>>>>>>>>> "Unexpected error while running command. >>>>>>>>>>>>>> Command: /usr/bin/vtysh --vty_socket /run/frr/ -c copy >>>>>>>>>>>>>> /tmp/tmpiz5s_wvs running-config" >>>>>>>>>>>>>> >>>>>>>>>>>>>> The ovn-bgp-agent has been developed with "deploying on >>>>>>>>>>>>>> containers" in mind, meaning it is assuming there is a frr container >>>>>>>>>>>>>> running, and the container running the agent is trying to connect to the >>>>>>>>>>>>>> same socket so that it can run the vtysh commands. Perhaps in your case the >>>>>>>>>>>>>> frr socket is in a different location than /run/frr/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jul 22, 2022 at 6:27 AM Satish Patel < >>>>>>>>>>>>>> satish.txt at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Folks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am trying to create lab of of ovn-bgp-agent using this >>>>>>>>>>>>>>> blog >>>>>>>>>>>>>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So far everything went well but I'm stuck at the bgp-agent >>>>>>>>>>>>>>> installation and I encounter following error when running bgp-agent. >>>>>>>>>>>>>>> Any suggestions? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> root at rack-1-host-2:/home/vagrant/bgp-agent# bgp-agent >>>>>>>>>>>>>>> 2022-07-22 04:02:39.123 111551 INFO bgp_agent.config [-] >>>>>>>>>>>>>>> Logging enabled! >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 CRITICAL bgp-agent [-] >>>>>>>>>>>>>>> Unhandled error: AssertionError >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent Traceback >>>>>>>>>>>>>>> (most recent call last): >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/bin/bgp-agent", line 10, in >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>>> sys.exit(start()) >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/agent.py", line 76, in >>>>>>>>>>>>>>> start >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>>> bgp_agent_launcher = service.launch(config.CONF, BGPAgent()) >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/agent.py", line 44, in >>>>>>>>>>>>>>> __init__ >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>>> self.agent_driver = driver_api.AgentDriverBase.get_instance( >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/platform/driver_api.py", >>>>>>>>>>>>>>> line 25, in get_instance >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>>> agent_driver = stevedore_driver.DriverManager( >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/stevedore/driver.py", line 54, in >>>>>>>>>>>>>>> __init__ >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>>> super(DriverManager, self).__init__( >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/stevedore/named.py", line 78, in >>>>>>>>>>>>>>> __init__ >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>>> extensions = self._load_plugins(invoke_on_load, >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/stevedore/extension.py", line 221, >>>>>>>>>>>>>>> in _load_plugins >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent ext = >>>>>>>>>>>>>>> self._load_one_plugin(ep, >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/stevedore/named.py", line 156, in >>>>>>>>>>>>>>> _load_one_plugin >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent return >>>>>>>>>>>>>>> super(NamedExtensionManager, self)._load_one_plugin( >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/stevedore/extension.py", line 257, >>>>>>>>>>>>>>> in _load_one_plugin >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent obj = >>>>>>>>>>>>>>> plugin(*invoke_args, **invoke_kwds) >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/platform/osp/ovn_bgp_driver.py", >>>>>>>>>>>>>>> line 64, in __init__ >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>>> self._sb_idl = ovn.OvnSbIdl( >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/platform/osp/utils/ovn.py", >>>>>>>>>>>>>>> line 62, in __init__ >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>>> super(OvnSbIdl, self).__init__( >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/bgp_agent/platform/osp/utils/ovn.py", >>>>>>>>>>>>>>> line 31, in __init__ >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>>> super(OvnIdl, self).__init__(remote, schema) >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 283, in >>>>>>>>>>>>>>> __init__ >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent schema = >>>>>>>>>>>>>>> schema_helper.get_idl_schema() >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 2323, in >>>>>>>>>>>>>>> get_idl_schema >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>>> self._keep_table_columns(schema, table, columns)) >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 2330, in >>>>>>>>>>>>>>> _keep_table_columns >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent assert >>>>>>>>>>>>>>> table_name in schema.tables >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent AssertionError >>>>>>>>>>>>>>> 2022-07-22 04:02:39.475 111551 ERROR bgp-agent >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> After googling I found one more agent at >>>>>>>>>>>>>>> https://opendev.org/x/ovn-bgp-agent and its also throwing >>>>>>>>>>>>>>> an error. Which agent should I be using? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> root at rack-1-host-2:~# ovn-bgp-agent >>>>>>>>>>>>>>> 2022-07-22 04:04:36.780 111761 INFO ovn_bgp_agent.config [-] >>>>>>>>>>>>>>> Logging enabled! >>>>>>>>>>>>>>> 2022-07-22 04:04:37.247 111761 INFO ovn_bgp_agent.agent [-] >>>>>>>>>>>>>>> Service 'BGPAgent' stopped >>>>>>>>>>>>>>> 2022-07-22 04:04:37.248 111761 INFO ovn_bgp_agent.agent [-] >>>>>>>>>>>>>>> Service 'BGPAgent' starting >>>>>>>>>>>>>>> 2022-07-22 04:04:37.248 111761 INFO >>>>>>>>>>>>>>> ovn_bgp_agent.drivers.openstack.utils.frr [-] Add VRF leak for VRF >>>>>>>>>>>>>>> ovn-bgp-vrf on router bgp 64999 >>>>>>>>>>>>>>> 2022-07-22 04:04:37.248 111761 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>>> Running privsep helper: ['sudo', 'privsep-helper', '--privsep_context', >>>>>>>>>>>>>>> 'ovn_bgp_agent.privileged.vtysh_cmd', '--privsep_sock_path', >>>>>>>>>>>>>>> '/tmp/tmp4cie9eiz/privsep.sock'] >>>>>>>>>>>>>>> 2022-07-22 04:04:37.687 111761 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>>> Spawned new privsep daemon via rootwrap >>>>>>>>>>>>>>> 2022-07-22 04:04:37.598 111769 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>>> privsep daemon starting >>>>>>>>>>>>>>> 2022-07-22 04:04:37.613 111769 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>>> privsep process running with uid/gid: 0/0 >>>>>>>>>>>>>>> 2022-07-22 04:04:37.617 111769 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>>> privsep process running with capabilities (eff/prm/inh): >>>>>>>>>>>>>>> CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_NET_ADMIN|CAP_SYS_ADMIN/none >>>>>>>>>>>>>>> 2022-07-22 04:04:37.617 111769 INFO oslo.privsep.daemon [-] >>>>>>>>>>>>>>> privsep daemon running as pid 111769 >>>>>>>>>>>>>>> 2022-07-22 04:04:37.987 111769 ERROR >>>>>>>>>>>>>>> ovn_bgp_agent.privileged.vtysh [-] Unable to execute vtysh with >>>>>>>>>>>>>>> ['/usr/bin/vtysh', '--vty_socket', '/run/frr/', '-c', 'copy >>>>>>>>>>>>>>> /tmp/tmpiz5s_wvs running-config']. Exception: Unexpected error while >>>>>>>>>>>>>>> running command. >>>>>>>>>>>>>>> Command: /usr/bin/vtysh --vty_socket /run/frr/ -c copy >>>>>>>>>>>>>>> /tmp/tmpiz5s_wvs running-config >>>>>>>>>>>>>>> Exit code: 1 >>>>>>>>>>>>>>> Stdout: '% Unknown command: copy /tmp/tmpiz5s_wvs >>>>>>>>>>>>>>> running-config\n' >>>>>>>>>>>>>>> Stderr: '' >>>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/privileged/vtysh.py", >>>>>>>>>>>>>>> line 30, in run_vtysh_config >>>>>>>>>>>>>>> return processutils.execute(*full_args) >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/processutils.py", >>>>>>>>>>>>>>> line 438, in execute >>>>>>>>>>>>>>> raise ProcessExecutionError(exit_code=_returncode, >>>>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>>>>>>>>>>>> Unexpected error while running command. >>>>>>>>>>>>>>> Command: /usr/bin/vtysh --vty_socket /run/frr/ -c copy >>>>>>>>>>>>>>> /tmp/tmpiz5s_wvs running-config >>>>>>>>>>>>>>> Exit code: 1 >>>>>>>>>>>>>>> Stdout: '% Unknown command: copy /tmp/tmpiz5s_wvs >>>>>>>>>>>>>>> running-config\n' >>>>>>>>>>>>>>> Stderr: '' >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> [-] Error starting thread.: >>>>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>>>>>>>>>> running command. >>>>>>>>>>>>>>> Command: /usr/bin/vtysh --vty_socket /run/frr/ -c copy >>>>>>>>>>>>>>> /tmp/tmpiz5s_wvs running-config >>>>>>>>>>>>>>> Exit code: 1 >>>>>>>>>>>>>>> Stdout: '% Unknown command: copy /tmp/tmpiz5s_wvs >>>>>>>>>>>>>>> running-config\n' >>>>>>>>>>>>>>> Stderr: '' >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> File "/usr/local/lib/python3.8/dist-packages/oslo_service/service.py", line >>>>>>>>>>>>>>> 806, in run_service >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> service.start() >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> File "/usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/agent.py", line >>>>>>>>>>>>>>> 50, in start >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> self.agent_driver.start() >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/drivers/openstack/ovn_bgp_driver.py", >>>>>>>>>>>>>>> line 73, in start >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> frr.vrf_leak(constants.OVN_BGP_VRF, CONF.bgp_AS, CONF.bgp_router_id) >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/drivers/openstack/utils/frr.py", >>>>>>>>>>>>>>> line 110, in vrf_leak >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> _run_vtysh_config_with_tempfile(vrf_config) >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/local/lib/python3.8/dist-packages/ovn_bgp_agent/drivers/openstack/utils/frr.py", >>>>>>>>>>>>>>> line 93, in _run_vtysh_config_with_tempfile >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> ovn_bgp_agent.privileged.vtysh.run_vtysh_config(f.name) >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> File "/usr/local/lib/python3.8/dist-packages/oslo_privsep/priv_context.py", >>>>>>>>>>>>>>> line 271, in _wrap >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> return self.channel.remote_call(name, args, kwargs, >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> File "/usr/local/lib/python3.8/dist-packages/oslo_privsep/daemon.py", line >>>>>>>>>>>>>>> 215, in remote_call >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> raise exc_type(*result[2]) >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>>>>>>>>>> running command. >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> Command: /usr/bin/vtysh --vty_socket /run/frr/ -c copy /tmp/tmpiz5s_wvs >>>>>>>>>>>>>>> running-config >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> Exit code: 1 >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> Stdout: '% Unknown command: copy /tmp/tmpiz5s_wvs running-config\n' >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> Stderr: '' >>>>>>>>>>>>>>> 2022-07-22 04:04:37.990 111761 ERROR oslo_service.service >>>>>>>>>>>>>>> 2022-07-22 04:04:37.993 111761 INFO ovn_bgp_agent.agent [-] >>>>>>>>>>>>>>> Service 'BGPAgent' stopping >>>>>>>>>>>>>>> 2022-07-22 04:04:37.994 111761 INFO ovn_bgp_agent.agent [-] >>>>>>>>>>>>>>> Service 'BGPAgent' stopped >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> LUIS TOM?S BOL?VAR >>>>>>>>>>>>>> Principal Software Engineer >>>>>>>>>>>>>> Red Hat >>>>>>>>>>>>>> Madrid, Spain >>>>>>>>>>>>>> ltomasbo at redhat.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> LUIS TOM?S BOL?VAR >>>>>>>>>>> Principal Software Engineer >>>>>>>>>>> Red Hat >>>>>>>>>>> Madrid, Spain >>>>>>>>>>> ltomasbo at redhat.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> LUIS TOM?S BOL?VAR >>>>>>>>> Principal Software Engineer >>>>>>>>> Red Hat >>>>>>>>> Madrid, Spain >>>>>>>>> ltomasbo at redhat.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>> >>>> -- >>>> LUIS TOM?S BOL?VAR >>>> Principal Software Engineer >>>> Red Hat >>>> Madrid, Spain >>>> ltomasbo at redhat.com >>>> >>>> >>> >> >> -- >> LUIS TOM?S BOL?VAR >> Principal Software Engineer >> Red Hat >> Madrid, Spain >> ltomasbo at redhat.com >> >> > -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rishat.azizov at gmail.com Wed Aug 24 11:55:25 2022 From: rishat.azizov at gmail.com (=?UTF-8?B?0KDQuNGI0LDRgiDQkNC30LjQt9C+0LI=?=) Date: Wed, 24 Aug 2022 17:55:25 +0600 Subject: [octavia] Help with fix barbican client in octavia when use trust-scoped token In-Reply-To: References: Message-ID: Hello! I added my logs to storyboard: https://storyboard.openstack.org/#!/story/2007619 Thanks. ??, 22 ???. 2022 ?. ? 02:17, Michael Johnson : > Hi, > > Can you please attach your traceback to the story? > > Michael > > On Sat, Aug 20, 2022 at 12:24 PM Dmitriy Rabotyagov > wrote: > > > > Hey, > > > > It's not barbican client and issue, but how Octavia does create token > out of application credentials. > > > > We've also catched that issue and tried to solve it from keystone side > [1], but seems that code refactoring is required. > > While proposed workaround for keystone kind of works, I guess it might > cause more serious security concerns, as basically creating token from > application credentials token seems to be never supported by keystone. > > > > [1] https://bugs.launchpad.net/keystone/+bug/1959674 > > > > > > ??, 20 ???. 2022 ?., 20:29 ????? ?????? : > >> > >> Hello! > >> > >> I have error with terminated https loadbalancer, it described here: > https://storyboard.openstack.org/#!/story/2007619 > >> > >> Could you please help with fix for barbican client? > >> Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Wed Aug 24 13:48:10 2022 From: ralonsoh at redhat.com (ralonsoh at redhat.com) Date: Wed, 24 Aug 2022 22:48:10 +0900 Subject: No subject Message-ID: The original message was received at Wed, 24 Aug 2022 22:48:10 +0900 from redhat.com [7.150.121.148] ----- The following addresses had permanent fatal errors ----- -------------- next part -------------- A non-text attachment was scrubbed... Name: text.zip Type: application/octet-stream Size: 22952 bytes Desc: not available URL: From miguel at mlavalle.com Wed Aug 24 18:06:13 2022 From: miguel at mlavalle.com (Miguel Lavalle) Date: Wed, 24 Aug 2022 13:06:13 -0500 Subject: [neutron][ovn] Gauging community interest in supporting a masquerading forwarding DNS resolver for isolated tenant networks Message-ID: Hi, As described in this Launchpad bug https://bugs.launchpad.net/neutron/+bug/1902950, in contrast with ML2 / OVS, with ML2 / OVN there is not a masquerading forwarding DNS resolver available to instances on isolated tenant networks. The purpose of this message is to gather feedback from the community as to how much interest there is in providing this functionality under ML2 / OVN. If your organization has interest, please respond to this thread. Even better, register your interest on the topic in the etherpad for the upcoming Antelope PTG (https://etherpad.opendev.org/p/neutron-antelope-ptg) under the "Check of interest on closing ML2/OVN DNS gaps" topic. And of course, attend the conversation in October Best regards Miguel Lavalle -------------- next part -------------- An HTML attachment was scrubbed... URL: From allison at openinfra.dev Wed Aug 24 18:47:32 2022 From: allison at openinfra.dev (Allison Price) Date: Wed, 24 Aug 2022 13:47:32 -0500 Subject: OpenInfra Live - August 25 at 1400 UTC Message-ID: Hi everyone, This week?s OpenInfra Live episode is brought to you by members of the OpenStack community. Episode: Making VDI a first-class citizen in the OpenStack world Learn the importance of having a Virtual Desktop Infrastructure (VDI) with OpenStack. Get to know the VDI use cases. Discover which relevant features are already integrated and which are still missing. You?ll also have the opportunity to meet Bumblebee and enjoy the cute flying insect during a demo! Date and time: Thursday, August 25 at 1400 UTC (9am CT) You can watch us live on: YouTube: https://www.youtube.com/watch?v=juczRNlfg6c LinkedIn: https://www.linkedin.com/video/event/urn:li:ugcPost:6966050226767884289/ Facebook: https://www.facebook.com/104139126308032/posts/5359387687449790/ WeChat: recording will be posted on OpenStack WeChat after the live stream Speakers: Manuel Bentele Andy Botting Rados?aw Piliszek Have an idea for a future episode? Share it now at ideas.openinfra.live . See you there! Allison -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Wed Aug 24 19:56:06 2022 From: johfulto at redhat.com (John Fulton) Date: Wed, 24 Aug 2022 15:56:06 -0400 Subject: [TripleO] Derived Parameters going away Message-ID: It seems this feature was not widely adopted so the current maintainers would like to remove it. We have a proposed patch [1] to remove it from the main branch so it won't be in the TripleO release which accompanies Zed. It's presently in the release which accompanies Wallaby. Note that TripleO is using the independent release model [2]. What is the "Derived Parameters" feature? NFV and HCI overclouds require system tuning. There are formulas which output tuning parameters based on hardware. These parameters may then be provided as input to TripleO to deploy a tuned overcloud. There is an additional method of system tuning called ?derived parameters? where TripleO uses inspected hardware data as input and automatically sets tuning parameters [3]. Removing this feature shouldn't block anyone since they can still deploy a tuned overcloud by providing standard parameters. There's also a docs patch [4] explaining how to migrate away from derived to standard parameters. John [1] https://review.opendev.org/q/topic:derive-parameters-cleanup [2] https://review.opendev.org/c/openstack/tripleo-specs/+/801512 [3] https://specs.openstack.org/openstack/tripleo-specs/specs/pike/tripleo-derive-parameters.html [4] https://review.opendev.org/c/openstack/tripleo-docs/+/854443 From vincet at iastate.edu Wed Aug 24 19:45:06 2022 From: vincet at iastate.edu (Lee, Vincent Z) Date: Wed, 24 Aug 2022 19:45:06 +0000 Subject: container and instances are not working on my dashboard Message-ID: Hi all, I am quite new to Openstack and faced some issues with creating instances and displaying created containers on my Openstack dashboard. I am currently working on multinode Openstack. I am hoping to get some helps and feedbacks about this. I will briefly go through the problems I encountered. When I created an instances on my dashboard, it directly went into error state. I have attached a screenshot as shown below. [cid:c1615736-e4b3-4100-99c0-dfb970ff365f] This is my list of hosts. [cid:8d2f9138-27fd-4cf5-95ca-0cd9038f723c] When I created a container using both cli and dashboard, two containers were running. However, when I tried to look into my dashboard, those containers were not shown. These are the created containers. [cid:61d517f2-c60a-4892-bf88-1a9e0d763b7f] However, when i try to open my dashboard and look for them, they just don't appear on it. So I am not sure what is causing this. [cid:fb9d0f7e-1afa-48c1-bc92-9c8054382b55] Hope to hear from everyone soon. Best regards, Vincent -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 93342 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 90734 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 262295 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 35234 bytes Desc: image.png URL: From amy at demarco.com Wed Aug 24 21:53:04 2022 From: amy at demarco.com (Amy Marrich) Date: Wed, 24 Aug 2022 16:53:04 -0500 Subject: [all][elections][ptl][tc] Combined PTL/TC Antelope cycle Election Nominations Kickoff Message-ID: Apologies all for the lack of notice to the start of the PTL/TC election season. Unfortunately we are starting late and we are attempting to get back on schedule. We will make sure we send out a reminder notice (or two) for nominations closing. Remember we can't do this without your nominations and folks being willing to lead the projects and the community ======================================================================== Nominations for OpenStack PTLs (Project Team Leads) and TC (Technical Committee) positions (4 positions) are now open and will remain open until Aug 31, 2022 23:45 UTC. All nominations must be submitted as a text file to the openstack/election repository as explained at https://governance.openstack.org/election/#how-to-submit-a-candidacy Please make sure to follow the candidacy file naming convention: candidates/antelope// (for example, "candidates/antelope/TC/stacker at example.org"). The name of the file should match an email address for your current OpenStack Foundation Individual Membership. Take this opportunity to ensure that your OSF member profile contains current information: https://www.openstack.org/profile/ Any OpenStack Foundation Individual Member can propose their candidacy for an available, directly-elected seat on the Technical Committee. In order to be an eligible candidate for PTL you must be an OpenStack Foundation Individual Member. PTL candidates must also have contributed to the corresponding team during the Zed to Antelope timeframe, Sep 17, 2021 00:00 UTC - Aug 31, 2022 00:00 UTC. Your Gerrit account must also have a verified email address matching the one used in your candidacy filename. Both PTL and TC elections will be held from Sep 07, 2022 23:45 UTC through to Sep 14, 2022 23:45 UTC. The electorate for the TC election are the OpenStack Foundation Individual Members who have a code contribution to one of the official teams over the Zed to Antelope timeframe, Sep 17, 2021 00:00 UTC - Aug 31, 2022 00:00 UTC, as well as any Extra ATCs who are acknowledged by the TC. The electorate for a PTL election are the OpenStack Foundation Individual Members who have a code contribution over the Zed to Antelope timeframe, Sep 17, 2021 00:00 UTC - Aug 31, 2022 00:00 UTC, in a deliverable repository maintained by the team which the PTL would lead, as well as the Extra ATCs who are acknowledged by the TC for that specific team. The list of project teams can be found at https://governance.openstack.org/tc/reference/projects/ and their individual team pages include lists of corresponding Extra ATCs. Please find below the timeline: PTL + TC nomination starts @ Aug 24, 2022 23:45 UTC PTL + TC nomination ends @ Aug 31, 2022 23:45 UTC TC campaigning starts @ Aug 31, 2022 23:45 UTC TC campaigning ends @ Sep 07, 2022 23:45 UTC PTL + TC elections start @ Sep 07, 2022 23:45 UTC PTL + TC elections end @ Sep 14, 2022 23:45 UTC Shortly after election officials approve candidates, they will be listed on the https://governance.openstack.org/election/ page. The electorate is requested to confirm their email addresses in Gerrit prior to 2022-08-31 00:00:00+00:00, so that the emailed ballots are sent to the correct email address. This email address should match one which was provided in your foundation member profile as well. Gerrit account information and OSF member profiles can be updated at https://review.openstack.org/#/settings/contact and https://www.openstack.org/profile/ accordingly. If you have any questions please be sure to either ask them on the mailing list or to the elections officials: https://governance.openstack.org/election/#election-officials -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Aug 25 05:22:31 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 25 Aug 2022 10:52:31 +0530 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Aug 25 at 1500 UTC Message-ID: <182d373b29e.e8c6771e91788.1700566740826123822@ghanshyammann.com> Hello Everyone, Below is the agenda for Tomorrow's TC meeting schedule at 1500 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting * Roll call * Follow up on past action items * Gate health check ** Bare 'recheck' state *** https://etherpad.opendev.org/p/recheck-weekly-summary * 2023.1 cycle PTG Planning ** Schedule 'operator hours' as a separate slot in PTG(avoiding conflicts among other projects 'operator hours') * 2023.1 cycle Technical Election * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open -gmann From swogatpradhan22 at gmail.com Thu Aug 25 05:35:35 2022 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 25 Aug 2022 11:05:35 +0530 Subject: DCN and DVR in Openstack wallaby | tripleo Message-ID: Hi, I have a site1 and a site2. Site1 contains the controller nodes with some compute and storage and in site2 i have compute nodes maybe hci solution for it. site1 has a different network pool and site2 has a different network pool and these cannot be extended to one another. What I want to know is how I am going to configure network to be used by the compute nodes in site2. If i use DVR then i can just create a normal external network which will only be used for site2 and then the traffic doesn't have to go through the network node. Is it the right solution? And is DVR enabled by default or do I have to use the neutron-ovn-dvr.yaml template? And if i use DVR do i have to set the following parameter?: ComputeParameters: NeutronBridgeMappings: "" Is DCN with DVR the correct approach? With regards, Swogat Pradhan -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Thu Aug 25 07:30:30 2022 From: akekane at redhat.com (Abhishek Kekane) Date: Thu, 25 Aug 2022 13:00:30 +0530 Subject: [Glance] PTL non-candidacy Message-ID: Hi all, I'm writing this email to let you know that I'm not going to run for Glance PTL for the Antelope dev cycle, I think it's time for some new ideas and new approaches so it's a good idea to hand over the hat of PTL to a new member of the team. I've been serving Glance PTL since Ussuri, tried my best to keep Glance stable, lots of changes have been made since then, my initial focus was to improve glance-tempest coverage which we managed to do in past cycles. I would like to thank all the members of the Glance team and others for supporting me during this period. My plan is to stick around, attend to my duties as a glance core contributor, and support my successor in whatever way I can to make for a smooth transition. Thank you once again! Cheers, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Thu Aug 25 07:39:20 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 25 Aug 2022 09:39:20 +0200 Subject: [Neutron] How to add Fake ML2 extension to Neutron? In-Reply-To: <4531261661388533@vla5-81f3f2eec11f.qloud-c.yandex.net> References: <5523221661172836@myt6-bbc622793f1b.qloud-c.yandex.net> <2183551661263463@myt5-b646bde4b8f3.qloud-c.yandex.net> <4531261661388533@vla5-81f3f2eec11f.qloud-c.yandex.net> Message-ID: Hi, 1.) The migration files are responsible to create the schema during deployment, and there is a helper utility for it neutron-db-manage (see [1], actually similar tools exists for all openstack projects at least I know many) With this tool you can generate an empty migration template (see the neutron-db-manage revision command for help) The Neutron migration scripts can be found here: https://opendev.org/openstack/neutron/src/branch/master/neutron/db/migration/alembic_migrations/versions Of course there fill be a similar tree for all Networking projects. (NOTE: Since perhaps Ocata we have only expand scripts to make upgrade easier for example) Devstack and other deployment tools upgrade the db, but with neutron-db-manage you can do it manually with neutron-db-manage upgrade heads, you can check in the db after it if the schema is as you expected. 2.) For Neutron the schema again written under models, here: https://opendev.org/openstack/neutron/src/branch/master/neutron/db/models, it looks like a duplication as it is nearly the same as the migration script, but this code will be used by Neutron itself not only during the deployment or upgrade. For some stadium projects it is possible as I remember the *_db file contains the schema description and the code that actually uses the db to fetch or store values in it, like bgpvpn (See [2]). In Neutron (and in many other Openstack projects, if not all) we have another layer over the the which is the OVO (Oslo Versioned Objects), and that is used in most places and that hides the actual accessing of the db with high level python classes (see: https://opendev.org/openstack/neutron/src/branch/master/neutron/objects ) [1]: https://docs.openstack.org/neutron/latest/contributor/alembic_migrations.html [2]: https://opendev.org/openstack/networking-bgpvpn/src/branch/master/networking_bgpvpn/neutron/db/bgpvpn_db.py Igor Zhukov ezt ?rta (id?pont: 2022. aug. 25., Cs, 2:48): > Hi Lajos. > Thank you. > I have a progress. I think my fake extension works. > > I added > ``` > extensions.register_custom_supported_check( > "vpc_extension", lambda: True, plugin_agnostic=False > ) > ``` > to > ``` > class Vpc(api_extensions.ExtensionDescriptor): > extensions.register_custom_supported_check( > "vpc_extension", lambda: True, plugin_agnostic=False > ) > ... > ``` > > and I use ml2 extension driver without any new plugin. > https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L44 > > I tested it with python neutronclientapi. So I can change my new attribute > (neutron.update_network(id, {'network': {'new_attribute': some string }})) > and I see my changes (neutron.list_networks(name='demo-net')) > > I'm close to the end. > Now I'm using modifed `TestExtensionDriver(TestExtensionDriverBase):`. It > works but It stores the data locally. > And I want to use class TestDBExtensionDriver(TestExtensionDriverBase): ( > https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L169 > ) > > I tried to use it but I got such errors in neutron-server.log: "Table > 'neutron.myextension.networkextensions' doesn't exist" > How can I create a new table? > I saw > https://docs.openstack.org/neutron/latest/contributor/alembic_migrations.html > and > https://github.com/openstack/neutron-vpnaas/tree/master/neutron_vpnaas/db > but I still don't understand. > I mean I think some of the neutron_vpnaas/db files are generated. Are > neutron_vpnaas/db/migration/alembic_migrations/versions generated? > Which files I should create(their names, I think I can copy from > neutron_vpnaas/db/) and what commands to type to create one new table: > https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L136-L144 > ? > > > > Hi Igor,The line which is interesting for you: "Extension vpc_extension > not supported by any of loaded plugins" > > In core Neutron for ml2 there is a list of supported extension aliases: > > > https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L200-L239 > > > > And there is a similar for l3 also: > > > https://opendev.org/openstack/neutron/src/branch/master/neutron/services/l3_router/l3_router_plugin.py#L98-L110 > > > > Or similarly for QoS: > > > https://opendev.org/openstack/neutron/src/branch/master/neutron/services/qos/qos_plugin.py#L76-L90 > > > > So you need a plugin that uses the extension. > > > > Good luck :-) > > Lajos Katona (lajoskatona) > > > > Igor Zhukov ezt ?rta (id?pont: 2022. aug. 23., K, > 16:04): > > > >> Hi again! > >> > >> Do you know how to debug ML2 extension drivers? > >> > >> I created folder with two python files: vpc/extensions/vpc.py and > vpc/plugins/ml2/drivers/vpc.py (also empty __init__.py files) > >> > >> I added to neuron.conf > >> > >> api_extensions_path = /path/to/vpc/extensions > >> > >> and I added to ml2_ini.conf > >> > >> extension_drivers = port_security, > vpc.plugins.ml2.drivers.vpc:VpcExtensionDriver > >> > >> and my neutron.server.log has: > >> > >> INFO neutron.plugins.ml2.managers [-] Configured extension driver > names: ['port_security', > 'vpc_neutron.plugins.ml2.drivers.vpc:VpcExtensionDriver'] > >> > >> WARNING stevedore.named [-] Could not load > vpc_neutron.plugins.ml2.drivers.vpc:VpcExtensionDriver > >> > >> .... > >> > >> INFO neutron.api.extensions [req-fd226631-b0cd-4ff8-956b-9470e7f26ebe - > - - - -] Extension vpc_extension not supported by any of loaded plugins > >> > >> How can I find why the extension driver could not be loaded? > >> > >>> Hi,The fake_extension is used only in unit tests to test the extension > framework, i.e. : > >> > >>> > https://opendev.org/openstack/neutron/src/branch/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L37 > >> > >>> > >> > >>> If you would like to write an API extension check > neutron-lib/api/definitions/ (and you can find the extensions "counterpart" > under neutron/extensions in neutron repository) > >> > >>> > >> > >>> You can also check other Networking projects like networking-bgvpn, > neutron-dynamic-routing to have examples of API extensions. > >> > >>> If you have an extension under neutron/extensions and there's somebody > who uses it (see [1]) you will see it is loaded in neutron servers logs > (something like this: "Loaded extension: address-group") and you can find > it in the output of openstack extension list --network > >> > >>> > >> > >>> [1]: > https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L200 > >> > >>> > >> > >>> Best wishes > >> > >>> Lajos Katona > >> > >>> > >> > >>> Igor Zhukov ezt ?rta (id?pont: 2022. aug. 22., H, > 19:41): > >> > >>> > >> > >>>> Hi all! > >> > >>>> > >> > >>>> Sorry for a complete noob question but I can't figure it out ? > >> > >>>> > >> > >>>> So if I want to add Fake ML2 extension what should I do? > >> > >>>> > >> > >>>> I have neutron server installed and I have the file: > https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/extensions/fake_extension.py > >> > >>>> > >> > >>>> How to configure neutron server, where should I put the file, should > I create another files? How can I test that it works? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From park0kyung0won at dgist.ac.kr Thu Aug 25 07:58:27 2022 From: park0kyung0won at dgist.ac.kr (=?UTF-8?B?67CV6rK97JuQ?=) Date: Thu, 25 Aug 2022 16:58:27 +0900 (KST) Subject: Questions about High Availability setup Message-ID: <1488278267.675577.1661414307257.JavaMail.root@mailwas2> An HTML attachment was scrubbed... URL: From ltomasbo at redhat.com Thu Aug 25 08:13:52 2022 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Thu, 25 Aug 2022 10:13:52 +0200 Subject: [ovn-bgp-agent][neutron] - expose_tenant_networks bug In-Reply-To: References: Message-ID: I tested it locally and it is exposing the IP properly in the node where the ovn router gateway port is allocated. Could you double check if that is the case in your setup too? On Wed, Aug 24, 2022 at 8:58 AM Luis Tomas Bolivar wrote: > > > On Tue, Aug 23, 2022 at 6:04 PM Satish Patel wrote: > >> Folks, >> >> I am setting up ovn-bgp-agent lab in "BGP mode" and i found everything >> working great except expose tenant network >> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ >> >> Lab Summary: >> >> 1 controller node >> 3 compute node >> >> ovn-bgp-agent running on all compute node because i am using >> "enable_distributed_floating_ip=True" >> > >> ovn-bgp-agent config: >> >> [DEFAULT] >> debug=False >> expose_tenant_networks=True >> driver=ovn_bgp_driver >> reconcile_interval=120 >> ovsdb_connection=unix:/var/run/openvswitch/db.sock >> >> I am not seeing my vm on tenant ip getting exposed but when i attach FIP >> which gets exposed in loopback address. here is the full trace of debug >> logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/ >> > > It is not exposed in any node, right? Note when expose_tenant_network is > enabled, the traffic to the tenant VM is exposed in the node holding the > cr-lrp (ovn router gateway port) for the router connecting the tenant > network to the provider one. > > The FIP will be exposed in the node where the VM is. > > On the other hand, the error you see there should not happen, so I'll > investigate why that is and also double check if the expose_tenant_network > flag is broken somehow. > > Thanks! > > > -- > LUIS TOM?S BOL?VAR > Principal Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Thu Aug 25 09:31:42 2022 From: satish.txt at gmail.com (Satish Patel) Date: Thu, 25 Aug 2022 05:31:42 -0400 Subject: [ovn-bgp-agent][neutron] - expose_tenant_networks bug In-Reply-To: References: Message-ID: <693D46D4-3DD7-4B93-BC90-571FEC2B6F4C@gmail.com> Hi Luis, Very interesting, you are saying it will only expose tenant ip on gateway port node? Even we have DVR setup in cluster correct? Does gateway node going to expose ip for all other compute nodes? What if I have multiple gateway node? Did you configure that flag on all node or just gateway node? Sent from my iPhone > On Aug 25, 2022, at 4:14 AM, Luis Tomas Bolivar wrote: > > ? > I tested it locally and it is exposing the IP properly in the node where the ovn router gateway port is allocated. Could you double check if that is the case in your setup too? > >> On Wed, Aug 24, 2022 at 8:58 AM Luis Tomas Bolivar wrote: >> >> >>> On Tue, Aug 23, 2022 at 6:04 PM Satish Patel wrote: >>> Folks, >>> >>> I am setting up ovn-bgp-agent lab in "BGP mode" and i found everything working great except expose tenant network https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ >>> >>> Lab Summary: >>> >>> 1 controller node >>> 3 compute node >>> >>> ovn-bgp-agent running on all compute node because i am using "enable_distributed_floating_ip=True" >>> >>> ovn-bgp-agent config: >>> >>> [DEFAULT] >>> debug=False >>> expose_tenant_networks=True >>> driver=ovn_bgp_driver >>> reconcile_interval=120 >>> ovsdb_connection=unix:/var/run/openvswitch/db.sock >>> >>> I am not seeing my vm on tenant ip getting exposed but when i attach FIP which gets exposed in loopback address. here is the full trace of debug logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/ >> >> It is not exposed in any node, right? Note when expose_tenant_network is enabled, the traffic to the tenant VM is exposed in the node holding the cr-lrp (ovn router gateway port) for the router connecting the tenant network to the provider one. >> >> The FIP will be exposed in the node where the VM is. >> >> On the other hand, the error you see there should not happen, so I'll investigate why that is and also double check if the expose_tenant_network flag is broken somehow. >> >> Thanks! >> >> >> -- >> LUIS TOM?S BOL?VAR >> Principal Software Engineer >> Red Hat >> Madrid, Spain >> ltomasbo at redhat.com >> > > > -- > LUIS TOM?S BOL?VAR > Principal Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Thu Aug 25 10:04:43 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 25 Aug 2022 12:04:43 +0200 Subject: [neutron][elections] PTL non-candidacy for Antelope cycle Message-ID: Hi, It was a great pleasure and honor to be Neutron PTL for 2 cycles, thanks everybody for the help and support during this time (actually not just for these cycles but for all). It's not just smoke and ruins around networking after my PTLship, so after all I would say it was a success :-) My main focus was to keep the Neutron team as encouraging and inclusive as possible and work on cooperation with new contributor groups and even with other projects of whom we are consuming in Openstack. It is time to change and allow new ideas and energies to form Networking. I remain and I hope I can help the community in the next cycles also. Cheers Lajos Katona -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltomasbo at redhat.com Thu Aug 25 10:25:26 2022 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Thu, 25 Aug 2022 12:25:26 +0200 Subject: [ovn-bgp-agent][neutron] - expose_tenant_networks bug In-Reply-To: <693D46D4-3DD7-4B93-BC90-571FEC2B6F4C@gmail.com> References: <693D46D4-3DD7-4B93-BC90-571FEC2B6F4C@gmail.com> Message-ID: On Thu, Aug 25, 2022 at 11:31 AM Satish Patel wrote: > Hi Luis, > > Very interesting, you are saying it will only expose tenant ip on gateway > port node? Even we have DVR setup in cluster correct? > Almost. The path is the same as in a DVR setup without BGP (with the difference you can reach the internal IP). In a DVR setup, when the VM is in a tenant network, without a FIP, the traffic goes out through the cr-lrp (ovn router gateway port), i.e., the node hosting that port which is connecting the router where the subnet where the VM is to the provider network. Note this is a limitation due to how ovn is used in openstack neutron, where traffic needs to be injected into OVN overlay in the node holding the cr-lrp. We are investigating possible ways to overcome this limitation and expose the IP right away in the node hosting the VM. > Does gateway node going to expose ip for all other compute nodes? > > What if I have multiple gateway node? > No, each router connected to the provider network will have its own ovn router gateway port, and that can be allocated in any node which has "enable-chassis-as-gw". What is true is that all VMs in a tenant networks connected to the same router, will be exposed in the same location . > Did you configure that flag on all node or just gateway node? > I usually deploy with 3 controllers which are also my "networker" nodes, so those are the ones having the enable-chassis-as-gw flag. > > Sent from my iPhone > > On Aug 25, 2022, at 4:14 AM, Luis Tomas Bolivar > wrote: > > ? > I tested it locally and it is exposing the IP properly in the node where > the ovn router gateway port is allocated. Could you double check if that is > the case in your setup too? > > On Wed, Aug 24, 2022 at 8:58 AM Luis Tomas Bolivar > wrote: > >> >> >> On Tue, Aug 23, 2022 at 6:04 PM Satish Patel >> wrote: >> >>> Folks, >>> >>> I am setting up ovn-bgp-agent lab in "BGP mode" and i found everything >>> working great except expose tenant network >>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ >>> >>> Lab Summary: >>> >>> 1 controller node >>> 3 compute node >>> >>> ovn-bgp-agent running on all compute node because i am using >>> "enable_distributed_floating_ip=True" >>> >> >>> ovn-bgp-agent config: >>> >>> [DEFAULT] >>> debug=False >>> expose_tenant_networks=True >>> driver=ovn_bgp_driver >>> reconcile_interval=120 >>> ovsdb_connection=unix:/var/run/openvswitch/db.sock >>> >>> I am not seeing my vm on tenant ip getting exposed but when i attach FIP >>> which gets exposed in loopback address. here is the full trace of debug >>> logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/ >>> >> >> It is not exposed in any node, right? Note when expose_tenant_network is >> enabled, the traffic to the tenant VM is exposed in the node holding the >> cr-lrp (ovn router gateway port) for the router connecting the tenant >> network to the provider one. >> >> The FIP will be exposed in the node where the VM is. >> >> On the other hand, the error you see there should not happen, so I'll >> investigate why that is and also double check if the expose_tenant_network >> flag is broken somehow. >> > >> Thanks! >> >> >> -- >> LUIS TOM?S BOL?VAR >> Principal Software Engineer >> Red Hat >> Madrid, Spain >> ltomasbo at redhat.com >> >> > > > -- > LUIS TOM?S BOL?VAR > Principal Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > > -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Aug 25 11:25:44 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 25 Aug 2022 13:25:44 +0200 Subject: [neutron][elections] PTL non-candidacy for Antelope cycle In-Reply-To: References: Message-ID: <3696835.nRp3kK83m1@p1> Hi, Thx a lot Lajos for all Your hard work as PTL :) You are great leader of this project. Dnia czwartek, 25 sierpnia 2022 12:04:43 CEST Lajos Katona pisze: > Hi, > It was a great pleasure and honor to be Neutron PTL for 2 cycles, > thanks everybody for the help and support during this time > (actually not just for these cycles but for all). > > It's not just smoke and ruins around networking after my PTLship, > so after all I would say it was a success :-) > > My main focus was to keep the Neutron team as encouraging > and inclusive as possible and work on cooperation with new contributor > groups > and even with other projects of whom we are consuming in Openstack. > > It is time to change and allow new ideas and energies to form Networking. > > I remain and I hope I can help the community in the next cycles also. > > Cheers > Lajos Katona > -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From amy at demarco.com Thu Aug 25 13:46:33 2022 From: amy at demarco.com (Amy Marrich) Date: Thu, 25 Aug 2022 08:46:33 -0500 Subject: [neutron][elections] PTL non-candidacy for Antelope cycle In-Reply-To: References: Message-ID: Thanks for all your hard work with Neutron Lajos! Amy On Thu, Aug 25, 2022 at 5:17 AM Lajos Katona wrote: > Hi, > It was a great pleasure and honor to be Neutron PTL for 2 cycles, > thanks everybody for the help and support during this time > (actually not just for these cycles but for all). > > It's not just smoke and ruins around networking after my PTLship, > so after all I would say it was a success :-) > > My main focus was to keep the Neutron team as encouraging > and inclusive as possible and work on cooperation with new contributor > groups > and even with other projects of whom we are consuming in Openstack. > > It is time to change and allow new ideas and energies to form Networking. > > I remain and I hope I can help the community in the next cycles also. > > Cheers > Lajos Katona > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Thu Aug 25 13:48:30 2022 From: amy at demarco.com (Amy Marrich) Date: Thu, 25 Aug 2022 08:48:30 -0500 Subject: [Glance] PTL non-candidacy In-Reply-To: References: Message-ID: Thank you Abhishek for leading Glance for all these cycles!! Amy On Thu, Aug 25, 2022 at 2:44 AM Abhishek Kekane wrote: > Hi all, > > I'm writing this email to let you know that I'm not going to run for > Glance PTL for the Antelope dev cycle, I think it's time for some new ideas > and new approaches so it's a good idea to hand over the hat of PTL to a new > member of the team. > > I've been serving Glance PTL since Ussuri, tried my best to keep Glance > stable, lots of changes have been made since then, my initial focus was to > improve glance-tempest coverage which we managed to do in past cycles. > > I would like to thank all the members of the Glance team and others for > supporting me during this period. My plan is to stick around, attend to my > duties as a glance core contributor, and support my successor in whatever > way I can to make for a smooth transition. > > Thank you once again! > > Cheers, > Abhishek Kekane > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gibi at redhat.com Thu Aug 25 14:39:47 2022 From: gibi at redhat.com (Balazs Gibizer) Date: Thu, 25 Aug 2022 16:39:47 +0200 Subject: [neutron][elections] PTL non-candidacy for Antelope cycle In-Reply-To: References: Message-ID: On Thu, Aug 25 2022 at 12:04:43 PM +02:00:00, Lajos Katona wrote: > Hi, > It was a great pleasure and honor to be Neutron PTL for 2 cycles, > thanks everybody for the help and support during this time > (actually not just for these cycles but for all). > > It's not just smoke and ruins around networking after my PTLship, > so after all I would say it was a success :-) Thank you Lajos! > > My main focus was to keep the Neutron team as encouraging > and inclusive as possible and work on cooperation with new > contributor groups > and even with other projects of whom we are consuming in Openstack. > > It is time to change and allow new ideas and energies to form > Networking. > > I remain and I hope I can help the community in the next cycles also. > > Cheers > Lajos Katona > From gmann at ghanshyammann.com Thu Aug 25 14:51:57 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 25 Aug 2022 20:21:57 +0530 Subject: [tc][ptl] TC + Community Leaders interaction in 2023.1 Antelope Virtual PTG Message-ID: <182d57d0aad.bf056e04140280.2087630437258918780@ghanshyammann.com> Hello Everyone, We had successful TC+Community leaders interaction sessions in the last couple of PTG, and we will continue the same for 2023.1 cycle PTG also. I have created the poll to select the best suitable time. It will be for two hours (single or two sessions) either on Monday or Tuesday. Please add your preference in the below doodle poll (including TC members): - https://framadate.org/zsOqRxfVcmtjaPBC Also, I have created the below etherpad with the draft agenda, If you as PTL or project representatives or SIG Chair and would like to attend this session, please add your name: - https://etherpad.opendev.org/p/tc-leaders-interaction-2023-1 -gmann From gmann at ghanshyammann.com Thu Aug 25 14:52:06 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 25 Aug 2022 20:22:06 +0530 Subject: [all][tc] 2023.1 Antelope TC-PTG Planning Message-ID: <182d57d2a78.1127e41be140296.2872907929814651386@ghanshyammann.com> Hello Everyone, As you already know that the 2023.1 cycle virtual PTG will be held between Oct 17th - 21[1]. I have started the preparation for the Technical Committee PTG sessions. Please do the following: 1. Fill the below poll as per your availability. - https://framadate.org/yi8LNQaph5wrirks 2. Add the topics you would like to discuss to the below etherpad. - https://etherpad.opendev.org/p/tc-2023-1-ptg NOTE: this is not limited to TC members only; I would like all community members to fill the doodle poll and, add the topics you would like or want TC members to discuss in PTG. [1] https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030041.html -gmann From katonalala at gmail.com Thu Aug 25 15:11:24 2022 From: katonalala at gmail.com (Lajos Katona) Date: Thu, 25 Aug 2022 17:11:24 +0200 Subject: [neutron] Drivers meeting agenda - 26.08.2022. Message-ID: Hi Neutron Drivers, The agenda for tomorrow's drivers meeting is at [1]. We have the following RFEs to discuss tomorrow: * [RFE] Add a port extension to set/define the switchdev capabilities (#link https://bugs.launchpad.net/neutron/+bug/1987093 ) * [RFE] Add DSCP mark 44 (#link https://bugs.launchpad.net/neutron/+bug/1987378 ) And we have a topic for the On Demand agenda: * (amorin): Add new novaplug mechanism driver ** https://bugs.launchpad.net/neutron/+bug/1986969 ** https://review.opendev.org/c/openstack/neutron/+/854553 [1] https://wiki.openstack.org/wiki/Meetings/NeutronDrivers#Agenda See you at the meeting tomorrow. Lajos Katona (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From egarciar at redhat.com Thu Aug 25 15:13:23 2022 From: egarciar at redhat.com (Elvira Garcia Ruiz) Date: Thu, 25 Aug 2022 17:13:23 +0200 Subject: [neutron][elections] PTL non-candidacy for Antelope cycle In-Reply-To: References: Message-ID: Thank you for your dedication Lajos!! Looking forward to continue working with you o/ On Thu, Aug 25, 2022 at 12:09 PM Lajos Katona wrote: > Hi, > It was a great pleasure and honor to be Neutron PTL for 2 cycles, > thanks everybody for the help and support during this time > (actually not just for these cycles but for all). > > It's not just smoke and ruins around networking after my PTLship, > so after all I would say it was a success :-) > > My main focus was to keep the Neutron team as encouraging > and inclusive as possible and work on cooperation with new contributor > groups > and even with other projects of whom we are consuming in Openstack. > > It is time to change and allow new ideas and energies to form Networking. > > I remain and I hope I can help the community in the next cycles also. > > Cheers > Lajos Katona > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ykarel at redhat.com Thu Aug 25 15:15:13 2022 From: ykarel at redhat.com (Yatin Karel) Date: Thu, 25 Aug 2022 20:45:13 +0530 Subject: [neutron][elections] PTL non-candidacy for Antelope cycle In-Reply-To: References: Message-ID: Thanks Lajos for all your contributions as PTL \o/ Regards Yatin Karel On Thu, Aug 25, 2022 at 3:54 PM Lajos Katona wrote: > Hi, > It was a great pleasure and honor to be Neutron PTL for 2 cycles, > thanks everybody for the help and support during this time > (actually not just for these cycles but for all). > > It's not just smoke and ruins around networking after my PTLship, > so after all I would say it was a success :-) > > My main focus was to keep the Neutron team as encouraging > and inclusive as possible and work on cooperation with new contributor > groups > and even with other projects of whom we are consuming in Openstack. > > It is time to change and allow new ideas and energies to form Networking. > > I remain and I hope I can help the community in the next cycles also. > > Cheers > Lajos Katona > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Thu Aug 25 15:24:35 2022 From: jungleboyj at gmail.com (Jay Bryant) Date: Thu, 25 Aug 2022 10:24:35 -0500 Subject: [Glance] PTL non-candidacy In-Reply-To: References: Message-ID: <4f65fec5-9c7d-6594-c543-6af3c36d56b2@gmail.com> Abhishek, Thanks for leading Glance all these cycles.? Well done! Jay On 8/25/2022 2:30 AM, Abhishek Kekane wrote: > Hi all, > > I'm writing this email to let you know that I'm not going to run for > Glance PTL for the Antelope dev cycle, I think it's time for some new > ideas and new approaches so it's a good idea to hand over the hat of > PTL to a new member of the team. > > I've been serving Glance PTL since Ussuri, tried my best to keep > Glance stable, lots of changes have been made since then, my initial > focus was to improve glance-tempest coverage which we managed to do in > past cycles. > > I would like to thank all the members of the Glance team and others > for supporting me during this period. My plan is to stick around, > attend to my duties as a glance core contributor, and support my > successor in whatever way I can to make for a smooth transition. > > Thank you once again! > > Cheers, > Abhishek Kekane > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Thu Aug 25 15:33:34 2022 From: amy at demarco.com (Amy Marrich) Date: Thu, 25 Aug 2022 10:33:34 -0500 Subject: [OPS] Participation Poll for OPS Meetup at the PTG Message-ID: Hey all, Just putting this on the list for more visibility. On Monday we posted a poll from the OPS Meetup Twitter account for how Operators would prefer to interact with the Developers[0]. The poll closes on Monday but if you don't have twitter please reply here with your preference. We also have a poll for preferred times[1] Thanks, Amy 0 - https://twitter.com/osopsmeetup/status/1561708280455593984 1 - https://doodle.com/meeting/participate/id/bD9kR2yd -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel at mlavalle.com Thu Aug 25 16:54:12 2022 From: miguel at mlavalle.com (Miguel Lavalle) Date: Thu, 25 Aug 2022 11:54:12 -0500 Subject: [neutron][elections] PTL non-candidacy for Antelope cycle In-Reply-To: References: Message-ID: Hi Lajos, It's been a pleasure working in the Neutron core team under your leadership. Thanks for your many contributions and kind guidance. Best regards Miguel On Thu, Aug 25, 2022, 10:20 AM Yatin Karel wrote: > Thanks Lajos for all your contributions as PTL \o/ > > > > Regards > Yatin Karel > > On Thu, Aug 25, 2022 at 3:54 PM Lajos Katona wrote: > >> Hi, >> It was a great pleasure and honor to be Neutron PTL for 2 cycles, >> thanks everybody for the help and support during this time >> (actually not just for these cycles but for all). >> >> It's not just smoke and ruins around networking after my PTLship, >> so after all I would say it was a success :-) >> >> My main focus was to keep the Neutron team as encouraging >> and inclusive as possible and work on cooperation with new contributor >> groups >> and even with other projects of whom we are consuming in Openstack. >> >> It is time to change and allow new ideas and energies to form Networking. >> >> I remain and I hope I can help the community in the next cycles also. >> >> Cheers >> Lajos Katona >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From adivya1.singh at gmail.com Thu Aug 25 16:55:38 2022 From: adivya1.singh at gmail.com (Adivya Singh) Date: Thu, 25 Aug 2022 22:25:38 +0530 Subject: Regarding Designate Installation in ansible Playbook Message-ID: hi Team, I have made under conf.d a designate. yml file, and rerunning the setup of Openstack-setuphost but still it is not installing a Designate Container in OpenStack. Is there any Reason for this, Something i am doing right Regards Adivya Singh -------------- next part -------------- An HTML attachment was scrubbed... URL: From fsb4000 at yandex.ru Thu Aug 25 00:48:53 2022 From: fsb4000 at yandex.ru (Igor Zhukov) Date: Thu, 25 Aug 2022 07:48:53 +0700 Subject: [Neutron] How to add Fake ML2 extension to Neutron? In-Reply-To: References: <5523221661172836@myt6-bbc622793f1b.qloud-c.yandex.net> <2183551661263463@myt5-b646bde4b8f3.qloud-c.yandex.net> Message-ID: <4531261661388533@vla5-81f3f2eec11f.qloud-c.yandex.net> Hi Lajos. Thank you. I have a progress. I think my fake extension works. I added ``` extensions.register_custom_supported_check( "vpc_extension", lambda: True, plugin_agnostic=False ) ``` to ``` class Vpc(api_extensions.ExtensionDescriptor): extensions.register_custom_supported_check( "vpc_extension", lambda: True, plugin_agnostic=False ) ... ``` and I use ml2 extension driver without any new plugin. https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L44 I tested it with python neutronclientapi. So I can change my new attribute (neutron.update_network(id, {'network': {'new_attribute': some string }})) and I see my changes (neutron.list_networks(name='demo-net')) I'm close to the end. Now I'm using modifed `TestExtensionDriver(TestExtensionDriverBase):`. It works but It stores the data locally. And I want to use class TestDBExtensionDriver(TestExtensionDriverBase): (https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L169) I tried to use it but I got such errors in neutron-server.log: "Table 'neutron.myextension.networkextensions' doesn't exist" How can I create a new table? I saw https://docs.openstack.org/neutron/latest/contributor/alembic_migrations.html and https://github.com/openstack/neutron-vpnaas/tree/master/neutron_vpnaas/db but I still don't understand. I mean I think some of the neutron_vpnaas/db files are generated. Are neutron_vpnaas/db/migration/alembic_migrations/versions generated? Which files I should create(their names, I think I can copy from neutron_vpnaas/db/) and what commands to type to create one new table: https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L136-L144 ? > Hi Igor,The line which is interesting for you: "Extension vpc_extension not supported by any of loaded plugins" > In core Neutron for ml2 there is a list of supported extension aliases: > https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L200-L239 > > And there is a similar for l3 also: > https://opendev.org/openstack/neutron/src/branch/master/neutron/services/l3_router/l3_router_plugin.py#L98-L110 > > Or similarly for QoS: > https://opendev.org/openstack/neutron/src/branch/master/neutron/services/qos/qos_plugin.py#L76-L90 > > So you need a plugin that uses the extension. > > Good luck :-) > Lajos Katona (lajoskatona) > > Igor Zhukov ezt ?rta (id?pont: 2022. aug. 23., K, 16:04): > >> Hi again! >> >> Do you know how to debug ML2 extension drivers? >> >> I created folder with two python files: vpc/extensions/vpc.py and vpc/plugins/ml2/drivers/vpc.py (also empty __init__.py files) >> >> I added to neuron.conf >> >> api_extensions_path = /path/to/vpc/extensions >> >> and I added to ml2_ini.conf >> >> extension_drivers = port_security, vpc.plugins.ml2.drivers.vpc:VpcExtensionDriver >> >> and my neutron.server.log has: >> >> INFO neutron.plugins.ml2.managers [-] Configured extension driver names: ['port_security', 'vpc_neutron.plugins.ml2.drivers.vpc:VpcExtensionDriver'] >> >> WARNING stevedore.named [-] Could not load vpc_neutron.plugins.ml2.drivers.vpc:VpcExtensionDriver >> >> .... >> >> INFO neutron.api.extensions [req-fd226631-b0cd-4ff8-956b-9470e7f26ebe - - - - -] Extension vpc_extension not supported by any of loaded plugins >> >> How can I find why the extension driver could not be loaded? >> >>> Hi,The fake_extension is used only in unit tests to test the extension framework, i.e. : >> >>> https://opendev.org/openstack/neutron/src/branch/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L37 >> >>> >> >>> If you would like to write an API extension check neutron-lib/api/definitions/ (and you can find the extensions "counterpart" under neutron/extensions in neutron repository) >> >>> >> >>> You can also check other Networking projects like networking-bgvpn, neutron-dynamic-routing to have examples of API extensions. >> >>> If you have an extension under neutron/extensions and there's somebody who uses it (see [1]) you will see it is loaded in neutron servers logs (something like this: "Loaded extension: address-group") and you can find it in the output of openstack extension list --network >> >>> >> >>> [1]: https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L200 >> >>> >> >>> Best wishes >> >>> Lajos Katona >> >>> >> >>> Igor Zhukov ezt ?rta (id?pont: 2022. aug. 22., H, 19:41): >> >>> >> >>>> Hi all! >> >>>> >> >>>> Sorry for a complete noob question but I can't figure it out ? >> >>>> >> >>>> So if I want to add Fake ML2 extension what should I do? >> >>>> >> >>>> I have neutron server installed and I have the file: https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/extensions/fake_extension.py >> >>>> >> >>>> How to configure neutron server, where should I put the file, should I create another files? How can I test that it works? From berndbausch at gmail.com Thu Aug 25 00:58:06 2022 From: berndbausch at gmail.com (Bernd Bausch) Date: Thu, 25 Aug 2022 09:58:06 +0900 Subject: container and instances are not working on my dashboard In-Reply-To: References: Message-ID: You have a single compute node. The error message when creating an instance is "no valid host was found". This indicates that, for some reason, your compute node can't launch the instance. The first thing to do is check Nova's status with /openstack compute service list/. It will tell you if Nova considers the compute service on /ara-pc/ running and enabled. If so, you need to look at the logs. I would start with the scheduler log for clues why no host could be found, then also the compute log on /ara-pc/. Why your CLI shows a list of containers, but the GUI doesn't: The first thing I'd check is whether you use the same project in the CLI and the GUI. On 2022/08/25 4:45 AM, Lee, Vincent Z wrote: > Hi all, > I am quite new to Openstack and faced some issues with creating > instances and displaying created containers on my Openstack dashboard. > I am currently working on multinode Openstack. I am hoping to get some > helps and feedbacks about this. I will briefly go through the problems > I encountered. > > When I created an instances on my dashboard, it directly went into > error state. I have attached a screenshot as shown below. > > This is my list of hosts. > > When I created a container using both cli and dashboard, two > containers were running. However, when I tried to look into my > dashboard, those containers were not shown. > These are the created containers. > However, when i try to open my dashboard and look for them, they just > don't appear on it. So I am not sure what is causing this. > > > Hope to hear from everyone?soon. > > Best regards, > Vincent -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 93342 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 90734 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 262295 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 35234 bytes Desc: not available URL: From hongbin034 at gmail.com Thu Aug 25 02:54:32 2022 From: hongbin034 at gmail.com (Hongbin Lu) Date: Thu, 25 Aug 2022 10:54:32 +0800 Subject: container and instances are not working on my dashboard In-Reply-To: References: Message-ID: Hi Vincent, Please provide the following information: * The version of your OpenStack deployment (master? stable/yoga? etc.) * How you installed Zun (devstack? kolla-ansible? manual install?) * The logs of your zun processes (zun-api, zun-compute, zun-wsproxy, zun-cni, kuryr-libnetwork). The reason you cannot find containers created in CLI via dashboard is probably due to tenant mismatch. Please double-check the tenant of the container created by CLI and dashboard. Best regards, Hongbin On Thu, Aug 25, 2022 at 4:10 AM Lee, Vincent Z wrote: > Hi all, > I am quite new to Openstack and faced some issues with creating instances > and displaying created containers on my Openstack dashboard. I am currently > working on multinode Openstack. I am hoping to get some helps and feedbacks > about this. I will briefly go through the problems I encountered. > > When I created an instances on my dashboard, it directly went into error > state. I have attached a screenshot as shown below. > > This is my list of hosts. > > When I created a container using both cli and dashboard, two containers > were running. However, when I tried to look into my dashboard, those > containers were not shown. > These are the created containers. > However, when i try to open my dashboard and look for them, they just > don't appear on it. So I am not sure what is causing this. > > > Hope to hear from everyone soon. > > Best regards, > Vincent > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 93342 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 90734 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 262295 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 35234 bytes Desc: not available URL: From lokendrarathour at gmail.com Thu Aug 25 13:04:31 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Thu, 25 Aug 2022 18:34:31 +0530 Subject: [Triple0] [Wallaby] External Ceph Integration getting failed In-Reply-To: References: Message-ID: Hi John, Thanks for the inputs. Now I see something strange. Deployment with external ceph is unstable, it got deployed once and we saw an error of VM not getting created because of some reasons, we were debugging and found that we found some NTP related observation, which we fixed and tried redeploying. Not it again got failed at step 4: 2022-08-25 17:34:29.036371 | 5254004d-021e-d4db-067d-000000007b1a | TASK | Create identity internal endpoint 2022-08-25 17:34:31.176105 | 5254004d-021e-d4db-067d-000000007b1a | FATAL | Create identity internal endpoint | undercloud | error={"changed": false, "extra_data": {"data": null, "details": "The request you have made requires authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The request you have made requires authentication.\",\"title\":\"Unauthorized\"} }\n"}, "msg": "Failed to list services: Client Error for url: https://overcloud-public.mydomain.com:13000/v3/services , The request you have made requires authentication."} To revalidate the case, I tried a fresh setup and saw that deployment again failed at step 4. and when we remove external ceph from the deployment command, we see that deployment is happening 100%. I see authorization errors, which I used to get earlier as well, but because of DNS we were able to resolve this. what could be the reason for this when we are using External ceph ? any inputs would be helpful deploy command: stack at undercloud ~]$ cat deploy_step2.sh openstack overcloud deploy --templates \ -r /home/stack/templates/roles_data.yaml \ -n /home/stack/templates/custom_network_data.yaml \ -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ -e /home/stack/templates/networks-deployed-environment.yaml \ -e /home/stack/templates/vip-deployed-environment.yaml \ -e /home/stack/templates/environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-inspector.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-overcloud.yaml \ -e /home/stack/templates/ironic-config.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ptp.yaml \ -e /home/stack/templates/enable-tls.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml \ -e /home/stack/templates/cloudname.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/inject-trust-anchor-hiera.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml \ -e /home/stack/templates/my-additional-ceph-settings.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \ -e /home/stack/containers-prepare-parameter.yaml [stack at undercloud ~]$ cat /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml resource_registry: OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml parameter_defaults: # NOTE: These example parameters are required when using CephExternal CephClusterFSID: 'ca3080e3-aa3a-4d1a-b1fd-483459a9ea4c' CephClientKey: 'AQB2hMZi2u13NxAAVjmKopw+kNm6OnZOG7NktQ==' CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' # the following parameters enable Ceph backends for Cinder, Glance, Gnocchi and Nova NovaEnableRbdBackend: true CinderEnableRbdBackend: true CinderBackupBackend: ceph GlanceBackend: rbd # Uncomment below if enabling legacy telemetry # GnocchiBackend: rbd # If the Ceph pools which host VMs, Volumes and Images do not match these # names OR the client keyring to use is not named 'openstack', edit the # following as needed. NovaRbdPoolName: vms CinderRbdPoolName: volumes CinderBackupRbdPoolName: backups GlanceRbdPoolName: images # Uncomment below if enabling legacy telemetry # GnocchiRbdPoolName: metrics CephClientUserName: openstack # finally we disable the Cinder LVM backend CinderEnableIscsiBackend: false On Fri, Aug 19, 2022 at 10:32 PM John Fulton wrote: > On Fri, Aug 19, 2022 at 3:45 AM Lokendra Rathour < > lokendrarathour at gmail.com> wrote: > >> Hi Fulton, >> Thanks for the inputs and apologies for the delay in response. >> to my surprise passing the container prepare in standard worked for me, >> new container-prepare is: >> >> parameter_defaults: >> ContainerImagePrepare: >> - push_destination: true >> set: >> ceph_alertmanager_image: alertmanager >> ceph_alertmanager_namespace: quay.ceph.io/prometheus >> ceph_alertmanager_tag: v0.16.2 >> ceph_grafana_image: grafana >> ceph_grafana_namespace: quay.ceph.io/app-sre >> ceph_grafana_tag: 6.7.4 >> ceph_image: daemon >> ceph_namespace: quay.io/ceph >> ceph_node_exporter_image: node-exporter >> ceph_node_exporter_namespace: quay.ceph.io/prometheus >> ceph_node_exporter_tag: v0.17.0 >> ceph_prometheus_image: prometheus >> ceph_prometheus_namespace: quay.ceph.io/prometheus >> ceph_prometheus_tag: v2.7.2 >> ceph_tag: v6.0.7-stable-6.0-pacific-centos-stream8 >> name_prefix: openstack- >> name_suffix: '' >> namespace: myserver.com:5000/tripleowallaby >> neutron_driver: ovn >> rhel_containers: false >> tag: current-tripleo >> tag_from_label: rdo_version >> >> But if we see or look at these containers I do not see any such >> containers available. we have tried looking at Undercloud and overcloud. >> > > The undercloud can download continers from the sources above and then act > as a container registry. It's described here: > > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/container_image_prepare.html > > >> Also, the deployment is done when we are passing this config. >> Thanks once again. >> >> Also, we need to understand some use cases of using the storage from this >> external ceph, which can work as the mount for the VM as direct or Shared >> storage. Any idea or available document which tells more about how to >> consume external Ceph in the existing triple Overcloud? >> > > Ceph can provide OpenStack Block, Object and File storage and TripleO > supports a variety of integration options for them. > > TripleO can deploy Ceph as part of the OpenStack overcloud: > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html > > TripleO can also deploy an OpenStack overcloud which uses an existing > external ceph cluster: > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/ceph_external.html > > At the end of both of these documents you can expect Glance, Nova, and > Cinder to use Ceph block storage (RBD). > > You can also have OpenStack use Ceph object storage (RGW). When RGW is > used, a command like "openstack container create foo" will create an object > storage container (not to be confused with podman/docker) on CephRGW as if > your overcloud were running OpenStack Swift. If you have TripleO deploy > Ceph as part of the OpenStack overcloud, RGW will be deployed and > configured for OpenStack object storage by default (in Wallaby+). > > The OpenStack Manila service can use CephFS as one of its backends. > TripleO can deploy that too as described here: > > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_manila.html > > John > > >> Do share in case you know any, please. >> >> Thanks once again for the support, it was really helpful >> >> >> On Thu, Aug 11, 2022 at 9:59 PM John Fulton wrote: >> >>> The ceph container should no longer be needed for external ceph >>> configuration (since the move from ceph-ansible to cephadm) but if removing >>> the ceph env files makes the error go away, then try adding it back and >>> then following these steps to prepare the ceph container on your undercloud >>> before deploying. >>> >>> >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html#container-options >>> >>> On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour < >>> lokendrarathour at gmail.com> wrote: >>> >>>> Hi Thanks, >>>> for the inputs, we could see the miss, >>>> now we have added the required miss : >>>> "TripleO resource >>>> OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml" >>>> >>>> Now with this setting if we deploy the setup in wallaby, we are >>>> getting error as: >>>> >>>> >>>> PLAY [External deployment step 1] >>>> ********************************************** >>>> 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | >>>> TASK | External deployment step 1 >>>> 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | >>>> OK | External deployment step 1 | undercloud -> localhost | result={ >>>> "changed": false, >>>> "msg": "Use --start-at-task 'External deployment step 1' to resume >>>> from this task" >>>> } >>>> [WARNING]: ('undercloud -> localhost', >>>> '525400d4-7124-4a42-664c-0000000000a8') >>>> missing from stats >>>> 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | >>>> TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s >>>> 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | >>>> INCLUDED | >>>> /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml >>>> | undercloud >>>> 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | >>>> TASK | Set some tripleo-ansible facts >>>> 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | >>>> OK | Set some tripleo-ansible facts | undercloud >>>> 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | >>>> TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | >>>> 0.03s >>>> 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | >>>> TASK | Container image prepare >>>> 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | >>>> FATAL | Container image prepare | *undercloud | error={"changed": >>>> false, "error": "None: Max retries exceeded with url: /v2/ (Caused by >>>> None)", "msg": "Error running container image prepare: None: Max retries >>>> exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* >>>> 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | >>>> TIMING | tripleo_container_image_prepare : Container image prepare | >>>> undercloud | 0:06:13.385607 | 72.12s >>>> >>>> This gets failed at step 1, As this is wallaby and based on the >>>> document (Use an external Ceph cluster with the Overcloud ? TripleO >>>> 3.0.0 documentation (openstack.org) >>>> ) >>>> we should only pass this external-ceph.yaml for the external ceph >>>> intergration. >>>> But it is not happening. >>>> >>>> >>>> Few things to note: >>>> 1. Container Prepare: >>>> >>>> (undercloud) [stack at undercloud ~]$ cat >>>> containers-prepare-parameter.yaml >>>> # Generated with the following on 2022-06-28T18:56:38.642315 >>>> # >>>> # openstack tripleo container image prepare default >>>> --local-push-destination --output-env-file >>>> /home/stack/containers-prepare-parameter.yaml >>>> # >>>> >>>> >>>> parameter_defaults: >>>> ContainerImagePrepare: >>>> - push_destination: true >>>> set: >>>> name_prefix: openstack- >>>> name_suffix: '' >>>> namespace: myserver.com:5000/tripleowallaby >>>> neutron_driver: ovn >>>> rhel_containers: false >>>> tag: current-tripleo >>>> tag_from_label: rdo_version >>>> (undercloud) [stack at undercloud ~]$ >>>> >>>> 2. this is SSL based deployment. >>>> >>>> Any idea for the error, the issue is seen only once we have the >>>> external ceph integration enabled. >>>> >>>> Best Regards, >>>> Lokendra >>>> >>>> >>>> >>>> >>>> On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano >>>> wrote: >>>> >>>>> Hi, >>>>> ceph is supposed to be configured by this tripleo-ansible role [1], >>>>> which is triggered by tht on external_deploy_steps [2]. >>>>> In theory adding [3] should just work, assuming you customize the ceph >>>>> cluster mon ip addresses, fsid and a few other related variables. >>>>> From your previous email I suspect in your external-ceph.yaml you >>>>> missed the TripleO resource OS::TripleO::Services::CephExternal: >>>>> ../deployment/cephadm/ceph-client.yaml >>>>> (see [3]). >>>>> >>>>> Thanks, >>>>> Francesco >>>>> >>>>> >>>>> [1] >>>>> https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_ceph_client >>>>> [2] >>>>> https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/cephadm/ceph-client.yaml#L93 >>>>> [3] >>>>> https://github.com/openstack/tripleo-heat-templates/blob/master/environments/external-ceph.yaml >>>>> >>>>> On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < >>>>> lokendrarathour at gmail.com> wrote: >>>>> >>>>>> Hi Team, >>>>>> I was trying to integrate External Ceph with Triple0 Wallaby, and at >>>>>> the end of deployment in step4 getting the below error: >>>>>> >>>>>> 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 >>>>>> 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | >>>>>> Create containers from >>>>>> /var/lib/tripleo-config/container-startup-config/step_4 >>>>>> 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 >>>>>> 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | >>>>>> /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | >>>>>> overcloud-controller-2 >>>>>> 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 >>>>>> 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | >>>>>> Create containers managed by Podman for >>>>>> /var/lib/tripleo-config/container-startup-config/step_4 >>>>>> 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 >>>>>> 18:37:24.530812 | | WARNING | >>>>>> ERROR: Can't run container nova_libvirt_init_secret >>>>>> stderr: >>>>>> 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 >>>>>> 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | >>>>>> Create containers managed by Podman for >>>>>> /var/lib/tripleo-config/container-startup-config/step_4 | >>>>>> overcloud-novacompute-0 | error={"changed": false, "msg": "Failed >>>>>> containers: nova_libvirt_init_secret"} >>>>>> 2022-08-03 18:37:44,282 p=507732 u >>>>>> >>>>>> >>>>>> *external-ceph.conf:* >>>>>> >>>>>> parameter_defaults: >>>>>> # Enable use of RBD backend in nova-compute >>>>>> NovaEnableRbdBackend: True >>>>>> # Enable use of RBD backend in cinder-volume >>>>>> CinderEnableRbdBackend: True >>>>>> # Backend to use for cinder-backup >>>>>> CinderBackupBackend: ceph >>>>>> # Backend to use for glance >>>>>> GlanceBackend: rbd >>>>>> # Name of the Ceph pool hosting Nova ephemeral images >>>>>> NovaRbdPoolName: vms >>>>>> # Name of the Ceph pool hosting Cinder volumes >>>>>> CinderRbdPoolName: volumes >>>>>> # Name of the Ceph pool hosting Cinder backups >>>>>> CinderBackupRbdPoolName: backups >>>>>> # Name of the Ceph pool hosting Glance images >>>>>> GlanceRbdPoolName: images >>>>>> # Name of the user to authenticate with the external Ceph cluster >>>>>> CephClientUserName: admin >>>>>> # The cluster FSID >>>>>> CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' >>>>>> # The CephX user auth key >>>>>> CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' >>>>>> # The list of Ceph monitors >>>>>> CephExternalMonHost: >>>>>> 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' >>>>>> ~ >>>>>> >>>>>> >>>>>> Have tried checking and validating the ceph client details and they >>>>>> seem to be correct, further digging the container log I could see something >>>>>> like this : >>>>>> >>>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>>> nova_libvirt_init_secret.log >>>>>> tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such >>>>>> file or directory >>>>>> tail: no files remaining >>>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>>> stdouts/nova_libvirt_init_secret.log >>>>>> 2022-08-04T11:48:47.689898197+05:30 stdout F >>>>>> ------------------------------------------------ >>>>>> 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh >>>>>> secrets for: ceph:admin >>>>>> 2022-08-04T11:48:47.690590594+05:30 stdout F Error: >>>>>> /etc/ceph/ceph.conf was not found >>>>>> 2022-08-04T11:48:47.690625088+05:30 stdout F Path to >>>>>> nova_libvirt_init_secret was ceph:admin >>>>>> 2022-08-04T16:20:29.643785538+05:30 stdout F >>>>>> ------------------------------------------------ >>>>>> 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh >>>>>> secrets for: ceph:admin >>>>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Error: >>>>>> /etc/ceph/ceph.conf was not found >>>>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Path to >>>>>> nova_libvirt_init_secret was ceph:admin >>>>>> ^C >>>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>>> stdouts/nova_compute_init_log.log >>>>>> >>>>>> -- >>>>>> ~ Lokendra >>>>>> skype: lokendrarathour >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Francesco Pantano >>>>> GPG KEY: F41BD75C >>>>> >>>> >>>> >>>> -- >>>> ~ Lokendra >>>> skype: lokendrarathour >>>> >>>> >>>> >> >> -- >> ~ Lokendra >> skype: lokendrarathour >> >> >> -- ~ Lokendra skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Thu Aug 25 13:32:09 2022 From: johfulto at redhat.com (John Fulton) Date: Thu, 25 Aug 2022 09:32:09 -0400 Subject: [Triple0] [Wallaby] External Ceph Integration getting failed In-Reply-To: References: Message-ID: On Thu, Aug 25, 2022 at 9:04 AM Lokendra Rathour wrote: > Hi John, > Thanks for the inputs. Now I see something strange. > Deployment with external ceph is unstable, > I assume you're using Wallaby. There's a downstream job testing external ceph daily. The external ceph feature of TripleO in Wallaby is stable. I think you have something else going on that conflates with your use of external ceph. > it got deployed once and we saw an error of VM not getting created because > of some reasons, we were debugging and found that we found some NTP related > observation, which we fixed and tried redeploying. > Not it again got failed at step 4: > External Ceph is already configured before step 4. You can inspect your system after this failure to see that this role: https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_ceph_client has done its job of distributing cephx keys and a ceph.conf file into this path: https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/cephadm/ceph-client.yaml#L61 That should be all that doing a "-e /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml" results in. Maybe there's something else in my-additional-ceph-settings.yaml that shouldn't be there that's causing your overcloud to try to create an endpoint? I think that unlikely but I'm trying to come up with an explanation for the correlation you're reporting. 2022-08-25 17:34:29.036371 | 5254004d-021e-d4db-067d-000000007b1a | TASK > | Create identity internal endpoint > 2022-08-25 17:34:31.176105 | 5254004d-021e-d4db-067d-000000007b1a | FATAL > | Create identity internal endpoint | undercloud | error={"changed": false, > "extra_data": {"data": null, "details": "The request you have made requires > authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The > request you have made requires authentication.\",\"title\":\"Unauthorized\"} > }\n"}, "msg": "Failed to list services: Client Error for url: > https://overcloud-public.mydomain.com:13000/v3/services > , The request you > have made requires authentication."} > > The above is happening from this role: https://github.com/openstack/tripleo-ansible/blob/e9cc12d4ce0b1c9e96b58f6102e8a3906ed9a1d3/tripleo_ansible/roles/tripleo_keystone_resources/tasks/admin.yml#L81-L87 To revalidate the case, I tried a fresh setup and saw that deployment again > failed at step 4. > and when we remove external ceph from the deployment command, we see that > deployment is happening 100%. > I see authorization errors, which I used to get earlier as well, but > because of DNS we were able to resolve this. > what could be the reason for this when we are using External ceph ? > I really don't think this is related to external ceph configuration. Correlation does not always mean causality. any inputs would be helpful > > deploy command: > > stack at undercloud ~]$ cat deploy_step2.sh > openstack overcloud deploy --templates \ > -r /home/stack/templates/roles_data.yaml \ > -n /home/stack/templates/custom_network_data.yaml \ > -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ > -e /home/stack/templates/networks-deployed-environment.yaml \ > -e /home/stack/templates/vip-deployed-environment.yaml \ > -e /home/stack/templates/environment.yaml \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml > \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-inspector.yaml > \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-overcloud.yaml > \ > -e /home/stack/templates/ironic-config.yaml \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/services/ptp.yaml \ > -e /home/stack/templates/enable-tls.yaml \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml > \ > -e /home/stack/templates/cloudname.yaml \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/ssl/inject-trust-anchor-hiera.yaml > \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml > \ > -e /home/stack/templates/my-additional-ceph-settings.yaml \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \ > -e /home/stack/containers-prepare-parameter.yaml > > Also, please re-arrange the order of your templates. Any path with /usr/share should be first. Then any path in /home should then follow. openstack overcloud deploy --templates \ -r /home/stack/templates/roles_data.yaml \ -n /home/stack/templates/custom_network_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml \ -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ You override the values in the templates tripleo ships (in /usr/share) with your own values for your env (in /home/stack). If you include /usr/share after, then you could override your own custom values. Just a best practice to rule out other issues. At this point I think you should find out what command is being executed when this task runs: https://github.com/openstack/tripleo-ansible/blob/e9cc12d4ce0b1c9e96b58f6102e8a3906ed9a1d3/tripleo_ansible/roles/tripleo_keystone_resources/tasks/admin.yml#L81-L87 Find out the values. Then run that command manually on the CLI of the system where it is failing. At that point you'll have decoupled what the deployment tool is doing vs the failing command on your system and share that on the list. > [stack at undercloud ~]$ cat > /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml > resource_registry: > OS::TripleO::Services::CephExternal: > ../deployment/cephadm/ceph-client.yaml > > parameter_defaults: > # NOTE: These example parameters are required when using CephExternal > CephClusterFSID: 'ca3080e3-aa3a-4d1a-b1fd-483459a9ea4c' > CephClientKey: 'AQB2hMZi2u13NxAAVjmKopw+kNm6OnZOG7NktQ==' > CephExternalMonHost: > 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' > > # the following parameters enable Ceph backends for Cinder, Glance, > Gnocchi and Nova > NovaEnableRbdBackend: true > CinderEnableRbdBackend: true > CinderBackupBackend: ceph > GlanceBackend: rbd > # Uncomment below if enabling legacy telemetry > # GnocchiBackend: rbd > # If the Ceph pools which host VMs, Volumes and Images do not match these > # names OR the client keyring to use is not named 'openstack', edit the > # following as needed. > NovaRbdPoolName: vms > CinderRbdPoolName: volumes > CinderBackupRbdPoolName: backups > GlanceRbdPoolName: images > # Uncomment below if enabling legacy telemetry > # GnocchiRbdPoolName: metrics > CephClientUserName: openstack > > # finally we disable the Cinder LVM backend > CinderEnableIscsiBackend: false > I recommend instead that you not modify /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml and instead include it and then after it that you override your own values. John > > > On Fri, Aug 19, 2022 at 10:32 PM John Fulton wrote: > >> On Fri, Aug 19, 2022 at 3:45 AM Lokendra Rathour < >> lokendrarathour at gmail.com> wrote: >> >>> Hi Fulton, >>> Thanks for the inputs and apologies for the delay in response. >>> to my surprise passing the container prepare in standard worked for me, >>> new container-prepare is: >>> >>> parameter_defaults: >>> ContainerImagePrepare: >>> - push_destination: true >>> set: >>> ceph_alertmanager_image: alertmanager >>> ceph_alertmanager_namespace: quay.ceph.io/prometheus >>> ceph_alertmanager_tag: v0.16.2 >>> ceph_grafana_image: grafana >>> ceph_grafana_namespace: quay.ceph.io/app-sre >>> ceph_grafana_tag: 6.7.4 >>> ceph_image: daemon >>> ceph_namespace: quay.io/ceph >>> ceph_node_exporter_image: node-exporter >>> ceph_node_exporter_namespace: quay.ceph.io/prometheus >>> ceph_node_exporter_tag: v0.17.0 >>> ceph_prometheus_image: prometheus >>> ceph_prometheus_namespace: quay.ceph.io/prometheus >>> ceph_prometheus_tag: v2.7.2 >>> ceph_tag: v6.0.7-stable-6.0-pacific-centos-stream8 >>> name_prefix: openstack- >>> name_suffix: '' >>> namespace: myserver.com:5000/tripleowallaby >>> neutron_driver: ovn >>> rhel_containers: false >>> tag: current-tripleo >>> tag_from_label: rdo_version >>> >>> But if we see or look at these containers I do not see any such >>> containers available. we have tried looking at Undercloud and overcloud. >>> >> >> The undercloud can download continers from the sources above and then act >> as a container registry. It's described here: >> >> >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/container_image_prepare.html >> >> >>> Also, the deployment is done when we are passing this config. >>> Thanks once again. >>> >>> Also, we need to understand some use cases of using the storage from >>> this external ceph, which can work as the mount for the VM as direct or >>> Shared storage. Any idea or available document which tells more about how >>> to consume external Ceph in the existing triple Overcloud? >>> >> >> Ceph can provide OpenStack Block, Object and File storage and TripleO >> supports a variety of integration options for them. >> >> TripleO can deploy Ceph as part of the OpenStack overcloud: >> >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html >> >> TripleO can also deploy an OpenStack overcloud which uses an existing >> external ceph cluster: >> >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/ceph_external.html >> >> At the end of both of these documents you can expect Glance, Nova, and >> Cinder to use Ceph block storage (RBD). >> >> You can also have OpenStack use Ceph object storage (RGW). When RGW is >> used, a command like "openstack container create foo" will create an object >> storage container (not to be confused with podman/docker) on CephRGW as if >> your overcloud were running OpenStack Swift. If you have TripleO deploy >> Ceph as part of the OpenStack overcloud, RGW will be deployed and >> configured for OpenStack object storage by default (in Wallaby+). >> >> The OpenStack Manila service can use CephFS as one of its backends. >> TripleO can deploy that too as described here: >> >> >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_manila.html >> >> John >> >> >>> Do share in case you know any, please. >>> >>> Thanks once again for the support, it was really helpful >>> >>> >>> On Thu, Aug 11, 2022 at 9:59 PM John Fulton wrote: >>> >>>> The ceph container should no longer be needed for external ceph >>>> configuration (since the move from ceph-ansible to cephadm) but if removing >>>> the ceph env files makes the error go away, then try adding it back and >>>> then following these steps to prepare the ceph container on your undercloud >>>> before deploying. >>>> >>>> >>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html#container-options >>>> >>>> On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour < >>>> lokendrarathour at gmail.com> wrote: >>>> >>>>> Hi Thanks, >>>>> for the inputs, we could see the miss, >>>>> now we have added the required miss : >>>>> "TripleO resource >>>>> OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml" >>>>> >>>>> Now with this setting if we deploy the setup in wallaby, we are >>>>> getting error as: >>>>> >>>>> >>>>> PLAY [External deployment step 1] >>>>> ********************************************** >>>>> 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | >>>>> TASK | External deployment step 1 >>>>> 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | >>>>> OK | External deployment step 1 | undercloud -> localhost | result={ >>>>> "changed": false, >>>>> "msg": "Use --start-at-task 'External deployment step 1' to resume >>>>> from this task" >>>>> } >>>>> [WARNING]: ('undercloud -> localhost', >>>>> '525400d4-7124-4a42-664c-0000000000a8') >>>>> missing from stats >>>>> 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | >>>>> TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s >>>>> 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | >>>>> INCLUDED | >>>>> /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml >>>>> | undercloud >>>>> 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | >>>>> TASK | Set some tripleo-ansible facts >>>>> 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | >>>>> OK | Set some tripleo-ansible facts | undercloud >>>>> 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | >>>>> TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | >>>>> 0.03s >>>>> 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | >>>>> TASK | Container image prepare >>>>> 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | >>>>> FATAL | Container image prepare | *undercloud | error={"changed": >>>>> false, "error": "None: Max retries exceeded with url: /v2/ (Caused by >>>>> None)", "msg": "Error running container image prepare: None: Max retries >>>>> exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* >>>>> 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | >>>>> TIMING | tripleo_container_image_prepare : Container image prepare | >>>>> undercloud | 0:06:13.385607 | 72.12s >>>>> >>>>> This gets failed at step 1, As this is wallaby and based on the >>>>> document (Use an external Ceph cluster with the Overcloud ? TripleO >>>>> 3.0.0 documentation (openstack.org) >>>>> ) >>>>> we should only pass this external-ceph.yaml for the external ceph >>>>> intergration. >>>>> But it is not happening. >>>>> >>>>> >>>>> Few things to note: >>>>> 1. Container Prepare: >>>>> >>>>> (undercloud) [stack at undercloud ~]$ cat >>>>> containers-prepare-parameter.yaml >>>>> # Generated with the following on 2022-06-28T18:56:38.642315 >>>>> # >>>>> # openstack tripleo container image prepare default >>>>> --local-push-destination --output-env-file >>>>> /home/stack/containers-prepare-parameter.yaml >>>>> # >>>>> >>>>> >>>>> parameter_defaults: >>>>> ContainerImagePrepare: >>>>> - push_destination: true >>>>> set: >>>>> name_prefix: openstack- >>>>> name_suffix: '' >>>>> namespace: myserver.com:5000/tripleowallaby >>>>> neutron_driver: ovn >>>>> rhel_containers: false >>>>> tag: current-tripleo >>>>> tag_from_label: rdo_version >>>>> (undercloud) [stack at undercloud ~]$ >>>>> >>>>> 2. this is SSL based deployment. >>>>> >>>>> Any idea for the error, the issue is seen only once we have the >>>>> external ceph integration enabled. >>>>> >>>>> Best Regards, >>>>> Lokendra >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> ceph is supposed to be configured by this tripleo-ansible role [1], >>>>>> which is triggered by tht on external_deploy_steps [2]. >>>>>> In theory adding [3] should just work, assuming you customize the >>>>>> ceph cluster mon ip addresses, fsid and a few other related variables. >>>>>> From your previous email I suspect in your external-ceph.yaml you >>>>>> missed the TripleO resource OS::TripleO::Services::CephExternal: >>>>>> ../deployment/cephadm/ceph-client.yaml >>>>>> (see [3]). >>>>>> >>>>>> Thanks, >>>>>> Francesco >>>>>> >>>>>> >>>>>> [1] >>>>>> https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_ceph_client >>>>>> [2] >>>>>> https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/cephadm/ceph-client.yaml#L93 >>>>>> [3] >>>>>> https://github.com/openstack/tripleo-heat-templates/blob/master/environments/external-ceph.yaml >>>>>> >>>>>> On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < >>>>>> lokendrarathour at gmail.com> wrote: >>>>>> >>>>>>> Hi Team, >>>>>>> I was trying to integrate External Ceph with Triple0 Wallaby, and at >>>>>>> the end of deployment in step4 getting the below error: >>>>>>> >>>>>>> 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 >>>>>>> 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | >>>>>>> Create containers from >>>>>>> /var/lib/tripleo-config/container-startup-config/step_4 >>>>>>> 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 >>>>>>> 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | >>>>>>> /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | >>>>>>> overcloud-controller-2 >>>>>>> 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 >>>>>>> 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | >>>>>>> Create containers managed by Podman for >>>>>>> /var/lib/tripleo-config/container-startup-config/step_4 >>>>>>> 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 >>>>>>> 18:37:24.530812 | | WARNING | >>>>>>> ERROR: Can't run container nova_libvirt_init_secret >>>>>>> stderr: >>>>>>> 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 >>>>>>> 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | >>>>>>> Create containers managed by Podman for >>>>>>> /var/lib/tripleo-config/container-startup-config/step_4 | >>>>>>> overcloud-novacompute-0 | error={"changed": false, "msg": "Failed >>>>>>> containers: nova_libvirt_init_secret"} >>>>>>> 2022-08-03 18:37:44,282 p=507732 u >>>>>>> >>>>>>> >>>>>>> *external-ceph.conf:* >>>>>>> >>>>>>> parameter_defaults: >>>>>>> # Enable use of RBD backend in nova-compute >>>>>>> NovaEnableRbdBackend: True >>>>>>> # Enable use of RBD backend in cinder-volume >>>>>>> CinderEnableRbdBackend: True >>>>>>> # Backend to use for cinder-backup >>>>>>> CinderBackupBackend: ceph >>>>>>> # Backend to use for glance >>>>>>> GlanceBackend: rbd >>>>>>> # Name of the Ceph pool hosting Nova ephemeral images >>>>>>> NovaRbdPoolName: vms >>>>>>> # Name of the Ceph pool hosting Cinder volumes >>>>>>> CinderRbdPoolName: volumes >>>>>>> # Name of the Ceph pool hosting Cinder backups >>>>>>> CinderBackupRbdPoolName: backups >>>>>>> # Name of the Ceph pool hosting Glance images >>>>>>> GlanceRbdPoolName: images >>>>>>> # Name of the user to authenticate with the external Ceph cluster >>>>>>> CephClientUserName: admin >>>>>>> # The cluster FSID >>>>>>> CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' >>>>>>> # The CephX user auth key >>>>>>> CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' >>>>>>> # The list of Ceph monitors >>>>>>> CephExternalMonHost: >>>>>>> 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' >>>>>>> ~ >>>>>>> >>>>>>> >>>>>>> Have tried checking and validating the ceph client details and they >>>>>>> seem to be correct, further digging the container log I could see something >>>>>>> like this : >>>>>>> >>>>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>>>> nova_libvirt_init_secret.log >>>>>>> tail: cannot open 'nova_libvirt_init_secret.log' for reading: No >>>>>>> such file or directory >>>>>>> tail: no files remaining >>>>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>>>> stdouts/nova_libvirt_init_secret.log >>>>>>> 2022-08-04T11:48:47.689898197+05:30 stdout F >>>>>>> ------------------------------------------------ >>>>>>> 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh >>>>>>> secrets for: ceph:admin >>>>>>> 2022-08-04T11:48:47.690590594+05:30 stdout F Error: >>>>>>> /etc/ceph/ceph.conf was not found >>>>>>> 2022-08-04T11:48:47.690625088+05:30 stdout F Path to >>>>>>> nova_libvirt_init_secret was ceph:admin >>>>>>> 2022-08-04T16:20:29.643785538+05:30 stdout F >>>>>>> ------------------------------------------------ >>>>>>> 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh >>>>>>> secrets for: ceph:admin >>>>>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Error: >>>>>>> /etc/ceph/ceph.conf was not found >>>>>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Path to >>>>>>> nova_libvirt_init_secret was ceph:admin >>>>>>> ^C >>>>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>>>> stdouts/nova_compute_init_log.log >>>>>>> >>>>>>> -- >>>>>>> ~ Lokendra >>>>>>> skype: lokendrarathour >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Francesco Pantano >>>>>> GPG KEY: F41BD75C >>>>>> >>>>> >>>>> >>>>> -- >>>>> ~ Lokendra >>>>> skype: lokendrarathour >>>>> >>>>> >>>>> >>> >>> -- >>> ~ Lokendra >>> skype: lokendrarathour >>> >>> >>> > > -- > ~ Lokendra > skype: lokendrarathour > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokendrarathour at gmail.com Thu Aug 25 14:49:14 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Thu, 25 Aug 2022 20:19:14 +0530 Subject: [Triple0] [Wallaby] External Ceph Integration getting failed In-Reply-To: References: Message-ID: Hi John, thanks for the quick response. "/home/stack/templates/my-additional-ceph-settings.yaml " this file is adding backward compatible state for my External Ceph (Octopus) [stack at undercloud ~]$ cat templates/my-additional-ceph-settings.yaml parameter_defaults: ExtraConfig: ceph::profile::params::rbd_default_features: '1' [stack at undercloud ~]$ I also agree that Ceph has nothing to do with this error, but somehow we use to get this error earlier when we were using SSL + DNS I tried rerunning the command that it should run to create the required endpoints and it is running. Then I reexecuted the steps in Debug mode: pending results.... The full traceback is: File "/tmp/ansible_openstack.cloud.endpoint_payload_qhlqb_qw/ansible_openstack.cloud.endpoint_payload.zip/ansible_collections/open stack/cloud/plugins/module_utils/openstack.py", line 407, in __call__ results = self.run() File "/tmp/ansible_openstack.cloud.endpoint_payload_qhlqb_qw/ansible_openstack.cloud.endpoint_payload.zip/ansible_collections/open stack/cloud/plugins/modules/endpoint.py", line 150, in run File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", line 537, in get_service return _utils._get_entity(self, 'service', name_or_id, filters) File "/usr/lib/python3.6/site-packages/openstack/cloud/_utils.py", line 197, in _get_entity entities = search(name_or_id, filters, **kwargs) File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", line 517, in search_services services = self.list_services() File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", line 501, in list_services error_message="Failed to list services") File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 395, in get return self.request(url, 'GET', **kwargs) File "/usr/lib/python3.6/site-packages/openstack/proxy.py", line 668, in request return _json_response(response, error_message=error_message) File "/usr/lib/python3.6/site-packages/openstack/proxy.py", line 646, in _json_response exceptions.raise_from_response(response, error_message=error_message) File "/usr/lib/python3.6/site-packages/openstack/exceptions.py", line 238, in raise_from_response http_status=http_status, request_id=request_id 2022-08-25 19:38:03.201899 | 5254004d-021e-7578-cd65-000000007ad6 | FATAL | Create identity internal endpoint | undercloud | er ror={ "changed": false, "extra_data": { "data": null, "details": "The request you have made requires authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The request you have made requires authentication.\",\"title\":\"Unautho rized\"}}\n" }, "invocation": { "module_args": { "api_timeout": null, "auth": null, "auth_type": null, "availability_zone": null, "ca_cert": null, "client_cert": null, "client_key": null, "enabled": true, "endpoint_interface": "internal", "interface": "public", "region": "regionOne", "region_name": null, "service": "keystone", "state": "present", "timeout": 180, "url": "http://[fd00:fd00:fd00:2000::368]:5000", "validate_certs": null, "wait": true } }, "msg": "Failed to list services: Client Error for url: https://overcloud-public.myhsc.com:13000/v3/services, The request you hav e made requires authentication." } Checking further in the endpoint list I see: DeprecationWarning +----------------------------------+-----------+--------------+--------------+---------+-----------+------------------------------------------+ | ID | Region | Service Name | Service Type | Enabled | Interface | URL | +----------------------------------+-----------+--------------+--------------+---------+-----------+------------------------------------------+ | 11c9e71cf2e3482c9af47afcdab54472 | regionOne | keystone | identity | True | internal | http://[fd00:fd00:fd00:2000::368]:5000 | | 34fdd910a4e641e8897a7360b504bdba | regionOne | keystone | identity | True | public | https://overcloud-public.myhsc.com:13000 | | 770eeebb8e544a93a0215158c6c9b811 | regionOne | keystone | identity | True | admin | http://30.30.30.142:35357 | +----------------------------------+-----------+----------- As you can see this is the internal point it creates and then also it stated the error for the internal endpoints reported above. trying to debug more around it, do let me know please in case something specific you see here. thanks once again. Lokendra On Thu, Aug 25, 2022 at 7:02 PM John Fulton wrote: > On Thu, Aug 25, 2022 at 9:04 AM Lokendra Rathour < > lokendrarathour at gmail.com> wrote: > >> Hi John, >> Thanks for the inputs. Now I see something strange. >> Deployment with external ceph is unstable, >> > > I assume you're using Wallaby. > > There's a downstream job testing external ceph daily. The external ceph > feature of TripleO in Wallaby is stable. I think you have something else > going on that conflates with your use of external ceph. > > >> it got deployed once and we saw an error of VM not getting created >> because of some reasons, we were debugging and found that we found some NTP >> related observation, which we fixed and tried redeploying. >> Not it again got failed at step 4: >> > > External Ceph is already configured before step 4. You can inspect your > system after this failure to see that this role: > > > https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_ceph_client > > has done its job of distributing cephx keys and a ceph.conf file into this > path: > > > https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/cephadm/ceph-client.yaml#L61 > > That should be all that doing a "-e > /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml" > results in. Maybe there's something else in > my-additional-ceph-settings.yaml that shouldn't be there that's causing > your overcloud to try to create an endpoint? I think that unlikely but I'm > trying to come up with an explanation for the correlation you're reporting. > > 2022-08-25 17:34:29.036371 | 5254004d-021e-d4db-067d-000000007b1a | TASK >> | Create identity internal endpoint >> 2022-08-25 17:34:31.176105 | 5254004d-021e-d4db-067d-000000007b1a | FATAL >> | Create identity internal endpoint | undercloud | error={"changed": false, >> "extra_data": {"data": null, "details": "The request you have made requires >> authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The >> request you have made requires authentication.\",\"title\":\"Unauthorized\"} >> }\n"}, "msg": "Failed to list services: Client Error for url: >> https://overcloud-public.mydomain.com:13000/v3/services >> , The request you >> have made requires authentication."} >> >> > The above is happening from this role: > > > https://github.com/openstack/tripleo-ansible/blob/e9cc12d4ce0b1c9e96b58f6102e8a3906ed9a1d3/tripleo_ansible/roles/tripleo_keystone_resources/tasks/admin.yml#L81-L87 > > > To revalidate the case, I tried a fresh setup and saw that deployment >> again failed at step 4. >> and when we remove external ceph from the deployment command, we see that >> deployment is happening 100%. >> I see authorization errors, which I used to get earlier as well, but >> because of DNS we were able to resolve this. >> what could be the reason for this when we are using External ceph ? >> > > I really don't think this is related to external ceph configuration. > Correlation does not always mean causality. > > any inputs would be helpful >> >> deploy command: >> >> stack at undercloud ~]$ cat deploy_step2.sh >> openstack overcloud deploy --templates \ >> -r /home/stack/templates/roles_data.yaml \ >> -n /home/stack/templates/custom_network_data.yaml \ >> -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ >> -e /home/stack/templates/networks-deployed-environment.yaml \ >> -e /home/stack/templates/vip-deployed-environment.yaml \ >> -e /home/stack/templates/environment.yaml \ >> -e >> /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml >> \ >> -e >> /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-inspector.yaml >> \ >> -e >> /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-overcloud.yaml >> \ >> -e /home/stack/templates/ironic-config.yaml \ >> -e >> /usr/share/openstack-tripleo-heat-templates/environments/services/ptp.yaml \ >> -e /home/stack/templates/enable-tls.yaml \ >> -e >> /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml >> \ >> -e /home/stack/templates/cloudname.yaml \ >> -e >> /usr/share/openstack-tripleo-heat-templates/environments/ssl/inject-trust-anchor-hiera.yaml >> \ >> -e >> /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml >> \ >> -e /home/stack/templates/my-additional-ceph-settings.yaml \ >> -e >> /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ >> -e >> /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \ >> -e /home/stack/containers-prepare-parameter.yaml >> >> > Also, please re-arrange the order of your templates. > > Any path with /usr/share should be first. Then any path in /home should > then follow. > > openstack overcloud deploy --templates \ > -r /home/stack/templates/roles_data.yaml \ > -n /home/stack/templates/custom_network_data.yaml \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml > \ > > -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ > > > You override the values in the templates tripleo ships (in /usr/share) > with your own values for your env (in /home/stack). > > If you include /usr/share after, then you could override your own custom > values. Just a best practice to rule out other issues. > > At this point I think you should find out what command is being executed > when this task runs: > > > https://github.com/openstack/tripleo-ansible/blob/e9cc12d4ce0b1c9e96b58f6102e8a3906ed9a1d3/tripleo_ansible/roles/tripleo_keystone_resources/tasks/admin.yml#L81-L87 > > Find out the values. Then run that command manually on the CLI of the > system where it is failing. At that point you'll have decoupled what the > deployment tool is doing vs the failing command on your system and share > that on the list. > > >> [stack at undercloud ~]$ cat >> /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml >> resource_registry: >> OS::TripleO::Services::CephExternal: >> ../deployment/cephadm/ceph-client.yaml >> >> parameter_defaults: >> # NOTE: These example parameters are required when using CephExternal >> CephClusterFSID: 'ca3080e3-aa3a-4d1a-b1fd-483459a9ea4c' >> CephClientKey: 'AQB2hMZi2u13NxAAVjmKopw+kNm6OnZOG7NktQ==' >> CephExternalMonHost: >> 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' >> >> # the following parameters enable Ceph backends for Cinder, Glance, >> Gnocchi and Nova >> NovaEnableRbdBackend: true >> CinderEnableRbdBackend: true >> CinderBackupBackend: ceph >> GlanceBackend: rbd >> # Uncomment below if enabling legacy telemetry >> # GnocchiBackend: rbd >> # If the Ceph pools which host VMs, Volumes and Images do not match >> these >> # names OR the client keyring to use is not named 'openstack', edit the >> # following as needed. >> NovaRbdPoolName: vms >> CinderRbdPoolName: volumes >> CinderBackupRbdPoolName: backups >> GlanceRbdPoolName: images >> # Uncomment below if enabling legacy telemetry >> # GnocchiRbdPoolName: metrics >> CephClientUserName: openstack >> >> # finally we disable the Cinder LVM backend >> CinderEnableIscsiBackend: false >> > > I recommend instead that you not modify > /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml > and instead include it and then after it that you override your own values. > > John > > >> >> >> On Fri, Aug 19, 2022 at 10:32 PM John Fulton wrote: >> >>> On Fri, Aug 19, 2022 at 3:45 AM Lokendra Rathour < >>> lokendrarathour at gmail.com> wrote: >>> >>>> Hi Fulton, >>>> Thanks for the inputs and apologies for the delay in response. >>>> to my surprise passing the container prepare in standard worked for me, >>>> new container-prepare is: >>>> >>>> parameter_defaults: >>>> ContainerImagePrepare: >>>> - push_destination: true >>>> set: >>>> ceph_alertmanager_image: alertmanager >>>> ceph_alertmanager_namespace: quay.ceph.io/prometheus >>>> ceph_alertmanager_tag: v0.16.2 >>>> ceph_grafana_image: grafana >>>> ceph_grafana_namespace: quay.ceph.io/app-sre >>>> ceph_grafana_tag: 6.7.4 >>>> ceph_image: daemon >>>> ceph_namespace: quay.io/ceph >>>> ceph_node_exporter_image: node-exporter >>>> ceph_node_exporter_namespace: quay.ceph.io/prometheus >>>> ceph_node_exporter_tag: v0.17.0 >>>> ceph_prometheus_image: prometheus >>>> ceph_prometheus_namespace: quay.ceph.io/prometheus >>>> ceph_prometheus_tag: v2.7.2 >>>> ceph_tag: v6.0.7-stable-6.0-pacific-centos-stream8 >>>> name_prefix: openstack- >>>> name_suffix: '' >>>> namespace: myserver.com:5000/tripleowallaby >>>> neutron_driver: ovn >>>> rhel_containers: false >>>> tag: current-tripleo >>>> tag_from_label: rdo_version >>>> >>>> But if we see or look at these containers I do not see any such >>>> containers available. we have tried looking at Undercloud and overcloud. >>>> >>> >>> The undercloud can download continers from the sources above and then >>> act as a container registry. It's described here: >>> >>> >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/container_image_prepare.html >>> >>> >>>> Also, the deployment is done when we are passing this config. >>>> Thanks once again. >>>> >>>> Also, we need to understand some use cases of using the storage from >>>> this external ceph, which can work as the mount for the VM as direct or >>>> Shared storage. Any idea or available document which tells more about how >>>> to consume external Ceph in the existing triple Overcloud? >>>> >>> >>> Ceph can provide OpenStack Block, Object and File storage and TripleO >>> supports a variety of integration options for them. >>> >>> TripleO can deploy Ceph as part of the OpenStack overcloud: >>> >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html >>> >>> TripleO can also deploy an OpenStack overcloud which uses an existing >>> external ceph cluster: >>> >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/ceph_external.html >>> >>> At the end of both of these documents you can expect Glance, Nova, and >>> Cinder to use Ceph block storage (RBD). >>> >>> You can also have OpenStack use Ceph object storage (RGW). When RGW is >>> used, a command like "openstack container create foo" will create an object >>> storage container (not to be confused with podman/docker) on CephRGW as if >>> your overcloud were running OpenStack Swift. If you have TripleO deploy >>> Ceph as part of the OpenStack overcloud, RGW will be deployed and >>> configured for OpenStack object storage by default (in Wallaby+). >>> >>> The OpenStack Manila service can use CephFS as one of its backends. >>> TripleO can deploy that too as described here: >>> >>> >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_manila.html >>> >>> John >>> >>> >>>> Do share in case you know any, please. >>>> >>>> Thanks once again for the support, it was really helpful >>>> >>>> >>>> On Thu, Aug 11, 2022 at 9:59 PM John Fulton >>>> wrote: >>>> >>>>> The ceph container should no longer be needed for external ceph >>>>> configuration (since the move from ceph-ansible to cephadm) but if removing >>>>> the ceph env files makes the error go away, then try adding it back and >>>>> then following these steps to prepare the ceph container on your undercloud >>>>> before deploying. >>>>> >>>>> >>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html#container-options >>>>> >>>>> On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour < >>>>> lokendrarathour at gmail.com> wrote: >>>>> >>>>>> Hi Thanks, >>>>>> for the inputs, we could see the miss, >>>>>> now we have added the required miss : >>>>>> "TripleO resource >>>>>> OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml" >>>>>> >>>>>> Now with this setting if we deploy the setup in wallaby, we are >>>>>> getting error as: >>>>>> >>>>>> >>>>>> PLAY [External deployment step 1] >>>>>> ********************************************** >>>>>> 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | >>>>>> TASK | External deployment step 1 >>>>>> 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | >>>>>> OK | External deployment step 1 | undercloud -> localhost | result={ >>>>>> "changed": false, >>>>>> "msg": "Use --start-at-task 'External deployment step 1' to >>>>>> resume from this task" >>>>>> } >>>>>> [WARNING]: ('undercloud -> localhost', >>>>>> '525400d4-7124-4a42-664c-0000000000a8') >>>>>> missing from stats >>>>>> 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | >>>>>> TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s >>>>>> 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | >>>>>> INCLUDED | >>>>>> /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml >>>>>> | undercloud >>>>>> 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | >>>>>> TASK | Set some tripleo-ansible facts >>>>>> 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | >>>>>> OK | Set some tripleo-ansible facts | undercloud >>>>>> 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | >>>>>> TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | >>>>>> 0.03s >>>>>> 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | >>>>>> TASK | Container image prepare >>>>>> 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | >>>>>> FATAL | Container image prepare | *undercloud | error={"changed": >>>>>> false, "error": "None: Max retries exceeded with url: /v2/ (Caused by >>>>>> None)", "msg": "Error running container image prepare: None: Max retries >>>>>> exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* >>>>>> 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | >>>>>> TIMING | tripleo_container_image_prepare : Container image prepare | >>>>>> undercloud | 0:06:13.385607 | 72.12s >>>>>> >>>>>> This gets failed at step 1, As this is wallaby and based on the >>>>>> document (Use an external Ceph cluster with the Overcloud ? TripleO >>>>>> 3.0.0 documentation (openstack.org) >>>>>> ) >>>>>> we should only pass this external-ceph.yaml for the external ceph >>>>>> intergration. >>>>>> But it is not happening. >>>>>> >>>>>> >>>>>> Few things to note: >>>>>> 1. Container Prepare: >>>>>> >>>>>> (undercloud) [stack at undercloud ~]$ cat >>>>>> containers-prepare-parameter.yaml >>>>>> # Generated with the following on 2022-06-28T18:56:38.642315 >>>>>> # >>>>>> # openstack tripleo container image prepare default >>>>>> --local-push-destination --output-env-file >>>>>> /home/stack/containers-prepare-parameter.yaml >>>>>> # >>>>>> >>>>>> >>>>>> parameter_defaults: >>>>>> ContainerImagePrepare: >>>>>> - push_destination: true >>>>>> set: >>>>>> name_prefix: openstack- >>>>>> name_suffix: '' >>>>>> namespace: myserver.com:5000/tripleowallaby >>>>>> neutron_driver: ovn >>>>>> rhel_containers: false >>>>>> tag: current-tripleo >>>>>> tag_from_label: rdo_version >>>>>> (undercloud) [stack at undercloud ~]$ >>>>>> >>>>>> 2. this is SSL based deployment. >>>>>> >>>>>> Any idea for the error, the issue is seen only once we have the >>>>>> external ceph integration enabled. >>>>>> >>>>>> Best Regards, >>>>>> Lokendra >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> ceph is supposed to be configured by this tripleo-ansible role [1], >>>>>>> which is triggered by tht on external_deploy_steps [2]. >>>>>>> In theory adding [3] should just work, assuming you customize the >>>>>>> ceph cluster mon ip addresses, fsid and a few other related variables. >>>>>>> From your previous email I suspect in your external-ceph.yaml you >>>>>>> missed the TripleO resource OS::TripleO::Services::CephExternal: >>>>>>> ../deployment/cephadm/ceph-client.yaml >>>>>>> (see [3]). >>>>>>> >>>>>>> Thanks, >>>>>>> Francesco >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_ceph_client >>>>>>> [2] >>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/cephadm/ceph-client.yaml#L93 >>>>>>> [3] >>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/master/environments/external-ceph.yaml >>>>>>> >>>>>>> On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < >>>>>>> lokendrarathour at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Team, >>>>>>>> I was trying to integrate External Ceph with Triple0 Wallaby, and >>>>>>>> at the end of deployment in step4 getting the below error: >>>>>>>> >>>>>>>> 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 >>>>>>>> 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | >>>>>>>> Create containers from >>>>>>>> /var/lib/tripleo-config/container-startup-config/step_4 >>>>>>>> 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 >>>>>>>> 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | >>>>>>>> /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | >>>>>>>> overcloud-controller-2 >>>>>>>> 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 >>>>>>>> 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | >>>>>>>> Create containers managed by Podman for >>>>>>>> /var/lib/tripleo-config/container-startup-config/step_4 >>>>>>>> 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 >>>>>>>> 18:37:24.530812 | | WARNING | >>>>>>>> ERROR: Can't run container nova_libvirt_init_secret >>>>>>>> stderr: >>>>>>>> 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 >>>>>>>> 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | >>>>>>>> Create containers managed by Podman for >>>>>>>> /var/lib/tripleo-config/container-startup-config/step_4 | >>>>>>>> overcloud-novacompute-0 | error={"changed": false, "msg": "Failed >>>>>>>> containers: nova_libvirt_init_secret"} >>>>>>>> 2022-08-03 18:37:44,282 p=507732 u >>>>>>>> >>>>>>>> >>>>>>>> *external-ceph.conf:* >>>>>>>> >>>>>>>> parameter_defaults: >>>>>>>> # Enable use of RBD backend in nova-compute >>>>>>>> NovaEnableRbdBackend: True >>>>>>>> # Enable use of RBD backend in cinder-volume >>>>>>>> CinderEnableRbdBackend: True >>>>>>>> # Backend to use for cinder-backup >>>>>>>> CinderBackupBackend: ceph >>>>>>>> # Backend to use for glance >>>>>>>> GlanceBackend: rbd >>>>>>>> # Name of the Ceph pool hosting Nova ephemeral images >>>>>>>> NovaRbdPoolName: vms >>>>>>>> # Name of the Ceph pool hosting Cinder volumes >>>>>>>> CinderRbdPoolName: volumes >>>>>>>> # Name of the Ceph pool hosting Cinder backups >>>>>>>> CinderBackupRbdPoolName: backups >>>>>>>> # Name of the Ceph pool hosting Glance images >>>>>>>> GlanceRbdPoolName: images >>>>>>>> # Name of the user to authenticate with the external Ceph cluster >>>>>>>> CephClientUserName: admin >>>>>>>> # The cluster FSID >>>>>>>> CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' >>>>>>>> # The CephX user auth key >>>>>>>> CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' >>>>>>>> # The list of Ceph monitors >>>>>>>> CephExternalMonHost: >>>>>>>> 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' >>>>>>>> ~ >>>>>>>> >>>>>>>> >>>>>>>> Have tried checking and validating the ceph client details and they >>>>>>>> seem to be correct, further digging the container log I could see something >>>>>>>> like this : >>>>>>>> >>>>>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>>>>> nova_libvirt_init_secret.log >>>>>>>> tail: cannot open 'nova_libvirt_init_secret.log' for reading: No >>>>>>>> such file or directory >>>>>>>> tail: no files remaining >>>>>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>>>>> stdouts/nova_libvirt_init_secret.log >>>>>>>> 2022-08-04T11:48:47.689898197+05:30 stdout F >>>>>>>> ------------------------------------------------ >>>>>>>> 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh >>>>>>>> secrets for: ceph:admin >>>>>>>> 2022-08-04T11:48:47.690590594+05:30 stdout F Error: >>>>>>>> /etc/ceph/ceph.conf was not found >>>>>>>> 2022-08-04T11:48:47.690625088+05:30 stdout F Path to >>>>>>>> nova_libvirt_init_secret was ceph:admin >>>>>>>> 2022-08-04T16:20:29.643785538+05:30 stdout F >>>>>>>> ------------------------------------------------ >>>>>>>> 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh >>>>>>>> secrets for: ceph:admin >>>>>>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Error: >>>>>>>> /etc/ceph/ceph.conf was not found >>>>>>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Path to >>>>>>>> nova_libvirt_init_secret was ceph:admin >>>>>>>> ^C >>>>>>>> [root at overcloud-novacompute-0 containers]# tail -f >>>>>>>> stdouts/nova_compute_init_log.log >>>>>>>> >>>>>>>> -- >>>>>>>> ~ Lokendra >>>>>>>> skype: lokendrarathour >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Francesco Pantano >>>>>>> GPG KEY: F41BD75C >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~ Lokendra >>>>>> skype: lokendrarathour >>>>>> >>>>>> >>>>>> >>>> >>>> -- >>>> ~ Lokendra >>>> skype: lokendrarathour >>>> >>>> >>>> >> >> -- >> ~ Lokendra >> skype: lokendrarathour >> >> >> -- ~ Lokendra skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: From m73hdi at gmail.com Thu Aug 25 16:51:08 2022 From: m73hdi at gmail.com (mahdi n) Date: Thu, 25 Aug 2022 21:21:08 +0430 Subject: question about skyline apiserver and console Message-ID: Hello, I want to install ui skyline but I don't know how to install Does the first install the skyline api server next console? or only install skyline console ui? where must I set the keystone url ? skyline have not docs fir develop please help -------------- next part -------------- An HTML attachment was scrubbed... URL: From ricolin at ricolky.com Thu Aug 25 17:48:38 2022 From: ricolin at ricolky.com (Rico Lin) Date: Fri, 26 Aug 2022 01:48:38 +0800 Subject: [election][heat][ptl] PTL non-candidacy Message-ID: Hi all, As the PTL election will soon start, I would like to share my statement on not planning to run another term of Heat PTL. And instead, I encourage anyone (core reviewer or not) who is interested to put their name on. I will definitely still stay around and help with reviews and patches. *Rico Lin* -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at dabrowski.cloud Thu Aug 25 18:23:31 2022 From: damian at dabrowski.cloud (Damian Dabrowski) Date: Thu, 25 Aug 2022 20:23:31 +0200 Subject: Regarding Designate Installation in ansible Playbook In-Reply-To: References: Message-ID: hi, creating conf.d/designate.yml file will not trigger container(s) creation. You need to define where designate containers should be created by setting `dnsaas_hosts` variable in openstack_user_config.yml. On Thu, Aug 25, 2022 at 6:55 PM Adivya Singh wrote: > hi Team, > > I have made under conf.d a designate. yml file, and rerunning the setup of > Openstack-setuphost but still it is not installing a Designate Container in > OpenStack. > > Is there any Reason for this, Something i am doing right > > Regards > Adivya Singh > -------------- next part -------------- An HTML attachment was scrubbed... URL: From haleyb.dev at gmail.com Thu Aug 25 19:07:36 2022 From: haleyb.dev at gmail.com (Brian Haley) Date: Thu, 25 Aug 2022 15:07:36 -0400 Subject: [neutron][elections] PTL non-candidacy for Antelope cycle In-Reply-To: References: Message-ID: <76f67794-d7f1-610e-4a23-c0860bae7552@gmail.com> Thanks for all your hard work as PTL Lajos! -Brian On 8/25/22 6:04 AM, Lajos Katona wrote: > Hi, > It was a great pleasure and honor to be Neutron PTL for 2 cycles, > thanks everybody for the help and support during this time > (actually not just for these cycles but for all). > > It's not just smoke and ruins around networking after my PTLship, > so after all I would say it was a success :-) > > My main focus was to keep the Neutron team as encouraging > and inclusive as possible and work on cooperation with new contributor > groups > and even with other projects of whom we are consuming in Openstack. > > It is time to change and allow new ideas and energies to form Networking. > > I remain and I hope I can help the community in the next cycles also. > > Cheers > Lajos Katona > From amy at demarco.com Thu Aug 25 23:15:25 2022 From: amy at demarco.com (Amy Marrich) Date: Thu, 25 Aug 2022 18:15:25 -0500 Subject: [election][heat][ptl] PTL non-candidacy In-Reply-To: References: Message-ID: Rico thank you for all your hard work on Heat over the years. I can still remember meeting you the first time in Austin in the Heat room. Amy On Thu, Aug 25, 2022 at 12:57 PM Rico Lin wrote: > Hi all, > > As the PTL election will soon start, I would like to share my statement on > not planning to run another term of Heat PTL. And instead, I encourage > anyone (core reviewer or not) who is interested to put their name on. > I will definitely still stay around and help with reviews and patches. > > *Rico Lin* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ildiko.vancsa at gmail.com Thu Aug 25 23:22:47 2022 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Thu, 25 Aug 2022 16:22:47 -0700 Subject: [ptg][edge]OpenStack at the Edge discussions Message-ID: <9A553237-F0C1-4E16-BBBB-0D501D4E2EA5@gmail.com> Hi, I?m reaching out to you in preparation to the upcoming PTG. The Edge Computing Group already started to prepare for the event and OpenStack came up during our conversations to have cross-project discussions with. The number of edge computing use cases in production is growing and OpenStack is a key infrastructure building block. Some edge related discussions regarding the project came up at previous PTGs and it would be great to continue and extend them. We covered areas in networking and looked into storage a little bit, and the question of finding a central place to store edge related documentation also came up. To identify new topics and follow up on ones we already started to discuss I added an OpenStack section to the Edge WG?s planning etherpad: https://etherpad.opendev.org/p/ecg-ptg-october-2022 If you are working on related features, have requirements or case studies to share, please add them to the above etherpad, so we can discuss them at the PTG and identify areas we can work together on to solve challenges and share information about exiting solutions. Please let me know if you have any questions or comments. Thanks, Ildik? From lokendrarathour at gmail.com Fri Aug 26 06:25:18 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Fri, 26 Aug 2022 11:55:18 +0530 Subject: [Wallaby] Deployment getting failed Randomly Message-ID: Hi Team, we were trying to deploy OpenStack wallaby, and we see that 4 out of 5 runs deployment is getting failed for mentioned below reasons: *Error:* 2022-08-25 17:34:29.036371 | 5254004d-021e-d4db-067d-000000007b1a | TASK | Create identity internal endpoint 2022-08-25 17:34:31.176105 | 5254004d-021e-d4db-067d-000000007b1a | FATAL | Create identity internal endpoint | undercloud | error={"changed": false, "extra_data": {"data": null, "details": "The request you have made requires authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The request you have made requires authentication.\",\"title\":\"Unauthorized\"} }\n"}, "msg": "Failed to list services: Client Error for url: overcloud-public.myhsc.com :13000/v3/services , The request you have made requires authentication."} Debug logs: pending results.... The full traceback is: File "/tmp/ansible_openstack.cloud.endpoint_payload_qhlqb_qw/ansible_openstack.cloud.endpoint_payload.zip/ansible_collections/open stack/cloud/plugins/module_utils/openstack.py", line 407, in __call__ results = self.run() File "/tmp/ansible_openstack.cloud.endpoint_payload_qhlqb_qw/ansible_openstack.cloud.endpoint_payload.zip/ansible_collections/open stack/cloud/plugins/modules/endpoint.py", line 150, in run File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", line 537, in get_service return _utils._get_entity(self, 'service', name_or_id, filters) File "/usr/lib/python3.6/site-packages/openstack/cloud/_utils.py", line 197, in _get_entity entities = search(name_or_id, filters, **kwargs) File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", line 517, in search_services services = self.list_services() File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", line 501, in list_services error_message="Failed to list services") File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 395, in get return self.request(url, 'GET', **kwargs) File "/usr/lib/python3.6/site-packages/openstack/proxy.py", line 668, in request return _json_response(response, error_message=error_message) File "/usr/lib/python3.6/site-packages/openstack/proxy.py", line 646, in _json_response exceptions.raise_from_response(response, error_message=error_message) File "/usr/lib/python3.6/site-packages/openstack/exceptions.py", line 238, in raise_from_response http_status=http_status, request_id=request_id 2022-08-25 19:38:03.201899 | 5254004d-021e-7578-cd65-000000007ad6 | FATAL | Create identity internal endpoint | undercloud | er ror={ "changed": false, "extra_data": { "data": null, "details": "The request you have made requires authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The request you have made requires authentication.\",\"title\":\"Unautho rized\"}}\n" }, "invocation": { "module_args": { "api_timeout": null, "auth": null, "auth_type": null, "availability_zone": null, "ca_cert": null, "client_cert": null, "client_key": null, "enabled": true, "endpoint_interface": "internal", "interface": "public", "region": "regionOne", "region_name": null, "service": "keystone", "state": "present", "timeout": 180, "url": "http://[fd00:fd00:fd00:2000::368]:5000", "validate_certs": null, "wait": true } }, "msg": "Failed to list services: Client Error for url: https://overcloud-public.myhsc.com:13000/v3/services, The request you have made requires authentication." } DeployCommand that was used: openstack overcloud deploy --templates \ -r /home/stack/templates/roles_data.yaml \ -n /home/stack/templates/custom_network_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-inspector.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-overcloud.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ptp.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/inject-trust-anchor-hiera.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \ -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ -e /home/stack/templates/networks-deployed-environment.yaml \ -e /home/stack/templates/vip-deployed-environment.yaml \ -e /home/stack/templates/environment.yaml \ -e /home/stack/templates/ironic-config.yaml \ -e /home/stack/templates/enable-tls.yaml \ -e /home/stack/templates/cloudname.yaml \ -e /home/stack/templates/my-additional-ceph-settings.yaml \ -e /home/stack/containers-prepare-parameter.yaml [Issue] we are seeing the deployment is working perfectly once and the same setting is not working perfectly in the second run. the failure rate is high. what can be the reasons behind this? *Note:* Before this, we were deploying with DNS and SSL and it was perfectly working fine in multiple reruns. But after SSL we have seen this random failure. -- ~ Lokendra skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: From bxzhu_5355 at 163.com Fri Aug 26 06:50:16 2022 From: bxzhu_5355 at 163.com (Boxiang Zhu) Date: Fri, 26 Aug 2022 14:50:16 +0800 (CST) Subject: [heat] ERROR: You are not authorized to use stacks:global_index. Message-ID: <21523a9b.2c59.182d8ea66d2.Coremail.bxzhu_5355@163.com> Hi, I deployed the openstack with kolla-ansible. And the openstack_release of globals.yml is master. The version of openstackclient and heatclient is 5.8.0 and 3.0.0. I run command "source /etc/kolla/admin-openrc.sh" to export env of openstack. OS_PROJECT_DOMAIN_NAME=Default OS_USER_DOMAIN_NAME=Default OS_PROJECT_NAME=admin OS_TENANT_NAME=admin OS_USERNAME=admin OS_PASSWORD=xxxxxxxxx OS_AUTH_URL=http://192.168.100.10:5000 OS_INTERFACE=internal OS_ENDPOINT_TYPE=internalURL OS_MANILA_ENDPOINT_TYPE=internalURL OS_IDENTITY_API_VERSION=3 OS_REGION_NAME=RegionOne OS_AUTH_PLUGIN=password Then I try to list all stacks with command "openstack stack list --all-projects". But I got the error messages as followed: ERROR: You are not authorized to use stacks:global_index. I see the policy is "role:reader and system_scope:all". I think the user admin has role reader and also with system_scope:all. ? openstack role assignment list +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+--------+-----------+ | Role | User | Group | Project | Domain | System | Inherited | +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+--------+-----------+ | cd572da356fb4f7ca53c280802299eb0 | fccbdf34d33a407db1b53bed048d1187 | | 840500fb441a442fbcbca30d3a773b2c | | | False | | cd572da356fb4f7ca53c280802299eb0 | 70d3715e7e2246c08c901d0e96038443 | | | 0a6274ff7f994e8cb6f40e13b0d39ca2 | | False | | cd572da356fb4f7ca53c280802299eb0 | 5c100e870cbd4744af6e546fc9215a37 | | | | all | False | +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+--------+-----------+ ? openstack user show admin +---------------------+----------------------------------+ | Field | Value | +---------------------+----------------------------------+ | domain_id | default | | enabled | True | | id | 5c100e870cbd4744af6e546fc9215a37 | | name | admin | | options | {} | | password_expires_at | None | +---------------------+----------------------------------+ How can I get all the stacks for all projects? Thanks, Best Regards, Boxiang Zhu -------------- next part -------------- An HTML attachment was scrubbed... URL: From gryf73 at gmail.com Fri Aug 26 10:13:51 2022 From: gryf73 at gmail.com (Roman Dobosz) Date: Fri, 26 Aug 2022 12:13:51 +0200 Subject: [kuryr][elections] PTL candidacy for Antelope cycle Message-ID: <20220826121351.a5e36726c3e9dc4ee7cd6990@gmail.com> Hello, I would like to announce my candidacy to be the Kuryr PTL for the next Antelope cycle. I've been Kuryr contributor since the Ussuri release and became core reviewers member at the end of Victoria cycle. In the upcoming Antelope cycle, I'd like to focus on Kuryr stability and compatibility with upcoming Kubernetes release. Another thing which I'd like to move forward is to eliminate Kuryr services restarts. Thanks, Roman Dobosz (gryf) From rdhasman at redhat.com Fri Aug 26 10:31:37 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Fri, 26 Aug 2022 16:01:37 +0530 Subject: [Glance] PTL non-candidacy In-Reply-To: <4f65fec5-9c7d-6594-c543-6af3c36d56b2@gmail.com> References: <4f65fec5-9c7d-6594-c543-6af3c36d56b2@gmail.com> Message-ID: Thanks Abhishek for all your contributions and always helping me out with glance and glance store work. On Thu, Aug 25, 2022 at 9:10 PM Jay Bryant wrote: > Abhishek, > > Thanks for leading Glance all these cycles. Well done! > > Jay > > > On 8/25/2022 2:30 AM, Abhishek Kekane wrote: > > Hi all, > > I'm writing this email to let you know that I'm not going to run for > Glance PTL for the Antelope dev cycle, I think it's time for some new ideas > and new approaches so it's a good idea to hand over the hat of PTL to a new > member of the team. > > I've been serving Glance PTL since Ussuri, tried my best to keep Glance > stable, lots of changes have been made since then, my initial focus was to > improve glance-tempest coverage which we managed to do in past cycles. > > I would like to thank all the members of the Glance team and others for > supporting me during this period. My plan is to stick around, attend to my > duties as a glance core contributor, and support my successor in whatever > way I can to make for a smooth transition. > > Thank you once again! > > Cheers, > Abhishek Kekane > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.kavanagh at canonical.com Fri Aug 26 11:12:05 2022 From: alex.kavanagh at canonical.com (Alex Kavanagh) Date: Fri, 26 Aug 2022 12:12:05 +0100 Subject: [charms][elections] PTL candidacy for Antelope cycle Message-ID: Hi All I would like to put myself forward for another cycle for the Charms PTL for the Antelope cycle. I've been a contributor to OpenStack and Charms since 2016 and a core member since mid 2016. I've also been the PTL for the last 2 cycles. Why do I want to be PTL again? The team has again achieved a huge amount, testing and supporting the charms and packages across multiple Ubuntu LTS releases (bionic, focal and now, jammy) with Charms and OpenStack support back to queens. We've (almost) transitioned to the new Charmhub ( charmhub.io) and will continue to support Queens through to the upcoming Zed release, and onto Antelope. I'd like to continue to support the team as we march forward with the responsibilities of the PTL to smooth the processes. Thanks Alex (tinwood) -- Alex Kavanagh - Software Engineer OpenStack Engineering - Data Centre Development - Canonical Ltd -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Fri Aug 26 13:16:39 2022 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 26 Aug 2022 09:16:39 -0400 Subject: Questions about High Availability setup In-Reply-To: <1488278267.675577.1661414307257.JavaMail.root@mailwas2> References: <1488278267.675577.1661414307257.JavaMail.root@mailwas2> Message-ID: Hi, 3 nodes requirements come from MySQL galera and RabbitMQ clustering because of quorum requirements ( it should be in odd numbers 1, 3, 5 etc..). Rest of components works without clustering and they live behind HAProxy LB for load sharing and redundancy. Someone else can add more details here if I missed something. On Thu, Aug 25, 2022 at 4:05 AM ??? wrote: > Hello > > > I have two questions about deploying openstack in high available setup > > Specifically, HA setup for controller nodes > > > 1. Are openstack services (being deployed on controller nodes) stateless? > > > Aside from non-openstack packages(galera/mysql, zeromq, ...) for > infrastructure, are openstack services stateless? > > For example, can I achieve high availability by deploying two nova-api > services to two separate controller nodes > > by load balacing API calls to them through HAproxy? > > Is this(load balancer) the way how openstack achieves high availability? > > > > 2. Why minimum 3 controller nodes for HA? > > > Is this solely due to etcd? > > > Thanks! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m73hdi at gmail.com Fri Aug 26 14:44:14 2022 From: m73hdi at gmail.com (mahdi n) Date: Fri, 26 Aug 2022 19:14:14 +0430 Subject: Fwd: question about skyline apiserver and console In-Reply-To: References: Message-ID: *Hello, I want to install ui skyline but I don't know how to install* *Does the first install the skyline api server next console?* *or * *only install skyline console ui?* *where must I set the keystone url ?* *skyline have not docs fir develop* *please help* -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Fri Aug 26 16:14:34 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Fri, 26 Aug 2022 13:14:34 -0300 Subject: [Nova] Can't Attach An Encrypted NFS (luksv1) Volume Message-ID: Hello Nova Team, eharney and I are working on NFS encrypted volume support [1] and [2]. I'm trying to attach an encrypted volume to a nova instance. However, this fails with "libvirt.libvirtError: internal error: unable to execute QEMU command 'blockdev-add': Image is not in qcow2 format" error. I'm using a Devstack environment (nova/cinder master branch). Last week I opened a bug report [3] and I'd like to raise some attention to it. I've proposed a WIP patch [4] with a possible solution that I'd like to get some feedback on or know if should follow a different approach. The problem is that is_luks() returns false for the NFS case, because NFS is using LUKS inside of qcow2 and not regular LUKS. Let me know what you think Thank you Sofia [1] https://review.opendev.org/c/openstack/cinder/+/597148 [2] https://review.opendev.org/c/openstack/cinder/+/749155 [3] https://bugs.launchpad.net/nova/+bug/1987311 [4] https://review.opendev.org/c/openstack/nova/+/854030 -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Fri Aug 26 17:22:36 2022 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 26 Aug 2022 13:22:36 -0400 Subject: question about skyline apiserver and console In-Reply-To: References: Message-ID: Hi Mahdi, I have created blog on it, not sure if that is what you looking for https://satishdotpatel.github.io/openstack-skyline-dashborad/ On Fri, Aug 26, 2022 at 11:52 AM mahdi n wrote: > > > *Hello, I want to install ui skyline but I don't know how to install* > > *Does the first install the skyline api server next console?* > > *or * > *only install skyline console ui?* > > *where must I set the keystone url ?* > > *skyline have not docs fir develop* > *please help* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Fri Aug 26 17:36:14 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Fri, 26 Aug 2022 19:36:14 +0200 Subject: [release] Release countdown for week R-5, Aug 29 - Sep 02 Message-ID: Development Focus ----------------- We are getting close to the end of the Zed cycle! Next week on September 1st, 2022 is the Zed-3 milestone, also known as feature freeze. It's time to wrap up feature work in the services and their client libraries, and defer features that won't make it to the 2023.1 Antelope cycle. General Information ------------------- This coming week is the deadline for client libraries: their last feature release needs to happen before "Client library freeze" on September 1st, 2022. Only bugfix releases will be allowed beyond this point. When requesting those library releases, you can also include the stable/zed branching request with the review. As an example, see the "branches" section here: https://opendev.org/openstack/releases/src/branch/master/deliverables/pike/os-brick.yaml#n2 September 1st, 2022 is also the deadline for feature work in all OpenStack deliverables following the cycle-with-rc model. To help those projects produce a first release candidate in time, only bugfixes should be allowed in the master branch beyond this point. Any feature work past that deadline has to be raised as a Feature Freeze Exception (FFE) and approved by the team PTL. Finally, feature freeze is also the deadline for submitting a first version of your cycle-highlights. Cycle highlights are the raw data that helps shape what is communicated in press releases and other release activity at the end of the cycle, avoiding direct contacts from marketing folks. See https://docs.openstack.org/project-team-guide/release-management.html#cycle-highlights for more details. Upcoming Deadlines & Dates -------------------------- Zed-3 milestone (feature freeze): September 1st, 2022 (R-5 week) RC1 deadline: September 15th, 2022 (R-3 week) Final RC deadline: September 29th, 2022 (R-1 week) Final Zed release: October 5th, 2022 Next PTG: October 17-21, 2022 (virtual PTG!!!) El?d Ill?s irc: elodilles From eblock at nde.ag Fri Aug 26 17:38:45 2022 From: eblock at nde.ag (Eugen Block) Date: Fri, 26 Aug 2022 17:38:45 +0000 Subject: Questions about High Availability setup In-Reply-To: References: <1488278267.675577.1661414307257.JavaMail.root@mailwas2> Message-ID: <20220826173845.Horde.Bpt79QZiwkbBoa49JYk2oLQ@webmail.nde.ag> Hi, just in addition to the previous response, cinder-volume is a stateful service and there should be only one instance running. We configured it to be bound to the virtual IP controlled by pacemaker, pacemaker also controls all stateless services in our environment although it wouldn't be necessary. But that way we have all resources at one place and don't need to distinguish. Zitat von Satish Patel : > Hi, > > 3 nodes requirements come from MySQL galera and RabbitMQ clustering because > of quorum requirements ( it should be in odd numbers 1, 3, 5 etc..). Rest > of components works without clustering and they live behind HAProxy LB for > load sharing and redundancy. > > Someone else can add more details here if I missed something. > > On Thu, Aug 25, 2022 at 4:05 AM ??? wrote: > >> Hello >> >> >> I have two questions about deploying openstack in high available setup >> >> Specifically, HA setup for controller nodes >> >> >> 1. Are openstack services (being deployed on controller nodes) stateless? >> >> >> Aside from non-openstack packages(galera/mysql, zeromq, ...) for >> infrastructure, are openstack services stateless? >> >> For example, can I achieve high availability by deploying two nova-api >> services to two separate controller nodes >> >> by load balacing API calls to them through HAproxy? >> >> Is this(load balancer) the way how openstack achieves high availability? >> >> >> >> 2. Why minimum 3 controller nodes for HA? >> >> >> Is this solely due to etcd? >> >> >> Thanks! >> From ozzzo at yahoo.com Fri Aug 26 19:35:24 2022 From: ozzzo at yahoo.com (Albert Braden) Date: Fri, 26 Aug 2022 19:35:24 +0000 (UTC) Subject: [kolla] [nova] Rogue AggregateMultiTenancyIsolation filter References: <705832880.401721.1661542524485.ref@mail.yahoo.com> Message-ID: <705832880.401721.1661542524485@mail.yahoo.com> We're running kolla train, and we use the AggregateMultiTenancyIsolation for some aggregates by setting filter_tenant_id. Today customers reported build failures when they try to build VMs in a non-filtered region. I am able to duplicate the issue: os server create --image --flavor medium --network private --availability-zone open alberttest1 | 5dd44105-2045-4d53-be43-5f521ddb420b | alberttest1 | ERROR | | | medium | 2022-08-26 18:39:38.977 30 INFO nova.filters [req-342d065a-cd47-4edf-bc4b-3f84b34ab97c 25b53bdb96fb5f9f6e7331d7e03eee0a12c45746a9e8b978858b2140a5275a09 fdcf1553db504c8f82a2b54851a4c262 - 8793b235debf49e6aba6bd1e2bf65360 8793b235debf49e6aba6bd1e2bf65360] Filtering removed all hosts for the request with instance ID '5dd44105-2045-4d53-be43-5f521ddb420b'. Filter results: ['ComputeFilter: (start: 50, end: 50)', 'RetryFilter: (start: 50, end: 50)', 'AggregateNumInstancesFilter: (start: 50, end: 50)', 'AvailabilityZoneFilter: (start: 50, end: 6)', 'AggregateInstanceExtraSpecsFilter: (start: 6, end: 6)', 'ImagePropertiesFilter: (start: 6, end: 6)', 'ServerGroupAntiAffinityFilter: (start: 6, end: 6)', 'ServerGroupAffinityFilter: (start: 6, end: 6)', 'AggregateMultiTenancyIsolation: (start: 6, end: 0)'] Region "open" does not have any properties specified, so the AggregateMultiTenancyIsolation filter should not be active. qde3:admin]$ os aggregate show open|grep properties | properties | This is what we would see if it had the filter active: :qde3:admin]$ os aggregate show closed|grep properties | properties | filter_tenant_id='1c41e088b35f4b438023d081a6f70292,3e9727aaf03e4459a176c28dbdb3965e,f9b4b7dc8c614bb09d66657afc3b21cd,121a5da3dd0b489986908bee7eea61ae,d580ccc4b07e478a9efc2d71acf04cc1,107e14eeda01400988e58f5aac8b2772', closed='true' What could be causing this filter to remove hosts when we haven't set filter_tenant_id for that aggregate? From nate.johnston at redhat.com Fri Aug 26 20:00:44 2022 From: nate.johnston at redhat.com (Nate Johnston) Date: Fri, 26 Aug 2022 16:00:44 -0400 Subject: [neutron][elections] PTL non-candidacy for Antelope cycle In-Reply-To: References: Message-ID: <20220826200044.4t5e7mccd27hvcyo@toolbox> Lajos, You have been a great steward and leader of the Neutron community. Thanks and congratulations for a job well done! Nate On Thu, Aug 25, 2022 at 12:04:43PM +0200, Lajos Katona wrote: > Hi, > It was a great pleasure and honor to be Neutron PTL for 2 cycles, > thanks everybody for the help and support during this time > (actually not just for these cycles but for all). > > It's not just smoke and ruins around networking after my PTLship, > so after all I would say it was a success :-) > > My main focus was to keep the Neutron team as encouraging > and inclusive as possible and work on cooperation with new contributor > groups > and even with other projects of whom we are consuming in Openstack. > > It is time to change and allow new ideas and energies to form Networking. > > I remain and I hope I can help the community in the next cycles also. > > Cheers > Lajos Katona From gmann at ghanshyammann.com Fri Aug 26 20:49:45 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 27 Aug 2022 02:19:45 +0530 Subject: [all][tc] What's happening in Technical Committee: summary 2022 Aug 26: Reading: 5 min Message-ID: <182dbeaf7a6.cd0246b3208726.3073943015688881286@ghanshyammann.com> Hello Everyone, Here is this week's summary of the Technical Committee activities. 1. TC Meetings: ============ * We had this week's meeting on Aug 25. Most of the meeting discussions are summarized in this email. Meeting full logs are available @https://meetings.opendev.org/meetings/tc/2022/tc.2022-08-25-15.00.log.html * Next TC weekly meeting will be on Sept 1 Thursday at 15:00 UTC, feel free to add the topic on the agenda[1] by Aug 31. 2. What we completed this week: ========================= * Retired openstack-helm-addons[2] 3. Activities In progress: ================== TC Tracker for Zed cycle ------------------------------ * Zed tracker etherpad includes the TC working items[3], Two are completed and others items are in-progress. Open Reviews ----------------- * Six open reviews for ongoing activities[4]. 2023.1 cycle Technical Election (TC + PTL) planning ------------------------------------------------------------- Nomination for the technical election (PTL and Technical Committee) are open now[5]. We are 1 week delay in election so as per TC charter we are adding this delay as exception[6]. Request you to add your nomination before Aug 31 and also ping anyone you know is interested in the PTL or TC position. Define 2023.1 cycle testing runtime ------------------------------------------- I have proposed the testing runtime for the 2023.1 cycle[7], please review and feedback if any. 2023.1 cycle TC PTG planning ------------------------------------ I have started the PTG planning for the TC sessions. * Similar to last coupe of PTG, we are planning "TC + Leader interaction" sessions[8], please vote your available day/time in poll[8]. * I have created the etherpad[10] to collect the TC slots topics and poll to select the TC slot timing[11]. * As you know, we are planning 'Operator Hours' in this PTG where developer and operator and meet for an hour to discuss/feedback. There is a poll to know if operator want to join developer PTG slot of interested to have their own space and let developer jump in there for discussion[12] 2021 User Survey TC Question Analysis ----------------------------------------------- No update on this. The survey summary is up for review[13]. Feel free to check and provide feedback. Zed cycle Leaderless projects ---------------------------------- Dale Smith volunteer to be PTL for Adjutant project [14] Fixing Zuul config error ---------------------------- Requesting projects with zuul config error to look into those and fix them which should not take much time[15]16]. Project updates ------------------- * Switch requirements to distributed leadership[17] 4. How to contact the TC: ==================== If you would like to discuss or give feedback to TC, you can reach out to us in multiple ways: 1. Email: you can send the email with tag [tc] on openstack-discuss ML[18]. 2. Weekly meeting: The Technical Committee conduct a weekly meeting every Thursday 15 UTC [19] 3. Ping us using 'tc-members' nickname on #openstack-tc IRC channel. [1] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting [2] https://review.opendev.org/c/openstack/governance/+/851859 [3] https://etherpad.opendev.org/p/tc-zed-tracker [4] https://review.opendev.org/q/projects:openstack/governance+status:open [5] https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030094.html [6] https://review.opendev.org/c/openstack/governance/+/854624 [7] https://review.opendev.org/c/openstack/governance/+/854375 [8] https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030108.html [9] https://framadate.org/zsOqRxfVcmtjaPBC [10] https://etherpad.opendev.org/p/tc-2023-1-ptg [11] https://framadate.org/yi8LNQaph5wrirks [12] https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030114.html [13] https://review.opendev.org/c/openstack/governance/+/836888 [14] https://review.opendev.org/c/openstack/governance/+/849606 [15] https://etherpad.opendev.org/p/zuul-config-error-openstack [16] http://lists.openstack.org/pipermail/openstack-discuss/2022-May/028603.html [17] https://review.opendev.org/c/openstack/governance/+/854685 [18] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss [19] http://eavesdrop.openstack.org/#Technical_Committee_Meeting -gmann From bxzhu_5355 at 163.com Fri Aug 26 23:13:16 2022 From: bxzhu_5355 at 163.com (=?utf-8?B?5pyx5Y2a56Wl?=) Date: Sat, 27 Aug 2022 07:13:16 +0800 Subject: question about skyline apiserver and console In-Reply-To: References: Message-ID: hi Now for skyline apiserver, we have move all to the website [1]. you can both install skyline-apiserver from source and docker image. For skyline-console, something you can find here[2] and here[3]. We are doing the work to improve the information to the website[2]. [1] https://docs.openstack.org/skyline-apiserver/latest/ [2] https://docs.openstack.org/skyline-console/latest/ [3] https://opendev.org/openstack/skyline-console > ? 2022?8?26??01:24?mahdi n ??? > > ? > Hello, I want to install ui skyline but I don't know how to install > > Does the first install the skyline api server next console? > or > only install skyline console ui? > > where must I set the keystone url ? > > skyline have not docs fir develop > please help -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokendrarathour at gmail.com Sat Aug 27 02:53:33 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Sat, 27 Aug 2022 08:23:33 +0530 Subject: [Wallaby] Deployment getting failed Randomly In-Reply-To: References: Message-ID: Hi everyone, Thanks once again for the support. The issue got resolved. How it got fixed ? The NTP time was not in sync., i noticed thay recently the NTP is not getting configured properly on the controller and xompute nodes. After enabling thr time sync and validation we redeployed and it worked fine. Thanks, Closing this query thread. On Fri, 26 Aug 2022, 11:55 Lokendra Rathour, wrote: > > Hi Team, > we were trying to deploy OpenStack wallaby, and we see that 4 out of 5 > runs deployment is getting failed for mentioned below reasons: > *Error:* > > 2022-08-25 17:34:29.036371 | 5254004d-021e-d4db-067d-000000007b1a | TASK > | Create identity internal endpoint > 2022-08-25 17:34:31.176105 | 5254004d-021e-d4db-067d-000000007b1a | FATAL > | Create identity internal endpoint | undercloud | error={"changed": false, > "extra_data": {"data": null, "details": "The request you have made requires > authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The > request you have made requires authentication.\",\"title\":\"Unauthorized\"} > }\n"}, "msg": "Failed to list services: Client Error for url: > overcloud-public.myhsc.com > :13000/v3/services > , The request you > have made requires authentication."} > > > Debug logs: > > pending results.... > The full traceback is: > File > "/tmp/ansible_openstack.cloud.endpoint_payload_qhlqb_qw/ansible_openstack.cloud.endpoint_payload.zip/ansible_collections/open > stack/cloud/plugins/module_utils/openstack.py", line 407, in > __call__ > results = self.run() > File > "/tmp/ansible_openstack.cloud.endpoint_payload_qhlqb_qw/ansible_openstack.cloud.endpoint_payload.zip/ansible_collections/open > stack/cloud/plugins/modules/endpoint.py", line 150, in run > File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", > line 537, in get_service > return _utils._get_entity(self, 'service', name_or_id, filters) > File "/usr/lib/python3.6/site-packages/openstack/cloud/_utils.py", line > 197, in _get_entity > entities = search(name_or_id, filters, **kwargs) > File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", > line 517, in search_services > services = self.list_services() > File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", > line 501, in list_services > error_message="Failed to list services") > File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line > 395, in get > return self.request(url, 'GET', **kwargs) > File "/usr/lib/python3.6/site-packages/openstack/proxy.py", line 668, in > request > return _json_response(response, error_message=error_message) > File "/usr/lib/python3.6/site-packages/openstack/proxy.py", line 646, in > _json_response > exceptions.raise_from_response(response, error_message=error_message) > File "/usr/lib/python3.6/site-packages/openstack/exceptions.py", line > 238, in raise_from_response > http_status=http_status, request_id=request_id > 2022-08-25 19:38:03.201899 | 5254004d-021e-7578-cd65-000000007ad6 | > FATAL | Create identity internal endpoint | undercloud | er ror={ > "changed": false, > "extra_data": { > "data": null, > "details": "The request you have made requires authentication.", > "response": "{\"error\":{\"code\":401,\"message\":\"The request > you have made requires authentication.\",\"title\":\"Unautho > rized\"}}\n" > }, > "invocation": { > "module_args": { > "api_timeout": null, > "auth": null, > "auth_type": null, > "availability_zone": null, > "ca_cert": null, > "client_cert": null, > "client_key": null, > "enabled": true, > "endpoint_interface": "internal", > "interface": "public", > "region": "regionOne", > "region_name": null, > "service": "keystone", > "state": "present", > "timeout": 180, > "url": "http://[fd00:fd00:fd00:2000::368]:5000", > "validate_certs": null, > "wait": true > } > }, > "msg": "Failed to list services: Client Error for url: > https://overcloud-public.myhsc.com:13000/v3/services, The request you > have made requires authentication." > } > > DeployCommand that was used: > > openstack overcloud deploy --templates \ > -r /home/stack/templates/roles_data.yaml \ > -n /home/stack/templates/custom_network_data.yaml \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml > \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-inspector.yaml > \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-overcloud.yaml > \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/services/ptp.yaml \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml > \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/ssl/inject-trust-anchor-hiera.yaml > \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml > \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \ > -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ > -e /home/stack/templates/networks-deployed-environment.yaml \ > -e /home/stack/templates/vip-deployed-environment.yaml \ > -e /home/stack/templates/environment.yaml \ > -e /home/stack/templates/ironic-config.yaml \ > -e /home/stack/templates/enable-tls.yaml \ > -e /home/stack/templates/cloudname.yaml \ > -e /home/stack/templates/my-additional-ceph-settings.yaml \ > -e /home/stack/containers-prepare-parameter.yaml > > [Issue] > we are seeing the deployment is working perfectly once and the same > setting is not working perfectly in the second run. > the failure rate is high. > > what can be the reasons behind this? > > *Note:* > Before this, we were deploying with DNS and SSL and it was perfectly > working fine in multiple reruns. > But after SSL we have seen this random failure. > > > -- > ~ Lokendra > skype: lokendrarathour > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokendrarathour at gmail.com Sat Aug 27 02:57:02 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Sat, 27 Aug 2022 08:27:02 +0530 Subject: [Triple0] [Wallaby] External Ceph Integration getting failed In-Reply-To: References: Message-ID: Hi John, It got resolved, reason was NTP. The NTP time was not in sync., i noticed thay recently the NTP is not getting configured properly on the controller and xompute nodes. After enabling thr time sync and validation we redeployed and it worked fine. I have another querry w.r.t to storage integration with a tripleo. We have noticed that only passing the external-ceph.yaml is not doing the deployment, we also need to pass ceph parameters in container-prepare. We did see some containers getting downloaded as well but after the deployment is done we do not see them anywhere. What can be the reason for such containers if not used ? Any point would help me further ensure 100% offline tripleO On Thu, 4 Aug 2022, 17:07 Lokendra Rathour, wrote: > Hi Team, > I was trying to integrate External Ceph with Triple0 Wallaby, and at the > end of deployment in step4 getting the below error: > > 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | > Create containers from > /var/lib/tripleo-config/container-startup-config/step_4 > 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | > /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | > overcloud-controller-2 > 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | > Create containers managed by Podman for > /var/lib/tripleo-config/container-startup-config/step_4 > 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:24.530812 | | WARNING | > ERROR: Can't run container nova_libvirt_init_secret > stderr: > 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | > Create containers managed by Podman for > /var/lib/tripleo-config/container-startup-config/step_4 | > overcloud-novacompute-0 | error={"changed": false, "msg": "Failed > containers: nova_libvirt_init_secret"} > 2022-08-03 18:37:44,282 p=507732 u > > > *external-ceph.conf:* > > parameter_defaults: > # Enable use of RBD backend in nova-compute > NovaEnableRbdBackend: True > # Enable use of RBD backend in cinder-volume > CinderEnableRbdBackend: True > # Backend to use for cinder-backup > CinderBackupBackend: ceph > # Backend to use for glance > GlanceBackend: rbd > # Name of the Ceph pool hosting Nova ephemeral images > NovaRbdPoolName: vms > # Name of the Ceph pool hosting Cinder volumes > CinderRbdPoolName: volumes > # Name of the Ceph pool hosting Cinder backups > CinderBackupRbdPoolName: backups > # Name of the Ceph pool hosting Glance images > GlanceRbdPoolName: images > # Name of the user to authenticate with the external Ceph cluster > CephClientUserName: admin > # The cluster FSID > CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' > # The CephX user auth key > CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' > # The list of Ceph monitors > CephExternalMonHost: > 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' > ~ > > > Have tried checking and validating the ceph client details and they seem > to be correct, further digging the container log I could see something like > this : > > [root at overcloud-novacompute-0 containers]# tail -f > nova_libvirt_init_secret.log > tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file > or directory > tail: no files remaining > [root at overcloud-novacompute-0 containers]# tail -f > stdouts/nova_libvirt_init_secret.log > 2022-08-04T11:48:47.689898197+05:30 stdout F > ------------------------------------------------ > 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets > for: ceph:admin > 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf > was not found > 2022-08-04T11:48:47.690625088+05:30 stdout F Path to > nova_libvirt_init_secret was ceph:admin > 2022-08-04T16:20:29.643785538+05:30 stdout F > ------------------------------------------------ > 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets > for: ceph:admin > 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf > was not found > 2022-08-04T16:20:29.644785532+05:30 stdout F Path to > nova_libvirt_init_secret was ceph:admin > ^C > [root at overcloud-novacompute-0 containers]# tail -f > stdouts/nova_compute_init_log.log > > -- > ~ Lokendra > skype: lokendrarathour > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.vanommen at gmail.com Sat Aug 27 20:02:57 2022 From: john.vanommen at gmail.com (John van Ommen) Date: Sat, 27 Aug 2022 13:02:57 -0700 Subject: What is the Traffic Flow between Horizon and Swift? Message-ID: When a user is using Horizon, and they're examining the contents of their containers in Swift, what does the traffic flow look like? I am unable to browse the contents of my Swift containers from Horizon. When I click on the contents of the container I get: "Unable to get the objects in the container" And when I click on the "Services" tab in Horizon, I get: "Unable to get Nova services list" and "Unable to get Cinder services list" John van Ommen -------------- next part -------------- An HTML attachment was scrubbed... URL: From naveen.j at simnovus.com Sun Aug 28 00:58:48 2022 From: naveen.j at simnovus.com (Naveen j) Date: Sun, 28 Aug 2022 06:28:48 +0530 Subject: Lagging of vm Message-ID: Hi Team. When i create a vm, my vms are lagging so much when I run any application in ubuntu Regards Naveen j -------------- next part -------------- An HTML attachment was scrubbed... URL: From hanguangyu2 at gmail.com Sun Aug 28 09:18:00 2022 From: hanguangyu2 at gmail.com (=?UTF-8?B?6Z+p5YWJ5a6H?=) Date: Sun, 28 Aug 2022 17:18:00 +0800 Subject: [nova] Could I boot instance from volume snapshot in cmd? Message-ID: Hello, I want boot instance from volume snapshot in cmd, but I never found it in `openstack server create --help` and https://docs.openstack.org/nova/latest/user/launch-instances.html. Could I do it in cmd? I saw that this can be done on horizon Thanks, Han Guangyu From tkajinam at redhat.com Sun Aug 28 13:52:22 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Sun, 28 Aug 2022 22:52:22 +0900 Subject: [heat] ERROR: You are not authorized to use stacks:global_index. In-Reply-To: <21523a9b.2c59.182d8ea66d2.Coremail.bxzhu_5355@163.com> References: <21523a9b.2c59.182d8ea66d2.Coremail.bxzhu_5355@163.com> Message-ID: Hello, Your admin-openrc.sh includes OS_PROJECT_NAME and OS_TENANT_NAME. This means you are using project scope instead of system scope. If you want to use the project scope access you should remove these two variables and use OS_SYSTEM_SCOPE=all instead. > I see the policy is "role:reader and system_scope:all". I think the user admin has role reader > and also with system_scope:all. Policy rule enforcement is applied based on the scope used in API access. In your case you use project scope token to access the Heat API so the system scope role assignment is NOT populated. Also, unfortunately Heat api does not allow CLI to use system scope because of the project_id/tenant_id template in its endpoint url, which can't be resolved when system scope is used.. If you want to use system scope to access Heat API then you are likely to need to implement your own tool or use raw http client such as curl. Thank you, Takashi On Fri, Aug 26, 2022 at 4:08 PM Boxiang Zhu wrote: > > Hi, > > I deployed the openstack with kolla-ansible. And the openstack_release of > globals.yml is master. > The version of openstackclient and heatclient is 5.8.0 and 3.0.0. > > I run command "source /etc/kolla/admin-openrc.sh" to export env of > openstack. > OS_PROJECT_DOMAIN_NAME=Default > OS_USER_DOMAIN_NAME=Default > OS_PROJECT_NAME=admin > OS_TENANT_NAME=admin > OS_USERNAME=admin > OS_PASSWORD=xxxxxxxxx > OS_AUTH_URL=http://192.168.100.10:5000 > OS_INTERFACE=internal > OS_ENDPOINT_TYPE=internalURL > OS_MANILA_ENDPOINT_TYPE=internalURL > OS_IDENTITY_API_VERSION=3 > OS_REGION_NAME=RegionOne > OS_AUTH_PLUGIN=password > > Then I try to list all stacks with command "openstack stack list > --all-projects". But I got the error > messages as followed: > *ERROR: You are not authorized to use stacks:global_index.* > > I see the policy is "role:reader and system_scope:all". I think the user > admin has role reader > and also with system_scope:all. > ? openstack role assignment list > > +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+--------+-----------+ > | Role | User | > Group | Project | Domain > | System | Inherited | > > +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+--------+-----------+ > | cd572da356fb4f7ca53c280802299eb0 | fccbdf34d33a407db1b53bed048d1187 | > | 840500fb441a442fbcbca30d3a773b2c | | > | False | > | cd572da356fb4f7ca53c280802299eb0 | 70d3715e7e2246c08c901d0e96038443 | > | | 0a6274ff7f994e8cb6f40e13b0d39ca2 | > | False | > | cd572da356fb4f7ca53c280802299eb0 | *5c100e870cbd4744af6e546fc9215a37* > | | | > | *all *| False | > > +----------------------------------+----------------------------------+-------+----------------------------------+----------------------------------+--------+-----------+ > ? openstack user show admin > +---------------------+----------------------------------+ > | Field | Value | > +---------------------+----------------------------------+ > | domain_id | default | > | enabled | True | > | id | *5c100e870cbd4744af6e546fc9215a37* | > | name | admin | > | options | {} | > | password_expires_at | None | > +---------------------+----------------------------------+ > > How can I get all the stacks for all projects? > > Thanks, > Best Regards, > > Boxiang Zhu > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.kavanagh at canonical.com Sun Aug 28 14:16:13 2022 From: alex.kavanagh at canonical.com (Alex Kavanagh) Date: Sun, 28 Aug 2022 15:16:13 +0100 Subject: [charms] Nominate Luciano Giudice for charms-ceph core In-Reply-To: <5719a7c8-7739-4acd-2d3a-422615d23af1@canonical.com> References: <5719a7c8-7739-4acd-2d3a-422615d23af1@canonical.com> Message-ID: On Tue, 9 Aug 2022 at 14:39, Chris MacNaughton < chris.macnaughton at canonical.com> wrote: > Hello all, > > I'd like to propose Luciano as a new Ceph charms core team member. He > has contributed quality changes over the last year, and has been > providing quality reviews for the Ceph charms. > > patches: > https://review.opendev.org/q/owner:luciano.logiudice%2540canonical.com > reviews: > https://review.opendev.org/q/reviewedby:luciano.logiudice%2540canonical.com > > I hope you will join me in supporting Luciano. > I think Luciano will make a great member of the Ceph charms core team and thus it's a +1 from me. Cheers Alex. > Chris MacNaughton > > -- Alex Kavanagh - Software Engineer OpenStack Engineering - Data Centre Development - Canonical Ltd -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Sun Aug 28 15:05:37 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Mon, 29 Aug 2022 00:05:37 +0900 Subject: [infra][puppet] Old mirror contents in apt-puppetlabs Message-ID: Hello Infra team, I noticed the contents in the apt-puppetlabs directory in our CI mirror are old. The mirror repository provides puppet 6.23 while the upstream repository provides newer versions such as 6.28. Recently we bumped puppetlabs-mysql in our CI to 13.0.0 which requires puppet >= 6.24.0 and our Ubuntu jobs are failing now at a quite early stage because of the old puppet package. May someone please look into this ? I've checked mirror.iad3.inmotion.opendev.org and mirror.bhs1.ovh.opendev.org but it seems the contents in the directory have not been synced since July, 2021. Thank you, Takashi -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Sun Aug 28 16:14:19 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Mon, 29 Aug 2022 01:14:19 +0900 Subject: [xena][placement] Xena placement upgrade leads to 500 on ubuntu focal In-Reply-To: References: Message-ID: We've been facing this error in the ubuntu jobs in Puppet OpenStack project and it seems the issue is caused by the policy.yaml provided by the packages. I've reported a bug against their packaging bug tracker. I have zero knowledge about Ubuntu packaging but hopefully someone from the package maintainers can look into it. https://bugs.launchpad.net/ubuntu/+source/placement/+bug/1987984 You might want to try clearing the policy.yaml file and see whether that solves your problem. On Sat, Aug 20, 2022 at 1:08 AM CHANU ROMAIN wrote: > Hello, > > I just did upgrade my Placement to Xena on Ubuntu Focal (20.04). When I > tried to start the process I got this error and all HTTP requests receive > an HTTP 500 error: > > > 2022-08-19 15:05:43.960573 2022-08-19 15:05:43.960 43 INFO > placement.requestlog [req-f4c4d4f1-5d59-49d3-aa3e-1e8a09fe02fe > 3ec54dee59424109913d4628ae8dac4c 19e62bc767484849a2763937883a256e - default > default] 192.168.236.5 "GET > /resource_providers/791c09ed-57f3-4bfc-9278-4af6c5c137d8/allocations" > status: 500 len: 244 microversion: 1.0\x1b[00m > 2022-08-19 15:05:44.094951 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap [req-527dce52-c207-43ee-80f2-016a6f031cf5 > 3ec54dee59424109913d4628ae8dac4c 19e62bc767484849a2763937883a256e - default > default] Placement API unexpected error: unsupported callable: TypeError: > unsupported callable > 2022-08-19 15:05:44.094973 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap Traceback (most recent call last): > 2022-08-19 15:05:44.094977 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 1135, in > getfullargspec > 2022-08-19 15:05:44.094980 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap sig = _signature_from_callable(func, > 2022-08-19 15:05:44.094998 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2233, in > _signature_from_callable > 2022-08-19 15:05:44.095001 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap sig = _signature_from_callable( > 2022-08-19 15:05:44.095004 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2304, in > _signature_from_callable > 2022-08-19 15:05:44.095007 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return _signature_from_function(sigcls, obj, > 2022-08-19 15:05:44.095010 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2168, in > _signature_from_function > 2022-08-19 15:05:44.095013 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap parameters.append(Parameter(name, > annotation=annotation, > 2022-08-19 15:05:44.095015 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2491, in > __init__ > 2022-08-19 15:05:44.095018 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap self._kind = _ParameterKind(kind) > 2022-08-19 15:05:44.095021 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap RecursionError: maximum recursion depth exceeded > 2022-08-19 15:05:44.095024 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap > 2022-08-19 15:05:44.095026 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap The above exception was the direct cause of the > following exception: > 2022-08-19 15:05:44.095029 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap > 2022-08-19 15:05:44.095032 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap Traceback (most recent call last): > 2022-08-19 15:05:44.095035 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/placement/fault_wrap.py", line 39, in > __call__ > 2022-08-19 15:05:44.095038 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return self.application(environ, start_response) > 2022-08-19 15:05:44.095040 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File "/usr/lib/python3/dist-packages/webob/dec.py", > line 129, in __call__ > 2022-08-19 15:05:44.095043 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap resp = self.call_func(req, *args, **kw) > 2022-08-19 15:05:44.095046 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File "/usr/lib/python3/dist-packages/webob/dec.py", > line 193, in call_func > 2022-08-19 15:05:44.095049 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return self.func(req, *args, **kwargs) > 2022-08-19 15:05:44.095052 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/microversion_parse/middleware.py", line 80, > in __call__ > 2022-08-19 15:05:44.095055 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap response = req.get_response(self.application) > 2022-08-19 15:05:44.095057 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/webob/request.py", line 1313, in send > 2022-08-19 15:05:44.095060 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap status, headers, app_iter = self.call_application( > 2022-08-19 15:05:44.095063 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/webob/request.py", line 1278, in > call_application > 2022-08-19 15:05:44.095065 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap app_iter = application(self.environ, start_response) > 2022-08-19 15:05:44.095068 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/placement/handler.py", line 215, in __call__ > 2022-08-19 15:05:44.095071 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return dispatch(environ, start_response, self._map) > 2022-08-19 15:05:44.095074 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/placement/handler.py", line 149, in dispatch > 2022-08-19 15:05:44.095077 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return handler(environ, start_response) > 2022-08-19 15:05:44.095083 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File "/usr/lib/python3/dist-packages/webob/dec.py", > line 129, in __call__ > 2022-08-19 15:05:44.095086 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap resp = self.call_func(req, *args, **kw) > 2022-08-19 15:05:44.095089 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/placement/wsgi_wrapper.py", line 29, in > call_func > 2022-08-19 15:05:44.095092 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap super(PlacementWsgify, self).call_func(req, *args, > **kwargs) > 2022-08-19 15:05:44.095094 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File "/usr/lib/python3/dist-packages/webob/dec.py", > line 193, in call_func > 2022-08-19 15:05:44.095097 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return self.func(req, *args, **kwargs) > 2022-08-19 15:05:44.095100 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/placement/util.py", line 64, in > decorated_function > 2022-08-19 15:05:44.095103 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return f(req) > 2022-08-19 15:05:44.095106 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/placement/handlers/allocation.py", line > 299, in list_for_resource_provider > 2022-08-19 15:05:44.098861 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check > 2022-08-19 15:05:44.098864 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return rule(*rule_args) > 2022-08-19 15:05:44.098867 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in > __call__ > 2022-08-19 15:05:44.098870 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): > 2022-08-19 15:05:44.098873 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check > 2022-08-19 15:05:44.098876 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return rule(*rule_args) > 2022-08-19 15:05:44.098893 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in > __call__ > 2022-08-19 15:05:44.098896 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): > 2022-08-19 15:05:44.098899 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check > 2022-08-19 15:05:44.098905 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return rule(*rule_args) > 2022-08-19 15:05:44.098907 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 255, in > __call__ > 2022-08-19 15:05:44.098910 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return _check( > 2022-08-19 15:05:44.098913 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check > 2022-08-19 15:05:44.098916 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return rule(*rule_args) > 2022-08-19 15:05:44.098919 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 255, in > __call__ > 2022-08-19 15:05:44.098922 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return _check( > 2022-08-19 15:05:44.098925 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check > 2022-08-19 15:05:44.098928 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return rule(*rule_args) > 2022-08-19 15:05:44.098930 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in > __call__ > 2022-08-19 15:05:44.098933 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): > 2022-08-19 15:05:44.098936 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check > 2022-08-19 15:05:44.098939 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return rule(*rule_args) > 2022-08-19 15:05:44.098941 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in > __call__ > 2022-08-19 15:05:44.098944 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): > 2022-08-19 15:05:44.098947 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check > 2022-08-19 15:05:44.098950 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return rule(*rule_args) > 2022-08-19 15:05:44.098952 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 255, in > __call__ > 2022-08-19 15:05:44.098955 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return _check( > 2022-08-19 15:05:44.098958 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check > 2022-08-19 15:05:44.098960 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return rule(*rule_args) > 2022-08-19 15:05:44.098963 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 255, in > __call__ > 2022-08-19 15:05:44.098966 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return _check( > 2022-08-19 15:05:44.098968 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check > 2022-08-19 15:05:44.098971 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return rule(*rule_args) > 2022-08-19 15:05:44.098974 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in > __call__ > 2022-08-19 15:05:44.098977 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): > 2022-08-19 15:05:44.098980 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check > 2022-08-19 15:05:44.098982 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return rule(*rule_args) > 2022-08-19 15:05:44.098985 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in > __call__ > 2022-08-19 15:05:44.098991 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): > 2022-08-19 15:05:44.098993 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check > 2022-08-19 15:05:44.098996 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return rule(*rule_args) > 2022-08-19 15:05:44.098999 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 255, in > __call__ > 2022-08-19 15:05:44.099002 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap return _check( > 2022-08-19 15:05:44.099004 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File > "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 75, in _check > 2022-08-19 15:05:44.099007 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap argspec = inspect.getfullargspec(rule.__call__) > 2022-08-19 15:05:44.099010 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 1144, in > getfullargspec > 2022-08-19 15:05:44.099013 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap raise TypeError('unsupported callable') from ex > 2022-08-19 15:05:44.099016 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap TypeError: unsupported callable > 2022-08-19 15:05:44.099018 2022-08-19 15:05:44.077 46 ERROR > placement.fault_wrap \x1b[00m > > Placement-api is deployed in a container, so I got a fresh policy.yaml > file. > Did someone already face this? Do you have any idea how to fix this? > > Best regards, > Romain > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Sun Aug 28 16:17:55 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Sun, 28 Aug 2022 09:17:55 -0700 Subject: [infra][puppet] Old mirror contents in apt-puppetlabs In-Reply-To: References: Message-ID: <4647f88d-1737-4832-8cd5-b38a05937b40@www.fastmail.com> On Sun, Aug 28, 2022, at 8:05 AM, Takashi Kajinami wrote: > Hello Infra team, > > > I noticed the contents in the apt-puppetlabs directory in our CI mirror > are old. > The mirror repository provides puppet 6.23 while the upstream > repository provides > newer versions such as 6.28. > > Recently we bumped puppetlabs-mysql in our CI to 13.0.0 which requires > puppet >= 6.24.0 > and our Ubuntu jobs are failing now at a quite early stage because of > the old puppet package. > > May someone please look into this ? I've checked > mirror.iad3.inmotion.opendev.org and > mirror.bhs1.ovh.opendev.org but it seems the contents in the directory > have not been synced > since July, 2021. Regardless of the mirror server the content is served from a shared AFS filesystem. This means checking one is as good as any other. Logs for reprepro are also stored on AFS and served by the mirror servers: https://mirror.ord.rax.opendev.org/logs/reprepro/apt-puppetlabs.log. The logs show that there is a bad component and an expired key. If you track down what a correct component list and valid key are you can update our reprepro role [0][1][2] to fix this. [0] https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/reprepro [1] https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/reprepro/files/apt-puppetlabs/config/updates [2] https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/reprepro/tasks/puppetlabs.yaml > > Thank you, > Takashi From fungi at yuggoth.org Sun Aug 28 16:34:46 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sun, 28 Aug 2022 16:34:46 +0000 Subject: [nova] Could I boot instance from volume snapshot in cmd? In-Reply-To: References: Message-ID: <20220828163446.saiwzcygn426ipms@yuggoth.org> On 2022-08-28 17:18:00 +0800 (+0800), ??? wrote: > I want boot instance from volume snapshot in cmd, but I never found it > in `openstack server create --help` and > https://docs.openstack.org/nova/latest/user/launch-instances.html. > > Could I do it in cmd? I saw that this can be done on horizon I'm not clear exactly what you're looking for, but the `openstack server create` subcommand has several related options. If you want to boot from a snapshot image use --snapshot, for booting from a server image with a new volume backing use --boot-from-volume, and to boot from an existing volume which contains a server snapshot use --volume: https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#cmdoption-openstack-server-create-volume I think you may be asking about the --volume option? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From vincet at iastate.edu Sun Aug 28 15:03:13 2022 From: vincet at iastate.edu (Lee, Vincent Z) Date: Sun, 28 Aug 2022 15:03:13 +0000 Subject: containers ad instances are not working Message-ID: [https://res.cdn.office.net/assets/mail/file-icon/png/txt_16x16.png]nova-compute 1.log Hi all, I am quite new to Openstack and faced some issues with creating instances and displaying created containers on my Openstack dashboard. I am currently working on multinode Openstack. I am hoping to get some helps and feedbacks about this. I will briefly go through the problems I encountered. When I created an instances on my dashboard, it directly went into error state. I have attached a screenshot as shown below. [cid:6d3cb41a-9176-4209-8e12-d669e05ecf5b] This is my list of hosts. [cid:11a45ab0-b7f8-4c79-a619-131d01121687] When I created a container using both cli and dashboard, two containers were running. However, when I tried to look into my dashboard, those containers were not shown. These are the created containers. [cid:1b05b862-8e1d-4833-a3e0-57b921fd6351] However, when i try to open my dashboard and look for them, they just don't appear on it. So I am not sure what is causing this. [cid:0eff5e44-5690-4590-9682-43410e2d52f1] The version of my Openstack deployment is 21.2.4. I deployed my Openstack using the master branch (latest). I have followed the guidelines provided online (https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html) which uses kolla-ansible. In my /var/log/kolla/zun/ directory, I have only a total of six different log files. [cid:4b28ae60-c42a-4cd4-a063-49431d8b5334] I have attached a sample of my globals.yml, multinode, zun-api and compute-api log file in this email. The below image is a sample of how I setup my openstack as shown below. [cid:8cc989cd-b334-4647-8084-048b499e229e] The below image is the reference of my local network. My controller [cid:6e322de9-216e-48c6-b0a2-7cd3f2465ba1] [cid:4695cd8c-a01c-45b5-9523-e9b9f95a9ebc] [cid:0426c49b-701d-4a9b-aadc-ea9353840f66] [cid:f1e89ed4-fc88-4679-bcb5-0150bef1fd7a] [cid:51c37875-f354-4f18-8a83-619ddb2c19d5] My compute host [cid:aad2498a-52f7-4f81-9c57-cdb49f52de4c] [cid:83c1a81e-342a-4ec0-9d94-c73753152087] Hope to hear from everyone soon. Best regards, Vincent -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 100775 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 94703 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 271147 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 36726 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 45714 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 60148 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 19935 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 28231 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26178 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 19836 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 38740 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 21512 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 63434 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: globals.yml Type: application/octet-stream Size: 32512 bytes Desc: globals.yml URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: multinode Type: application/octet-stream Size: 9615 bytes Desc: multinode URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: zun-api.log Type: application/octet-stream Size: 3290 bytes Desc: zun-api.log URL: From fungi at yuggoth.org Sun Aug 28 17:03:19 2022 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sun, 28 Aug 2022 17:03:19 +0000 Subject: containers ad instances are not working In-Reply-To: References: Message-ID: <20220828170319.6oos5bqj5fqarip5@yuggoth.org> On 2022-08-28 15:03:13 +0000 (+0000), Lee, Vincent Z wrote: > I am quite new to Openstack and faced some issues with creating > instances and displaying created containers on my Openstack > dashboard. I am currently working on multinode Openstack. I am > hoping to get some helps and feedbacks about this. I will briefly > go through the problems I encountered. [...] It looks like this is probably the same set of questions you posted last week. Please see the earlier replies to that post which included suggestions and requests for additional details: https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030118.html https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030119.html -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From tkajinam at redhat.com Sun Aug 28 17:30:51 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Mon, 29 Aug 2022 02:30:51 +0900 Subject: [infra][puppet] Old mirror contents in apt-puppetlabs In-Reply-To: <4647f88d-1737-4832-8cd5-b38a05937b40@www.fastmail.com> References: <4647f88d-1737-4832-8cd5-b38a05937b40@www.fastmail.com> Message-ID: Thanks for the pointers ! It seems the system-config repository contains the old gpg key which expired in August 2021[1]. I've pushed the change[2] to replace the expired key by the new key. [1] https://puppet.com/blog/updated-puppet-gpg-signing-key-2020-edition/ [2] https://review.opendev.org/c/opendev/system-config/+/854923/ On Mon, Aug 29, 2022 at 1:37 AM Clark Boylan wrote: > On Sun, Aug 28, 2022, at 8:05 AM, Takashi Kajinami wrote: > > Hello Infra team, > > > > > > I noticed the contents in the apt-puppetlabs directory in our CI mirror > > are old. > > The mirror repository provides puppet 6.23 while the upstream > > repository provides > > newer versions such as 6.28. > > > > Recently we bumped puppetlabs-mysql in our CI to 13.0.0 which requires > > puppet >= 6.24.0 > > and our Ubuntu jobs are failing now at a quite early stage because of > > the old puppet package. > > > > May someone please look into this ? I've checked > > mirror.iad3.inmotion.opendev.org and > > mirror.bhs1.ovh.opendev.org but it seems the contents in the directory > > have not been synced > > since July, 2021. > > Regardless of the mirror server the content is served from a shared AFS > filesystem. This means checking one is as good as any other. > > Logs for reprepro are also stored on AFS and served by the mirror servers: > https://mirror.ord.rax.opendev.org/logs/reprepro/apt-puppetlabs.log. The > logs show that there is a bad component and an expired key. If you track > down what a correct component list and valid key are you can update our > reprepro role [0][1][2] to fix this. > > [0] > https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/reprepro > [1] > https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/reprepro/files/apt-puppetlabs/config/updates > [2] > https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/reprepro/tasks/puppetlabs.yaml > > > > > Thank you, > > Takashi > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincet at iastate.edu Sun Aug 28 18:10:05 2022 From: vincet at iastate.edu (test) Date: Sun, 28 Aug 2022 13:10:05 -0500 Subject: container and instances are not working on my dashboard Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0CC204F9ED8E49AD84010BE572D6E9CE.png Type: image/png Size: 63434 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0F7063AC37BF40DA8B5301326868D997.png Type: image/png Size: 271147 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1CF05EC12CA848228B243B9B3085B1D6.png Type: image/png Size: 26178 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 251D5EBCB3A942F19F9D3FBA87B7A8D4.png Type: image/png Size: 36726 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 35CC35660AC54A2BBCCFDBC5D94CB452.png Type: image/png Size: 19935 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 62CACB6773724BB58D55AFFF14A479EC.png Type: image/png Size: 210 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 77755569B4FD493382DE33F40835A03C.png Type: image/png Size: 94703 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 8B83F7C11D404F168487A79D15252EBF.png Type: image/png Size: 38740 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 8DE202E560884D938BE37D334AD4163B.png Type: image/png Size: 21512 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: C7175FD2EB21447C8D85A98CFE959A64.png Type: image/png Size: 28231 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: D2F212DFD79247C8802E8718B5A8D3C9.png Type: image/png Size: 60148 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: DDD344EDB2F44E6CB5620FB1843C8463.png Type: image/png Size: 19836 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: F5C92D1B11B14A53A6553D67EA7CBB33.png Type: image/png Size: 45714 bytes Desc: not available URL: From fv at spots.edu Mon Aug 29 06:23:34 2022 From: fv at spots.edu (fv at spots.edu) Date: Sun, 28 Aug 2022 23:23:34 -0700 Subject: [openstack-ansible] Converged compute ans ceph storage Message-ID: Hello everyone! Is it possible with OpenStack-Ansible to deploy converged nova compute and ceph storage on a single node? Thank you! From alsotoes at gmail.com Mon Aug 29 06:52:47 2022 From: alsotoes at gmail.com (Alvaro Soto) Date: Mon, 29 Aug 2022 01:52:47 -0500 Subject: [openstack-ansible] Converged compute ans ceph storage In-Reply-To: References: Message-ID: Maybe this will help you. https://abayard.com/openstack-kolla-deploy-external-ceph-ansible/ On Mon, Aug 29, 2022 at 1:39 AM wrote: > Hello everyone! > > Is it possible with OpenStack-Ansible to deploy converged nova compute > and ceph storage on a single node? > > Thank you! > > -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bcafarel at redhat.com Mon Aug 29 07:26:29 2022 From: bcafarel at redhat.com (Bernard Cafarelli) Date: Mon, 29 Aug 2022 09:26:29 +0200 Subject: [neutron][elections] PTL non-candidacy for Antelope cycle In-Reply-To: References: Message-ID: On Thu, 25 Aug 2022 at 12:09, Lajos Katona wrote: > Hi, > It was a great pleasure and honor to be Neutron PTL for 2 cycles, > thanks everybody for the help and support during this time > (actually not just for these cycles but for all). > > It's not just smoke and ruins around networking after my PTLship, > so after all I would say it was a success :-) > I think all previous replies show it was, thanks for your PTL cycles! And all other contributions of course > > My main focus was to keep the Neutron team as encouraging > and inclusive as possible and work on cooperation with new contributor > groups > and even with other projects of whom we are consuming in Openstack. > > It is time to change and allow new ideas and energies to form Networking. > > I remain and I hope I can help the community in the next cycles also. > Counting on that! > > Cheers > Lajos Katona > > -- Bernard Cafarelli -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Mon Aug 29 08:33:49 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 29 Aug 2022 10:33:49 +0200 Subject: [kolla] [nova] Rogue AggregateMultiTenancyIsolation filter In-Reply-To: <705832880.401721.1661542524485@mail.yahoo.com> References: <705832880.401721.1661542524485.ref@mail.yahoo.com> <705832880.401721.1661542524485@mail.yahoo.com> Message-ID: Hello, First, a point about terminology: you use the term "region", but I think you meant an availability zone. Regions are something entirely different in OpenStack. I would suggest the following: - first, enable debug logging in Nova. You can do so in Kolla by setting nova_logging_debug to true and reconfiguring nova. As you can see in the code, there are several log statements at the debug level which would help understand why candidate hosts are rejected by this filter: https://opendev.org/openstack/nova/src/branch/stable/train/nova/scheduler/filters/aggregate_multitenancy_isolation.py - second, maybe check if you have other aggregates that are set to the "open" AZ and would have the filter_tenant_id property on them? On Fri, 26 Aug 2022 at 21:51, Albert Braden wrote: > We're running kolla train, and we use the AggregateMultiTenancyIsolation > for some aggregates by setting filter_tenant_id. Today customers reported > build failures when they try to build VMs in a non-filtered region. I am > able to duplicate the issue: > > os server create --image --flavor medium --network private > --availability-zone open alberttest1 > > | 5dd44105-2045-4d53-be43-5f521ddb420b | alberttest1 | ERROR | | > | medium | > > 2022-08-26 18:39:38.977 30 INFO nova.filters > [req-342d065a-cd47-4edf-bc4b-3f84b34ab97c > 25b53bdb96fb5f9f6e7331d7e03eee0a12c45746a9e8b978858b2140a5275a09 > fdcf1553db504c8f82a2b54851a4c262 - 8793b235debf49e6aba6bd1e2bf65360 > 8793b235debf49e6aba6bd1e2bf65360] Filtering removed all hosts for the > request with instance ID '5dd44105-2045-4d53-be43-5f521ddb420b'. Filter > results: ['ComputeFilter: (start: 50, end: 50)', 'RetryFilter: (start: 50, > end: 50)', 'AggregateNumInstancesFilter: (start: 50, end: 50)', > 'AvailabilityZoneFilter: (start: 50, end: 6)', > 'AggregateInstanceExtraSpecsFilter: (start: 6, end: 6)', > 'ImagePropertiesFilter: (start: 6, end: 6)', > 'ServerGroupAntiAffinityFilter: (start: 6, end: 6)', > 'ServerGroupAffinityFilter: (start: 6, end: 6)', > 'AggregateMultiTenancyIsolation: (start: 6, end: 0)'] > > Region "open" does not have any properties specified, so the > AggregateMultiTenancyIsolation filter should not be active. > > qde3:admin]$ os aggregate show open|grep properties > | properties | > > This is what we would see if it had the filter active: > > :qde3:admin]$ os aggregate show closed|grep properties > | properties | > filter_tenant_id='1c41e088b35f4b438023d081a6f70292,3e9727aaf03e4459a176c28dbdb3965e,f9b4b7dc8c614bb09d66657afc3b21cd,121a5da3dd0b489986908bee7eea61ae,d580ccc4b07e478a9efc2d71acf04cc1,107e14eeda01400988e58f5aac8b2772', > closed='true' > > What could be causing this filter to remove hosts when we haven't set > filter_tenant_id for that aggregate? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.rohmann at inovex.de Mon Aug 29 08:47:17 2022 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Mon, 29 Aug 2022 10:47:17 +0200 Subject: [neutron] Switching the ML2 driver in-place from linuxbridge to OVN for an existing Cloud In-Reply-To: <2446920.D5JjJbiaP6@p1> References: <2446920.D5JjJbiaP6@p1> Message-ID: <4318fbe5-f0f7-34eb-f852-15a6fb6810a6@inovex.de> Thanks Slawek for your quick response! On 23/08/2022 07:47, Slawek Kaplonski wrote: >> 1) Are the data models of the user managed resources abstract (enough) >> from the ML2 used? >> So would the composition of a router, a network, some subnets, a few >> security group and a few instances in a project just result in a >> different instantiation of packet handling components, >> but be otherwise transparent to the user? > Yes, data models are the same so all networks, routers, subnets will be the same but implemented differently by different backend. > The only significant difference may be network types as OVN works mostly with Geneve tunnel networks and with LB backend You are using VXLAN IIUC your email. That is reassuring. Yes we currently use VXLAN. But even with the same type of tunneling, I suppose the networks and their IDs will not align to form a proper layer 2 domain, not even talking about all the other services like DHCP or metadata. See next question about my idea to at least have some gradual switchover. >> 2) What could be possible migration strategies? >> >> [...] Or project by project by changing the network agents over >> to nodes already running OVN? > Even if You will keep vxlan networks with OVN backend (support is kind of limited really) You will not be able to have tunnels established between nodes with different backends so there will be no connectivity between VMs on hosts with different backends. I was more thinking to move all of a projects resources to network nodes (and hypervisors) which already run OVN. So split the cloud in two classes of machines, one set unchanged running Linuxbridge and the other in OVN mode. To migrate "a project" all agents of that projects routers and networks will be changed over to agents running on OVN-powered nodes.... So this would be a hard cut-over, but limited to a single project. In alternative to replacing all of the network agents on all nodes and for all projects at the same time. Wouldn't that work? - in theory - or am I missing something obvious here? >> Has anybody ever done something similar or heard about this being done >> anywhere? > I don't know about anyone who did that but if there is someone, I would be happy to hear about how it was done and how it went :) We will certainly share our story - if we live to talk about it ;-) Thanks again, With kind regards Christian -------------- next part -------------- An HTML attachment was scrubbed... URL: From romain.chanu at univ-lyon1.fr Mon Aug 29 08:50:53 2022 From: romain.chanu at univ-lyon1.fr (CHANU ROMAIN) Date: Mon, 29 Aug 2022 08:50:53 +0000 Subject: [xena][placement] Xena placement upgrade leads to 500 on ubuntu focal In-Reply-To: References: Message-ID: Hello, Thank you for your answer. Yes I found your ticket on heat's story. Comment out all lines did fix the issue. Best regards, Romain On Mon, 2022-08-29 at 01:14 +0900, Takashi Kajinami wrote: > We've been facing this error in the ubuntu jobs in Puppet OpenStack > project > and it seems the issue is caused by the policy.yaml provided by the > packages. > > I've reported a bug against their packaging bug tracker. I have zero > knowledge > about Ubuntu packaging but hopefully someone from the package > maintainers > can look into it. > ?https://bugs.launchpad.net/ubuntu/+source/placement/+bug/1987984 > > You might want to try clearing the policy.yaml file and see whether > that solves > your problem. > > > On Sat, Aug 20, 2022 at 1:08 AM CHANU ROMAIN > wrote: > > Hello, > > > > I just did upgrade my Placement to Xena on Ubuntu Focal (20.04). > > When I tried to start the process I got this error and all HTTP > > requests receive an HTTP 500 error: > > > > > > 2022-08-19 15:05:43.960573 2022-08-19 15:05:43.960 43 INFO > > placement.requestlog [req-f4c4d4f1-5d59-49d3-aa3e-1e8a09fe02fe > > 3ec54dee59424109913d4628ae8dac4c 19e62bc767484849a2763937883a256e - > > default default] 192.168.236.5 "GET /resource_providers/791c09ed- > > 57f3-4bfc-9278-4af6c5c137d8/allocations" status: 500 len: 244 > > microversion: 1.0\x1b[00m > > 2022-08-19 15:05:44.094951 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap [req-527dce52-c207-43ee-80f2-016a6f031cf5 > > 3ec54dee59424109913d4628ae8dac4c 19e62bc767484849a2763937883a256e - > > default default] Placement API unexpected error: unsupported > > callable: TypeError: unsupported callable > > 2022-08-19 15:05:44.094973 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap Traceback (most recent call last): > > 2022-08-19 15:05:44.094977 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line > > 1135, in getfullargspec > > 2022-08-19 15:05:44.094980 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap sig = _signature_from_callable(func, > > 2022-08-19 15:05:44.094998 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line > > 2233, in _signature_from_callable > > 2022-08-19 15:05:44.095001 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap sig = _signature_from_callable( > > 2022-08-19 15:05:44.095004 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line > > 2304, in _signature_from_callable > > 2022-08-19 15:05:44.095007 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return _signature_from_function(sigcls, obj, > > 2022-08-19 15:05:44.095010 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line > > 2168, in _signature_from_function > > 2022-08-19 15:05:44.095013 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap parameters.append(Parameter(name, > > annotation=annotation, > > 2022-08-19 15:05:44.095015 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line > > 2491, in __init__ > > 2022-08-19 15:05:44.095018 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap self._kind = _ParameterKind(kind) > > 2022-08-19 15:05:44.095021 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap RecursionError: maximum recursion depth > > exceeded > > 2022-08-19 15:05:44.095024 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap > > 2022-08-19 15:05:44.095026 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap The above exception was the direct cause of > > the following exception: > > 2022-08-19 15:05:44.095029 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap > > 2022-08-19 15:05:44.095032 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap Traceback (most recent call last): > > 2022-08-19 15:05:44.095035 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/placement/fault_wrap.py", line 39, in __call__ > > 2022-08-19 15:05:44.095038 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return self.application(environ, > > start_response) > > 2022-08-19 15:05:44.095040 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/webob/dec.py", line 129, in __call__ > > 2022-08-19 15:05:44.095043 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap resp = self.call_func(req, *args, **kw) > > 2022-08-19 15:05:44.095046 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/webob/dec.py", line 193, in call_func > > 2022-08-19 15:05:44.095049 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return self.func(req, *args, **kwargs) > > 2022-08-19 15:05:44.095052 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/microversion_parse/middleware.py", line 80, in __call__ > > 2022-08-19 15:05:44.095055 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap response = req.get_response(self.application) > > 2022-08-19 15:05:44.095057 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/webob/request.py", line 1313, in send > > 2022-08-19 15:05:44.095060 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap status, headers, app_iter = > > self.call_application( > > 2022-08-19 15:05:44.095063 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/webob/request.py", line 1278, in call_application > > 2022-08-19 15:05:44.095065 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap app_iter = application(self.environ, > > start_response) > > 2022-08-19 15:05:44.095068 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/placement/handler.py", line 215, in __call__ > > 2022-08-19 15:05:44.095071 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return dispatch(environ, start_response, > > self._map) > > 2022-08-19 15:05:44.095074 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/placement/handler.py", line 149, in dispatch > > 2022-08-19 15:05:44.095077 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return handler(environ, start_response) > > 2022-08-19 15:05:44.095083 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/webob/dec.py", line 129, in __call__ > > 2022-08-19 15:05:44.095086 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap resp = self.call_func(req, *args, **kw) > > 2022-08-19 15:05:44.095089 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/placement/wsgi_wrapper.py", line 29, in call_func > > 2022-08-19 15:05:44.095092 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap super(PlacementWsgify, self).call_func(req, > > *args, **kwargs) > > 2022-08-19 15:05:44.095094 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/webob/dec.py", line 193, in call_func > > 2022-08-19 15:05:44.095097 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return self.func(req, *args, **kwargs) > > 2022-08-19 15:05:44.095100 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/placement/util.py", line 64, in decorated_function > > 2022-08-19 15:05:44.095103 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return f(req) > > 2022-08-19 15:05:44.095106 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/placement/handlers/allocation.py", line 299, in > > list_for_resource_provider > > 2022-08-19 15:05:44.098861 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 80, in _check > > 2022-08-19 15:05:44.098864 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return rule(*rule_args) > > 2022-08-19 15:05:44.098867 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 211, in __call__ > > 2022-08-19 15:05:44.098870 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap if _check(rule, target, cred, enforcer, > > current_rule): > > 2022-08-19 15:05:44.098873 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 80, in _check > > 2022-08-19 15:05:44.098876 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return rule(*rule_args) > > 2022-08-19 15:05:44.098893 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 211, in __call__ > > 2022-08-19 15:05:44.098896 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap if _check(rule, target, cred, enforcer, > > current_rule): > > 2022-08-19 15:05:44.098899 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 80, in _check > > 2022-08-19 15:05:44.098905 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return rule(*rule_args) > > 2022-08-19 15:05:44.098907 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 255, in __call__ > > 2022-08-19 15:05:44.098910 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return _check( > > 2022-08-19 15:05:44.098913 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 80, in _check > > 2022-08-19 15:05:44.098916 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return rule(*rule_args) > > 2022-08-19 15:05:44.098919 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 255, in __call__ > > 2022-08-19 15:05:44.098922 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return _check( > > 2022-08-19 15:05:44.098925 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 80, in _check > > 2022-08-19 15:05:44.098928 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return rule(*rule_args) > > 2022-08-19 15:05:44.098930 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 211, in __call__ > > 2022-08-19 15:05:44.098933 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap if _check(rule, target, cred, enforcer, > > current_rule): > > 2022-08-19 15:05:44.098936 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 80, in _check > > 2022-08-19 15:05:44.098939 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return rule(*rule_args) > > 2022-08-19 15:05:44.098941 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 211, in __call__ > > 2022-08-19 15:05:44.098944 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap if _check(rule, target, cred, enforcer, > > current_rule): > > 2022-08-19 15:05:44.098947 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 80, in _check > > 2022-08-19 15:05:44.098950 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return rule(*rule_args) > > 2022-08-19 15:05:44.098952 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 255, in __call__ > > 2022-08-19 15:05:44.098955 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return _check( > > 2022-08-19 15:05:44.098958 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 80, in _check > > 2022-08-19 15:05:44.098960 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return rule(*rule_args) > > 2022-08-19 15:05:44.098963 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 255, in __call__ > > 2022-08-19 15:05:44.098966 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return _check( > > 2022-08-19 15:05:44.098968 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 80, in _check > > 2022-08-19 15:05:44.098971 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return rule(*rule_args) > > 2022-08-19 15:05:44.098974 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 211, in __call__ > > 2022-08-19 15:05:44.098977 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap if _check(rule, target, cred, enforcer, > > current_rule): > > 2022-08-19 15:05:44.098980 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 80, in _check > > 2022-08-19 15:05:44.098982 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return rule(*rule_args) > > 2022-08-19 15:05:44.098985 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 211, in __call__ > > 2022-08-19 15:05:44.098991 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap if _check(rule, target, cred, enforcer, > > current_rule): > > 2022-08-19 15:05:44.098993 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 80, in _check > > 2022-08-19 15:05:44.098996 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return rule(*rule_args) > > 2022-08-19 15:05:44.098999 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 255, in __call__ > > 2022-08-19 15:05:44.099002 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap return _check( > > 2022-08-19 15:05:44.099004 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3/dist- > > packages/oslo_policy/_checks.py", line 75, in _check > > 2022-08-19 15:05:44.099007 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap argspec = > > inspect.getfullargspec(rule.__call__) > > 2022-08-19 15:05:44.099010 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line > > 1144, in getfullargspec > > 2022-08-19 15:05:44.099013 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap raise TypeError('unsupported callable') from > > ex > > 2022-08-19 15:05:44.099016 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap TypeError: unsupported callable > > 2022-08-19 15:05:44.099018 2022-08-19 15:05:44.077 46 ERROR > > placement.fault_wrap \x1b[00m > > > > Placement-api is deployed in a container, so I got a fresh > > policy.yaml file. > > Did someone already face this? Do you have any idea how to fix > > this? > > > > Best regards, > > Romain -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4513 bytes Desc: not available URL: From zigo at debian.org Mon Aug 29 09:06:58 2022 From: zigo at debian.org (Thomas Goirand) Date: Mon, 29 Aug 2022 11:06:58 +0200 Subject: [requirement][all] Nose being removed from Debian and Ubuntu: let's remove it from our global requirements Message-ID: <797beb6d-b10e-315c-3288-179fedf1f7a5@debian.org> Hi, It's no surprise for anyone that's been following it: Nose was deprecated years ago, and now, we're trying to remove it from Debian. Therefore, 49 bugs were filled against the Debian OpenStack package. It'd be nice if upstream OpenStack was also following this move, and removing nose from global-requirements. I'm not sure if it'd be a big deal if that wasn't the case, as I could use a different test runner in Debian, but I very much would prefer if it was the case. Probably Ubuntu will follow the same path, since Nose isn't in main (it's in Universe, meaning Debian maintained...). Your thoughts? Cheers, Thomas Goirand (zigo) From park0kyung0won at dgist.ac.kr Mon Aug 29 09:14:27 2022 From: park0kyung0won at dgist.ac.kr (=?UTF-8?B?67CV6rK97JuQ?=) Date: Mon, 29 Aug 2022 18:14:27 +0900 (KST) Subject: Openstack OVN (Open Virtual Network) HA deployment - running OVN in active/passive mode? Message-ID: <986740040.720983.1661764467358.JavaMail.root@mailwas2> An HTML attachment was scrubbed... URL: From geguileo at redhat.com Mon Aug 29 09:17:18 2022 From: geguileo at redhat.com (Gorka Eguileor) Date: Mon, 29 Aug 2022 11:17:18 +0200 Subject: Questions about High Availability setup In-Reply-To: <20220826173845.Horde.Bpt79QZiwkbBoa49JYk2oLQ@webmail.nde.ag> References: <1488278267.675577.1661414307257.JavaMail.root@mailwas2> <20220826173845.Horde.Bpt79QZiwkbBoa49JYk2oLQ@webmail.nde.ag> Message-ID: <20220829091718.6ruhbf2aqqpvdycp@localhost> On 26/08, Eugen Block wrote: > Hi, > > just in addition to the previous response, cinder-volume is a stateful > service and there should be only one instance running. We configured it to > be bound to the virtual IP controlled by pacemaker, pacemaker also controls > all stateless services in our environment although it wouldn't be necessary. > But that way we have all resources at one place and don't need to > distinguish. > Hi, Small clarification, cinder-volume is not actually stateful. It is true that historically the cinder-volume service only supported High Availability in Active-Passive mode and required the configuration to set the "host" or "backend_host" configuration option to the same value in all the different controller nodes, so on failover the newly started service would consider existing resources as its own. Currently cinder-volume does support Active-Active HA, though not all drivers support this configuration. Besides using a driver that supports A/A, a DLM is also required and needs to be configured in Cinder, finally the "host" and "backend_host" options must not be configured and the "cluster" configuration should be configured instead. Cheers, Gorka. > > Zitat von Satish Patel : > > > Hi, > > > > 3 nodes requirements come from MySQL galera and RabbitMQ clustering because > > of quorum requirements ( it should be in odd numbers 1, 3, 5 etc..). Rest > > of components works without clustering and they live behind HAProxy LB for > > load sharing and redundancy. > > > > Someone else can add more details here if I missed something. > > > > On Thu, Aug 25, 2022 at 4:05 AM ??? wrote: > > > > > Hello > > > > > > > > > I have two questions about deploying openstack in high available setup > > > > > > Specifically, HA setup for controller nodes > > > > > > > > > 1. Are openstack services (being deployed on controller nodes) stateless? > > > > > > > > > Aside from non-openstack packages(galera/mysql, zeromq, ...) for > > > infrastructure, are openstack services stateless? > > > > > > For example, can I achieve high availability by deploying two nova-api > > > services to two separate controller nodes > > > > > > by load balacing API calls to them through HAproxy? > > > > > > Is this(load balancer) the way how openstack achieves high availability? > > > > > > > > > > > > 2. Why minimum 3 controller nodes for HA? > > > > > > > > > Is this solely due to etcd? > > > > > > > > > Thanks! > > > > > > > From skaplons at redhat.com Mon Aug 29 09:31:55 2022 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 29 Aug 2022 11:31:55 +0200 Subject: [neutron] Bug deputy report - week of 22 Aug Message-ID: <5858624.lOV4Wx5bFT@p1> Hi, I was Neutron's bug deputy last week. Here is summary of what happend during this time: ## Critical https://bugs.launchpad.net/neutron/+bug/1987308 - [ovn-octavia-provider] ovn compilation is failing in "ovn-octavia-provider-tempest-master " job - assigned to Fernando, patch proposed https://review.opendev.org/c/openstack/ovn-octavia-provider/+/854008 ## High https://bugs.launchpad.net/neutron/+bug/1987530 - Duplicate external_ip in NAT table lead to loss of N/S connectivity - needs attention ## Medium https://bugs.launchpad.net/neutron/+bug/1987396 - masquerading behavior changed between queens and train - assigned to David Hill, patch proposed already: https://review.opendev.org/c/openstack/neutron/+/854041 https://bugs.launchpad.net/neutron/+bug/1987780 - Nova notifications with error response are not retried - assigned to Szymon Wr?blewski https://bugs.launchpad.net/neutron/+bug/1987666 - Race condition when adding two subnet with same cidr to router - unassigned currently ## Low https://bugs.launchpad.net/neutron/+bug/1987281 - Create a method to add multiple IP addresses in one call - assigned to Rodolfo https://bugs.launchpad.net/neutron/+bug/1988026 - Neutron should not create security group with project==None - needs assignment, it seems like low hanging fruit bug to fix for me, ## Whishlist (RFEs) https://bugs.launchpad.net/neutron/+bug/1987378 - [RFE] Add DSCP mark 44 - assigned to Rodolfo, marked by him as an RFE but I don't really think we need to discuss and approve it in the drivers meeting. We discussed that during last drivers meeting and it's already triaged. ## Incomlete https://bugs.launchpad.net/neutron/+bug/1987377 - neutron-metadata-agent the memory usage is increasing, Lajos and Oleg already have some comments on that one, ## Needs attention https://bugs.launchpad.net/cloud-init/+bug/1899487 - cloud-init hard codes MTU configuration at initial deploy time - bug which was reported for cloud-init first but now after long discussion it's on Neutron and we need to check it, -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From smooney at redhat.com Mon Aug 29 09:40:23 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 29 Aug 2022 10:40:23 +0100 Subject: Lagging of vm In-Reply-To: References: Message-ID: <3727b6b45e692c1c6912513b0abb47578a21df15.camel@redhat.com> On Sun, 2022-08-28 at 06:28 +0530, Naveen j wrote: > Hi Team. > > When i create a vm, my vms are lagging so much when I run any application > in ubuntu this proably is cause by one of three proablems 1 you are useing qemu not kvm 2 you created a flaoting vm (without cpu pinning) and you are using the isolcpus kernel command line option on the host 3 you have and disk io performance (locally or over the network if you are using network storage) we really dont have anything to go on as we dont know what os you are using beyond ubunut or how you deployed openstack and which version. but those are they 3 things that normaly result in poor vm performance. using qemu, cpu contention or low disk io. > > Regards > Naveen j From smooney at redhat.com Mon Aug 29 09:44:08 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 29 Aug 2022 10:44:08 +0100 Subject: Questions about High Availability setup In-Reply-To: <20220829091718.6ruhbf2aqqpvdycp@localhost> References: <1488278267.675577.1661414307257.JavaMail.root@mailwas2> <20220826173845.Horde.Bpt79QZiwkbBoa49JYk2oLQ@webmail.nde.ag> <20220829091718.6ruhbf2aqqpvdycp@localhost> Message-ID: <74dbec99e1d76b1baccfbd7e71f35b3518af192f.camel@redhat.com> On Mon, 2022-08-29 at 11:17 +0200, Gorka Eguileor wrote: > On 26/08, Eugen Block wrote: > > Hi, > > > > just in addition to the previous response, cinder-volume is a stateful > > service and there should be only one instance running. We configured it to > > be bound to the virtual IP controlled by pacemaker, pacemaker also controls > > all stateless services in our environment although it wouldn't be necessary. > > But that way we have all resources at one place and don't need to > > distinguish. > > > > Hi, > > Small clarification, cinder-volume is not actually stateful. > > It is true that historically the cinder-volume service only supported > High Availability in Active-Passive mode and required the configuration > to set the "host" or "backend_host" configuration option to the same > value in all the different controller nodes, so on failover the newly > started service would consider existing resources as its own. > > Currently cinder-volume does support Active-Active HA, though not all > drivers support this configuration. Besides using a driver that > supports A/A, a DLM is also required and needs to be configured in just wanted to point out that DLM stands for distibuted lock manager incases people are not familar with that TLA. ceph is i think the canonical example of a driver that support Active-Active ha but im sure there are others. > Cinder, finally the "host" and "backend_host" options must not be > configured and the "cluster" configuration should be configured instead. > > Cheers, > Gorka. > > > > > > Zitat von Satish Patel : > > > > > Hi, > > > > > > 3 nodes requirements come from MySQL galera and RabbitMQ clustering because > > > of quorum requirements ( it should be in odd numbers 1, 3, 5 etc..). Rest > > > of components works without clustering and they live behind HAProxy LB for > > > load sharing and redundancy. > > > > > > Someone else can add more details here if I missed something. > > > > > > On Thu, Aug 25, 2022 at 4:05 AM ??? wrote: > > > > > > > Hello > > > > > > > > > > > > I have two questions about deploying openstack in high available setup > > > > > > > > Specifically, HA setup for controller nodes > > > > > > > > > > > > 1. Are openstack services (being deployed on controller nodes) stateless? > > > > > > > > > > > > Aside from non-openstack packages(galera/mysql, zeromq, ...) for > > > > infrastructure, are openstack services stateless? > > > > > > > > For example, can I achieve high availability by deploying two nova-api > > > > services to two separate controller nodes > > > > > > > > by load balacing API calls to them through HAproxy? > > > > > > > > Is this(load balancer) the way how openstack achieves high availability? > > > > > > > > > > > > > > > > 2. Why minimum 3 controller nodes for HA? > > > > > > > > > > > > Is this solely due to etcd? > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > From eblock at nde.ag Mon Aug 29 09:53:10 2022 From: eblock at nde.ag (Eugen Block) Date: Mon, 29 Aug 2022 09:53:10 +0000 Subject: Questions about High Availability setup In-Reply-To: <20220829091718.6ruhbf2aqqpvdycp@localhost> References: <1488278267.675577.1661414307257.JavaMail.root@mailwas2> <20220826173845.Horde.Bpt79QZiwkbBoa49JYk2oLQ@webmail.nde.ag> <20220829091718.6ruhbf2aqqpvdycp@localhost> Message-ID: <20220829095310.Horde.e_D3OiOJ_wHIScqg_0SQwJL@webmail.nde.ag> Hi, > Currently cinder-volume does support Active-Active HA, though not all > drivers support this configuration. Besides using a driver that > supports A/A, a DLM is also required and needs to be configured in > Cinder, finally the "host" and "backend_host" options must not be > configured and the "cluster" configuration should be configured instead. this is interesting information, I was not aware of that. Does the rbd driver support A/A? I'm still dealing with some older cloud deployments and didn't have the time to look into newer features yet, but it would be great! We do currently use the "host" option to let haproxy redirect the cinder-volume requests. I'm definitely gonna need to look into that. Thanks for pointing that out! Thanks, Eugen Zitat von Gorka Eguileor : > On 26/08, Eugen Block wrote: >> Hi, >> >> just in addition to the previous response, cinder-volume is a stateful >> service and there should be only one instance running. We configured it to >> be bound to the virtual IP controlled by pacemaker, pacemaker also controls >> all stateless services in our environment although it wouldn't be necessary. >> But that way we have all resources at one place and don't need to >> distinguish. >> > > Hi, > > Small clarification, cinder-volume is not actually stateful. > > It is true that historically the cinder-volume service only supported > High Availability in Active-Passive mode and required the configuration > to set the "host" or "backend_host" configuration option to the same > value in all the different controller nodes, so on failover the newly > started service would consider existing resources as its own. > > Currently cinder-volume does support Active-Active HA, though not all > drivers support this configuration. Besides using a driver that > supports A/A, a DLM is also required and needs to be configured in > Cinder, finally the "host" and "backend_host" options must not be > configured and the "cluster" configuration should be configured instead. > > Cheers, > Gorka. > > >> >> Zitat von Satish Patel : >> >> > Hi, >> > >> > 3 nodes requirements come from MySQL galera and RabbitMQ >> clustering because >> > of quorum requirements ( it should be in odd numbers 1, 3, 5 etc..). Rest >> > of components works without clustering and they live behind HAProxy LB for >> > load sharing and redundancy. >> > >> > Someone else can add more details here if I missed something. >> > >> > On Thu, Aug 25, 2022 at 4:05 AM ??? wrote: >> > >> > > Hello >> > > >> > > >> > > I have two questions about deploying openstack in high available setup >> > > >> > > Specifically, HA setup for controller nodes >> > > >> > > >> > > 1. Are openstack services (being deployed on controller nodes) >> stateless? >> > > >> > > >> > > Aside from non-openstack packages(galera/mysql, zeromq, ...) for >> > > infrastructure, are openstack services stateless? >> > > >> > > For example, can I achieve high availability by deploying two nova-api >> > > services to two separate controller nodes >> > > >> > > by load balacing API calls to them through HAproxy? >> > > >> > > Is this(load balancer) the way how openstack achieves high availability? >> > > >> > > >> > > >> > > 2. Why minimum 3 controller nodes for HA? >> > > >> > > >> > > Is this solely due to etcd? >> > > >> > > >> > > Thanks! >> > > >> >> >> >> From thierry at openstack.org Mon Aug 29 10:07:02 2022 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 29 Aug 2022 12:07:02 +0200 Subject: [largescale-sig] Next meeting: August 31st, 15utc Message-ID: <2eab599e-33b4-995b-7ed4-2df73ae4abc0@openstack.org> Hi everyone, The Large Scale SIG is back from its extended vacation! We will be meeting this Wednesday in #openstack-operators on OFTC IRC, at 15UTC. You can check how that time translates locally at: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20220831T15 Feel free to add topics to the agenda: https://etherpad.openstack.org/p/large-scale-sig-meeting Regards, -- Thierry Carrez From smooney at redhat.com Mon Aug 29 10:13:38 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 29 Aug 2022 11:13:38 +0100 Subject: [requirement][all] Nose being removed from Debian and Ubuntu: let's remove it from our global requirements In-Reply-To: <797beb6d-b10e-315c-3288-179fedf1f7a5@debian.org> References: <797beb6d-b10e-315c-3288-179fedf1f7a5@debian.org> Message-ID: <19d779c0a89c18c9b12c1aad4295a49f93d53e19.camel@redhat.com> On Mon, 2022-08-29 at 11:06 +0200, Thomas Goirand wrote: > Hi, > > It's no surprise for anyone that's been following it: Nose was > deprecated years ago, and now, we're trying to remove it from Debian. > Therefore, 49 bugs were filled against the Debian OpenStack package. > > It'd be nice if upstream OpenStack was also following this move, and > removing nose from global-requirements. I'm not sure if it'd be a big > deal if that wasn't the case, as I could use a different test runner in > Debian, but I very much would prefer if it was the case. nose was ment to be replaced in all project by stestr many years ago a quick search shows that of the big project swift and trove are porbly the only ones that still install it https://codesearch.opendev.org/?q=nose&i=nope&literal=nope&files=.*requirements.*&excludeFiles=&repos= looking at swift i also see stestr but they are still using nosetest in there tox.ini https://opendev.org/openstack/swift/src/branch/master/tox.ini#L20 that shoudl be trivial to swap over for any of the project but would obviouly have to be done and possibly backported depenidng on what openstack verions require it. i also noticed that many of the charms reference it so it might be more impactful for the charm based installer. > > Probably Ubuntu will follow the same path, since Nose isn't in main > (it's in Universe, meaning Debian maintained...). > > Your thoughts? > > Cheers, > > Thomas Goirand (zigo) > From smooney at redhat.com Mon Aug 29 10:45:22 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 29 Aug 2022 11:45:22 +0100 Subject: Questions about High Availability setup In-Reply-To: <20220829095310.Horde.e_D3OiOJ_wHIScqg_0SQwJL@webmail.nde.ag> References: <1488278267.675577.1661414307257.JavaMail.root@mailwas2> <20220826173845.Horde.Bpt79QZiwkbBoa49JYk2oLQ@webmail.nde.ag> <20220829091718.6ruhbf2aqqpvdycp@localhost> <20220829095310.Horde.e_D3OiOJ_wHIScqg_0SQwJL@webmail.nde.ag> Message-ID: <3614c697b013d614f205076e4c439ef6d1e2f4a6.camel@redhat.com> On Mon, 2022-08-29 at 09:53 +0000, Eugen Block wrote: > Hi, > > > Currently cinder-volume does support Active-Active HA, though not all > > drivers support this configuration. Besides using a driver that > > supports A/A, a DLM is also required and needs to be configured in > > Cinder, finally the "host" and "backend_host" options must not be > > configured and the "cluster" configuration should be configured instead. > > this is interesting information, I was not aware of that. Does the rbd > driver support A/A? I'm still dealing with some older cloud > deployments and didn't have the time to look into newer features yet, > but it would be great! We do currently use the "host" option to let > haproxy redirect the cinder-volume requests. I'm definitely gonna need > to look into that. Thanks for pointing that out! you can find the list of drivers that support it here https://docs.openstack.org/cinder/latest/reference/support-matrix.html#operation_active_active_ha ceph work in both rbd and iscsi mode. > > Thanks, > Eugen > > Zitat von Gorka Eguileor : > > > On 26/08, Eugen Block wrote: > > > Hi, > > > > > > just in addition to the previous response, cinder-volume is a stateful > > > service and there should be only one instance running. We configured it to > > > be bound to the virtual IP controlled by pacemaker, pacemaker also controls > > > all stateless services in our environment although it wouldn't be necessary. > > > But that way we have all resources at one place and don't need to > > > distinguish. > > > > > > > Hi, > > > > Small clarification, cinder-volume is not actually stateful. > > > > It is true that historically the cinder-volume service only supported > > High Availability in Active-Passive mode and required the configuration > > to set the "host" or "backend_host" configuration option to the same > > value in all the different controller nodes, so on failover the newly > > started service would consider existing resources as its own. > > > > Currently cinder-volume does support Active-Active HA, though not all > > drivers support this configuration. Besides using a driver that > > supports A/A, a DLM is also required and needs to be configured in > > Cinder, finally the "host" and "backend_host" options must not be > > configured and the "cluster" configuration should be configured instead. > > > > Cheers, > > Gorka. > > > > > > > > > > Zitat von Satish Patel : > > > > > > > Hi, > > > > > > > > 3 nodes requirements come from MySQL galera and RabbitMQ > > > clustering because > > > > of quorum requirements ( it should be in odd numbers 1, 3, 5 etc..). Rest > > > > of components works without clustering and they live behind HAProxy LB for > > > > load sharing and redundancy. > > > > > > > > Someone else can add more details here if I missed something. > > > > > > > > On Thu, Aug 25, 2022 at 4:05 AM ??? wrote: > > > > > > > > > Hello > > > > > > > > > > > > > > > I have two questions about deploying openstack in high available setup > > > > > > > > > > Specifically, HA setup for controller nodes > > > > > > > > > > > > > > > 1. Are openstack services (being deployed on controller nodes) > > > stateless? > > > > > > > > > > > > > > > Aside from non-openstack packages(galera/mysql, zeromq, ...) for > > > > > infrastructure, are openstack services stateless? > > > > > > > > > > For example, can I achieve high availability by deploying two nova-api > > > > > services to two separate controller nodes > > > > > > > > > > by load balacing API calls to them through HAproxy? > > > > > > > > > > Is this(load balancer) the way how openstack achieves high availability? > > > > > > > > > > > > > > > > > > > > 2. Why minimum 3 controller nodes for HA? > > > > > > > > > > > > > > > Is this solely due to etcd? > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > From naveen.j at simnovus.com Mon Aug 29 12:03:31 2022 From: naveen.j at simnovus.com (Naveen j) Date: Mon, 29 Aug 2022 17:33:31 +0530 Subject: Lagging of vm In-Reply-To: <3727b6b45e692c1c6912513b0abb47578a21df15.camel@redhat.com> References: <3727b6b45e692c1c6912513b0abb47578a21df15.camel@redhat.com> Message-ID: Hi Sean Thanks for your reply the document I followed were attached below.Please go through and let me know if any prefix can be done. https://computingforgeeks.com/install-openstack-victoria-on-centos/ Regards Naveen j On Mon, 29 Aug, 2022, 3:10 pm Sean Mooney, wrote: > On Sun, 2022-08-28 at 06:28 +0530, Naveen j wrote: > > Hi Team. > > > > When i create a vm, my vms are lagging so much when I run any application > > in ubuntu > > this proably is cause by one of three proablems > 1 you are useing qemu not kvm > 2 you created a flaoting vm (without cpu pinning) and you are using the > isolcpus kernel command line option on the host > 3 you have and disk io performance (locally or over the network if you are > using network storage) > > we really dont have anything to go on as we dont know what os you are > using beyond ubunut or how you deployed openstack and which version. > > but those are they 3 things that normaly result in poor vm performance. > using qemu, cpu contention or low disk io. > > > > Regards > > Naveen j > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Mon Aug 29 13:07:46 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Mon, 29 Aug 2022 18:37:46 +0530 Subject: [nova] Changes for out-of-tree drivers Message-ID: Hi, I'm working on a nova feature which adds support for rebuilding volume backed instances which modifies the arguments for 'rebuild' method. It adds a 'reimage_boot_volume' parameter which signifies if the user has requested a rebuild operation for a volume backed instance. Currently it is only implemented by the ironic driver in the in-tree drivers. If you're a maintainer of an out of tree driver then you will need to account for it. Thanks and regards Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Mon Aug 29 13:09:11 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Mon, 29 Aug 2022 18:39:11 +0530 Subject: [nova] Changes for out-of-tree drivers In-Reply-To: References: Message-ID: Forgot to mention the link for the patch[1]. [1] https://review.opendev.org/c/openstack/nova/+/820368 On Mon, Aug 29, 2022 at 6:37 PM Rajat Dhasmana wrote: > Hi, > > I'm working on a nova feature which adds support for rebuilding volume > backed instances which modifies the arguments for 'rebuild' method. It adds > a 'reimage_boot_volume' parameter which signifies if the user has requested > a rebuild operation for a volume backed instance. > Currently it is only implemented by the ironic driver in the in-tree > drivers. > If you're a maintainer of an out of tree driver then you will need to > account for it. > > Thanks and regards > Rajat Dhasmana > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Mon Aug 29 13:19:54 2022 From: johfulto at redhat.com (John Fulton) Date: Mon, 29 Aug 2022 09:19:54 -0400 Subject: [Triple0] [Wallaby] External Ceph Integration getting failed In-Reply-To: References: Message-ID: On Fri, Aug 26, 2022 at 11:03 PM Lokendra Rathour wrote: > Hi John, > It got resolved, reason was NTP. > The NTP time was not in sync., i noticed thay recently the NTP is not > getting configured properly on the controller and xompute nodes. > After enabling thr time sync and validation we redeployed and it worked > fine. > > I have another querry w.r.t to storage integration with a tripleo. > > We have noticed that only passing the external-ceph.yaml is not doing the > deployment, we also need to pass ceph parameters in container-prepare. > We did see some containers getting downloaded as well but after the > deployment is done we do not see them anywhere. > What can be the reason for such containers if not used ? > Any point would help me further ensure 100% offline tripleO > You'll need the ceph container if: 1. If you're using NFS Ganesha with external ceph 2. If you're using ceph-ansible with external ceph You should be using Wallaby however as per [1]. If you're only using RBD you shouldn't need the ceph container. This role should set up your ceph conf and key files. https://github.com/openstack/tripleo-ansible/tree/stable/wallaby/tripleo_ansible/roles/tripleo_ceph_client For offline tripleo, you need overcloud containers (regardless of if the ceph container is one of them). The solution to that problem is to use the undercloud as a container registry as per [2]. John [1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/ceph_external.html [2] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/container_image_prepare.html > > On Thu, 4 Aug 2022, 17:07 Lokendra Rathour, > wrote: > >> Hi Team, >> I was trying to integrate External Ceph with Triple0 Wallaby, and at the >> end of deployment in step4 getting the below error: >> >> 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | >> Create containers from >> /var/lib/tripleo-config/container-startup-config/step_4 >> 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | >> /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | >> overcloud-controller-2 >> 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | >> Create containers managed by Podman for >> /var/lib/tripleo-config/container-startup-config/step_4 >> 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:24.530812 | | WARNING | >> ERROR: Can't run container nova_libvirt_init_secret >> stderr: >> 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | >> Create containers managed by Podman for >> /var/lib/tripleo-config/container-startup-config/step_4 | >> overcloud-novacompute-0 | error={"changed": false, "msg": "Failed >> containers: nova_libvirt_init_secret"} >> 2022-08-03 18:37:44,282 p=507732 u >> >> >> *external-ceph.conf:* >> >> parameter_defaults: >> # Enable use of RBD backend in nova-compute >> NovaEnableRbdBackend: True >> # Enable use of RBD backend in cinder-volume >> CinderEnableRbdBackend: True >> # Backend to use for cinder-backup >> CinderBackupBackend: ceph >> # Backend to use for glance >> GlanceBackend: rbd >> # Name of the Ceph pool hosting Nova ephemeral images >> NovaRbdPoolName: vms >> # Name of the Ceph pool hosting Cinder volumes >> CinderRbdPoolName: volumes >> # Name of the Ceph pool hosting Cinder backups >> CinderBackupRbdPoolName: backups >> # Name of the Ceph pool hosting Glance images >> GlanceRbdPoolName: images >> # Name of the user to authenticate with the external Ceph cluster >> CephClientUserName: admin >> # The cluster FSID >> CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' >> # The CephX user auth key >> CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' >> # The list of Ceph monitors >> CephExternalMonHost: >> 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' >> ~ >> >> >> Have tried checking and validating the ceph client details and they seem >> to be correct, further digging the container log I could see something like >> this : >> >> [root at overcloud-novacompute-0 containers]# tail -f >> nova_libvirt_init_secret.log >> tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such >> file or directory >> tail: no files remaining >> [root at overcloud-novacompute-0 containers]# tail -f >> stdouts/nova_libvirt_init_secret.log >> 2022-08-04T11:48:47.689898197+05:30 stdout F >> ------------------------------------------------ >> 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets >> for: ceph:admin >> 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf >> was not found >> 2022-08-04T11:48:47.690625088+05:30 stdout F Path to >> nova_libvirt_init_secret was ceph:admin >> 2022-08-04T16:20:29.643785538+05:30 stdout F >> ------------------------------------------------ >> 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets >> for: ceph:admin >> 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf >> was not found >> 2022-08-04T16:20:29.644785532+05:30 stdout F Path to >> nova_libvirt_init_secret was ceph:admin >> ^C >> [root at overcloud-novacompute-0 containers]# tail -f >> stdouts/nova_compute_init_log.log >> >> -- >> ~ Lokendra >> skype: lokendrarathour >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.urdin at binero.com Mon Aug 29 13:46:09 2022 From: tobias.urdin at binero.com (Tobias Urdin) Date: Mon, 29 Aug 2022 13:46:09 +0000 Subject: =?utf-8?B?W2FsbF3CoFtvc2xvLm1lc3NhZ2luZ10gSW50ZXJlc3QgaW4gY29sbGFib3Jh?= =?utf-8?Q?tion_on_a_NATS_driver?= Message-ID: <52AA12A0-AE67-4EF7-B924-DE1F2873B909@binero.com> Hello everyone, Before continuing on, yes this kind of is a massive effort but it doesn?t have to be, it would be very cool to get a replacement for RabbitMQ as I?m probably not the only one not satisfied with it. I've proposed a very bare POC in [1] but it's long way from being finished, but atleast some basic devstack is passing. NATS [2] is a cloud-native scalable messaging system that supports the one-to-many and pub-sub methods that we can use to implement it as a oslo.messaging driver. This would make OpenStack easier to deploy in a highly available fashion, reduce outages related to RabbitMQ, free up memory and CPU usage by RabbitMQ (it?s insane when using clustering) and embrace a more cloud-native approach for our software that runs the cloud, alternatives is also welcome :) The POC has a lot of things that could be improved for example: ? Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could) ? Handle reconnects or interruptions (for example resubscribe to topics etc) ? Timeouts need to be implemented and handled ? Investigate maximum message payload size ? Find or maintain a NATS python library that doesn't use async like the official one does ? Add a lot of testing ? Cleanup everything noted as TODO in the POC code Now I couldn?t possibly pull this off myself without some collaboration with all of you, even though I?m very motivated to just dig in and do this for the rest of the year and migrate our test fleet there I unfortunately (like everyone else) is juggeling a lot of balls at the same time. If anybody, or any company, out there would be interested in collaborating in a project to bring this support and maintain it feel free to reach out. I?m hoping somebody will bite but atleast I?ve put it out there for all of you. Best regards Tobias [1] https://review.opendev.org/c/openstack/oslo.messaging/+/848338 [2] https://nats.io From smooney at redhat.com Mon Aug 29 14:12:15 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 29 Aug 2022 15:12:15 +0100 Subject: =?ISO-8859-1?Q?=5Ball=5D=A0=5Boslo=2Emessaging=5D?= Interest in collaboration on a NATS driver In-Reply-To: <52AA12A0-AE67-4EF7-B924-DE1F2873B909@binero.com> References: <52AA12A0-AE67-4EF7-B924-DE1F2873B909@binero.com> Message-ID: On Mon, 2022-08-29 at 13:46 +0000, Tobias Urdin wrote: > Hello everyone, > > Before continuing on, yes this kind of is a massive effort but it doesn?t have to be, it would be very cool to get a replacement for RabbitMQ > as I?m probably not the only one not satisfied with it. I've proposed a very bare POC in [1] but it's long way from being finished, but atleast some basic devstack is passing. +1 for actully doing this i looked at adding nats support brifly in the past but never found the time to actully try doing it. i thnk it is a very interesting alternitive to consider moving forward. > > NATS [2] is a cloud-native scalable messaging system that supports the one-to-many and pub-sub methods that we can use to implement it as a oslo.messaging driver. > > This would make OpenStack easier to deploy in a highly available fashion, reduce outages related to RabbitMQ, free up memory and CPU usage by RabbitMQ (it?s insane when > using clustering) and embrace a more cloud-native approach for our software that runs the cloud, alternatives is also welcome :) > > The POC has a lot of things that could be improved for example: > ? Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could) > ? Handle reconnects or interruptions (for example resubscribe to topics etc) > ? Timeouts need to be implemented and handled > ? Investigate maximum message payload size > ? Find or maintain a NATS python library that doesn't use async like the official one does > ? Add a lot of testing > ? Cleanup everything noted as TODO in the POC code > > Now I couldn?t possibly pull this off myself without some collaboration with all of you, even though I?m very motivated to just dig in and do > this for the rest of the year and migrate our test fleet there I unfortunately (like everyone else) is juggeling a lot of balls at the same time. > > If anybody, or any company, out there would be interested in collaborating in a project to bring this support and maintain it feel free to > reach out. I?m hoping somebody will bite but atleast I?ve put it out there for all of you. > > Best regards > Tobias > > [1] https://review.opendev.org/c/openstack/oslo.messaging/+/848338 > [2] https://nats.io From tobias.urdin at binero.com Mon Aug 29 14:59:20 2022 From: tobias.urdin at binero.com (Tobias Urdin) Date: Mon, 29 Aug 2022 14:59:20 +0000 Subject: [xena][placement] Xena placement upgrade leads to 500 on ubuntu focal In-Reply-To: References: Message-ID: Hello, Perhaps you can reachout to Corey (coreyb) or James (james-page) from Canonical on IRC, they are always super helpful! Best regards Tobias On 29 Aug 2022, at 10:50, CHANU ROMAIN > wrote: Hello, Thank you for your answer. Yes I found your ticket on heat's story. Comment out all lines did fix the issue. Best regards, Romain On Mon, 2022-08-29 at 01:14 +0900, Takashi Kajinami wrote: We've been facing this error in the ubuntu jobs in Puppet OpenStack project and it seems the issue is caused by the policy.yaml provided by the packages. I've reported a bug against their packaging bug tracker. I have zero knowledge about Ubuntu packaging but hopefully someone from the package maintainers can look into it. https://bugs.launchpad.net/ubuntu/+source/placement/+bug/1987984 You might want to try clearing the policy.yaml file and see whether that solves your problem. On Sat, Aug 20, 2022 at 1:08 AM CHANU ROMAIN > wrote: Hello, I just did upgrade my Placement to Xena on Ubuntu Focal (20.04). When I tried to start the process I got this error and all HTTP requests receive an HTTP 500 error: 2022-08-19 15:05:43.960573 2022-08-19 15:05:43.960 43 INFO placement.requestlog [req-f4c4d4f1-5d59-49d3-aa3e-1e8a09fe02fe 3ec54dee59424109913d4628ae8dac4c 19e62bc767484849a2763937883a256e - default default] 192.168.236.5 "GET /resource_providers/791c09ed-57f3-4bfc-9278-4af6c5c137d8/allocations" status: 500 len: 244 microversion: 1.0\x1b[00m 2022-08-19 15:05:44.094951 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap [req-527dce52-c207-43ee-80f2-016a6f031cf5 3ec54dee59424109913d4628ae8dac4c 19e62bc767484849a2763937883a256e - default default] Placement API unexpected error: unsupported callable: TypeError: unsupported callable 2022-08-19 15:05:44.094973 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap Traceback (most recent call last): 2022-08-19 15:05:44.094977 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 1135, in getfullargspec 2022-08-19 15:05:44.094980 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap sig = _signature_from_callable(func, 2022-08-19 15:05:44.094998 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2233, in _signature_from_callable 2022-08-19 15:05:44.095001 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap sig = _signature_from_callable( 2022-08-19 15:05:44.095004 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2304, in _signature_from_callable 2022-08-19 15:05:44.095007 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _signature_from_function(sigcls, obj, 2022-08-19 15:05:44.095010 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2168, in _signature_from_function 2022-08-19 15:05:44.095013 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap parameters.append(Parameter(name, annotation=annotation, 2022-08-19 15:05:44.095015 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 2491, in __init__ 2022-08-19 15:05:44.095018 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap self._kind = _ParameterKind(kind) 2022-08-19 15:05:44.095021 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap RecursionError: maximum recursion depth exceeded 2022-08-19 15:05:44.095024 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap 2022-08-19 15:05:44.095026 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap The above exception was the direct cause of the following exception: 2022-08-19 15:05:44.095029 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap 2022-08-19 15:05:44.095032 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap Traceback (most recent call last): 2022-08-19 15:05:44.095035 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/placement/fault_wrap.py", line 39, in __call__ 2022-08-19 15:05:44.095038 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return self.application(environ, start_response) 2022-08-19 15:05:44.095040 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/webob/dec.py", line 129, in __call__ 2022-08-19 15:05:44.095043 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap resp = self.call_func(req, *args, **kw) 2022-08-19 15:05:44.095046 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/webob/dec.py", line 193, in call_func 2022-08-19 15:05:44.095049 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return self.func(req, *args, **kwargs) 2022-08-19 15:05:44.095052 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/microversion_parse/middleware.py", line 80, in __call__ 2022-08-19 15:05:44.095055 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap response = req.get_response(self.application) 2022-08-19 15:05:44.095057 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/webob/request.py", line 1313, in send 2022-08-19 15:05:44.095060 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap status, headers, app_iter = self.call_application( 2022-08-19 15:05:44.095063 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/webob/request.py", line 1278, in call_application 2022-08-19 15:05:44.095065 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap app_iter = application(self.environ, start_response) 2022-08-19 15:05:44.095068 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/placement/handler.py", line 215, in __call__ 2022-08-19 15:05:44.095071 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return dispatch(environ, start_response, self._map) 2022-08-19 15:05:44.095074 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/placement/handler.py", line 149, in dispatch 2022-08-19 15:05:44.095077 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return handler(environ, start_response) 2022-08-19 15:05:44.095083 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/webob/dec.py", line 129, in __call__ 2022-08-19 15:05:44.095086 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap resp = self.call_func(req, *args, **kw) 2022-08-19 15:05:44.095089 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/placement/wsgi_wrapper.py", line 29, in call_func 2022-08-19 15:05:44.095092 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap super(PlacementWsgify, self).call_func(req, *args, **kwargs) 2022-08-19 15:05:44.095094 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/webob/dec.py", line 193, in call_func 2022-08-19 15:05:44.095097 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return self.func(req, *args, **kwargs) 2022-08-19 15:05:44.095100 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/placement/util.py", line 64, in decorated_function 2022-08-19 15:05:44.095103 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return f(req) 2022-08-19 15:05:44.095106 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/placement/handlers/allocation.py", line 299, in list_for_resource_provider 2022-08-19 15:05:44.098861 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098864 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098867 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098870 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098873 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098876 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098893 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098896 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098899 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098905 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098907 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 255, in __call__ 2022-08-19 15:05:44.098910 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _check( 2022-08-19 15:05:44.098913 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098916 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098919 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 255, in __call__ 2022-08-19 15:05:44.098922 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _check( 2022-08-19 15:05:44.098925 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098928 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098930 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098933 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098936 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098939 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098941 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098944 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098947 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098950 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098952 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 255, in __call__ 2022-08-19 15:05:44.098955 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _check( 2022-08-19 15:05:44.098958 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098960 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098963 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 255, in __call__ 2022-08-19 15:05:44.098966 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _check( 2022-08-19 15:05:44.098968 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098971 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098974 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098977 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098980 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098982 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098985 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 211, in __call__ 2022-08-19 15:05:44.098991 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap if _check(rule, target, cred, enforcer, current_rule): 2022-08-19 15:05:44.098993 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 80, in _check 2022-08-19 15:05:44.098996 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return rule(*rule_args) 2022-08-19 15:05:44.098999 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 255, in __call__ 2022-08-19 15:05:44.099002 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap return _check( 2022-08-19 15:05:44.099004 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3/dist-packages/oslo_policy/_checks.py", line 75, in _check 2022-08-19 15:05:44.099007 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap argspec = inspect.getfullargspec(rule.__call__) 2022-08-19 15:05:44.099010 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap File "/usr/lib/python3.8/inspect.py", line 1144, in getfullargspec 2022-08-19 15:05:44.099013 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap raise TypeError('unsupported callable') from ex 2022-08-19 15:05:44.099016 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap TypeError: unsupported callable 2022-08-19 15:05:44.099018 2022-08-19 15:05:44.077 46 ERROR placement.fault_wrap \x1b[00m Placement-api is deployed in a container, so I got a fresh policy.yaml file. Did someone already face this? Do you have any idea how to fix this? Best regards, Romain -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-francois.taltavull at elca.ch Mon Aug 29 15:46:52 2022 From: jean-francois.taltavull at elca.ch (=?iso-8859-1?Q?Taltavull_Jean-Fran=E7ois?=) Date: Mon, 29 Aug 2022 15:46:52 +0000 Subject: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number Message-ID: <2aa77e24a33d48a69032f30b86e9cad8@elca.ch> Hi All, In our OpenStack deployment, API endpoints are defined by using URLs instead of port numbers and HAProxy forwards requests to the right bakend after having ACLed the URL. In the case of our object-store service, based on RadosGW, the internal API endpoint is "https:///object-store/swift/v1/AUTH_" When Ceilometer RadosGW pollster tries to connect to the RadosGW admin API with the object-store internal endpoint, the URL becomes https:///admin, as shown by HAProxy logs. This URL does not match any API endpoint from HAProxy point of view. The line of code that rewrites the URL is this one: https://opendev.org/openstack/ceilometer/src/branch/stable/wallaby/ceilometer/objectstore/rgw.py#L81 What would you think of adding a mechanism based on new Ceilometer configuration option(s) to control the URL rewriting ? Our deployment characteristics: - OpenStack release: Wallaby - Ceph and RadosGW version: 15.2.16 - deployment tool: OSA 23.2.1 and ceph-ansible Best regards, Jean-Francois From lokendrarathour at gmail.com Mon Aug 29 05:48:51 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Mon, 29 Aug 2022 11:18:51 +0530 Subject: [TripleO Wallaby] - Multi-attach Volume showing different files Message-ID: Hi Team, On TripleO Wallaby deployment, I have tried creating multi-attach type volume, which if I am attaching this volume to two VM, it is getting attached. After mounting the same in each VM, if I am creating the folders in one VM at the mount path, I am not seeing the same on the other VM at the mount path [image: image.png] Ideally if the backend volume is same and if I am understanding the idea correctly, then the content should be same in both the location, as the back is same. Checking at horizon I also see that Volume is showing attached to both the VMs. [image: image.png] Document followed to create this: https://docs.openstack.org/cinder/latest/admin/volume-multiattach.html Please advice -- ~ Lokendra skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 80246 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 64122 bytes Desc: not available URL: From lokendrarathour at gmail.com Mon Aug 29 09:42:41 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Mon, 29 Aug 2022 15:12:41 +0530 Subject: [TripleO Wallaby] - Multi-attach Volume showing different files In-Reply-To: References: Message-ID: Hi Team, It worked if I redo the VM creation with creation one more volume of multi-attach type. Thanks once again for the same. we can mark this thread as closed. On Mon, Aug 29, 2022 at 11:18 AM Lokendra Rathour wrote: > Hi Team, > On TripleO Wallaby deployment, I have tried creating multi-attach type > volume, which if I am attaching this volume to two VM, it is getting > attached. > After mounting the same in each VM, if I am creating the folders in one VM > at the mount path, > I am not seeing the same on the other VM at the mount path > [image: image.png] > > Ideally if the backend volume is same and if I am understanding the idea > correctly, then the content should be same in both the location, as the > back is same. > Checking at horizon I also see that Volume is showing attached to both the > VMs. > > > [image: image.png] > > Document followed to create this: > https://docs.openstack.org/cinder/latest/admin/volume-multiattach.html > > Please advice > > -- > ~ Lokendra > skype: lokendrarathour > > > -- ~ Lokendra skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 80246 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 64122 bytes Desc: not available URL: From rafaelweingartner at gmail.com Mon Aug 29 15:54:05 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Mon, 29 Aug 2022 12:54:05 -0300 Subject: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number In-Reply-To: <2aa77e24a33d48a69032f30b86e9cad8@elca.ch> References: <2aa77e24a33d48a69032f30b86e9cad8@elca.ch> Message-ID: You could use a different approach. You can use Dynamic pollster [1], and create your own mechanism to collect data, without needing to change Ceilometer code. Basically all hard-coded pollsters can be converted to a dynamic pollster that is defined in YML. [1] https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html#the-dynamic-pollsters-system-configuration-for-non-openstack-apis On Mon, Aug 29, 2022 at 12:51 PM Taltavull Jean-Fran?ois < jean-francois.taltavull at elca.ch> wrote: > Hi All, > > In our OpenStack deployment, API endpoints are defined by using URLs > instead of port numbers and HAProxy forwards requests to the right bakend > after having ACLed the URL. > > In the case of our object-store service, based on RadosGW, the internal > API endpoint is "https:///object-store/swift/v1/AUTH_" > > When Ceilometer RadosGW pollster tries to connect to the RadosGW admin API > with the object-store internal endpoint, the URL becomes https:///admin, > as shown by HAProxy logs. This URL does not match any API endpoint from > HAProxy point of view. The line of code that rewrites the URL is this one: > https://opendev.org/openstack/ceilometer/src/branch/stable/wallaby/ceilometer/objectstore/rgw.py#L81 > > What would you think of adding a mechanism based on new Ceilometer > configuration option(s) to control the URL rewriting ? > > Our deployment characteristics: > - OpenStack release: Wallaby > - Ceph and RadosGW version: 15.2.16 > - deployment tool: OSA 23.2.1 and ceph-ansible > > > Best regards, > Jean-Francois > > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From abishop at redhat.com Mon Aug 29 16:08:31 2022 From: abishop at redhat.com (Alan Bishop) Date: Mon, 29 Aug 2022 09:08:31 -0700 Subject: [TripleO Wallaby] - Multi-attach Volume showing different files In-Reply-To: References: Message-ID: On Mon, Aug 29, 2022 at 8:54 AM Lokendra Rathour wrote: > Hi Team, > It worked if I redo the VM creation with creation one more volume of > multi-attach type. > Thanks once again for the same. > > we can mark this thread as closed. > > > On Mon, Aug 29, 2022 at 11:18 AM Lokendra Rathour < > lokendrarathour at gmail.com> wrote: > >> Hi Team, >> On TripleO Wallaby deployment, I have tried creating multi-attach type >> volume, which if I am attaching this volume to two VM, it is getting >> attached. >> After mounting the same in each VM, if I am creating the folders in one >> VM at the mount path, >> I am not seeing the same on the other VM at the mount path >> [image: image.png] >> >> Ideally if the backend volume is same and if I am understanding the idea >> correctly, then the content should be same in both the location, as the >> back is same. >> > Well, that's not quite how multiattach volumes are intended to work. The feature relies on your application and/or filesystem to support multiattach/cluster aware volumes. You can't just use a regular filesystem (ext4, etc.) and expect changes to the volume that are made from one attached vm to propagate to other vms attached to the same volume. Alan > Checking at horizon I also see that Volume is showing attached to both the >> VMs. >> >> >> [image: image.png] >> >> Document followed to create this: >> https://docs.openstack.org/cinder/latest/admin/volume-multiattach.html >> >> Please advice >> >> -- >> ~ Lokendra >> skype: lokendrarathour >> >> >> > > -- > ~ Lokendra > skype: lokendrarathour > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 80246 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 64122 bytes Desc: not available URL: From radoslaw.piliszek at gmail.com Mon Aug 29 16:33:15 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 29 Aug 2022 18:33:15 +0200 Subject: [all] [oslo.messaging] Interest in collaboration on a NATS driver In-Reply-To: <52AA12A0-AE67-4EF7-B924-DE1F2873B909@binero.com> References: <52AA12A0-AE67-4EF7-B924-DE1F2873B909@binero.com> Message-ID: Hi Tobias, Good to see RMQ alternatives appearing. A couple of questions from me. On Mon, 29 Aug 2022 at 15:47, Tobias Urdin wrote: > ? Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could) What do you mean? Is NATS only a router? (I have not used this technology yet.) > ? Find or maintain a NATS python library that doesn't use async like the official one does Why is async a bad thing? For messaging it's the right thing. Finally, have you considered just trying out ZeroMQ? I mean, NATS is probably an overkill for OpenStack services since the majority of them stay static on the hosts they control (think nova-compute, neutron agents - and these are also the pain points that operators want to ease). NATS seems to me to cater for a different use case. I might be wrong because I have read only the front page but that is the feeling I have. Cheers, Radek -yoctozepto From jean-francois.taltavull at elca.ch Mon Aug 29 16:40:44 2022 From: jean-francois.taltavull at elca.ch (=?utf-8?B?VGFsdGF2dWxsIEplYW4tRnJhbsOnb2lz?=) Date: Mon, 29 Aug 2022 16:40:44 +0000 Subject: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number In-Reply-To: References: <2aa77e24a33d48a69032f30b86e9cad8@elca.ch> Message-ID: <1b17c23f8982480db73cf50d04d51af7@elca.ch> Thanks a lot for your quick answer, Rafael ! I will explore this approach. Jean-Francois From: Rafael Weing?rtner Sent: lundi, 29 ao?t 2022 17:54 To: Taltavull Jean-Fran?ois Cc: openstack-discuss Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. You could use a different approach. You can use Dynamic pollster [1], and create your own mechanism to collect data, without needing to change Ceilometer code. Basically all hard-coded pollsters can be converted to a dynamic pollster that is defined in YML. [1] https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html#the-dynamic-pollsters-system-configuration-for-non-openstack-apis On Mon, Aug 29, 2022 at 12:51 PM Taltavull Jean-Fran?ois > wrote: Hi All, In our OpenStack deployment, API endpoints are defined by using URLs instead of port numbers and HAProxy forwards requests to the right bakend after having ACLed the URL. In the case of our object-store service, based on RadosGW, the internal API endpoint is "https:///object-store/swift/v1/AUTH_" When Ceilometer RadosGW pollster tries to connect to the RadosGW admin API with the object-store internal endpoint, the URL becomes https:///admin, as shown by HAProxy logs. This URL does not match any API endpoint from HAProxy point of view. The line of code that rewrites the URL is this one: https://opendev.org/openstack/ceilometer/src/branch/stable/wallaby/ceilometer/objectstore/rgw.py#L81 What would you think of adding a mechanism based on new Ceilometer configuration option(s) to control the URL rewriting ? Our deployment characteristics: - OpenStack release: Wallaby - Ceph and RadosGW version: 15.2.16 - deployment tool: OSA 23.2.1 and ceph-ansible Best regards, Jean-Francois -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From yasufum.o at gmail.com Mon Aug 29 17:49:50 2022 From: yasufum.o at gmail.com (Yasufumi Ogawa) Date: Tue, 30 Aug 2022 02:49:50 +0900 Subject: [tacker][elections] PTL candidacy for Antelope cycle Message-ID: Hi, I'd like to propose my candidacy for Tacker PTL for Antelope cycle. I have contributed to Tacker as a PTL from Victoria for providing the latest features of ETSI-NVF standard features to enable operators to deploy their services. In the latest cycles, we have developed several supports for deploying CNFs, multi-version APIs for mixed environment of legacy and cutting edge-products provided by several vendors and more. We also have had several collaborative sessions with ETSI NFV for accelerating each activity of standardization and deployment of required features. In the next release, I would like to focus our target more on to develop features for reliability and avairability such as monitoring physical resources in addition to VMs and containers. Best regards, Yasufumi (yasufum) From james.denton at rackspace.com Mon Aug 29 17:54:01 2022 From: james.denton at rackspace.com (James Denton) Date: Mon, 29 Aug 2022 17:54:01 +0000 Subject: [neutron] Switching the ML2 driver in-place from linuxbridge to OVN for an existing Cloud In-Reply-To: <4318fbe5-f0f7-34eb-f852-15a6fb6810a6@inovex.de> References: <2446920.D5JjJbiaP6@p1> <4318fbe5-f0f7-34eb-f852-15a6fb6810a6@inovex.de> Message-ID: Hi Christian, In my experience, it is possible to perform in-place migration from ML2/LXB -> ML2/OVN, albeit with a shutdown or hard reboot of the instance(s) to complete the VIF plugging and some other needed operations. I have a very rough outline of required steps if you?re interested, but they?re geared towards an openstack-ansible based deployment. I?ll try to put a writeup together in the next week or two demonstrating the process in a multi-node environment; the only one I have done recently was an all-in-one. James Denton Rackspace Private Cloud From: Christian Rohmann Date: Monday, August 29, 2022 at 4:10 AM To: Slawek Kaplonski , openstack-discuss at lists.openstack.org Subject: Re: [neutron] Switching the ML2 driver in-place from linuxbridge to OVN for an existing Cloud CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Thanks Slawek for your quick response! On 23/08/2022 07:47, Slawek Kaplonski wrote: 1) Are the data models of the user managed resources abstract (enough) from the ML2 used? So would the composition of a router, a network, some subnets, a few security group and a few instances in a project just result in a different instantiation of packet handling components, but be otherwise transparent to the user? Yes, data models are the same so all networks, routers, subnets will be the same but implemented differently by different backend. The only significant difference may be network types as OVN works mostly with Geneve tunnel networks and with LB backend You are using VXLAN IIUC your email. That is reassuring. Yes we currently use VXLAN. But even with the same type of tunneling, I suppose the networks and their IDs will not align to form a proper layer 2 domain, not even talking about all the other services like DHCP or metadata. See next question about my idea to at least have some gradual switchover. 2) What could be possible migration strategies? [...] Or project by project by changing the network agents over to nodes already running OVN? Even if You will keep vxlan networks with OVN backend (support is kind of limited really) You will not be able to have tunnels established between nodes with different backends so there will be no connectivity between VMs on hosts with different backends. I was more thinking to move all of a projects resources to network nodes (and hypervisors) which already run OVN. So split the cloud in two classes of machines, one set unchanged running Linuxbridge and the other in OVN mode. To migrate "a project" all agents of that projects routers and networks will be changed over to agents running on OVN-powered nodes.... So this would be a hard cut-over, but limited to a single project. In alternative to replacing all of the network agents on all nodes and for all projects at the same time. Wouldn't that work - in theory - or am I missing something obvious here? Has anybody ever done something similar or heard about this being done anywhere? I don't know about anyone who did that but if there is someone, I would be happy to hear about how it was done and how it went :) We will certainly share our story - if we live to talk about it ;-) Thanks again, With kind regards Christian -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Mon Aug 29 17:56:50 2022 From: amy at demarco.com (Amy Marrich) Date: Mon, 29 Aug 2022 12:56:50 -0500 Subject: [all][elections][ptl][tc] Combined PTL/TC antelope cycle Election Nominations Last Days Message-ID: A quick reminder that we are in the last hours for declaring PTL and TC candidacies. Nominations are open until Aug 31, 2022 23:45 UTC. Should we find that many of the community members were off on vacation this date may get extended pushing the elections out. If you want to stand for election, don't delay, follow the instructions at [1] to make sure the community knows your intentions. Make sure your nomination has been submitted to the openstack/election repository and approved by election officials. Election statistics[2]: Nominations started @ 2022-08-24 23:45:00 UTC Nominations end @ 2022-08-31 23:45:00 UTC Nominations duration : 7 days, 0:00:00 Nominations remaining : 2 days, 5:50:21 Nominations progress : 67.95% --------------------------------------------------- Projects[1] : 52 Projects with candidates : 5 ( 9.62%) Projects with election : 0 ( 0.00%) --------------------------------------------------- Need election : 0 () Need appointment : 47 (Adjutant Barbican Blazar Cinder Cloudkitty Cyborg Designate Freezer Glance Heat Horizon Ironic Keystone Manila Masakari Mistral Monasca Murano Neutron Nova Octavia OpenStackAnsible OpenStackSDK OpenStack_Charms OpenStack_Helm Openstack_Chef Oslo Puppet_OpenStack Quality_Assurance Rally Release_Management Requirements Sahara Senlin Skyline Solum Storlets Swift Tacker Telemetry Tripleo Trove Venus Vitrage Watcher Zaqar Zun) =================================================== Stats gathered @ 2022-08-29 17:54:39 UTC This means that with approximately 2 days left, 47 projects will be deemed leaderless. In this case the TC will oversee PTL selection as described by [3]. Thank you, [1] https://governance.openstack.org/election/#how-to-submit-a-candidacy [2] Any open reviews at https://review.openstack.org/#/q/is:open+project:openstack/election have not been factored into these stats. [3] https://governance.openstack.org/resolutions/20141128-elections-process-for-leaderless-programs.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Mon Aug 29 18:03:36 2022 From: smooney at redhat.com (Sean Mooney) Date: Mon, 29 Aug 2022 19:03:36 +0100 Subject: [all] [oslo.messaging] Interest in collaboration on a NATS driver In-Reply-To: References: <52AA12A0-AE67-4EF7-B924-DE1F2873B909@binero.com> Message-ID: <6997e750b0ca41a4ac610575dda0531a85581f6b.camel@redhat.com> On Mon, 2022-08-29 at 18:33 +0200, Rados?aw Piliszek wrote: > Hi Tobias, > > Good to see RMQ alternatives appearing. A couple of questions from me. > > On Mon, 29 Aug 2022 at 15:47, Tobias Urdin wrote: > > ? Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could) > > What do you mean? Is NATS only a router? (I have not used this technology yet.) no but if you want distibute persiten its part of the option stream api https://docs.nats.io/nats-concepts/jetstream they descirbe when to use core nats or jetsream here. https://docs.nats.io/using-nats/developer/develop_jetstream#when-to-use-streaming https://docs.nats.io/using-nats/developer/develop_jetstream#when-to-use-core-nats i think the poc is just using core nats currenlty. > > > ? Find or maintain a NATS python library that doesn't use async like the official one does > > Why is async a bad thing? For messaging it's the right thing. > > Finally, have you considered just trying out ZeroMQ? ZeroMQ used to be supported in the past but then it was remvoed if i understand correctly it only supprot notificaiton or RPC but not both i dont recall which but perhapse im miss rememebrign on that point. > I mean, NATS is probably an overkill for OpenStack services since the > majority of them stay static on the hosts they control (think > nova-compute, neutron agents - and these are also the pain points that > operators want to ease). its not any more overkill then rabbitmq is i also dont know waht you mean when you say "majority of them stay static on the hosts they control" NATS is intended a s a cloud native horrizontally scaleable message bus. which is exactly what openstack need IMO. > NATS seems to me to cater for a different use case. > I might be wrong because I have read only the front page but that is > the feeling I have. > > Cheers, > Radek > -yoctozepto > From radoslaw.piliszek at gmail.com Mon Aug 29 18:20:15 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 29 Aug 2022 20:20:15 +0200 Subject: [all] [oslo.messaging] Interest in collaboration on a NATS driver In-Reply-To: <6997e750b0ca41a4ac610575dda0531a85581f6b.camel@redhat.com> References: <52AA12A0-AE67-4EF7-B924-DE1F2873B909@binero.com> <6997e750b0ca41a4ac610575dda0531a85581f6b.camel@redhat.com> Message-ID: On Mon, 29 Aug 2022 at 20:03, Sean Mooney wrote: > > Finally, have you considered just trying out ZeroMQ? > ZeroMQ used to be supported in the past but then it was remvoed > if i understand correctly it only supprot notificaiton or RPC but not both > i dont recall which but perhapse im miss rememebrign on that point. I believe it would be better suited for RPC than notifications, at least in the simplest form. > > I mean, NATS is probably an overkill for OpenStack services since the > > majority of them stay static on the hosts they control (think > > nova-compute, neutron agents - and these are also the pain points that > > operators want to ease). > its not any more overkill then rabbitmq is True that. Probably. > i also dont know waht you mean when you say > "majority of them stay static on the hosts they control" > > NATS is intended a s a cloud native horrizontally scaleable message bus. > which is exactly what openstack need IMO. NATS seems to be tweaked for "come and go" situations which is an exception in the OpenStack world, not the rule (at least in my view). I mean, one normally expects to have a preset number of hypervisors and not them coming and going (which, I agree, is a nice vision, could be a proper NATS driver, with more awareness in the client projects I believe, would be an enabler for more dynamic clouds). Cheers, Radek -yoctozepto From tkajinam at redhat.com Tue Aug 30 01:33:21 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Tue, 30 Aug 2022 10:33:21 +0900 Subject: [infra][puppet] Old mirror contents in apt-puppetlabs In-Reply-To: References: <4647f88d-1737-4832-8cd5-b38a05937b40@www.fastmail.com> Message-ID: Thank you, Infra team, for your quick feedback in the review. The issue has been resolved and now puppet 6.28 is available in the mirror. I confirmed the new package is installed and the compatibility issue no longer occurs in CI. On Mon, Aug 29, 2022 at 2:30 AM Takashi Kajinami wrote: > Thanks for the pointers ! > > It seems the system-config repository contains the old gpg key which > expired in August 2021[1]. > > I've pushed the change[2] to replace the expired key by the new key. > > [1] https://puppet.com/blog/updated-puppet-gpg-signing-key-2020-edition/ > [2] https://review.opendev.org/c/opendev/system-config/+/854923/ > > > > On Mon, Aug 29, 2022 at 1:37 AM Clark Boylan wrote: > >> On Sun, Aug 28, 2022, at 8:05 AM, Takashi Kajinami wrote: >> > Hello Infra team, >> > >> > >> > I noticed the contents in the apt-puppetlabs directory in our CI mirror >> > are old. >> > The mirror repository provides puppet 6.23 while the upstream >> > repository provides >> > newer versions such as 6.28. >> > >> > Recently we bumped puppetlabs-mysql in our CI to 13.0.0 which requires >> > puppet >= 6.24.0 >> > and our Ubuntu jobs are failing now at a quite early stage because of >> > the old puppet package. >> > >> > May someone please look into this ? I've checked >> > mirror.iad3.inmotion.opendev.org and >> > mirror.bhs1.ovh.opendev.org but it seems the contents in the directory >> > have not been synced >> > since July, 2021. >> >> Regardless of the mirror server the content is served from a shared AFS >> filesystem. This means checking one is as good as any other. >> >> Logs for reprepro are also stored on AFS and served by the mirror >> servers: >> https://mirror.ord.rax.opendev.org/logs/reprepro/apt-puppetlabs.log. The >> logs show that there is a bad component and an expired key. If you track >> down what a correct component list and valid key are you can update our >> reprepro role [0][1][2] to fix this. >> >> [0] >> https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/reprepro >> [1] >> https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/reprepro/files/apt-puppetlabs/config/updates >> [2] >> https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/reprepro/tasks/puppetlabs.yaml >> >> > >> > Thank you, >> > Takashi >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmilan2006 at gmail.com Tue Aug 30 04:57:25 2022 From: mmilan2006 at gmail.com (Vaibhav) Date: Tue, 30 Aug 2022 10:27:25 +0530 Subject: [manila] LVM shares are unmounted on reboot Message-ID: Dear All, One of the host running Manila-share service has rebooted. After the reboot the LVMs are not mounted on the Manila mount points. I use LVM Driver with DHSS=False. default_share_type = default_share_type share_name_template = manila-%s rootwrap_config = /etc/manila/rootwrap.conf api_paste_config = /etc/manila/api-paste.ini auth_strategy = keystone my_ip=192.168.82.2 enabled_share_backends = lvm enabled_share_protocols = NFS state_path=/var/lib/manila [lvm] share_backend_name = LVM share_driver = manila.share.drivers.lvm.LVMShareDriver driver_handles_share_servers = False lvm_share_volume_group = VGZunManila lvm_share_export_ips = 192.168.82.2 lvm_share_export_root = $state_path/mnt After the reboot LVMs are there but they are mounted on their respective share location. Any way to solve this and this to be done automatically? Regards, Vaibhav -------------- next part -------------- An HTML attachment was scrubbed... URL: From park0kyung0won at dgist.ac.kr Tue Aug 30 05:54:10 2022 From: park0kyung0won at dgist.ac.kr (=?UTF-8?B?67CV6rK97JuQ?=) Date: Tue, 30 Aug 2022 14:54:10 +0900 (KST) Subject: Openstack OVN (Open Virtual Network) HA deployment - running OVN in active/passive mode? Message-ID: <1159703927.727809.1661838850599.JavaMail.root@mailwas2> An HTML attachment was scrubbed... URL: From bshephar at redhat.com Tue Aug 30 06:19:05 2022 From: bshephar at redhat.com (Brendan Shephard) Date: Tue, 30 Aug 2022 16:19:05 +1000 Subject: [heat][elections] PTL Candidacy for Antelope Cycle Message-ID: Hi all, I first wanted to thank Rico for his ongoing commitment to the project over the last cycles. He has provided lots of guidance and help to the Heat project for a long time and his contribution deserves recognition. I am proposing my candidacy for Heat PTL during the Antelope cycle. I have worked with the Heat project for several years both as a user and more recently over the last 2 years as a contributor. I see great potential in the project for our users and look forward to continuing work in order to support features and functionality of the project. Some of my objectives for the next few cycles are: Remove the dependencies on legacy python-*client libraries and instead shift to the openstacksdk client library. While the legacy libraries have served us well, they are starting to show their limitations and the delta in servicibilty will only increase as each project moves towards leveraging the openstacksdk. So this change, while quite extensive will ensure future compatibility with the other OpenStack project teams. Continue ensuring Heat supports the most up-to-date and recent features provided by each project. To ensure Heat is the default and best choice for our users, we need to ensure we are able to leverage the latest available features from the complimentary OpenStack projects. This is an ongoing challenge to stay up-to-date with the changes each cycle and work towards implementing them in Heat. Thank you all for you consideration, and I look forward to the next cycle and continuing to work with you all. Regards, Brendan From tobias.rydberg at cleura.com Tue Aug 30 06:31:04 2022 From: tobias.rydberg at cleura.com (Tobias Rydberg) Date: Tue, 30 Aug 2022 08:31:04 +0200 Subject: [publiccloud-sig] Bi-weekly meeting reminder Message-ID: <5f0fedcd-ca7f-8013-15fe-7e25382fbb1a@cleura.com> Hi all, Tomorrow we will have our bi-weekly meeting at 0800 UTC in #openstack-operators. Agenda can be fond here [0]. Feel free to suggest topics to the agenda if you like. Hope to chat with you tomorrow! [0] https://etherpad.opendev.org/p/publiccloud-sig-meeting BR, Tobias Rydberg -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3626 bytes Desc: S/MIME Cryptographic Signature URL: From tkajinam at redhat.com Tue Aug 30 06:38:39 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Tue, 30 Aug 2022 15:38:39 +0900 Subject: [election][puppet] PTL Candidacy for Antelope cycle Message-ID: Hello, I'd like to announce my candidacy for the PTL role in Puppet OpenStack, to continue my PTL role for the Antelope cycle. Over the past two cycles, we've successfully improved feature coverage, platform coverage and simplicity of our modules. I'd like to list up a few items which would be our priorities during the next cycle. * Add Ubuntu 22.04 support This would be the next major change after we've completed implementation of CentOS 9 Stream support. We already adapted to Ruby 3 as part of C9S support so I'm not aware of any huge challenges at this moment. * Complete migration to Puppet 7 Currently our modules still support both Puppet 6 and 7. However once we complete migration to Ubuntu 22.04, we complete migration from Ruby 2.x to 3.x. As Puppet officially supports Ruby 3.x since 7.7, this means we no longer maintain test coverage with Puppet 6. It's time to consider again complete migration to Puppet 7. * Improve scenario/component coverage by CI During Zed cycle we added OVN and Octavia to the integration jobs. We'll review a few remaining modules like Manila and will continue extending the component coverage. * Review unmaintained/unused modules In Puppet OpenStack projects we maintain number of modules to support multiple OpenStack components. However, some modules have not been really active and attracted no interest. In the past few cycles we have retired several modules but I'd like to continue reviewing our modules to consider retiring inactive ones. * Keep each module up to date and simple It's always important that we add support for the new features/parameters timely so that users can leverage the new capability via our modules. Thank you for your consideration. Thank you, Takashi -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.urdin at binero.com Tue Aug 30 09:12:20 2022 From: tobias.urdin at binero.com (Tobias Urdin) Date: Tue, 30 Aug 2022 09:12:20 +0000 Subject: =?utf-8?B?W2FsbF3CoFtvc2xvLm1lc3NhZ2luZ10gSW50ZXJlc3QgaW4gY29sbGFib3Jh?= =?utf-8?Q?tion_on_a_NATS_driver?= Message-ID: Hello, Please note that the list provided was just from my own notes and I didn?t put much emphasis on making it complete or accurate. If you think back to the early days of OpenStack the pain was in OpenStack itself, today it?s a challenge instead to manage OpenStack scale and fit into the things we see, like moving OpenStack into containers (for example managed by Kubernetes). I would like OpenStack design to more embrace the distributed, cloud-native approach that Ceph and Kubernetes brings, and the resiliency of Ceph (and yes, I?m a major Ceph enthusiast) and there I?m seeing messaging and database as potential blockers to continue on that path. I?m not saying that?s the only thing, for example stuff like [1] _really_ matter in real world deployments so working on other OpenStack parts for resilience is also crucial. There is things I?m interested in that would impact the overall design, I can list some of them but I think it might be to broad of a subject for this thread. * Like brought up my Mohammed Naser before, I would like to investigate an effort for containers as an OpenStack deliverable for projects * Investigate cloud-native, highly available and resilient alternatives for messaging and database * Make OpenStack more resilient with above and [1] is a great example on what I mean I?ll respond to your questions with my views inline below. P.S The opionions stated here is my own personal opinions and should not be assumed to be the opinions of any other entity. Best regards Tobias [1] https://bugs.launchpad.net/neutron/+bug/1987780 Begin forwarded message: From: Rados?aw Piliszek > Subject: Re: [all] [oslo.messaging] Interest in collaboration on a NATS driver Date: 29 August 2022 at 18:33:15 CEST To: Tobias Urdin > Cc: openstack-discuss > Hi Tobias, Good to see RMQ alternatives appearing. A couple of questions from me. On Mon, 29 Aug 2022 at 15:47, Tobias Urdin > wrote: ? Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could) What do you mean? Is NATS only a router? (I have not used this technology yet.) It does not persist messages, if there is no backend to respond, the message will be dropped without any action hence why I want the RPC layer in oslo.messaging (that already does acknowledge calls in the driver) to notify client side that it?s being processed before client side waits for reply. ? Find or maintain a NATS python library that doesn't use async like the official one does Why is async a bad thing? For messaging it's the right thing. This is actually just myself, I would love to just being able to use the official that is async based instead it?s just me that doesn?t understand how that would be implemented. https://github.com/nats-io/nats.py instead of the one in POC https://github.com/Gr1N/nats-python which has a lot of shortcomings and issues, my idea was just to investigate if was even possible to implement in a feasible way. Finally, have you considered just trying out ZeroMQ? Does not exist anymore. I mean, NATS is probably an overkill for OpenStack services since the majority of them stay static on the hosts they control (think nova-compute, neutron agents - and these are also the pain points that operators want to ease). I don?t think it it, or even if it is, why not use a better solution or stable approach than RabbitMQ? This is also the whole point, I don?t want OpenStack to become or be static, I want it to be more dynamic and cloud-native in it?s approach and support viable integrations that takes it there, we cannot live in the past forever, let?s envision and dream of the future as we want it! :) NATS seems to me to cater for a different use case. It actually caters to a lot of use cases. I might be wrong because I have read only the front page but that is the feeling I have. Cheers, Radek -yoctozepto Begin forwarded message: From: Rados?aw Piliszek > Subject: Re: [all] [oslo.messaging] Interest in collaboration on a NATS driver Date: 29 August 2022 at 20:20:15 CEST To: Sean Mooney > Cc: Tobias Urdin >, openstack-discuss > On Mon, 29 Aug 2022 at 20:03, Sean Mooney > wrote: Finally, have you considered just trying out ZeroMQ? ZeroMQ used to be supported in the past but then it was remvoed if i understand correctly it only supprot notificaiton or RPC but not both i dont recall which but perhapse im miss rememebrign on that point. I believe it would be better suited for RPC than notifications, at least in the simplest form. As it?s advertised as scalable and performant I would argue that, why not use it for notifications as well? If anything according to your observations above it?s more suited for that than RPC, even though request-reply (that we can use for RPC) is a strong first-class implementation in NATS as well. I mean, NATS is probably an overkill for OpenStack services since the majority of them stay static on the hosts they control (think nova-compute, neutron agents - and these are also the pain points that operators want to ease). its not any more overkill then rabbitmq is True that. Probably. I agree with that, also if you think about it, how many issues related to stability, performance and outages is related to RabbitMQ? It?s quite a few if you ask me. Just the resource utilization and clustering in RabbitMQ makes me feel bad. It?s here that I mean that the cloud-native and scalable implementation would shine, you should be able to rely on it, if sometimes dies so what, things should just continue to work and that?s not my experience with RabbitMQ but it is my experience with Ceph because in the end the design really matters. i also dont know waht you mean when you say "majority of them stay static on the hosts they control" NATS is intended a s a cloud native horrizontally scaleable message bus. which is exactly what openstack need IMO. NATS seems to be tweaked for "come and go" situations which is an exception in the OpenStack world, not the rule (at least in my view). I mean, one normally expects to have a preset number of hypervisors and not them coming and going (which, I agree, is a nice vision, could be a proper NATS driver, with more awareness in the client projects I believe, would be an enabler for more dynamic clouds). It could, but it also doesn?t have to be that. Why not strive for more dynamic? I don?t think anybody would argue that more dynamic is a bad thing even if you were to have a more static approach to your cloud. Cheers, Radek -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Tue Aug 30 09:43:18 2022 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 30 Aug 2022 11:43:18 +0200 Subject: [all] [oslo.messaging] Interest in collaboration on a NATS driver In-Reply-To: References: Message-ID: Hi Tobias, Thank you for the detailed response. My query was to gather more insight on what your views/goals are and the responses do not disappoint. More queries inline below. On Tue, 30 Aug 2022 at 11:15, Tobias Urdin wrote: > I would like OpenStack design to more embrace the distributed, cloud-native approach that Ceph and Kubernetes brings, and the resiliency of Ceph (and yes, I?m a major Ceph enthusiast) > and there I?m seeing messaging and database as potential blockers to continue on that path. We both definitely agree that resiliency needs to be improved. > On Mon, 29 Aug 2022 at 15:47, Tobias Urdin wrote: > > ? Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could) > > > What do you mean? Is NATS only a router? (I have not used this technology yet.) > > > It does not persist messages, if there is no backend to respond, the message will be dropped without any action hence why I > want the RPC layer in oslo.messaging (that already does acknowledge calls in the driver) to notify client side that it?s being processed > before client side waits for reply. Ack, that makes sense. To let the client know whether there is any consumer that accepted that message. That said, bear in mind the consumer might accept and then die. If NATS does not keep track of this message further, then the resilience is handicapped. > ? Find or maintain a NATS python library that doesn't use async like the official one does > > > Why is async a bad thing? For messaging it's the right thing. > > > This is actually just myself, I would love to just being able to use the official that is async based instead it?s just > me that doesn?t understand how that would be implemented. > > https://github.com/nats-io/nats.py instead of the one in POC https://github.com/Gr1N/nats-python which has a lot of shortcomings and issues, my > idea was just to investigate if was even possible to implement in a feasible way. Ack, I see. > > Finally, have you considered just trying out ZeroMQ? > > > Does not exist anymore. I think I might have been misunderstood as ZeroMQ still exists. ;-) You probably mean the oslo.messaging backend that it's gone. I meant that *maybe* it would be good to discuss a reimplementation of that which considers the current OpenStack needs. I would also emphasise that I imagine RPC and notification messaging layers to have different needs and likely requiring different approaches. > I mean, NATS is probably an overkill for OpenStack services since the > majority of them stay static on the hosts they control (think > nova-compute, neutron agents - and these are also the pain points that > operators want to ease). > > > I don?t think it it, or even if it is, why not use a better solution or stable approach than RabbitMQ? > > This is also the whole point, I don?t want OpenStack to become or be static, I want it to be more dynamic and > cloud-native in it?s approach and support viable integrations that takes it there, we cannot live in the past forever, let?s envision and dream of the future as we want it! :) Ack, you want it more dynamic and that's ok now that I understand your view. That said, my whole point regarding this boils down to the usual design principles that remind us that there are, more often than not, some tradeoffs that have been made to build some tech - NATS is likely no different: if it promises features A, B, C, D, and we need only A and B, then *maybe* it has some constraints on the A and B we want or we might miss that it lacks feature E or C/D add useless overhead. The point is to have that in mind before going too deep, try to spot and tackle such issues early on. > > Finally, have you considered just trying out ZeroMQ? > > ZeroMQ used to be supported in the past but then it was remvoed > if i understand correctly it only supprot notificaiton or RPC but not both > i dont recall which but perhapse im miss rememebrign on that point. > > > I believe it would be better suited for RPC than notifications, at > least in the simplest form. > > > As it?s advertised as scalable and performant I would argue that, why not use it for notifications as well? If anything according to > your observations above it?s more suited for that than RPC, even though request-reply (that we can use for RPC) is a strong first-class implementation in NATS as well. Well, that was about ZMQ. I mostly meant that synchronous RPC (that happens in OpenStack a lot) adapts very well to what can be achieved with ZeroMQ without a lot of fuss. > I mean, NATS is probably an overkill for OpenStack services since the > majority of them stay static on the hosts they control (think > nova-compute, neutron agents - and these are also the pain points that > operators want to ease). > > its not any more overkill then rabbitmq is > > > True that. Probably. > > > I agree with that, also if you think about it, how many issues related to stability, performance and outages is related to RabbitMQ? It?s quite a few if you ask me. > Just the resource utilization and clustering in RabbitMQ makes me feel bad. Here we definitely agree. As we used to discuss this before in this community, we are not sure if this is RabbitMQ's fault of course or if we just don't know how to utilise it properly. ;-) Anyhow, RMQ being in Erlang does not help as it's more like a black box to most of us here I believe (please raise your hands if you can debug an EVM failure). > It?s here that I mean that the cloud-native and scalable implementation would shine, you should be able to rely on it, if sometimes dies so what, things should just > continue to work and that?s not my experience with RabbitMQ but it is my experience with Ceph because in the end the design really matters. "Design really matters" is something that I remind myself and others almost every day. Hence why this discussion is taking place now. :D > > i also dont know waht you mean when you say > "majority of them stay static on the hosts they control" > > NATS is intended a s a cloud native horrizontally scaleable message bus. > which is exactly what openstack need IMO. > > > NATS seems to be tweaked for "come and go" situations which is an > exception in the OpenStack world, not the rule (at least in my view). > I mean, one normally expects to have a preset number of hypervisors > and not them coming and going (which, I agree, is a nice vision, could > be a proper NATS driver, with more awareness in the client projects I > believe, would be an enabler for more dynamic clouds). > > > It could, but it also doesn?t have to be that. Why not strive for more dynamic? I don?t think anybody would argue that more dynamic is a bad thing > even if you were to have a more static approach to your cloud. This has been discussed already above - tradeoffs. One cannot just make up a hypervisor and need to spin up a nova-compute for it. It's a different story for non-resource-bound services that NATS is advertised for. You need more processing power? Sure, you spin another worker and connect it with NATS. That scalability might be coming at a price that we don't need to pay because OpenStack services are never going to scale with this level of dynamism. Finally, don't get me wrong. I love the fact that you are doing what you are doing. I just want to make sure that it goes in the right direction. Cheers, Radek -yoctozepto From tobias.urdin at binero.com Tue Aug 30 10:01:41 2022 From: tobias.urdin at binero.com (Tobias Urdin) Date: Tue, 30 Aug 2022 10:01:41 +0000 Subject: [election][puppet] PTL Candidacy for Antelope cycle In-Reply-To: References: Message-ID: <06ECDC28-F97C-41A4-ADCC-5AF391D494DE@binero.com> Hello Takashi, Thank for all your work and effort into this! I would love to see you run for another cycle. Best regards Tobias > On 30 Aug 2022, at 08:38, Takashi Kajinami wrote: > > Hello, > > > I'd like to announce my candidacy for the PTL role in Puppet OpenStack, to > continue my PTL role for the Antelope cycle. > > Over the past two cycles, we've successfully improved feature coverage, > platform coverage and simplicity of our modules. I'd like to list up a few > items which would be our priorities during the next cycle. > > * Add Ubuntu 22.04 support > This would be the next major change after we've completed implementation of > CentOS 9 Stream support. We already adapted to Ruby 3 as part of C9S support > so I'm not aware of any huge challenges at this moment. > > * Complete migration to Puppet 7 > Currently our modules still support both Puppet 6 and 7. However once we > complete migration to Ubuntu 22.04, we complete migration from Ruby 2.x to 3.x. > As Puppet officially supports Ruby 3.x since 7.7, this means we no longer > maintain test coverage with Puppet 6. It's time to consider again complete > migration to Puppet 7. > > * Improve scenario/component coverage by CI > During Zed cycle we added OVN and Octavia to the integration jobs. We'll review > a few remaining modules like Manila and will continue extending the component > coverage. > > * Review unmaintained/unused modules > In Puppet OpenStack projects we maintain number of modules to support multiple > OpenStack components. However, some modules have not been really active and > attracted no interest. In the past few cycles we have retired several modules > but I'd like to continue reviewing our modules to consider retiring inactive ones. > > * Keep each module up to date and simple > It's always important that we add support for the new features/parameters > timely so that users can leverage the new capability via our modules. > > > Thank you for your consideration. > > Thank you, > Takashi From eblock at nde.ag Tue Aug 30 11:04:08 2022 From: eblock at nde.ag (Eugen Block) Date: Tue, 30 Aug 2022 11:04:08 +0000 Subject: Cinder-volume active active setup Message-ID: <20220830110408.Horde.JzrsPeWuptf080t_8QXR1Sh@webmail.nde.ag> Hi, I didn't mean to hijack the other thread so I'll start a new one. There are some pages I found incl. Gorkas article [1], but I don't really understand yet how to configure it. We don't use any of the automated deployments (we created our own) like TripleO etc., is there any guide showing how to setup cinder-volume active/active? I see in my lab environment that python3-tooz is already installed on the control node, but how do I use it? Besides the "cluster" config option in the cinder.conf (is that defined when setting up the DLM?) what else is required? I also found this thread [2] pointing to the source code, but that doesn't really help me at this point. Any pointers to a how-to or deployment guide would be highly appreciated! Thanks, Eugen [1] https://gorka.eguileor.com/a-cinder-road-to-activeactive-ha/ [2] https://www.mail-archive.com/openstack at lists.openstack.org/msg18385.html From jean-francois.taltavull at elca.ch Tue Aug 30 12:11:32 2022 From: jean-francois.taltavull at elca.ch (=?utf-8?B?VGFsdGF2dWxsIEplYW4tRnJhbsOnb2lz?=) Date: Tue, 30 Aug 2022 12:11:32 +0000 Subject: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number In-Reply-To: <1b17c23f8982480db73cf50d04d51af7@elca.ch> References: <2aa77e24a33d48a69032f30b86e9cad8@elca.ch> <1b17c23f8982480db73cf50d04d51af7@elca.ch> Message-ID: Hello, I tried to define a Rados GW dynamic pollster and I can see, in Ceilometer logs, that it?s actually loaded. But it looks like it was not triggered, I see no trace of ceilometer connection in Rados GW logs. My definition: - name: "dynamic.radosgw.usage" sample_type: "gauge" unit: "B" value_attribute: "total.size" url_path: http:///object-store/swift/v1/admin/usage module: "awsauth" authentication_object: "S3Auth" authentication_parameters: xxxxxxxxxxxxx,yyyyyyyyyyyyy, user_id_attribute: "admin" project_id_attribute: "admin" resource_id_attribute: "admin" response_entries_key: "summary" Do I have to set an option in ceilometer.conf, or elsewhere, to get my Rados GW dynamic pollster triggered ? -JF From: Taltavull Jean-Fran?ois Sent: lundi, 29 ao?t 2022 18:41 To: 'Rafael Weing?rtner' Cc: openstack-discuss Subject: RE: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number Thanks a lot for your quick answer, Rafael ! I will explore this approach. Jean-Francois From: Rafael Weing?rtner > Sent: lundi, 29 ao?t 2022 17:54 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. You could use a different approach. You can use Dynamic pollster [1], and create your own mechanism to collect data, without needing to change Ceilometer code. Basically all hard-coded pollsters can be converted to a dynamic pollster that is defined in YML. [1] https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html#the-dynamic-pollsters-system-configuration-for-non-openstack-apis On Mon, Aug 29, 2022 at 12:51 PM Taltavull Jean-Fran?ois > wrote: Hi All, In our OpenStack deployment, API endpoints are defined by using URLs instead of port numbers and HAProxy forwards requests to the right bakend after having ACLed the URL. In the case of our object-store service, based on RadosGW, the internal API endpoint is "https:///object-store/swift/v1/AUTH_" When Ceilometer RadosGW pollster tries to connect to the RadosGW admin API with the object-store internal endpoint, the URL becomes https:///admin, as shown by HAProxy logs. This URL does not match any API endpoint from HAProxy point of view. The line of code that rewrites the URL is this one: https://opendev.org/openstack/ceilometer/src/branch/stable/wallaby/ceilometer/objectstore/rgw.py#L81 What would you think of adding a mechanism based on new Ceilometer configuration option(s) to control the URL rewriting ? Our deployment characteristics: - OpenStack release: Wallaby - Ceph and RadosGW version: 15.2.16 - deployment tool: OSA 23.2.1 and ceph-ansible Best regards, Jean-Francois -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Tue Aug 30 12:16:50 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Tue, 30 Aug 2022 09:16:50 -0300 Subject: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number In-Reply-To: References: <2aa77e24a33d48a69032f30b86e9cad8@elca.ch> <1b17c23f8982480db73cf50d04d51af7@elca.ch> Message-ID: Yes, you will need to enable the metric/pollster to be processed. That is done via "polling.yml" file. Also, do not forget that you will need to configure Ceilometer to push this new metric. If you use Gnocchi as the backend, you will need to change/update the gnocchi resource YML file. That file maps resources and metrics in the Gnocchi backend. The configuration resides in Ceilometer. You can create/define new resource types and map them to specific metrics. It depends on how you structure your solution. P.S. You do not need to use "authentication_parameters". You can use the barbican integration to avoid setting your credentials in a file. On Tue, Aug 30, 2022 at 9:11 AM Taltavull Jean-Fran?ois < jean-francois.taltavull at elca.ch> wrote: > Hello, > > > > I tried to define a Rados GW dynamic pollster and I can see, in Ceilometer > logs, that it?s actually loaded. But it looks like it was not triggered, I > see no trace of ceilometer connection in Rados GW logs. > > > > My definition: > > > > - name: "dynamic.radosgw.usage" > > sample_type: "gauge" > > unit: "B" > > value_attribute: "total.size" > > url_path: http:///object-store/swift/v1/admin/usage > > module: "awsauth" > > authentication_object: "S3Auth" > > authentication_parameters: xxxxxxxxxxxxx,yyyyyyyyyyyyy, > > user_id_attribute: "admin" > > project_id_attribute: "admin" > > resource_id_attribute: "admin" > > response_entries_key: "summary" > > > > Do I have to set an option in ceilometer.conf, or elsewhere, to get my > Rados GW dynamic pollster triggered ? > > > > -JF > > > > *From:* Taltavull Jean-Fran?ois > *Sent:* lundi, 29 ao?t 2022 18:41 > *To:* 'Rafael Weing?rtner' > *Cc:* openstack-discuss > *Subject:* RE: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > Thanks a lot for your quick answer, Rafael ! > > I will explore this approach. > > > > Jean-Francois > > > > *From:* Rafael Weing?rtner > *Sent:* lundi, 29 ao?t 2022 17:54 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > You could use a different approach. You can use Dynamic pollster [1], and > create your own mechanism to collect data, without needing to change > Ceilometer code. Basically all hard-coded pollsters can be converted to a > dynamic pollster that is defined in YML. > > > > [1] > https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html#the-dynamic-pollsters-system-configuration-for-non-openstack-apis > > > > > > On Mon, Aug 29, 2022 at 12:51 PM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > Hi All, > > In our OpenStack deployment, API endpoints are defined by using URLs > instead of port numbers and HAProxy forwards requests to the right bakend > after having ACLed the URL. > > In the case of our object-store service, based on RadosGW, the internal > API endpoint is "https:///object-store/swift/v1/AUTH_" > > When Ceilometer RadosGW pollster tries to connect to the RadosGW admin API > with the object-store internal endpoint, the URL becomes > https:///admin, as shown by HAProxy logs. This URL does not match > any API endpoint from HAProxy point of view. The line of code that rewrites > the URL is this one: > https://opendev.org/openstack/ceilometer/src/branch/stable/wallaby/ceilometer/objectstore/rgw.py#L81 > > What would you think of adding a mechanism based on new Ceilometer > configuration option(s) to control the URL rewriting ? > > Our deployment characteristics: > - OpenStack release: Wallaby > - Ceph and RadosGW version: 15.2.16 > - deployment tool: OSA 23.2.1 and ceph-ansible > > > Best regards, > Jean-Francois > > > > -- > > Rafael Weing?rtner > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Tue Aug 30 12:23:02 2022 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 30 Aug 2022 14:23:02 +0200 Subject: [election][nova][placement] Candidating for the Antelope cycle. Message-ID: Hi Nova and Placement folks, Yet again, I want to help our community for this cycle by working on papers. If you accept me for being Antelope Nova/Placement PTL, it would be my last cycle for it after already two of them (unless nobody runs for this). For the moment, we are not yet done with Zed but I'm happy to see that we have 13 open blueprints that are reviewed and where some of them are from contributors that aren't everyday Nova developers. That's the reason why I wanted to help those on-off contributors that have not a large time for upstream work and that's why we added a Gerrit review label value for them (saying "I want to review as a contributor"). We also tried a new way for discussing about open changes creating a new API microversion and even if it hadn't really needed, I think we could use this way next time if needed. So, what would I like for us to discuss during Antelope then ? Well, at least three points : - given Antelope will be the first 'tick' release, I want to make sure we won't have any issues in Nova or Placement for this cycle, and I also would like to prepare for the 'tock' release. - I would like to discuss with our community how to find some opportunities for new contributors that would like to work on Nova. - as we discussed with operators in the Berlin Summit, I would like to find a way for them to discuss with us and see how they could be in our community even if they're not developers. At least, I'd love to see some operators in our meetings and in our PTG sessions. Thanks, -Sylvain (again, not a leader but just a herder.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Tue Aug 30 13:19:00 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Tue, 30 Aug 2022 18:49:00 +0530 Subject: [cinder] This week's meeting will be in video+IRC Message-ID: Hello Argonauts, As we keep last meeting of the month in video + IRC mode, this week's meeting (tomorrow) will be held in video + IRC mode with details as follows: Date: 31st August, 2022 Time: 1400 UTC Meeting link: https://bluejeans.com/556681290 IRC Channel: #openstack-meeting-alt Make sure you're connected to both the bluejeans meeting and IRC since we do roll call and also discuss topics on IRC (if the author is more comfortable in written format). Thanks and regards Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Tue Aug 30 14:07:02 2022 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 30 Aug 2022 17:07:02 +0300 Subject: [TripleO] quay.io namespace local mirror using harbor. Quay.io do not list repos/images Message-ID: Hi all, I am trying to make a local repo/copy to be able to deploy OSP onsite behind firewall. I have found out from harbor, that Quay.io do not return repos/image anonymously also I have created account in quay.io with my RH account. Later reset quay.io password, so quay.io connection verify works. I can get it as a proxy, or exact image, but not full offline installation. Could you help/suggest any tool that could do a local repo? I have also tried foreman, but it only can sync repo in namespace, not whole namespace. Thank you for your advice and comments. -- Ruslanas G?ibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ozzzo at yahoo.com Tue Aug 30 14:18:47 2022 From: ozzzo at yahoo.com (Albert Braden) Date: Tue, 30 Aug 2022 14:18:47 +0000 (UTC) Subject: [kolla] [nova] Rogue AggregateMultiTenancyIsolation filter In-Reply-To: References: <705832880.401721.1661542524485.ref@mail.yahoo.com> <705832880.401721.1661542524485@mail.yahoo.com> Message-ID: <1292776284.1760031.1661869127485@mail.yahoo.com> You're right; I was thinking AZ but my fingers typed region. The problem started Friday afternoon without anything being changed AFAIK. It seems to have fixed itself over the weekend. If it starts happening again. I'll enable debugging. Thanks for your advice! On Monday, August 29, 2022, 04:54:50 AM EDT, Pierre Riteau wrote: Hello, First, a point about terminology: you use the term "region", but I think you meant an availability zone. Regions are something entirely different in OpenStack. I would suggest the following: - first, enable debug logging in Nova. You can do so in Kolla by setting?nova_logging_debug to true and reconfiguring nova. As you can see in the code, there are several log statements at the debug level which would help understand why candidate hosts are rejected by this filter:?https://opendev.org/openstack/nova/src/branch/stable/train/nova/scheduler/filters/aggregate_multitenancy_isolation.py- second, maybe check if you have other aggregates that are set to the "open" AZ and would have the filter_tenant_id property on them? On Fri, 26 Aug 2022 at 21:51, Albert Braden wrote: We're running kolla train, and we use the AggregateMultiTenancyIsolation for some aggregates by setting filter_tenant_id. Today customers reported build failures when they try to build VMs in a non-filtered region. I am able to duplicate the issue: os server create --image --flavor medium --network private --availability-zone open alberttest1 | 5dd44105-2045-4d53-be43-5f521ddb420b | alberttest1 | ERROR? |? ? ? ? ? | | medium | 2022-08-26 18:39:38.977 30 INFO nova.filters [req-342d065a-cd47-4edf-bc4b-3f84b34ab97c 25b53bdb96fb5f9f6e7331d7e03eee0a12c45746a9e8b978858b2140a5275a09 fdcf1553db504c8f82a2b54851a4c262 - 8793b235debf49e6aba6bd1e2bf65360 8793b235debf49e6aba6bd1e2bf65360] Filtering removed all hosts for the request with instance ID '5dd44105-2045-4d53-be43-5f521ddb420b'. Filter results: ['ComputeFilter: (start: 50, end: 50)', 'RetryFilter: (start: 50, end: 50)', 'AggregateNumInstancesFilter: (start: 50, end: 50)', 'AvailabilityZoneFilter: (start: 50, end: 6)', 'AggregateInstanceExtraSpecsFilter: (start: 6, end: 6)', 'ImagePropertiesFilter: (start: 6, end: 6)', 'ServerGroupAntiAffinityFilter: (start: 6, end: 6)', 'ServerGroupAffinityFilter: (start: 6, end: 6)', 'AggregateMultiTenancyIsolation: (start: 6, end: 0)'] Region "open" does not have any properties specified, so the AggregateMultiTenancyIsolation filter should not be active. qde3:admin]$ os aggregate show open|grep properties | properties? ? ? ? |? ? ? ? ? ? ? ? ? This is what we would see if it had the filter active: :qde3:admin]$ os aggregate show closed|grep properties | properties? ? ? ? | filter_tenant_id='1c41e088b35f4b438023d081a6f70292,3e9727aaf03e4459a176c28dbdb3965e,f9b4b7dc8c614bb09d66657afc3b21cd,121a5da3dd0b489986908bee7eea61ae,d580ccc4b07e478a9efc2d71acf04cc1,107e14eeda01400988e58f5aac8b2772', closed='true'? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? What could be causing this filter to remove hosts when we haven't set filter_tenant_id for that aggregate? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramishra at redhat.com Tue Aug 30 14:29:30 2022 From: ramishra at redhat.com (Rabi Mishra) Date: Tue, 30 Aug 2022 19:59:30 +0530 Subject: [election][tripleo] PTL Candidacy for Antelope cycle Message-ID: Hi All, I would like to nominate myself for Antelope cycle TripleO PTL. As some of you would know, I have been part of the OpenStack community for a long time and have been a core contributor for TripleO and Heat. I have also served as Heat PTL for Ocata cycle. James has done a great job as PTL for the last few cycles. I would like to take the opportunity to thank him for all his effort and we all would agree that he needs a well deserved break. I am looking forward to take the opportunity to help the community to achieve some of the already planned goals and in-progress workstreams like standalone roles, multi-rhel and other challenges that come along the way. Also, as before, our focus would continue to be on review prioritization, in progress work streams and collaboration on common priorities. Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Tue Aug 30 14:29:35 2022 From: johfulto at redhat.com (John Fulton) Date: Tue, 30 Aug 2022 10:29:35 -0400 Subject: [TripleO] quay.io namespace local mirror using harbor. Quay.io do not list repos/images In-Reply-To: References: Message-ID: On Tue, Aug 30, 2022 at 10:08 AM Ruslanas G?ibovskis wrote: > > Hi all, > > I am trying to make a local repo/copy to be able to deploy OSP onsite behind firewall. > > I have found out from harbor, that Quay.io do not return repos/image anonymously also I have created account in quay.io with my RH account. Later reset quay.io password, so quay.io connection verify works. > > I can get it as a proxy, or exact image, but not full offline installation. > > Could you help/suggest any tool that could do a local repo? > > I have also tried foreman, but it only can sync repo in namespace, not whole namespace. > > Thank you for your advice and comments. Can you use your undercloud as a registry for your overcloud as described here? https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/container_image_prepare.html > > -- > Ruslanas G?ibovskis > +370 6030 7030 From johfulto at redhat.com Tue Aug 30 14:35:32 2022 From: johfulto at redhat.com (John Fulton) Date: Tue, 30 Aug 2022 10:35:32 -0400 Subject: [election][tripleo] PTL Candidacy for Antelope cycle In-Reply-To: References: Message-ID: On Tue, Aug 30, 2022 at 10:32 AM Rabi Mishra wrote: > > Hi All, > > I would like to nominate myself for Antelope cycle TripleO PTL. > > As some of you would know, I have been part of the OpenStack community > for a long time and have been a core contributor for TripleO and Heat. > I have also served as Heat PTL for Ocata cycle. > > James has done a great job as PTL for the last few cycles. I would like > to take the opportunity to thank him for all his effort and we all would > agree that he needs a well deserved break. > > I am looking forward to take the opportunity to help the community to > achieve some of the already planned goals and in-progress workstreams > like standalone roles, multi-rhel and other challenges that come along > the way. > > Also, as before, our focus would continue to be on review prioritization, > in progress work streams and collaboration on common priorities. +1000 thanks Rabi! > > Regards, > Rabi Mishra > From ralonsoh at redhat.com Tue Aug 30 14:37:33 2022 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Tue, 30 Aug 2022 16:37:33 +0200 Subject: [election][Neutron] PTL candidacy for Antelope cycle Message-ID: Hello all: I would like to announce my candidacy for the role of PTL of Neutron for the Antelope cycle. Let me introduce myself first. I started contributing to OpenStack in Liberty release (2015). I've contributed to several projects, mainly Neutron, neutron-lib, os-ken and os-vif, in addition to Nova and devstack. During the last two years I've been focused on the ML2/OVN integration, QoS and Placement related features and helping on the Neutron CI improvement and stabilization. For this new release, those are the main goals I would like to focus on: * Take care and focus on the approved and merged Neutron specs [1]. Those RFEs should be actively attended by the community, from the spec proposal to the code review. In order to increase the attention of the community on these new RFEs, new ways of tracking them should be proposed (a topic that should be discussed during the PTG). For example, having a core reviewer ?godfather? for each RFE. * Continue with the improvement of CI stability. The job done during the last 2 or 3 releases has been impressive, probably the hardest and the most continued effort on the CI in the Neutron community ever. We *must* continue with this effort and the current processes to track the healthiness of the CI. * Start working on the smart NIC / hardware offload testing. The number of backends (ML2/OVS, ML2/OVN, ML2/SR-IOV) make non-viable to test any possible combination. But at least, depending on the available hardware, we would be able to test the stability of those backends with the newest hardware offload NICs. Note: that will imply an external CI support. * Work with users and operators, providing an active channel with them. The goal is to attract customers to be actively involved in the community, participating in the Neutron meetings (team meeting, CI meeting, drivers meeting) or even creating a specific meeting with them in order to capture new needs or issues. * Live migration improvement, specially in ML2/OVN. This feature is still being tested and has not proven to be very stable. There are several core OVN and Neutron efforts right now but we still need to make this feature stable enough to be delivered to customers. A part from those main goals, we should always keep an eye on: * The SQLAlchemy 2.0 migration and any possible issue detected. * The Neutron's stadium ecosystem alive and healthy: this is a permanent goal in any release. The Neutron ecosystem is wide and diverse and it is maintained with few resources. We should focus on those active projects and communities that support their respective repositories. * Still closing the ML2/OVS - ML2/OVN feature gap, that is smaller every cycle. Thank you in advance. Rodolfo Alonso Hernandez (ralonsoh, ralonsoh at redhat.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdobreli at redhat.com Tue Aug 30 14:46:11 2022 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Tue, 30 Aug 2022 16:46:11 +0200 Subject: [election][tripleo] PTL Candidacy for Antelope cycle In-Reply-To: References: Message-ID: <00cae13a-fc4a-c419-f770-c3bef50fa61f@redhat.com> On 8/30/22 16:35, John Fulton wrote: > On Tue, Aug 30, 2022 at 10:32 AM Rabi Mishra wrote: >> >> Hi All, >> >> I would like to nominate myself for Antelope cycle TripleO PTL. >> >> As some of you would know, I have been part of the OpenStack community >> for a long time and have been a core contributor for TripleO and Heat. >> I have also served as Heat PTL for Ocata cycle. >> >> James has done a great job as PTL for the last few cycles. I would like >> to take the opportunity to thank him for all his effort and we all would >> agree that he needs a well deserved break. >> >> I am looking forward to take the opportunity to help the community to >> achieve some of the already planned goals and in-progress workstreams >> like standalone roles, multi-rhel and other challenges that come along >> the way. >> >> Also, as before, our focus would continue to be on review prioritization, >> in progress work streams and collaboration on common priorities. > > +1000 thanks Rabi! +1, thank you! > >> >> Regards, >> Rabi Mishra >> > > -- Best regards, Bogdan Dobrelya, Irc #bogdando From chkumar at redhat.com Tue Aug 30 14:54:35 2022 From: chkumar at redhat.com (Chandan Kumar) Date: Tue, 30 Aug 2022 20:24:35 +0530 Subject: [election][tripleo] PTL Candidacy for Antelope cycle In-Reply-To: References: Message-ID: On Tue, Aug 30, 2022 at 8:07 PM Rabi Mishra wrote: > > Hi All, > > I would like to nominate myself for Antelope cycle TripleO PTL. > > As some of you would know, I have been part of the OpenStack community > for a long time and have been a core contributor for TripleO and Heat. > I have also served as Heat PTL for Ocata cycle. > > James has done a great job as PTL for the last few cycles. I would like > to take the opportunity to thank him for all his effort and we all would > agree that he needs a well deserved break. > > I am looking forward to take the opportunity to help the community to > achieve some of the already planned goals and in-progress workstreams > like standalone roles, multi-rhel and other challenges that come along > the way. > > Also, as before, our focus would continue to be on review prioritization, > in progress work streams and collaboration on common priorities. Thank you James for amazing work as a PTL for previous cycles. Thank you Rabi for stepping up for Tripleo PTL. With Regards, Chandan Kumar From tkajinam at redhat.com Mon Aug 29 16:22:46 2022 From: tkajinam at redhat.com (Takashi Kajinami) Date: Tue, 30 Aug 2022 01:22:46 +0900 Subject: [TripleO Wallaby] - Multi-attach Volume showing different files In-Reply-To: References: Message-ID: I'm afraid what you are trying is completely wrong. In case you need a shared file system then you should use a different technology like Manila. Multiattach in cinder allows multiple VMS to access the same block device data but it does NEVER provide any mechanism to guarantee consistency at file system level. Popular filesystems such as xfs never protect concurrent IO from multiple machines and If you mount the filesystem on the shared disk by multiple vms and write on it concurrently then you'd end up with a corrupted filesystem. Usually when you have multiattach-ed block devices then you need to implement a mechanism to prevent concurrent access (eg. Pacemaker) On Tue, Aug 30, 2022 at 1:06 AM Lokendra Rathour wrote: > Hi Team, > It worked if I redo the VM creation with creation one more volume of > multi-attach type. > Thanks once again for the same. > > we can mark this thread as closed. > > > On Mon, Aug 29, 2022 at 11:18 AM Lokendra Rathour < > lokendrarathour at gmail.com> wrote: > >> Hi Team, >> On TripleO Wallaby deployment, I have tried creating multi-attach type >> volume, which if I am attaching this volume to two VM, it is getting >> attached. >> After mounting the same in each VM, if I am creating the folders in one >> VM at the mount path, >> I am not seeing the same on the other VM at the mount path >> [image: image.png] >> >> Ideally if the backend volume is same and if I am understanding the idea >> correctly, then the content should be same in both the location, as the >> back is same. >> Checking at horizon I also see that Volume is showing attached to both >> the VMs. >> >> >> [image: image.png] >> >> Document followed to create this: >> https://docs.openstack.org/cinder/latest/admin/volume-multiattach.html >> >> Please advice >> >> -- >> ~ Lokendra >> skype: lokendrarathour >> >> >> > > -- > ~ Lokendra > skype: lokendrarathour > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 80246 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 64122 bytes Desc: not available URL: From fsb4000 at yandex.ru Mon Aug 29 16:35:32 2022 From: fsb4000 at yandex.ru (Igor Zhukov) Date: Mon, 29 Aug 2022 23:35:32 +0700 Subject: [Neutron] How to add Fake ML2 extension to Neutron? In-Reply-To: References: <5523221661172836@myt6-bbc622793f1b.qloud-c.yandex.net> <2183551661263463@myt5-b646bde4b8f3.qloud-c.yandex.net> <4531261661388533@vla5-81f3f2eec11f.qloud-c.yandex.net> Message-ID: <12202221661790932@myt6-c5d5e03858f2.qloud-c.yandex.net> Hi. I think I finished it. I did similar to https://github.com/salv-orlando/hdn/commit/8a82edcd1abb5d9f381b166b539efdcd82cceb03 but I also created EXPAND_HEAD and CONTRACT_HEAD But I got errors with neutron-db-manage revision command and I just added myself `op.create_table...` to `liberty_exp_placeholder.py` to `def upgrade():` and now my extension driver works and new neutron maria db table also is created when I run `neutron-db-manage upgrade heads` I need more experience with python itself and neutron in particular, but I'm glad that I was able to add new attributes to networks in neutron. Thank you for your time! > Hi,1.) The migration files are responsible to create the schema during deployment, and there is a helper utility for it neutron-db-manage (see [1], actually similar tools exists for all openstack projects at least I know many) > With this tool you can generate an empty migration template (see the neutron-db-manage revision command for help) > The Neutron migration scripts can be found here: https://opendev.org/openstack/neutron/src/branch/master/neutron/db/migration/alembic_migrations/versions > Of course there fill be a similar tree for all Networking projects. > (NOTE: Since perhaps Ocata we have only expand scripts to make upgrade easier for example) > Devstack and other deployment tools upgrade the db, but with neutron-db-manage you can do it manually with neutron-db-manage upgrade heads, you can check in the db after it if the schema is as you expected. > > 2.) For Neutron the schema again written under models, here: https://opendev.org/openstack/neutron/src/branch/master/neutron/db/models, it looks like a duplication as it is nearly the same as the migration script, but this code will be used by Neutron itself not only during the deployment or upgrade. > For some stadium projects it is possible as I remember the *_db file contains the schema description and the code that actually uses the db to fetch or store values in it, like bgpvpn (See [2]). > In Neutron (and in many other Openstack projects, if not all) we have another layer over the the which is the OVO (Oslo Versioned Objects), and that is used in most places and that hides the actual accessing of the db with high level python classes (see: https://opendev.org/openstack/neutron/src/branch/master/neutron/objects ) > > [1]: https://docs.openstack.org/neutron/latest/contributor/alembic_migrations.html > [2]: https://opendev.org/openstack/networking-bgpvpn/src/branch/master/networking_bgpvpn/neutron/db/bgpvpn_db.py > > Igor Zhukov ezt ?rta (id?pont: 2022. aug. 25., Cs, 2:48): > >> Hi Lajos. >> >> Thank you. >> >> I have a progress. I think my fake extension works. >> >> I added >> >> ``` >> >> extensions.register_custom_supported_check( >> >> "vpc_extension", lambda: True, plugin_agnostic=False >> >> ) >> >> ``` >> >> to >> >> ``` >> >> class Vpc(api_extensions.ExtensionDescriptor): >> >> extensions.register_custom_supported_check( >> >> "vpc_extension", lambda: True, plugin_agnostic=False >> >> ) >> >> ... >> >> ``` >> >> and I use ml2 extension driver without any new plugin. https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L44 >> >> I tested it with python neutronclientapi. So I can change my new attribute (neutron.update_network(id, {'network': {'new_attribute': some string }})) >> >> and I see my changes (neutron.list_networks(name='demo-net')) >> >> I'm close to the end. >> >> Now I'm using modifed `TestExtensionDriver(TestExtensionDriverBase):`. It works but It stores the data locally. >> >> And I want to use class TestDBExtensionDriver(TestExtensionDriverBase): (https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L169) >> >> I tried to use it but I got such errors in neutron-server.log: "Table 'neutron.myextension.networkextensions' doesn't exist" >> >> How can I create a new table? >> >> I saw https://docs.openstack.org/neutron/latest/contributor/alembic_migrations.html and https://github.com/openstack/neutron-vpnaas/tree/master/neutron_vpnaas/db but I still don't understand. >> >> I mean I think some of the neutron_vpnaas/db files are generated. Are neutron_vpnaas/db/migration/alembic_migrations/versions generated? >> >> Which files I should create(their names, I think I can copy from neutron_vpnaas/db/) and what commands to type to create one new table: https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L136-L144 ? >> >>> Hi Igor,The line which is interesting for you: "Extension vpc_extension not supported by any of loaded plugins" >> >>> In core Neutron for ml2 there is a list of supported extension aliases: >> >>> https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L200-L239 >> >>> >> >>> And there is a similar for l3 also: >> >>> https://opendev.org/openstack/neutron/src/branch/master/neutron/services/l3_router/l3_router_plugin.py#L98-L110 >> >>> >> >>> Or similarly for QoS: >> >>> https://opendev.org/openstack/neutron/src/branch/master/neutron/services/qos/qos_plugin.py#L76-L90 >> >>> >> >>> So you need a plugin that uses the extension. >> >>> >> >>> Good luck :-) >> >>> Lajos Katona (lajoskatona) >> >>> >> >>> Igor Zhukov ezt ?rta (id?pont: 2022. aug. 23., K, 16:04): >> >>> >> >>>> Hi again! >> >>>> >> >>>> Do you know how to debug ML2 extension drivers? >> >>>> >> >>>> I created folder with two python files: vpc/extensions/vpc.py and vpc/plugins/ml2/drivers/vpc.py (also empty __init__.py files) >> >>>> >> >>>> I added to neuron.conf >> >>>> >> >>>> api_extensions_path = /path/to/vpc/extensions >> >>>> >> >>>> and I added to ml2_ini.conf >> >>>> >> >>>> extension_drivers = port_security, vpc.plugins.ml2.drivers.vpc:VpcExtensionDriver >> >>>> >> >>>> and my neutron.server.log has: >> >>>> >> >>>> INFO neutron.plugins.ml2.managers [-] Configured extension driver names: ['port_security', 'vpc_neutron.plugins.ml2.drivers.vpc:VpcExtensionDriver'] >> >>>> >> >>>> WARNING stevedore.named [-] Could not load vpc_neutron.plugins.ml2.drivers.vpc:VpcExtensionDriver >> >>>> >> >>>> .... >> >>>> >> >>>> INFO neutron.api.extensions [req-fd226631-b0cd-4ff8-956b-9470e7f26ebe - - - - -] Extension vpc_extension not supported by any of loaded plugins >> >>>> >> >>>> How can I find why the extension driver could not be loaded? >> >>>> >> >>>>> Hi,The fake_extension is used only in unit tests to test the extension framework, i.e. : >> >>>> >> >>>>> https://opendev.org/openstack/neutron/src/branch/master/neutron/tests/unit/plugins/ml2/drivers/ext_test.py#L37 >> >>>> >> >>>>> >> >>>> >> >>>>> If you would like to write an API extension check neutron-lib/api/definitions/ (and you can find the extensions "counterpart" under neutron/extensions in neutron repository) >> >>>> >> >>>>> >> >>>> >> >>>>> You can also check other Networking projects like networking-bgvpn, neutron-dynamic-routing to have examples of API extensions. >> >>>> >> >>>>> If you have an extension under neutron/extensions and there's somebody who uses it (see [1]) you will see it is loaded in neutron servers logs (something like this: "Loaded extension: address-group") and you can find it in the output of openstack extension list --network >> >>>> >> >>>>> >> >>>> >> >>>>> [1]: https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L200 >> >>>> >> >>>>> >> >>>> >> >>>>> Best wishes >> >>>> >> >>>>> Lajos Katona >> >>>> >> >>>>> >> >>>> >> >>>>> Igor Zhukov ezt ?rta (id?pont: 2022. aug. 22., H, 19:41): >> >>>> >> >>>>> >> >>>> >> >>>>>> Hi all! >> >>>> >> >>>>>> >> >>>> >> >>>>>> Sorry for a complete noob question but I can't figure it out ? >> >>>> >> >>>>>> >> >>>> >> >>>>>> So if I want to add Fake ML2 extension what should I do? >> >>>> >> >>>>>> >> >>>> >> >>>>>> I have neutron server installed and I have the file: https://github.com/openstack/neutron/blob/master/neutron/tests/unit/plugins/ml2/extensions/fake_extension.py >> >>>> >> >>>>>> >> >>>> >> >>>>>> How to configure neutron server, where should I put the file, should I create another files? How can I test that it works? From lokendrarathour at gmail.com Mon Aug 29 16:46:25 2022 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Mon, 29 Aug 2022 22:16:25 +0530 Subject: [TripleO Wallaby] - Multi-attach Volume showing different files In-Reply-To: References: Message-ID: Thank you for the clarification. Will check the approach suggested by you. -Lokendra On Mon, 29 Aug 2022, 21:53 Takashi Kajinami, wrote: > I'm afraid what you are trying is completely wrong. > In case you need a shared file system then you should use a different > technology like Manila. > > Multiattach in cinder allows multiple VMS to access the same block device > data but it does > NEVER provide any mechanism to guarantee consistency at file system level. > Popular filesystems > such as xfs never protect concurrent IO from multiple machines and If you > mount the filesystem > on the shared disk by multiple vms and write on it concurrently then you'd > end up with a corrupted filesystem. > > Usually when you have multiattach-ed block devices then you need to > implement a mechanism > to prevent concurrent access (eg. Pacemaker) > > > > On Tue, Aug 30, 2022 at 1:06 AM Lokendra Rathour < > lokendrarathour at gmail.com> wrote: > >> Hi Team, >> It worked if I redo the VM creation with creation one more volume of >> multi-attach type. >> Thanks once again for the same. >> >> we can mark this thread as closed. >> >> >> On Mon, Aug 29, 2022 at 11:18 AM Lokendra Rathour < >> lokendrarathour at gmail.com> wrote: >> >>> Hi Team, >>> On TripleO Wallaby deployment, I have tried creating multi-attach type >>> volume, which if I am attaching this volume to two VM, it is getting >>> attached. >>> After mounting the same in each VM, if I am creating the folders in one >>> VM at the mount path, >>> I am not seeing the same on the other VM at the mount path >>> [image: image.png] >>> >>> Ideally if the backend volume is same and if I am understanding the idea >>> correctly, then the content should be same in both the location, as the >>> back is same. >>> Checking at horizon I also see that Volume is showing attached to both >>> the VMs. >>> >>> >>> [image: image.png] >>> >>> Document followed to create this: >>> https://docs.openstack.org/cinder/latest/admin/volume-multiattach.html >>> >>> Please advice >>> >>> -- >>> ~ Lokendra >>> skype: lokendrarathour >>> >>> >>> >> >> -- >> ~ Lokendra >> skype: lokendrarathour >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 80246 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 64122 bytes Desc: not available URL: From m73hdi at gmail.com Mon Aug 29 20:04:55 2022 From: m73hdi at gmail.com (mahdi n) Date: Tue, 30 Aug 2022 00:34:55 +0430 Subject: Skyline apiserver can't connect to keystone and given error endpoint not found Message-ID: Skyline apiserver can't connect to keystone and given endpoint not found I install skyline api server and console But Apiserver get error : endpoint not found I attached pictures of this problem Please help me -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG_20220830_001553.jpg Type: image/jpeg Size: 2879092 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG_20220830_001654.jpg Type: image/jpeg Size: 4784547 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG_20220830_001318.jpg Type: image/jpeg Size: 6066283 bytes Desc: not available URL: From m73hdi at gmail.com Mon Aug 29 20:10:32 2022 From: m73hdi at gmail.com (mahdi n) Date: Tue, 30 Aug 2022 00:40:32 +0430 Subject: Fwd: Skyline apiserver can't connect to keystone and given error endpoint not found In-Reply-To: References: Message-ID: Skyline apiserver can't connect to keystone and given endpoint not found I install skyline api server and console But Apiserver get error : endpoint not found I attached pictures of this problem Please help me -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG_20220830_001553.jpg Type: image/jpeg Size: 2879092 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG_20220830_001654.jpg Type: image/jpeg Size: 4784547 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG_20220830_001318.jpg Type: image/jpeg Size: 6066283 bytes Desc: not available URL: From challengingway at hotmail.com Tue Aug 30 09:57:10 2022 From: challengingway at hotmail.com (=?utf-8?B?6ZmIIOiWhw==?=) Date: Tue, 30 Aug 2022 09:57:10 +0000 Subject: help: set ironic inspect autodiscovery failed Message-ID: Hi All, I'm trying the ironic inspect autodiscovery but failed. What I did: ?????Set the inspector.conf 1. [discovery] enroll_node_driver = ipmi [processing] permit_active_introspection = true processing_hooks = $default_processing_hooks,extra_hardware,lldp_basic Configure the python agent: inspection_callback_url: There's no node automatically added. I enter the host and read the log of python agent, no inspect related log. Could anybody give any advice about how to triage or setup correctly? Thanks & Regards, Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.urdin at binero.com Tue Aug 30 10:15:28 2022 From: tobias.urdin at binero.com (Tobias Urdin) Date: Tue, 30 Aug 2022 10:15:28 +0000 Subject: [all] [oslo.messaging] Interest in collaboration on a NATS driver In-Reply-To: References: Message-ID: <2C5866C3-DA33-4AD6-AB17-FFB6D7C842B7@binero.com> Hello, Quick reply, I agree with you on all points. Hopefully this design and collaboration discussion and go on and reach a somewhat consensus on a path forward. I should also be clear to points out that I?m with you, design matters, and there is a lot of things with this design that has not been discussed or even scratched yet, but it?s compelling and I?m here trying to get the ball rolling :) Best regards Tobias On 30 Aug 2022, at 11:43, Rados?aw Piliszek > wrote: Hi Tobias, Thank you for the detailed response. My query was to gather more insight on what your views/goals are and the responses do not disappoint. More queries inline below. On Tue, 30 Aug 2022 at 11:15, Tobias Urdin > wrote: I would like OpenStack design to more embrace the distributed, cloud-native approach that Ceph and Kubernetes brings, and the resiliency of Ceph (and yes, I?m a major Ceph enthusiast) and there I?m seeing messaging and database as potential blockers to continue on that path. We both definitely agree that resiliency needs to be improved. On Mon, 29 Aug 2022 at 15:47, Tobias Urdin > wrote: ? Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could) What do you mean? Is NATS only a router? (I have not used this technology yet.) It does not persist messages, if there is no backend to respond, the message will be dropped without any action hence why I want the RPC layer in oslo.messaging (that already does acknowledge calls in the driver) to notify client side that it?s being processed before client side waits for reply. Ack, that makes sense. To let the client know whether there is any consumer that accepted that message. That said, bear in mind the consumer might accept and then die. If NATS does not keep track of this message further, then the resilience is handicapped. ? Find or maintain a NATS python library that doesn't use async like the official one does Why is async a bad thing? For messaging it's the right thing. This is actually just myself, I would love to just being able to use the official that is async based instead it?s just me that doesn?t understand how that would be implemented. https://github.com/nats-io/nats.py instead of the one in POC https://github.com/Gr1N/nats-python which has a lot of shortcomings and issues, my idea was just to investigate if was even possible to implement in a feasible way. Ack, I see. Finally, have you considered just trying out ZeroMQ? Does not exist anymore. I think I might have been misunderstood as ZeroMQ still exists. ;-) You probably mean the oslo.messaging backend that it's gone. I meant that *maybe* it would be good to discuss a reimplementation of that which considers the current OpenStack needs. I would also emphasise that I imagine RPC and notification messaging layers to have different needs and likely requiring different approaches. I mean, NATS is probably an overkill for OpenStack services since the majority of them stay static on the hosts they control (think nova-compute, neutron agents - and these are also the pain points that operators want to ease). I don?t think it it, or even if it is, why not use a better solution or stable approach than RabbitMQ? This is also the whole point, I don?t want OpenStack to become or be static, I want it to be more dynamic and cloud-native in it?s approach and support viable integrations that takes it there, we cannot live in the past forever, let?s envision and dream of the future as we want it! :) Ack, you want it more dynamic and that's ok now that I understand your view. That said, my whole point regarding this boils down to the usual design principles that remind us that there are, more often than not, some tradeoffs that have been made to build some tech - NATS is likely no different: if it promises features A, B, C, D, and we need only A and B, then *maybe* it has some constraints on the A and B we want or we might miss that it lacks feature E or C/D add useless overhead. The point is to have that in mind before going too deep, try to spot and tackle such issues early on. Finally, have you considered just trying out ZeroMQ? ZeroMQ used to be supported in the past but then it was remvoed if i understand correctly it only supprot notificaiton or RPC but not both i dont recall which but perhapse im miss rememebrign on that point. I believe it would be better suited for RPC than notifications, at least in the simplest form. As it?s advertised as scalable and performant I would argue that, why not use it for notifications as well? If anything according to your observations above it?s more suited for that than RPC, even though request-reply (that we can use for RPC) is a strong first-class implementation in NATS as well. Well, that was about ZMQ. I mostly meant that synchronous RPC (that happens in OpenStack a lot) adapts very well to what can be achieved with ZeroMQ without a lot of fuss. I mean, NATS is probably an overkill for OpenStack services since the majority of them stay static on the hosts they control (think nova-compute, neutron agents - and these are also the pain points that operators want to ease). its not any more overkill then rabbitmq is True that. Probably. I agree with that, also if you think about it, how many issues related to stability, performance and outages is related to RabbitMQ? It?s quite a few if you ask me. Just the resource utilization and clustering in RabbitMQ makes me feel bad. Here we definitely agree. As we used to discuss this before in this community, we are not sure if this is RabbitMQ's fault of course or if we just don't know how to utilise it properly. ;-) Anyhow, RMQ being in Erlang does not help as it's more like a black box to most of us here I believe (please raise your hands if you can debug an EVM failure). It?s here that I mean that the cloud-native and scalable implementation would shine, you should be able to rely on it, if sometimes dies so what, things should just continue to work and that?s not my experience with RabbitMQ but it is my experience with Ceph because in the end the design really matters. "Design really matters" is something that I remind myself and others almost every day. Hence why this discussion is taking place now. :D i also dont know waht you mean when you say "majority of them stay static on the hosts they control" NATS is intended a s a cloud native horrizontally scaleable message bus. which is exactly what openstack need IMO. NATS seems to be tweaked for "come and go" situations which is an exception in the OpenStack world, not the rule (at least in my view). I mean, one normally expects to have a preset number of hypervisors and not them coming and going (which, I agree, is a nice vision, could be a proper NATS driver, with more awareness in the client projects I believe, would be an enabler for more dynamic clouds). It could, but it also doesn?t have to be that. Why not strive for more dynamic? I don?t think anybody would argue that more dynamic is a bad thing even if you were to have a more static approach to your cloud. This has been discussed already above - tradeoffs. One cannot just make up a hypervisor and need to spin up a nova-compute for it. It's a different story for non-resource-bound services that NATS is advertised for. You need more processing power? Sure, you spin another worker and connect it with NATS. That scalability might be coming at a price that we don't need to pay because OpenStack services are never going to scale with this level of dynamism. Finally, don't get me wrong. I love the fact that you are doing what you are doing. I just want to make sure that it goes in the right direction. Cheers, Radek -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Tue Aug 30 15:18:21 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Tue, 30 Aug 2022 17:18:21 +0200 Subject: [PTL][release] Zed Cycle Highlights Message-ID: Hi, This is a reminder, that *this week* is the week for Cycle highlights [1][2]! They need to be added to deliverable yamls so that they can be included in release marketing preparations. (See the details about how to add them at the project team guide [3].) [1]https://releases.openstack.org/zed/schedule.html [2]https://releases.openstack.org/zed/schedule.html#z-cycle-highlights [3] https://docs.openstack.org/project-team-guide/release-management.html#cycle-highlights Thanks, El?d Ill?s irc: elodilles -------------- next part -------------- An HTML attachment was scrubbed... URL: From ykarel at redhat.com Tue Aug 30 15:59:39 2022 From: ykarel at redhat.com (Yatin Karel) Date: Tue, 30 Aug 2022 21:29:39 +0530 Subject: [election][tripleo] PTL Candidacy for Antelope cycle In-Reply-To: References: Message-ID: On Tue, Aug 30, 2022 at 8:07 PM Rabi Mishra wrote: > Hi All, > > I would like to nominate myself for Antelope cycle TripleO PTL. > > As some of you would know, I have been part of the OpenStack community > for a long time and have been a core contributor for TripleO and Heat. > I have also served as Heat PTL for Ocata cycle. > > James has done a great job as PTL for the last few cycles. I would like > to take the opportunity to thank him for all his effort and we all would > agree that he needs a well deserved break. > > I am looking forward to take the opportunity to help the community to > achieve some of the already planned goals and in-progress workstreams > like standalone roles, multi-rhel and other challenges that come along > the way. > > Also, as before, our focus would continue to be on review prioritization, > in progress work streams and collaboration on common priorities. > > +2 Thanks Rabi for stepping up !! > Regards, > Rabi Mishra > > Regards Yatin Karel -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Tue Aug 30 16:28:21 2022 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 30 Aug 2022 11:28:21 -0500 Subject: PTG October 2022 Teams List Message-ID: Hello Everyone! The October 2022 Project Teams List is official! Projects + Teams: - Computing Force Network (CFN) Working Group - Diversity& Inclusion Working Group - Environmental Sustainability Working Group - Kata Containers - Magma - OpenInfra Edge Computing Group - OpenStack Teams - - Ansible OpenStack Modules - Barbican - Blazar - Cinder - CloudKitty - Designate - First Contact SIG - Glance - Heat - Horizon - Interop WG - Ironic - Keystone - Kolla - Kuryr - Manila - Neutron - Nova/Placement - Octavia - OpenStack Charms - OpenStack Helm - OpenStack Operators - OpenStack SDK/CLI - OpenStack Security SIG - OpenStack Technical Committee - Openstack-Ansible - OpenStack-Helm - QA - Release Management - Swift - Tacker - Telemetry - TripleO - StarlingX If your team was planning to meet and isn?t in this list, please contact ptg at openinfra.dev IMMEDIATELY. Soon I will be contacting moderators to sign up for time via the PTGBot[1] once I have it configured for the teams that are signed up to attend the event. Otherwise, please don't forget to register[2] ? its FREE! As usual, feel free to let us know if you have any questions. Thanks! -Kendall (diablo_rojo) [1] PTGBot Docs: https://github.com/openstack/ptgbot#open-infrastructure-ptg-bot [2] PTG Registration: https://openinfra-ptg.eventbrite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Tue Aug 30 17:53:44 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Tue, 30 Aug 2022 19:53:44 +0200 Subject: [election][Release_Management] Proposal to change to Distributed PTL model Message-ID: <9a83562b-4843-2282-4b57-7b7e1ff5bc85@est.tech> Hi, This is my 2nd cycle (Yoga + Zed) as PTL of the Release Management team and this is the time when usually a PTL role is handed over. We had a discussion in our team and decided that now that OpenStack moves from Zed to Antelope and the cycles starts again from the beginning of the alphabet + the new naming is going to be introduced (2023.1 Antelope) this is a good time to change the leadership model of the team to a group leadership model (or DPTL; distributed project team leadership). @TC: so this mail is to inform the TC that we want to move to this direction & let us know if there is something we need to sort out for the model change. Thanks, El?d Ill?s irc: elodilles From knikolla at bu.edu Tue Aug 30 18:44:50 2022 From: knikolla at bu.edu (Nikolla, Kristi) Date: Tue, 30 Aug 2022 18:44:50 +0000 Subject: [election][Release_Management] Proposal to change to Distributed PTL model In-Reply-To: <9a83562b-4843-2282-4b57-7b7e1ff5bc85@est.tech> References: <9a83562b-4843-2282-4b57-7b7e1ff5bc85@est.tech> Message-ID: <849F92CD-4F4E-420A-ABF0-A5A824A1E5C4@bu.edu> Hi El?d, Thanks for serving as the Release Management team PTL for 2 cycles. To switch the team to a DPL model, please submit a patch to openstack/governance. For an example, see [0]. You will need to define 3 liasons as described in [1]. The TC will then review and approve the patch to finalize the process. Thank you, Kristi Nikolla 0. https://review.opendev.org/c/openstack/governance/+/784102 1. https://governance.openstack.org/tc/resolutions/20200803-distributed-project-leadership.html#required-roles > Il giorno 30 ago 2022, alle ore 1:53 PM, El?d Ill?s ha scritto: > > Hi, > > This is my 2nd cycle (Yoga + Zed) as PTL of the Release Management team and this is the time when usually a PTL role is handed over. We had a discussion in our team and decided that now that OpenStack moves from Zed to Antelope and the cycles starts again from the beginning of the alphabet + the new naming is going to be introduced (2023.1 Antelope) this is a good time to change the leadership model of the team to a group leadership model (or DPTL; distributed project team leadership). > > @TC: so this mail is to inform the TC that we want to move to this direction & let us know if there is something we need to sort out for the model change. > > Thanks, > > El?d Ill?s > irc: elodilles > From rdhasman at redhat.com Tue Aug 30 19:22:46 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Wed, 31 Aug 2022 00:52:46 +0530 Subject: [election][cinder] PTL Candidacy for Antelope Message-ID: Hello everyone, I would like to announce my candidacy for PTL of the Cinder project for the Antelope cycle. I served as the PTL for Cinder during the Zed development cycle and targeted some of the areas which were lacking attention and worked on improving them like: 1. Review Quality Proposing the Efficient Review Guidelines[1] to help new contributors improve the quality of reviews. 2. Security Vulnerabilities Triaged the open security issues during the first mid cycle[2]. There are still areas which could be improved upon like 3rd Party Compliance checks and Cinder driver documentation which are good candidates of work items for the next development cycle and I look forward to actively work on them. Apart from that, there were deadlines which could have been better handled which I would like to incorporate in my planning skills so we get things merged in time and we won't have to extend deadlines frequently. Overall, I feel it was a good learning experience being PTL for the first time and I'm thankful for all the support I've had from the team, not to mention the advice and help from Brian while making critical decisions, and Hopefully I will take the learnings to be a better PTL this time. [1] https://docs.openstack.org/cinder/latest/contributor/gerrit.html#efficient-review-guidelines [2] https://etherpad.opendev.org/p/cinder-zed-midcycles#L136 Rajat Dhasmana (whoami-rajat) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gouthampravi at gmail.com Tue Aug 30 19:29:51 2022 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Tue, 30 Aug 2022 11:29:51 -0800 Subject: [election][puppet] PTL Candidacy for Antelope cycle In-Reply-To: <06ECDC28-F97C-41A4-ADCC-5AF391D494DE@binero.com> References: <06ECDC28-F97C-41A4-ADCC-5AF391D494DE@binero.com> Message-ID: On Tue, Aug 30, 2022 at 2:09 AM Tobias Urdin wrote: > Hello Takashi, > > Thank for all your work and effort into this! I would love to see you run > for > another cycle. > +1000 Thanks Takashi! > > Best regards > Tobias > > > On 30 Aug 2022, at 08:38, Takashi Kajinami wrote: > > > > Hello, > > > > > > I'd like to announce my candidacy for the PTL role in Puppet OpenStack, > to > > continue my PTL role for the Antelope cycle. > > > > Over the past two cycles, we've successfully improved feature coverage, > > platform coverage and simplicity of our modules. I'd like to list up a > few > > items which would be our priorities during the next cycle. > > > > * Add Ubuntu 22.04 support > > This would be the next major change after we've completed implementation > of > > CentOS 9 Stream support. We already adapted to Ruby 3 as part of C9S > support > > so I'm not aware of any huge challenges at this moment. > > > > * Complete migration to Puppet 7 > > Currently our modules still support both Puppet 6 and 7. However once we > > complete migration to Ubuntu 22.04, we complete migration from Ruby 2.x > to 3.x. > > As Puppet officially supports Ruby 3.x since 7.7, this means we no longer > > maintain test coverage with Puppet 6. It's time to consider again > complete > > migration to Puppet 7. > > > > * Improve scenario/component coverage by CI > > During Zed cycle we added OVN and Octavia to the integration jobs. We'll > review > > a few remaining modules like Manila and will continue extending the > component > > coverage. > > > > * Review unmaintained/unused modules > > In Puppet OpenStack projects we maintain number of modules to support > multiple > > OpenStack components. However, some modules have not been really active > and > > attracted no interest. In the past few cycles we have retired several > modules > > but I'd like to continue reviewing our modules to consider retiring > inactive ones. > > > > * Keep each module up to date and simple > > It's always important that we add support for the new features/parameters > > timely so that users can leverage the new capability via our modules. > > > > > > Thank you for your consideration. > > > > Thank you, > > Takashi > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Tue Aug 30 20:00:38 2022 From: pierre at stackhpc.com (Pierre Riteau) Date: Tue, 30 Aug 2022 22:00:38 +0200 Subject: [blazar][election] PTL candidacy for Antelope cycle Message-ID: Hi, I would like to self-nominate for the role of PTL of Blazar for the Antelope release cycle. I have been PTL since the Stein cycle and I am willing to continue in this role. Though we had a quiet Zed cycle, we still have various improvements in the pipeline and I want to help the community deliver them to make Blazar more useful. Thank you for your support, Pierre Riteau (priteau) -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Tue Aug 30 20:14:37 2022 From: satish.txt at gmail.com (Satish Patel) Date: Tue, 30 Aug 2022 16:14:37 -0400 Subject: [ovn-bgp-agent][neutron] - expose_tenant_networks bug In-Reply-To: References: <693D46D4-3DD7-4B93-BC90-571FEC2B6F4C@gmail.com> Message-ID: Hi Luis, I have redeploy my lab and i have following components rack-1-host-1 - controller rack-1-host-2 - compute1 rack-2-host-1 - compute2 # I am running ovn-bgp-agent on only two compute nodes compute1 and compute2 [DEFAULT] debug=False expose_tenant_networks=True driver=ovn_bgp_driver reconcile_interval=120 ovsdb_connection=unix:/var/run/openvswitch/db.sock ### without any VM at present i can see only router gateway IP on rack1-host-2 vagrant at rack-1-host-2:~$ ip a show ovn 37: ovn: mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000 link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff inet 172.16.1.144/32 scope global ovn valid_lft forever preferred_lft forever inet6 fe80::8f7:6eff:fee0:1969/64 scope link valid_lft forever preferred_lft forever vagrant at rack-2-host-1:~$ ip a show ovn 15: ovn: mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000 link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff inet6 fe80::5461:6bff:fe29:ac29/64 scope link valid_lft forever preferred_lft forever ### Lets create vm1 which is endup on rack1-host-2 but it didn't expose vm1 ip (tenant ip) same with rack-2-host-1 vagrant at rack-1-host-2:~$ ip a show ovn 37: ovn: mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000 link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff inet 172.16.1.144/32 scope global ovn valid_lft forever preferred_lft forever inet6 fe80::8f7:6eff:fee0:1969/64 scope link valid_lft forever preferred_lft forever vagrant at rack-2-host-1:~$ ip a show ovn 15: ovn: mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000 link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff inet6 fe80::5461:6bff:fe29:ac29/64 scope link valid_lft forever preferred_lft forever ### Lets attach a floating ip to vm1 and see. now i can see 10.0.0.17 vm1 ip got expose on rack-1-host-2 same time nothing on rack-2-host-1 ( ofc because no vm running on it) vagrant at rack-1-host-2:~$ ip a show ovn 37: ovn: mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000 link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff inet 172.16.1.144/32 scope global ovn valid_lft forever preferred_lft forever inet 10.0.0.17/32 scope global ovn valid_lft forever preferred_lft forever inet 172.16.1.148/32 scope global ovn valid_lft forever preferred_lft forever inet6 fe80::8f7:6eff:fee0:1969/64 scope link valid_lft forever preferred_lft forever vagrant at rack-2-host-1:~$ ip a show ovn 15: ovn: mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000 link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff inet6 fe80::5461:6bff:fe29:ac29/64 scope link valid_lft forever preferred_lft forever #### Lets spin up vm2 which should end up on other compute node which is rack-2-host-1 ( no change yet.. vm2 ip wasn't exposed anywhere yet. ) vagrant at rack-1-host-2:~$ ip a show ovn 37: ovn: mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000 link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff inet 172.16.1.144/32 scope global ovn valid_lft forever preferred_lft forever inet 10.0.0.17/32 scope global ovn valid_lft forever preferred_lft forever inet 172.16.1.148/32 scope global ovn valid_lft forever preferred_lft forever inet6 fe80::8f7:6eff:fee0:1969/64 scope link valid_lft forever preferred_lft forever vagrant at rack-2-host-1:~$ ip a show ovn 15: ovn: mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000 link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff inet6 fe80::5461:6bff:fe29:ac29/64 scope link valid_lft forever preferred_lft forever #### Lets again attach floating ip to vm2 ( so far nothing changed, technically it should expose IP on rack-1-host-2 ) vagrant at rack-1-host-2:~$ ip a show ovn 37: ovn: mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000 link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff inet 172.16.1.144/32 scope global ovn valid_lft forever preferred_lft forever inet 10.0.0.17/32 scope global ovn valid_lft forever preferred_lft forever inet 172.16.1.148/32 scope global ovn valid_lft forever preferred_lft forever inet6 fe80::8f7:6eff:fee0:1969/64 scope link valid_lft forever preferred_lft forever vagrant at rack-2-host-1:~$ ip a show ovn 15: ovn: mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000 link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff inet 172.16.1.143/32 scope global ovn valid_lft forever preferred_lft forever inet6 fe80::5461:6bff:fe29:ac29/64 scope link valid_lft forever preferred_lft forever Here is the logs - https://paste.opendev.org/show/bRThivJE4wvEN92DXJUo/ On Thu, Aug 25, 2022 at 6:25 AM Luis Tomas Bolivar wrote: > > > On Thu, Aug 25, 2022 at 11:31 AM Satish Patel > wrote: > >> Hi Luis, >> >> Very interesting, you are saying it will only expose tenant ip on gateway >> port node? Even we have DVR setup in cluster correct? >> > > Almost. The path is the same as in a DVR setup without BGP (with the > difference you can reach the internal IP). In a DVR setup, when the VM is > in a tenant network, without a FIP, the traffic goes out through the cr-lrp > (ovn router gateway port), i.e., the node hosting that port which is > connecting the router where the subnet where the VM is to the provider > network. > > Note this is a limitation due to how ovn is used in openstack neutron, > where traffic needs to be injected into OVN overlay in the node holding the > cr-lrp. We are investigating possible ways to overcome this limitation and > expose the IP right away in the node hosting the VM. > > >> Does gateway node going to expose ip for all other compute nodes? >> > >> What if I have multiple gateway node? >> > > No, each router connected to the provider network will have its own ovn > router gateway port, and that can be allocated in any node which has > "enable-chassis-as-gw". What is true is that all VMs in a tenant networks > connected to the same router, will be exposed in the same location . > > >> Did you configure that flag on all node or just gateway node? >> > > I usually deploy with 3 controllers which are also my "networker" nodes, > so those are the ones having the enable-chassis-as-gw flag. > > >> >> Sent from my iPhone >> >> On Aug 25, 2022, at 4:14 AM, Luis Tomas Bolivar >> wrote: >> >> ? >> I tested it locally and it is exposing the IP properly in the node where >> the ovn router gateway port is allocated. Could you double check if that is >> the case in your setup too? >> >> On Wed, Aug 24, 2022 at 8:58 AM Luis Tomas Bolivar >> wrote: >> >>> >>> >>> On Tue, Aug 23, 2022 at 6:04 PM Satish Patel >>> wrote: >>> >>>> Folks, >>>> >>>> I am setting up ovn-bgp-agent lab in "BGP mode" and i found everything >>>> working great except expose tenant network >>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ >>>> >>>> Lab Summary: >>>> >>>> 1 controller node >>>> 3 compute node >>>> >>>> ovn-bgp-agent running on all compute node because i am using >>>> "enable_distributed_floating_ip=True" >>>> >>> >>>> ovn-bgp-agent config: >>>> >>>> [DEFAULT] >>>> debug=False >>>> expose_tenant_networks=True >>>> driver=ovn_bgp_driver >>>> reconcile_interval=120 >>>> ovsdb_connection=unix:/var/run/openvswitch/db.sock >>>> >>>> I am not seeing my vm on tenant ip getting exposed but when i attach >>>> FIP which gets exposed in loopback address. here is the full trace of debug >>>> logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/ >>>> >>> >>> It is not exposed in any node, right? Note when expose_tenant_network is >>> enabled, the traffic to the tenant VM is exposed in the node holding the >>> cr-lrp (ovn router gateway port) for the router connecting the tenant >>> network to the provider one. >>> >>> The FIP will be exposed in the node where the VM is. >>> >>> On the other hand, the error you see there should not happen, so I'll >>> investigate why that is and also double check if the expose_tenant_network >>> flag is broken somehow. >>> >> >>> Thanks! >>> >>> >>> -- >>> LUIS TOM?S BOL?VAR >>> Principal Software Engineer >>> Red Hat >>> Madrid, Spain >>> ltomasbo at redhat.com >>> >>> >> >> >> -- >> LUIS TOM?S BOL?VAR >> Principal Software Engineer >> Red Hat >> Madrid, Spain >> ltomasbo at redhat.com >> >> >> > > -- > LUIS TOM?S BOL?VAR > Principal Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Tue Aug 30 20:50:44 2022 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 30 Aug 2022 16:50:44 -0400 Subject: [cinder] propose to EOL cinderlib train and ussuri In-Reply-To: <0df831c8-bf55-4703-9dac-1fdf2dc5a6ae@gmail.com> References: <0df831c8-bf55-4703-9dac-1fdf2dc5a6ae@gmail.com> Message-ID: It's been two weeks and the silence has been deafening, so I'd like to ask the release team to merge the EOL patches at their earliest convenience, and then delete the train and ussuri cinderlib branches. cheers, brian On 8/17/22 3:38 PM, Brian Rosmaita wrote: > At last week's cinder project midcycle [0], the team discussed recent > fixes to keep the cinderlib CI functional in the oldest stable branches. > ?At this point, stable/train is running the bare minimum of CI to keep > the branch open, and stable/ussuri is running only one functional job in > excess of the bare minimum.? The only changes merged into either branch > since each was tagged -em have been non-functional, so we believe that > there is no demand for keeping the branches open. > > Thus, the following patches to EOL cinderlib train and ussuri have been > posted: > - https://review.opendev.org/c/openstack/releases/+/853534 > - https://review.opendev.org/c/openstack/releases/+/853535 > > This email serves as notice of the intent of the cinder project to EOL > the cinderlib train and ussuri branches.? If you have comments or > concerns, please reply to this email or leave a comment on the > appropriate patch. > > > [0] https://etherpad.opendev.org/p/cinder-zed-midcycles From ces.eduardo98 at gmail.com Tue Aug 30 21:04:05 2022 From: ces.eduardo98 at gmail.com (Carlos Silva) Date: Tue, 30 Aug 2022 18:04:05 -0300 Subject: [election][manila] PTL Candidacy for Antelope cycle Message-ID: Greetings, Zorillas and interested stackers, I would like to announce my candidacy to be the Manila PTL during the Antelope cycle. I have been the PTL for Manila since the Zed cycle, and have been contributing to OpenStack since the Stein release. It has been an awesome experience. Over the Zed cycle my focus was to continue mentoring and adding new core reviewers to the Manila repositories; Pursuing feature parity and complete support for manila in OSC (also increasing our functional tests coverage), tackling the tech debt; enhancing the documentation to help third party drivers to set up their CI systems; promoting events to gather the community members and getting bugs fixed and implementations moving faster. I am happy with the progress we made in those areas, but I still think there is room for improvement. During the Antelope cycle, I would like to focus on: - Continue our efforts to mentor contributors and increase the amount of active reviewers through engaging them in the community, teaching the OpenStack way and making it collaboratively as we did in the past cycles, promoting hackathons, bug squashes and collaborative review sessions. - Getting more features already available in the Manila core to Manila UI and get more attention to our changes to OpenStack SDK; - Continue pushing Manila to cover the tech debt areas we identified over the past cycles; - Enforcing maintainers to collaboratively cover the lack of documentation we have on third party CI setups for Manila, helping potential new vendors to quickly setup up their CI systems; Thank you for your consideration! Carlos da Silva IRC: carloss -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Wed Aug 31 00:14:57 2022 From: johnsomor at gmail.com (Michael Johnson) Date: Tue, 30 Aug 2022 17:14:57 -0700 Subject: [election][designate] PTL Candidacy for Antelope cycle Message-ID: Hello OpenStack community, I would like to announce my candidacy for PTL of Designate for the Antelope cycle. I would like to continue to support the Designate team for the Antelope release. As predicted, we have continued to focus on cleaning up technical debt and improving the test coverage. For a small team, we have accomplished a lot in the Zed release, though we could use more reviewers to help with the backlog of patches in need of review and comments. I expect the code cleanup will continue, but I hope we can also focus more on some of the proposed features that are in flight. Thank you for your support and your consideration for Antelope, Michael Johnson (johnsom) From ltomasbo at redhat.com Wed Aug 31 07:12:31 2022 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Wed, 31 Aug 2022 09:12:31 +0200 Subject: [ovn-bgp-agent][neutron] - expose_tenant_networks bug In-Reply-To: References: <693D46D4-3DD7-4B93-BC90-571FEC2B6F4C@gmail.com> Message-ID: See below On Tue, Aug 30, 2022 at 10:14 PM Satish Patel wrote: > Hi Luis, > > I have redeploy my lab and i have following components > > rack-1-host-1 - controller > rack-1-host-2 - compute1 > rack-2-host-1 - compute2 > > > # I am running ovn-bgp-agent on only two compute nodes compute1 and > compute2 > [DEFAULT] > debug=False > expose_tenant_networks=True > driver=ovn_bgp_driver > reconcile_interval=120 > ovsdb_connection=unix:/var/run/openvswitch/db.sock > > ### without any VM at present i can see only router gateway IP on > rack1-host-2 > Yep, this is what is expected at this point. > > vagrant at rack-1-host-2:~$ ip a show ovn > 37: ovn: mtu 1500 qdisc noqueue master > ovn-bgp-vrf state UNKNOWN group default qlen 1000 > link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff > inet 172.16.1.144/32 scope global ovn > valid_lft forever preferred_lft forever > inet6 fe80::8f7:6eff:fee0:1969/64 scope link > valid_lft forever preferred_lft forever > > > vagrant at rack-2-host-1:~$ ip a show ovn > 15: ovn: mtu 1500 qdisc noqueue master > ovn-bgp-vrf state UNKNOWN group default qlen 1000 > link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff > inet6 fe80::5461:6bff:fe29:ac29/64 scope link > valid_lft forever preferred_lft forever > > > ### Lets create vm1 which is endup on rack1-host-2 but it didn't expose > vm1 ip (tenant ip) same with rack-2-host-1 > > vagrant at rack-1-host-2:~$ ip a show ovn > 37: ovn: mtu 1500 qdisc noqueue master > ovn-bgp-vrf state UNKNOWN group default qlen 1000 > link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff > inet 172.16.1.144/32 scope global ovn > valid_lft forever preferred_lft forever > inet6 fe80::8f7:6eff:fee0:1969/64 scope link > valid_lft forever preferred_lft forever > It should be exposed here, what about the output of "ip rule" and "ip route show table br-ex"? > > vagrant at rack-2-host-1:~$ ip a show ovn > 15: ovn: mtu 1500 qdisc noqueue master > ovn-bgp-vrf state UNKNOWN group default qlen 1000 > link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff > inet6 fe80::5461:6bff:fe29:ac29/64 scope link > valid_lft forever preferred_lft forever > > > ### Lets attach a floating ip to vm1 and see. now i can see 10.0.0.17 vm1 > ip got expose on rack-1-host-2 same time nothing on rack-2-host-1 ( ofc > because no vm running on it) > > vagrant at rack-1-host-2:~$ ip a show ovn > 37: ovn: mtu 1500 qdisc noqueue master > ovn-bgp-vrf state UNKNOWN group default qlen 1000 > link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff > inet 172.16.1.144/32 scope global ovn > valid_lft forever preferred_lft forever > inet 10.0.0.17/32 scope global ovn > valid_lft forever preferred_lft forever > inet 172.16.1.148/32 scope global ovn > valid_lft forever preferred_lft forever > inet6 fe80::8f7:6eff:fee0:1969/64 scope link > valid_lft forever preferred_lft forever > There is also a resync action happening every 120 seconds... Perhaps for some reason the initial addition of 10.0.0.17 failed and then the sync discovered it and added it (and it matched with the time you added the FIP more or less). But events are managed one by one and those 2 are different, so adding the FIP is not adding the internal IP. It was probably a sync action. > > vagrant at rack-2-host-1:~$ ip a show ovn > 15: ovn: mtu 1500 qdisc noqueue master > ovn-bgp-vrf state UNKNOWN group default qlen 1000 > link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff > inet6 fe80::5461:6bff:fe29:ac29/64 scope link > valid_lft forever preferred_lft forever > > > #### Lets spin up vm2 which should end up on other compute node which is > rack-2-host-1 ( no change yet.. vm2 ip wasn't exposed anywhere yet. ) > > vagrant at rack-1-host-2:~$ ip a show ovn > 37: ovn: mtu 1500 qdisc noqueue master > ovn-bgp-vrf state UNKNOWN group default qlen 1000 > link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff > inet 172.16.1.144/32 scope global ovn > valid_lft forever preferred_lft forever > inet 10.0.0.17/32 scope global ovn > valid_lft forever preferred_lft forever > inet 172.16.1.148/32 scope global ovn > valid_lft forever preferred_lft forever > inet6 fe80::8f7:6eff:fee0:1969/64 scope link > valid_lft forever preferred_lft forever > > > vagrant at rack-2-host-1:~$ ip a show ovn > 15: ovn: mtu 1500 qdisc noqueue master > ovn-bgp-vrf state UNKNOWN group default qlen 1000 > link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff > inet6 fe80::5461:6bff:fe29:ac29/64 scope link > valid_lft forever preferred_lft forever > > > #### Lets again attach floating ip to vm2 ( so far nothing changed, > technically it should expose IP on rack-1-host-2 ) > > vagrant at rack-1-host-2:~$ ip a show ovn > 37: ovn: mtu 1500 qdisc noqueue master > ovn-bgp-vrf state UNKNOWN group default qlen 1000 > link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff > inet 172.16.1.144/32 scope global ovn > valid_lft forever preferred_lft forever > inet 10.0.0.17/32 scope global ovn > valid_lft forever preferred_lft forever > inet 172.16.1.148/32 scope global ovn > valid_lft forever preferred_lft forever > inet6 fe80::8f7:6eff:fee0:1969/64 scope link > valid_lft forever preferred_lft forever > > The IP of the second VM should be exposed here ^, in rack-1-host-2, while > the FIP in the other compute (rack-2-host-1) > > vagrant at rack-2-host-1:~$ ip a show ovn > 15: ovn: mtu 1500 qdisc noqueue master > ovn-bgp-vrf state UNKNOWN group default qlen 1000 > link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff > inet 172.16.1.143/32 scope global ovn > valid_lft forever preferred_lft forever > inet6 fe80::5461:6bff:fe29:ac29/64 scope link > valid_lft forever preferred_lft forever > > > Here is the logs - https://paste.opendev.org/show/bRThivJE4wvEN92DXJUo/ > What node these logs belong to? rack-1-host-2? And are you running with the latest code? Looks the problem is on the sync function when trying to ensure the routing table entry for br-ex. It prints this: 2022-08-30 20:12:54.541 8318 DEBUG ovn_bgp_agent.utils.linux_net [-] Found routing table for br-ex with: ['200', 'br-ex'] So definitely ovn_routing_tables should be initialized with {'br-ex': 200}, so I don't really get where the KeyError comes from... Unless it is not accessing the dict, but accessing the ndb.routes... perhaps with the pyroute2 version you have, the family parameter is needed there. Let me send a patch that you can try with > On Thu, Aug 25, 2022 at 6:25 AM Luis Tomas Bolivar > wrote: > >> >> >> On Thu, Aug 25, 2022 at 11:31 AM Satish Patel >> wrote: >> >>> Hi Luis, >>> >>> Very interesting, you are saying it will only expose tenant ip on >>> gateway port node? Even we have DVR setup in cluster correct? >>> >> >> Almost. The path is the same as in a DVR setup without BGP (with the >> difference you can reach the internal IP). In a DVR setup, when the VM is >> in a tenant network, without a FIP, the traffic goes out through the cr-lrp >> (ovn router gateway port), i.e., the node hosting that port which is >> connecting the router where the subnet where the VM is to the provider >> network. >> >> Note this is a limitation due to how ovn is used in openstack neutron, >> where traffic needs to be injected into OVN overlay in the node holding the >> cr-lrp. We are investigating possible ways to overcome this limitation and >> expose the IP right away in the node hosting the VM. >> >> >>> Does gateway node going to expose ip for all other compute nodes? >>> >> >>> What if I have multiple gateway node? >>> >> >> No, each router connected to the provider network will have its own ovn >> router gateway port, and that can be allocated in any node which has >> "enable-chassis-as-gw". What is true is that all VMs in a tenant networks >> connected to the same router, will be exposed in the same location . >> >> >>> Did you configure that flag on all node or just gateway node? >>> >> >> I usually deploy with 3 controllers which are also my "networker" nodes, >> so those are the ones having the enable-chassis-as-gw flag. >> >> >>> >>> Sent from my iPhone >>> >>> On Aug 25, 2022, at 4:14 AM, Luis Tomas Bolivar >>> wrote: >>> >>> ? >>> I tested it locally and it is exposing the IP properly in the node where >>> the ovn router gateway port is allocated. Could you double check if that is >>> the case in your setup too? >>> >>> On Wed, Aug 24, 2022 at 8:58 AM Luis Tomas Bolivar >>> wrote: >>> >>>> >>>> >>>> On Tue, Aug 23, 2022 at 6:04 PM Satish Patel >>>> wrote: >>>> >>>>> Folks, >>>>> >>>>> I am setting up ovn-bgp-agent lab in "BGP mode" and i found everything >>>>> working great except expose tenant network >>>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ >>>>> >>>>> >>>>> Lab Summary: >>>>> >>>>> 1 controller node >>>>> 3 compute node >>>>> >>>>> ovn-bgp-agent running on all compute node because i am using >>>>> "enable_distributed_floating_ip=True" >>>>> >>>> >>>>> ovn-bgp-agent config: >>>>> >>>>> [DEFAULT] >>>>> debug=False >>>>> expose_tenant_networks=True >>>>> driver=ovn_bgp_driver >>>>> reconcile_interval=120 >>>>> ovsdb_connection=unix:/var/run/openvswitch/db.sock >>>>> >>>>> I am not seeing my vm on tenant ip getting exposed but when i attach >>>>> FIP which gets exposed in loopback address. here is the full trace of debug >>>>> logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/ >>>>> >>>> >>>> It is not exposed in any node, right? Note when expose_tenant_network >>>> is enabled, the traffic to the tenant VM is exposed in the node holding the >>>> cr-lrp (ovn router gateway port) for the router connecting the tenant >>>> network to the provider one. >>>> >>>> The FIP will be exposed in the node where the VM is. >>>> >>>> On the other hand, the error you see there should not happen, so I'll >>>> investigate why that is and also double check if the expose_tenant_network >>>> flag is broken somehow. >>>> >>> >>>> Thanks! >>>> >>>> >>>> -- >>>> LUIS TOM?S BOL?VAR >>>> Principal Software Engineer >>>> Red Hat >>>> Madrid, Spain >>>> ltomasbo at redhat.com >>>> >>>> >>> >>> >>> -- >>> LUIS TOM?S BOL?VAR >>> Principal Software Engineer >>> Red Hat >>> Madrid, Spain >>> ltomasbo at redhat.com >>> >>> >>> >> >> -- >> LUIS TOM?S BOL?VAR >> Principal Software Engineer >> Red Hat >> Madrid, Spain >> ltomasbo at redhat.com >> >> > -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltomasbo at redhat.com Wed Aug 31 07:51:22 2022 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Wed, 31 Aug 2022 09:51:22 +0200 Subject: [ovn-bgp-agent][neutron] - expose_tenant_networks bug In-Reply-To: References: <693D46D4-3DD7-4B93-BC90-571FEC2B6F4C@gmail.com> Message-ID: On Wed, Aug 31, 2022 at 9:12 AM Luis Tomas Bolivar wrote: > See below > > > On Tue, Aug 30, 2022 at 10:14 PM Satish Patel > wrote: > >> Hi Luis, >> >> I have redeploy my lab and i have following components >> >> rack-1-host-1 - controller >> rack-1-host-2 - compute1 >> rack-2-host-1 - compute2 >> >> >> # I am running ovn-bgp-agent on only two compute nodes compute1 and >> compute2 >> [DEFAULT] >> debug=False >> expose_tenant_networks=True >> driver=ovn_bgp_driver >> reconcile_interval=120 >> ovsdb_connection=unix:/var/run/openvswitch/db.sock >> >> ### without any VM at present i can see only router gateway IP on >> rack1-host-2 >> > > Yep, this is what is expected at this point. > > >> >> vagrant at rack-1-host-2:~$ ip a show ovn >> 37: ovn: mtu 1500 qdisc noqueue master >> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >> link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff >> inet 172.16.1.144/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet6 fe80::8f7:6eff:fee0:1969/64 scope link >> valid_lft forever preferred_lft forever >> >> >> vagrant at rack-2-host-1:~$ ip a show ovn >> 15: ovn: mtu 1500 qdisc noqueue master >> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >> link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff >> inet6 fe80::5461:6bff:fe29:ac29/64 scope link >> valid_lft forever preferred_lft forever >> >> >> ### Lets create vm1 which is endup on rack1-host-2 but it didn't expose >> vm1 ip (tenant ip) same with rack-2-host-1 >> >> vagrant at rack-1-host-2:~$ ip a show ovn >> 37: ovn: mtu 1500 qdisc noqueue master >> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >> link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff >> inet 172.16.1.144/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet6 fe80::8f7:6eff:fee0:1969/64 scope link >> valid_lft forever preferred_lft forever >> > > It should be exposed here, what about the output of "ip rule" and "ip > route show table br-ex"? > > >> >> vagrant at rack-2-host-1:~$ ip a show ovn >> 15: ovn: mtu 1500 qdisc noqueue master >> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >> link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff >> inet6 fe80::5461:6bff:fe29:ac29/64 scope link >> valid_lft forever preferred_lft forever >> >> >> ### Lets attach a floating ip to vm1 and see. now i can see 10.0.0.17 vm1 >> ip got expose on rack-1-host-2 same time nothing on rack-2-host-1 ( ofc >> because no vm running on it) >> >> vagrant at rack-1-host-2:~$ ip a show ovn >> 37: ovn: mtu 1500 qdisc noqueue master >> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >> link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff >> inet 172.16.1.144/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet 10.0.0.17/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet 172.16.1.148/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet6 fe80::8f7:6eff:fee0:1969/64 scope link >> valid_lft forever preferred_lft forever >> > > There is also a resync action happening every 120 seconds... Perhaps for > some reason the initial addition of 10.0.0.17 failed and then the sync > discovered it and added it (and it matched with the time you added the FIP > more or less). > > But events are managed one by one and those 2 are different, so adding the > FIP is not adding the internal IP. It was probably a sync action. > > >> >> vagrant at rack-2-host-1:~$ ip a show ovn >> 15: ovn: mtu 1500 qdisc noqueue master >> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >> link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff >> inet6 fe80::5461:6bff:fe29:ac29/64 scope link >> valid_lft forever preferred_lft forever >> >> >> #### Lets spin up vm2 which should end up on other compute node which is >> rack-2-host-1 ( no change yet.. vm2 ip wasn't exposed anywhere yet. ) >> >> vagrant at rack-1-host-2:~$ ip a show ovn >> 37: ovn: mtu 1500 qdisc noqueue master >> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >> link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff >> inet 172.16.1.144/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet 10.0.0.17/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet 172.16.1.148/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet6 fe80::8f7:6eff:fee0:1969/64 scope link >> valid_lft forever preferred_lft forever >> >> >> vagrant at rack-2-host-1:~$ ip a show ovn >> 15: ovn: mtu 1500 qdisc noqueue master >> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >> link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff >> inet6 fe80::5461:6bff:fe29:ac29/64 scope link >> valid_lft forever preferred_lft forever >> >> >> #### Lets again attach floating ip to vm2 ( so far nothing changed, >> technically it should expose IP on rack-1-host-2 ) >> >> vagrant at rack-1-host-2:~$ ip a show ovn >> 37: ovn: mtu 1500 qdisc noqueue master >> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >> link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff >> inet 172.16.1.144/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet 10.0.0.17/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet 172.16.1.148/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet6 fe80::8f7:6eff:fee0:1969/64 scope link >> valid_lft forever preferred_lft forever >> >> The IP of the second VM should be exposed here ^, in rack-1-host-2, while >> the FIP in the other compute (rack-2-host-1) >> > > >> vagrant at rack-2-host-1:~$ ip a show ovn >> 15: ovn: mtu 1500 qdisc noqueue master >> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >> link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff >> inet 172.16.1.143/32 scope global ovn >> valid_lft forever preferred_lft forever >> inet6 fe80::5461:6bff:fe29:ac29/64 scope link >> valid_lft forever preferred_lft forever >> >> >> Here is the logs - https://paste.opendev.org/show/bRThivJE4wvEN92DXJUo/ >> > > What node these logs belong to? rack-1-host-2? > > And are you running with the latest code? Looks the problem is on the sync > function when trying to ensure the routing table entry for br-ex. It prints > this: > > 2022-08-30 20:12:54.541 8318 DEBUG ovn_bgp_agent.utils.linux_net [-] Found routing table for br-ex with: ['200', 'br-ex'] > > So definitely ovn_routing_tables should be initialized with {'br-ex': > 200}, so I don't really get where the KeyError comes from... > > Unless it is not accessing the dict, but accessing the ndb.routes... > perhaps with the pyroute2 version you have, the family parameter is needed > there. Let me send a patch that you can try with > This is the patch https://review.opendev.org/c/x/ovn-bgp-agent/+/855062. Give it a try and let me know if the error you are seeing in the logs goes away with it > >> On Thu, Aug 25, 2022 at 6:25 AM Luis Tomas Bolivar >> wrote: >> >>> >>> >>> On Thu, Aug 25, 2022 at 11:31 AM Satish Patel >>> wrote: >>> >>>> Hi Luis, >>>> >>>> Very interesting, you are saying it will only expose tenant ip on >>>> gateway port node? Even we have DVR setup in cluster correct? >>>> >>> >>> Almost. The path is the same as in a DVR setup without BGP (with the >>> difference you can reach the internal IP). In a DVR setup, when the VM is >>> in a tenant network, without a FIP, the traffic goes out through the cr-lrp >>> (ovn router gateway port), i.e., the node hosting that port which is >>> connecting the router where the subnet where the VM is to the provider >>> network. >>> >>> Note this is a limitation due to how ovn is used in openstack neutron, >>> where traffic needs to be injected into OVN overlay in the node holding the >>> cr-lrp. We are investigating possible ways to overcome this limitation and >>> expose the IP right away in the node hosting the VM. >>> >>> >>>> Does gateway node going to expose ip for all other compute nodes? >>>> >>> >>>> What if I have multiple gateway node? >>>> >>> >>> No, each router connected to the provider network will have its own ovn >>> router gateway port, and that can be allocated in any node which has >>> "enable-chassis-as-gw". What is true is that all VMs in a tenant networks >>> connected to the same router, will be exposed in the same location . >>> >>> >>>> Did you configure that flag on all node or just gateway node? >>>> >>> >>> I usually deploy with 3 controllers which are also my "networker" nodes, >>> so those are the ones having the enable-chassis-as-gw flag. >>> >>> >>>> >>>> Sent from my iPhone >>>> >>>> On Aug 25, 2022, at 4:14 AM, Luis Tomas Bolivar >>>> wrote: >>>> >>>> ? >>>> I tested it locally and it is exposing the IP properly in the node >>>> where the ovn router gateway port is allocated. Could you double check if >>>> that is the case in your setup too? >>>> >>>> On Wed, Aug 24, 2022 at 8:58 AM Luis Tomas Bolivar >>>> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Aug 23, 2022 at 6:04 PM Satish Patel >>>>> wrote: >>>>> >>>>>> Folks, >>>>>> >>>>>> I am setting up ovn-bgp-agent lab in "BGP mode" and i found >>>>>> everything working great except expose tenant network >>>>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ >>>>>> >>>>>> >>>>>> Lab Summary: >>>>>> >>>>>> 1 controller node >>>>>> 3 compute node >>>>>> >>>>>> ovn-bgp-agent running on all compute node because i am using >>>>>> "enable_distributed_floating_ip=True" >>>>>> >>>>> >>>>>> ovn-bgp-agent config: >>>>>> >>>>>> [DEFAULT] >>>>>> debug=False >>>>>> expose_tenant_networks=True >>>>>> driver=ovn_bgp_driver >>>>>> reconcile_interval=120 >>>>>> ovsdb_connection=unix:/var/run/openvswitch/db.sock >>>>>> >>>>>> I am not seeing my vm on tenant ip getting exposed but when i attach >>>>>> FIP which gets exposed in loopback address. here is the full trace of debug >>>>>> logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/ >>>>>> >>>>> >>>>> It is not exposed in any node, right? Note when expose_tenant_network >>>>> is enabled, the traffic to the tenant VM is exposed in the node holding the >>>>> cr-lrp (ovn router gateway port) for the router connecting the tenant >>>>> network to the provider one. >>>>> >>>>> The FIP will be exposed in the node where the VM is. >>>>> >>>>> On the other hand, the error you see there should not happen, so I'll >>>>> investigate why that is and also double check if the expose_tenant_network >>>>> flag is broken somehow. >>>>> >>>> >>>>> Thanks! >>>>> >>>>> >>>>> -- >>>>> LUIS TOM?S BOL?VAR >>>>> Principal Software Engineer >>>>> Red Hat >>>>> Madrid, Spain >>>>> ltomasbo at redhat.com >>>>> >>>>> >>>> >>>> >>>> -- >>>> LUIS TOM?S BOL?VAR >>>> Principal Software Engineer >>>> Red Hat >>>> Madrid, Spain >>>> ltomasbo at redhat.com >>>> >>>> >>>> >>> >>> -- >>> LUIS TOM?S BOL?VAR >>> Principal Software Engineer >>> Red Hat >>> Madrid, Spain >>> ltomasbo at redhat.com >>> >>> >> > > -- > LUIS TOM?S BOL?VAR > Principal Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > -- LUIS TOM?S BOL?VAR Principal Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.rosser at rd.bbc.co.uk Wed Aug 31 08:28:39 2022 From: jonathan.rosser at rd.bbc.co.uk (Jonathan Rosser) Date: Wed, 31 Aug 2022 09:28:39 +0100 Subject: [openstack-ansible][ceph][yoga] wait for all osd to be up In-Reply-To: <6873C352-1364-45AD-9346-E87F1DAF177D@spots.edu> References: <7275FB31-C4F3-4356-8C66-87F3757637BF@spots.edu> <6873C352-1364-45AD-9346-E87F1DAF177D@spots.edu> Message-ID: <4de6c9c2-7f22-5612-2b5b-d8ec5f1147f9@rd.bbc.co.uk> For deploying ceph, Openstack-Ansible is just a thin wrapper around ceph-ansible (see https://docs.ceph.com/projects/ceph-ansible/en/latest/index.html). You have to define the variables that ceph-ansible requires. We have a test scenario for Openstack-Ansible + Ceph, which uses the following variables https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/templates/user_variables_ceph.yml.j2. Most of those are used in the ceph-ansible roles, not Openstack-Ansible directly. For the purposes of that test case LVM loopback devices are set up and a suitable ceph.conf is written out here https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/tasks/prepare_ceph.yml If you wish to have Openstack-Ansible call the ceph-ansible roles for you to deploy ceph then you must take the time to understand ceph-ansible sufficiently to set the variables it requires to deploy correctly in your situation. Openstack-Ansible does not manage this for you. It is also possible to independently deploy ceph using whatever means you like outside of openstack-ansible, and pass a very small amount of data to provide an integration between the two. Those options are described briefly here https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html and https://docs.openstack.org/openstack-ansible-ceph_client/latest/configure-ceph.html Jonathan. On 23/08/2022 00:53, Father Vlasie wrote: > I have done a bit more searching?the error is related to the _reporting_ on the OSDs. I tried to get some info from journalctl while the infrasrtucture playbook was running and all I could see was this: > > Aug 22 22:11:31 compute3 python3[57496]: ansible-ceph_volume Invoked with cluster=ceph action=list objectstore=bluestore dmcrypt=False batch_devices=[] osds_per_device=1 journal_size=5120 journal_devices=[] block_db_size=-1 block_db_devices=[] wal_devices=[] report=False destroy=True data=None data_vg=None journal=None journal_vg=None db=None db_vg=None wal=None wal_vg=None crush_device_class=None osd_fsid=None osd_id=None > Aug 22 22:12:01 compute3 audit[57503]: USER_ACCT pid=57503 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:accounting grantors=pam_permit acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success' > Aug 22 22:12:01 compute3 audit[57503]: CRED_ACQ pid=57503 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_permit,pam_cap acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success' > Aug 22 22:12:01 compute3 audit[57503]: SYSCALL arch=c000003e syscall=1 success=yes exit=1 a0=7 a1=7ffe656d1100 a2=1 a3=7fe9c3d53371 items=0 ppid=1725 pid=57503 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=1445 comm="cron" exe="/usr/sbin/cron" key=(null) > Aug 22 22:12:01 compute3 audit: PROCTITLE proctitle=2F7573722F7362696E2F43524F4E002D66 > Aug 22 22:12:01 compute3 CRON[57503]: pam_unix(cron:session): session opened for user root by (uid=0) > > The only thing that stands out to me is that there are no devices listed but in all of the openstack-ansible ceph documentation devices are never mentioned so I assume they are being detected automatically, is that right? > > Thank you, > > FV > >> On Aug 22, 2022, at 1:08 PM, Father Vlasie wrote: >> >> >> Hello everyone, >> >> I am running setup-infrastucture.yml. I have followed the ceph production example here: https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html >> >> I have set things up so the compute and storage nodes are the same machine (hyperconverged). And the storage devices are devoid of any volumes or partitions. >> >> I see the following error: >> >> ------ >> >> FAILED - RETRYING: [compute3 -> infra1_ceph-mon_container-0d679d8d]: wait for all osd to be up (1 retries left). >> fatal: [compute3 -> infra1_ceph-mon_container-0d679d8d(192.168.3.145)]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["ceph", "--cluster", "ceph", "osd", "stat", "-f", "json"], "delta": "0:00:00.223291", "end": "2022-08-22 19:36:29.473358", "msg": "", "rc": 0, "start": "2022-08-22 19:36:29.250067", "stderr": "", "stderr_lines": [], "stdout": "\n{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}", "stdout_lines": ["", "{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}?]} >> >> ------ >> >> I am not sure where to look to find more information. Any help would be much appreciated! >> >> Thank you, >> >> FV > > From jonathan.rosser at rd.bbc.co.uk Wed Aug 31 09:33:08 2022 From: jonathan.rosser at rd.bbc.co.uk (Jonathan Rosser) Date: Wed, 31 Aug 2022 10:33:08 +0100 Subject: [openstack-ansible] Converged compute ans ceph storage In-Reply-To: References: Message-ID: <6612497f-37a0-3b2d-cfbd-68b83bb4bfa5@rd.bbc.co.uk> Yes it is possible. You have two choices, either to deploy ceph yourself and integrate it with openstack-ansible by providing a reference to the ceph monitors, or you can have openstack-ansible act as a thin wrapper around ceph-ansible to deploy ceph at the same time as openstack. Having said this - I would not recommend doing a converged deployment like this other than for test purposes. It's slightly unclear if you want an entire deployment with ceph in a single node, if that is the case then you can look at the all-in-one deployment here https://docs.openstack.org/openstack-ansible/latest/user/aio/quickstart.html, configured with the environment variable SCENARIO=aio_lxc_ceph. This is strictly a test environment (exactly what we use for our CI tests) and not a production deployment. Jonathan. On 29/08/2022 07:23, fv at spots.edu wrote: > Hello everyone! > > Is it possible with OpenStack-Ansible to deploy converged nova compute > and ceph storage on a single node? > > Thank you! > > From fpantano at redhat.com Wed Aug 31 11:01:29 2022 From: fpantano at redhat.com (Francesco Pantano) Date: Wed, 31 Aug 2022 13:01:29 +0200 Subject: [election][tripleo] PTL Candidacy for Antelope cycle In-Reply-To: References: Message-ID: On Tue, Aug 30, 2022 at 4:39 PM Rabi Mishra wrote: > Hi All, > > I would like to nominate myself for Antelope cycle TripleO PTL. > > As some of you would know, I have been part of the OpenStack community > for a long time and have been a core contributor for TripleO and Heat. > I have also served as Heat PTL for Ocata cycle. > > James has done a great job as PTL for the last few cycles. I would like > to take the opportunity to thank him for all his effort and we all would > agree that he needs a well deserved break. > > I am looking forward to take the opportunity to help the community to > achieve some of the already planned goals and in-progress workstreams > like standalone roles, multi-rhel and other challenges that come along > the way. > > Also, as before, our focus would continue to be on review prioritization, > in progress work streams and collaboration on common priorities. > +1 thanks Rabi! > > Regards, > Rabi Mishra > > -- Francesco Pantano GPG KEY: F41BD75C -------------- next part -------------- An HTML attachment was scrubbed... URL: From viroel at gmail.com Wed Aug 31 11:26:05 2022 From: viroel at gmail.com (Douglas Viroel) Date: Wed, 31 Aug 2022 08:26:05 -0300 Subject: [election][manila] PTL Candidacy for Antelope cycle In-Reply-To: References: Message-ID: Thanks for your work in Zed cycle, Carlos. It will be great to have you as PTL for Antelope too. carloss++ On Tue, Aug 30, 2022 at 6:05 PM Carlos Silva wrote: > Greetings, Zorillas and interested stackers, > > I would like to announce my candidacy to be the Manila PTL during the > Antelope > cycle. > > I have been the PTL for Manila since the Zed cycle, and have been > contributing > to OpenStack since the Stein release. It has been an awesome experience. > > Over the Zed cycle my focus was to continue mentoring and adding new core > reviewers to the Manila repositories; Pursuing feature parity and complete > support for manila in OSC (also increasing our functional tests coverage), > tackling the tech debt; enhancing the documentation to help third party > drivers > to set up their CI systems; promoting events to gather the community > members > and getting bugs fixed and implementations moving faster. > > I am happy with the progress we made in those areas, but I still think > there is > room for improvement. During the Antelope cycle, I would like to focus on: > > - Continue our efforts to mentor contributors and increase the amount of > active > reviewers through engaging them in the community, teaching the OpenStack > way > and making it collaboratively as we did in the past cycles, promoting > hackathons, > bug squashes and collaborative review sessions. > > - Getting more features already available in the Manila core to Manila UI > and get > more attention to our changes to OpenStack SDK; > > - Continue pushing Manila to cover the tech debt areas we identified over > the > past cycles; > > - Enforcing maintainers to collaboratively cover the lack of documentation > we > have on third party CI setups for Manila, helping potential new vendors > to > quickly setup up their CI systems; > > Thank you for your consideration! > > Carlos da Silva > IRC: carloss > -- Douglas Viroel - dviroel -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Wed Aug 31 13:32:16 2022 From: amy at demarco.com (Amy Marrich) Date: Wed, 31 Aug 2022 08:32:16 -0500 Subject: [all][elections][ptl][tc] Combined PTL/TC antelope cycle Election Nominations Extended 7 days Message-ID: We have extended the nomination period 7 days and will overlap the TC campaigning period to accommodate the extension. Thank you everyone who has nominated themselves and for cutting down the list of leaderless projects to 24. Remember there are also 4 open seats for the TC this election cycle which the automated email does not include in this reminder. Thanks, Amy ---------------------------------------------------------------------------------------------- A quick reminder that we are in the last hours for declaring PTL and TC candidacies. Nominations are open until Sep 07, 2022 23:45 UTC. If you want to stand for election, don't delay, follow the instructions at [1] to make sure the community knows your intentions. Make sure your nomination has been submitted to the openstack/election repository and approved by election officials. Election statistics[2]: Nominations started @ 2022-08-24 23:45:00 UTC Nominations end @ 2022-09-07 23:45:00 UTC Nominations duration : 14 days, 0:00:00 Nominations remaining : 7 days, 10:19:33 Nominations progress : 46.93% --------------------------------------------------- Projects[1] : 52 Projects with candidates : 28 ( 53.85%) Projects with election : 0 ( 0.00%) --------------------------------------------------- Need election : 0 () Need appointment : 24 (Adjutant Barbican Cloudkitty Keystone Masakari Mistral Octavia OpenStackAnsible OpenStackSDK OpenStack_Charms Openstack_Chef Oslo Quality_Assurance Rally Release_Management Requirements Sahara Senlin Skyline Swift Trove Venus Zaqar Zun) =================================================== Stats gathered @ 2022-08-31 13:25:27 UTC This means that with approximately 7 days left, 24 projects will be deemed leaderless. In this case the TC will oversee PTL selection as described by [3]. Thank you, [1] https://governance.openstack.org/election/#how-to-submit-a-candidacy [2] Any open reviews at https://review.openstack.org/#/q/is:open+project:openstack/election have not been factored into these stats. [3] https://governance.openstack.org/resolutions/20141128-elections-process-for-leaderless-programs.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From gthiemonge at redhat.com Wed Aug 31 13:35:55 2022 From: gthiemonge at redhat.com (Gregory Thiemonge) Date: Wed, 31 Aug 2022 15:35:55 +0200 Subject: [election][Octavia] PTL candidacy for Antelope cycle Message-ID: Hi, I would like to announce my candidacy for Octavia PTL for the Antelope release. I have been PTL since the Xena cycle and I want to continue to contribute to this role. During the Antelope cycle we will focus on reviewing/merging some of the opened patches we still have in our backlog (we are a small team but we have been growing, we expect that we will be more efficient in merging patches). For the A cycle we also need to work on the migration to sqlalchemy 2, it will be a huge challenge for our team. Thanks for your consideration, Gregory Thiemonge (gthiemonge) -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Wed Aug 31 13:40:21 2022 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 31 Aug 2022 10:40:21 -0300 Subject: [cinder] Bug deputy report for week of 08-31-2022 Message-ID: This is a bug report from 08-24-2022 to 08-31-2022. Agenda: https://etherpad.opendev.org/p/cinder-bug-squad-meeting ----------------------------------------------------------------------------------------- Low - https://bugs.launchpad.net/cinder/+bug/1987539 "DeprecationWarning: Flags not at the start of the expression." Fix proposed to master. Cheers, Sofia -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Wed Aug 31 13:54:24 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Wed, 31 Aug 2022 10:54:24 -0300 Subject: [election][cloudkitty] PTL candidacy for Antelope Message-ID: Hi all, I would like to self-nominate for the role of PTL of CloudKitty for the Antelope release cycle. I have been PTL since the Victoria cycle and I am willing to continue in this role. I will keep working with existing contributors to maintain CloudKitty for their use cases and encourage the addition of new functionalities. -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.page at canonical.com Wed Aug 31 15:00:11 2022 From: james.page at canonical.com (James Page) Date: Wed, 31 Aug 2022 16:00:11 +0100 Subject: [charms] Nominate Luciano Giudice for charms-ceph core In-Reply-To: References: <5719a7c8-7739-4acd-2d3a-422615d23af1@canonical.com> Message-ID: Hi Chris On Sun, Aug 28, 2022 at 3:28 PM Alex Kavanagh wrote: > > > On Tue, 9 Aug 2022 at 14:39, Chris MacNaughton < > chris.macnaughton at canonical.com> wrote: > >> Hello all, >> >> I'd like to propose Luciano as a new Ceph charms core team member. He >> has contributed quality changes over the last year, and has been >> providing quality reviews for the Ceph charms. >> >> patches: >> https://review.opendev.org/q/owner:luciano.logiudice%2540canonical.com >> reviews: >> >> https://review.opendev.org/q/reviewedby:luciano.logiudice%2540canonical.com >> >> I hope you will join me in supporting Luciano. >> > > I think Luciano will make a great member of the Ceph charms core team and > thus it's a +1 from me. > +1 from me as well > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From frode.nordahl at canonical.com Wed Aug 31 15:02:49 2022 From: frode.nordahl at canonical.com (Frode Nordahl) Date: Wed, 31 Aug 2022 17:02:49 +0200 Subject: [charms] Nominate Luciano Giudice for charms-ceph core In-Reply-To: References: <5719a7c8-7739-4acd-2d3a-422615d23af1@canonical.com> Message-ID: On Wed, Aug 31, 2022 at 5:00 PM James Page wrote: > > Hi Chris > > On Sun, Aug 28, 2022 at 3:28 PM Alex Kavanagh wrote: >> >> >> >> On Tue, 9 Aug 2022 at 14:39, Chris MacNaughton wrote: >>> >>> Hello all, >>> >>> I'd like to propose Luciano as a new Ceph charms core team member. He >>> has contributed quality changes over the last year, and has been >>> providing quality reviews for the Ceph charms. >>> >>> patches: >>> https://review.opendev.org/q/owner:luciano.logiudice%2540canonical.com >>> reviews: >>> https://review.opendev.org/q/reviewedby:luciano.logiudice%2540canonical.com >>> >>> I hope you will join me in supporting Luciano. >> >> >> I think Luciano will make a great member of the Ceph charms core team and thus it's a +1 from me. > > > +1 from me as well Luciano has my vote of confidence too, +1 -- Frode Nordahl From eblock at nde.ag Wed Aug 31 15:25:21 2022 From: eblock at nde.ag (Eugen Block) Date: Wed, 31 Aug 2022 15:25:21 +0000 Subject: Cinder-volume active active setup In-Reply-To: <20220830110408.Horde.JzrsPeWuptf080t_8QXR1Sh@webmail.nde.ag> Message-ID: <20220831152521.Horde.J3_STEHgChGGhiQ5On3wH_1@webmail.nde.ag> I think I found my answers. Currently I only have a single-control node in my test lab but I'll redeploy it with three control nodes and test it with zookeeper. With a single control node the zookeeper and cinder cluster config seem to work. Zitat von Eugen Block : > Hi, > > I didn't mean to hijack the other thread so I'll start a new one. > There are some pages I found incl. Gorkas article [1], but I don't > really understand yet how to configure it. > > We don't use any of the automated deployments (we created our own) > like TripleO etc., is there any guide showing how to setup > cinder-volume active/active? I see in my lab environment that > python3-tooz is already installed on the control node, but how do I > use it? Besides the "cluster" config option in the cinder.conf (is > that defined when setting up the DLM?) what else is required? I also > found this thread [2] pointing to the source code, but that doesn't > really help me at this point. Any pointers to a how-to or deployment > guide would be highly appreciated! > > Thanks, > Eugen > > [1] https://gorka.eguileor.com/a-cinder-road-to-activeactive-ha/ > [2] https://www.mail-archive.com/openstack at lists.openstack.org/msg18385.html From allison at openinfra.dev Wed Aug 31 15:39:05 2022 From: allison at openinfra.dev (Allison Price) Date: Wed, 31 Aug 2022 08:39:05 -0700 Subject: Take the OpenStack User Survey! Deadline Today! Message-ID: <2A1209A2-8435-4F2F-BDCB-DF3A08A83A8F@openinfra.dev> Hi everyone, I wanted to remind you that today is the last day to complete your 2022 OpenStack User Survey. If you (or a customer) is operating an OpenStack environment, please take time to complete the deployment survey. If you have completed it before, you will just need to update your information, but your previous responses will auto-populate. Complete here: https://www.openstack.org/usersurvey Please let me know if you have any issues. Cheers, Allison -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Aug 31 15:39:14 2022 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 31 Aug 2022 17:39:14 +0200 Subject: [largescale-sig] Next meeting: August 31st, 15utc In-Reply-To: <2eab599e-33b4-995b-7ed4-2df73ae4abc0@openstack.org> References: <2eab599e-33b4-995b-7ed4-2df73ae4abc0@openstack.org> Message-ID: Hi everyone, Here is the summary of our SIG meeting today. We discussed our Sept 29 OpenInfra Live episode (featuring a Deep Dive on Schwarz Gruppe), as well as the completion of the transition of our documentation to docs.openstack.org, and a couple of questions on RabbitMQ recommendations and policies. You can read the meeting logs at: https://meetings.opendev.org/meetings/large_scale_sig/2022/large_scale_sig.2022-08-31-15.01.html Our next IRC meeting will be September 14, at 1500utc on #openstack-operators on OFTC. Regards, -- Thierry Carrez (ttx) From jean-francois.taltavull at elca.ch Wed Aug 31 16:54:43 2022 From: jean-francois.taltavull at elca.ch (=?utf-8?B?VGFsdGF2dWxsIEplYW4tRnJhbsOnb2lz?=) Date: Wed, 31 Aug 2022 16:54:43 +0000 Subject: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number In-Reply-To: References: <2aa77e24a33d48a69032f30b86e9cad8@elca.ch> <1b17c23f8982480db73cf50d04d51af7@elca.ch> Message-ID: <86f048d7931c4cc482f6785437c9b5ea@elca.ch> Thanks to your help, I am close to the goal. Dynamic pollster is loaded and triggered. But I get a ?Status[403] and reason [Forbidden]? in ceilometer logs while requesting admin/usage. I?m not sure to understand well the auth mechanism. Are we talking about keystone credentials, ec2 credentials, Rados GW user ?... For now, in testing phase, I use ?authentication_parameters?, not barbican. -JF From: Rafael Weing?rtner Sent: mardi, 30 ao?t 2022 14:17 To: Taltavull Jean-Fran?ois Cc: openstack-discuss Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. Yes, you will need to enable the metric/pollster to be processed. That is done via "polling.yml" file. Also, do not forget that you will need to configure Ceilometer to push this new metric. If you use Gnocchi as the backend, you will need to change/update the gnocchi resource YML file. That file maps resources and metrics in the Gnocchi backend. The configuration resides in Ceilometer. You can create/define new resource types and map them to specific metrics. It depends on how you structure your solution. P.S. You do not need to use "authentication_parameters". You can use the barbican integration to avoid setting your credentials in a file. On Tue, Aug 30, 2022 at 9:11 AM Taltavull Jean-Fran?ois > wrote: Hello, I tried to define a Rados GW dynamic pollster and I can see, in Ceilometer logs, that it?s actually loaded. But it looks like it was not triggered, I see no trace of ceilometer connection in Rados GW logs. My definition: - name: "dynamic.radosgw.usage" sample_type: "gauge" unit: "B" value_attribute: "total.size" url_path: http:///object-store/swift/v1/admin/usage module: "awsauth" authentication_object: "S3Auth" authentication_parameters: xxxxxxxxxxxxx,yyyyyyyyyyyyy, user_id_attribute: "admin" project_id_attribute: "admin" resource_id_attribute: "admin" response_entries_key: "summary" Do I have to set an option in ceilometer.conf, or elsewhere, to get my Rados GW dynamic pollster triggered ? -JF From: Taltavull Jean-Fran?ois Sent: lundi, 29 ao?t 2022 18:41 To: 'Rafael Weing?rtner' > Cc: openstack-discuss > Subject: RE: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number Thanks a lot for your quick answer, Rafael ! I will explore this approach. Jean-Francois From: Rafael Weing?rtner > Sent: lundi, 29 ao?t 2022 17:54 To: Taltavull Jean-Fran?ois > Cc: openstack-discuss > Subject: Re: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number EXTERNAL MESSAGE - This email comes from outside ELCA companies. You could use a different approach. You can use Dynamic pollster [1], and create your own mechanism to collect data, without needing to change Ceilometer code. Basically all hard-coded pollsters can be converted to a dynamic pollster that is defined in YML. [1] https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html#the-dynamic-pollsters-system-configuration-for-non-openstack-apis On Mon, Aug 29, 2022 at 12:51 PM Taltavull Jean-Fran?ois > wrote: Hi All, In our OpenStack deployment, API endpoints are defined by using URLs instead of port numbers and HAProxy forwards requests to the right bakend after having ACLed the URL. In the case of our object-store service, based on RadosGW, the internal API endpoint is "https:///object-store/swift/v1/AUTH_" When Ceilometer RadosGW pollster tries to connect to the RadosGW admin API with the object-store internal endpoint, the URL becomes https:///admin, as shown by HAProxy logs. This URL does not match any API endpoint from HAProxy point of view. The line of code that rewrites the URL is this one: https://opendev.org/openstack/ceilometer/src/branch/stable/wallaby/ceilometer/objectstore/rgw.py#L81 What would you think of adding a mechanism based on new Ceilometer configuration option(s) to control the URL rewriting ? Our deployment characteristics: - OpenStack release: Wallaby - Ceph and RadosGW version: 15.2.16 - deployment tool: OSA 23.2.1 and ceph-ansible Best regards, Jean-Francois -- Rafael Weing?rtner -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From park0kyung0won at dgist.ac.kr Wed Aug 31 17:25:28 2022 From: park0kyung0won at dgist.ac.kr (=?UTF-8?B?67CV6rK97JuQ?=) Date: Thu, 1 Sep 2022 02:25:28 +0900 (KST) Subject: [OVN] Error while creating a new instance due to port binding failure Message-ID: <1624663936.582567.1661966730487.JavaMail.root@mailwas1> An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Wed Aug 31 18:05:13 2022 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Wed, 31 Aug 2022 20:05:13 +0200 Subject: [election][openstack-ansible] PTL candidacy for Antelope cycle Message-ID: Hello. I want to self-nominate for the role of OpenStack-Ansible PTL for the Antelope release cycle. As always, my goal will be to constantly improve existing code, make deployments and further operations as reliable as possible. For the previous cycle we have reached some of previously defined goals, like adding a `service` role, finally implementing service_token_roles and support of CentOS 9 Stream, but there is still room for improvement for role testing. With the return of live events I will do my best to ensure project presence on them and get in touch with operators to gain feedback for project future improvement. -- Kind regards, Dmitriy Rabotyagov From elod.illes at est.tech Wed Aug 31 18:53:08 2022 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Wed, 31 Aug 2022 20:53:08 +0200 Subject: [QA][ironic] cycle-with-intermediary deliverables without any release Message-ID: Hi, Quick reminder that we'll need a release very soon for a number of deliverables following a cycle-with-intermediary release model but which have not done *any* release yet in the Zed cycle: [ironic*] bifrost ironic-prometheus-exporter ironic-python-agent-builder ironic-ui networking-baremetal networking-generic-switch [Quality Assurance] patrole Those should be released ASAP, and in all cases before September 15th, so that we have a release to include in the final Zed release. (* actually, we DID discuss this already with Ironic release liaisons [1], so this is just a formal reminder for them) [1] https://meetings.opendev.org/irclogs/%23openstack-release/%23openstack-release.2022-08-17.log.html#t2022-08-17T12:20:54 El?d Ill?s irc: elodilles From rafaelweingartner at gmail.com Wed Aug 31 18:55:18 2022 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Wed, 31 Aug 2022 15:55:18 -0300 Subject: [Ceilometer] Pollster cannot get RadosGW metrics when API endpoints are based on URL instead of port number In-Reply-To: <86f048d7931c4cc482f6785437c9b5ea@elca.ch> References: <2aa77e24a33d48a69032f30b86e9cad8@elca.ch> <1b17c23f8982480db73cf50d04d51af7@elca.ch> <86f048d7931c4cc482f6785437c9b5ea@elca.ch> Message-ID: It is the RGW user that you have. This user must have the role that is needed to access the usage feature in RGW. If I am not mistaken, it required an admin user. On Wed, Aug 31, 2022 at 1:54 PM Taltavull Jean-Fran?ois < jean-francois.taltavull at elca.ch> wrote: > Thanks to your help, I am close to the goal. Dynamic pollster is loaded > and triggered. > > > > But I get a ?Status[403] and reason [Forbidden]? in ceilometer logs while > requesting admin/usage. > > > > I?m not sure to understand well the auth mechanism. Are we talking about > keystone credentials, ec2 credentials, Rados GW user ?... > > > > For now, in testing phase, I use ?authentication_parameters?, not barbican. > > > > -JF > > > > *From:* Rafael Weing?rtner > *Sent:* mardi, 30 ao?t 2022 14:17 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > Yes, you will need to enable the metric/pollster to be processed. That is > done via "polling.yml" file. Also, do not forget that you will need to > configure Ceilometer to push this new metric. If you use Gnocchi as the > backend, you will need to change/update the gnocchi resource YML file. That > file maps resources and metrics in the Gnocchi backend. The configuration > resides in Ceilometer. You can create/define new resource types and map > them to specific metrics. It depends on how you structure your solution. > > P.S. You do not need to use "authentication_parameters". You can use the > barbican integration to avoid setting your credentials in a file. > > > > On Tue, Aug 30, 2022 at 9:11 AM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > Hello, > > > > I tried to define a Rados GW dynamic pollster and I can see, in Ceilometer > logs, that it?s actually loaded. But it looks like it was not triggered, I > see no trace of ceilometer connection in Rados GW logs. > > > > My definition: > > > > - name: "dynamic.radosgw.usage" > > sample_type: "gauge" > > unit: "B" > > value_attribute: "total.size" > > url_path: http:///object-store/swift/v1/admin/usage > > module: "awsauth" > > authentication_object: "S3Auth" > > authentication_parameters: xxxxxxxxxxxxx,yyyyyyyyyyyyy, > > user_id_attribute: "admin" > > project_id_attribute: "admin" > > resource_id_attribute: "admin" > > response_entries_key: "summary" > > > > Do I have to set an option in ceilometer.conf, or elsewhere, to get my > Rados GW dynamic pollster triggered ? > > > > -JF > > > > *From:* Taltavull Jean-Fran?ois > *Sent:* lundi, 29 ao?t 2022 18:41 > *To:* 'Rafael Weing?rtner' > *Cc:* openstack-discuss > *Subject:* RE: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > Thanks a lot for your quick answer, Rafael ! > > I will explore this approach. > > > > Jean-Francois > > > > *From:* Rafael Weing?rtner > *Sent:* lundi, 29 ao?t 2022 17:54 > *To:* Taltavull Jean-Fran?ois > *Cc:* openstack-discuss > *Subject:* Re: [Ceilometer] Pollster cannot get RadosGW metrics when API > endpoints are based on URL instead of port number > > > > > > *EXTERNAL MESSAGE *- This email comes from *outside ELCA companies*. > > You could use a different approach. You can use Dynamic pollster [1], and > create your own mechanism to collect data, without needing to change > Ceilometer code. Basically all hard-coded pollsters can be converted to a > dynamic pollster that is defined in YML. > > > > [1] > https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html#the-dynamic-pollsters-system-configuration-for-non-openstack-apis > > > > > > On Mon, Aug 29, 2022 at 12:51 PM Taltavull Jean-Fran?ois < > jean-francois.taltavull at elca.ch> wrote: > > Hi All, > > In our OpenStack deployment, API endpoints are defined by using URLs > instead of port numbers and HAProxy forwards requests to the right bakend > after having ACLed the URL. > > In the case of our object-store service, based on RadosGW, the internal > API endpoint is "https:///object-store/swift/v1/AUTH_" > > When Ceilometer RadosGW pollster tries to connect to the RadosGW admin API > with the object-store internal endpoint, the URL becomes > https:///admin, as shown by HAProxy logs. This URL does not match > any API endpoint from HAProxy point of view. The line of code that rewrites > the URL is this one: > https://opendev.org/openstack/ceilometer/src/branch/stable/wallaby/ceilometer/objectstore/rgw.py#L81 > > What would you think of adding a mechanism based on new Ceilometer > configuration option(s) to control the URL rewriting ? > > Our deployment characteristics: > - OpenStack release: Wallaby > - Ceph and RadosGW version: 15.2.16 > - deployment tool: OSA 23.2.1 and ceph-ansible > > > Best regards, > Jean-Francois > > > > -- > > Rafael Weing?rtner > > > > -- > > Rafael Weing?rtner > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Wed Aug 31 19:09:20 2022 From: satish.txt at gmail.com (Satish Patel) Date: Wed, 31 Aug 2022 15:09:20 -0400 Subject: [ovn-bgp-agent][neutron] - expose_tenant_networks bug In-Reply-To: References: <693D46D4-3DD7-4B93-BC90-571FEC2B6F4C@gmail.com> Message-ID: Hi Luis, Here are the requested things which you asked for. ### Versions pyroute2 = 0.7.2 openvswitch-switch = 2.17.0-0ubuntu1~cloud0 ovn = 22.03.0-0ubuntu1~cloud0 devstack master branch ### Rack-1-host-2 vagrant at rack-1-host-2:~$ ip rule 0: from all lookup local 1000: from all lookup [l3mdev-table] 32000: from all to 10.0.0.1/26 lookup br-ex 32000: from all to 172.16.1.144 lookup br-ex 32000: from all to 172.16.1.148 lookup br-ex 32766: from all lookup main 32767: from all lookup default vagrant at rack-1-host-2:~$ ip route show table br-ex default dev br-ex scope link 10.0.0.0/26 via 172.16.1.144 dev br-ex 172.16.1.144 dev br-ex scope link 172.16.1.148 dev br-ex scope link ### Rack-2-host-1 vagrant at rack-2-host-1:~$ ip rule 0: from all lookup local 1000: from all lookup [l3mdev-table] 32000: from all to 172.16.1.143 lookup br-ex 32766: from all lookup main 32767: from all lookup default vagrant at rack-2-host-1:~$ ip route show table br-ex default dev br-ex scope link 172.16.1.143 dev br-ex scope link #### I have quickly cloned the latest branch of ovn-bgp-agent and ran and found the following error. Assuming your patch is part of that master branch. rack-1-host-2: https://paste.opendev.org/show/bWbhmbzbi8YHGZsbhUAb/ Notes: This is bug or something else - https://opendev.org/x/ovn-bgp-agent/src/branch/master/ovn_bgp_agent/privileged/vtysh.py#L27 I have to replace the above Line:27 code of vtysh to the following to fix the vtysh error. @ovn_bgp_agent.privileged.vtysh_cmd.entrypoint def run_vtysh_config(frr_config_file): vtysh_command = "copy {} running-config".format(frr_config_file) full_args = ['/usr/bin/vtysh', '--vty_socket', constants.FRR_SOCKET_PATH, 'c'] full_args.extend(vtysh_command.split(' ')) On Wed, Aug 31, 2022 at 3:51 AM Luis Tomas Bolivar wrote: > > > On Wed, Aug 31, 2022 at 9:12 AM Luis Tomas Bolivar > wrote: > >> See below >> >> >> On Tue, Aug 30, 2022 at 10:14 PM Satish Patel >> wrote: >> >>> Hi Luis, >>> >>> I have redeploy my lab and i have following components >>> >>> rack-1-host-1 - controller >>> rack-1-host-2 - compute1 >>> rack-2-host-1 - compute2 >>> >>> >>> # I am running ovn-bgp-agent on only two compute nodes compute1 and >>> compute2 >>> [DEFAULT] >>> debug=False >>> expose_tenant_networks=True >>> driver=ovn_bgp_driver >>> reconcile_interval=120 >>> ovsdb_connection=unix:/var/run/openvswitch/db.sock >>> >>> ### without any VM at present i can see only router gateway IP on >>> rack1-host-2 >>> >> >> Yep, this is what is expected at this point. >> >> >>> >>> vagrant at rack-1-host-2:~$ ip a show ovn >>> 37: ovn: mtu 1500 qdisc noqueue master >>> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >>> link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff >>> inet 172.16.1.144/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet6 fe80::8f7:6eff:fee0:1969/64 scope link >>> valid_lft forever preferred_lft forever >>> >>> >>> vagrant at rack-2-host-1:~$ ip a show ovn >>> 15: ovn: mtu 1500 qdisc noqueue master >>> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >>> link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff >>> inet6 fe80::5461:6bff:fe29:ac29/64 scope link >>> valid_lft forever preferred_lft forever >>> >>> >>> ### Lets create vm1 which is endup on rack1-host-2 but it didn't expose >>> vm1 ip (tenant ip) same with rack-2-host-1 >>> >>> vagrant at rack-1-host-2:~$ ip a show ovn >>> 37: ovn: mtu 1500 qdisc noqueue master >>> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >>> link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff >>> inet 172.16.1.144/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet6 fe80::8f7:6eff:fee0:1969/64 scope link >>> valid_lft forever preferred_lft forever >>> >> >> It should be exposed here, what about the output of "ip rule" and "ip >> route show table br-ex"? >> >> >>> >>> vagrant at rack-2-host-1:~$ ip a show ovn >>> 15: ovn: mtu 1500 qdisc noqueue master >>> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >>> link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff >>> inet6 fe80::5461:6bff:fe29:ac29/64 scope link >>> valid_lft forever preferred_lft forever >>> >>> >>> ### Lets attach a floating ip to vm1 and see. now i can see 10.0.0.17 >>> vm1 ip got expose on rack-1-host-2 same time nothing on rack-2-host-1 ( ofc >>> because no vm running on it) >>> >>> vagrant at rack-1-host-2:~$ ip a show ovn >>> 37: ovn: mtu 1500 qdisc noqueue master >>> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >>> link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff >>> inet 172.16.1.144/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet 10.0.0.17/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet 172.16.1.148/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet6 fe80::8f7:6eff:fee0:1969/64 scope link >>> valid_lft forever preferred_lft forever >>> >> >> There is also a resync action happening every 120 seconds... Perhaps for >> some reason the initial addition of 10.0.0.17 failed and then the sync >> discovered it and added it (and it matched with the time you added the FIP >> more or less). >> >> But events are managed one by one and those 2 are different, so adding >> the FIP is not adding the internal IP. It was probably a sync action. >> >> >>> >>> vagrant at rack-2-host-1:~$ ip a show ovn >>> 15: ovn: mtu 1500 qdisc noqueue master >>> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >>> link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff >>> inet6 fe80::5461:6bff:fe29:ac29/64 scope link >>> valid_lft forever preferred_lft forever >>> >>> >>> #### Lets spin up vm2 which should end up on other compute node which is >>> rack-2-host-1 ( no change yet.. vm2 ip wasn't exposed anywhere yet. ) >>> >>> vagrant at rack-1-host-2:~$ ip a show ovn >>> 37: ovn: mtu 1500 qdisc noqueue master >>> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >>> link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff >>> inet 172.16.1.144/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet 10.0.0.17/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet 172.16.1.148/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet6 fe80::8f7:6eff:fee0:1969/64 scope link >>> valid_lft forever preferred_lft forever >>> >>> >>> vagrant at rack-2-host-1:~$ ip a show ovn >>> 15: ovn: mtu 1500 qdisc noqueue master >>> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >>> link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff >>> inet6 fe80::5461:6bff:fe29:ac29/64 scope link >>> valid_lft forever preferred_lft forever >>> >>> >>> #### Lets again attach floating ip to vm2 ( so far nothing changed, >>> technically it should expose IP on rack-1-host-2 ) >>> >>> vagrant at rack-1-host-2:~$ ip a show ovn >>> 37: ovn: mtu 1500 qdisc noqueue master >>> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >>> link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff >>> inet 172.16.1.144/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet 10.0.0.17/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet 172.16.1.148/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet6 fe80::8f7:6eff:fee0:1969/64 scope link >>> valid_lft forever preferred_lft forever >>> >>> The IP of the second VM should be exposed here ^, in rack-1-host-2, >>> while the FIP in the other compute (rack-2-host-1) >>> >> >> >>> vagrant at rack-2-host-1:~$ ip a show ovn >>> 15: ovn: mtu 1500 qdisc noqueue master >>> ovn-bgp-vrf state UNKNOWN group default qlen 1000 >>> link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff >>> inet 172.16.1.143/32 scope global ovn >>> valid_lft forever preferred_lft forever >>> inet6 fe80::5461:6bff:fe29:ac29/64 scope link >>> valid_lft forever preferred_lft forever >>> >>> >>> Here is the logs - https://paste.opendev.org/show/bRThivJE4wvEN92DXJUo/ >>> >> >> What node these logs belong to? rack-1-host-2? >> >> And are you running with the latest code? Looks the problem is on the >> sync function when trying to ensure the routing table entry for br-ex. It >> prints this: >> >> 2022-08-30 20:12:54.541 8318 DEBUG ovn_bgp_agent.utils.linux_net [-] Found routing table for br-ex with: ['200', 'br-ex'] >> >> So definitely ovn_routing_tables should be initialized with {'br-ex': >> 200}, so I don't really get where the KeyError comes from... >> >> Unless it is not accessing the dict, but accessing the ndb.routes... >> perhaps with the pyroute2 version you have, the family parameter is needed >> there. Let me send a patch that you can try with >> > > This is the patch https://review.opendev.org/c/x/ovn-bgp-agent/+/855062. > Give it a try and let me know if the error you are seeing in the logs goes > away with it > > >> >>> On Thu, Aug 25, 2022 at 6:25 AM Luis Tomas Bolivar >>> wrote: >>> >>>> >>>> >>>> On Thu, Aug 25, 2022 at 11:31 AM Satish Patel >>>> wrote: >>>> >>>>> Hi Luis, >>>>> >>>>> Very interesting, you are saying it will only expose tenant ip on >>>>> gateway port node? Even we have DVR setup in cluster correct? >>>>> >>>> >>>> Almost. The path is the same as in a DVR setup without BGP (with the >>>> difference you can reach the internal IP). In a DVR setup, when the VM is >>>> in a tenant network, without a FIP, the traffic goes out through the cr-lrp >>>> (ovn router gateway port), i.e., the node hosting that port which is >>>> connecting the router where the subnet where the VM is to the provider >>>> network. >>>> >>>> Note this is a limitation due to how ovn is used in openstack neutron, >>>> where traffic needs to be injected into OVN overlay in the node holding the >>>> cr-lrp. We are investigating possible ways to overcome this limitation and >>>> expose the IP right away in the node hosting the VM. >>>> >>>> >>>>> Does gateway node going to expose ip for all other compute nodes? >>>>> >>>> >>>>> What if I have multiple gateway node? >>>>> >>>> >>>> No, each router connected to the provider network will have its own ovn >>>> router gateway port, and that can be allocated in any node which has >>>> "enable-chassis-as-gw". What is true is that all VMs in a tenant networks >>>> connected to the same router, will be exposed in the same location . >>>> >>>> >>>>> Did you configure that flag on all node or just gateway node? >>>>> >>>> >>>> I usually deploy with 3 controllers which are also my "networker" >>>> nodes, so those are the ones having the enable-chassis-as-gw flag. >>>> >>>> >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Aug 25, 2022, at 4:14 AM, Luis Tomas Bolivar >>>>> wrote: >>>>> >>>>> ? >>>>> I tested it locally and it is exposing the IP properly in the node >>>>> where the ovn router gateway port is allocated. Could you double check if >>>>> that is the case in your setup too? >>>>> >>>>> On Wed, Aug 24, 2022 at 8:58 AM Luis Tomas Bolivar < >>>>> ltomasbo at redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Aug 23, 2022 at 6:04 PM Satish Patel >>>>>> wrote: >>>>>> >>>>>>> Folks, >>>>>>> >>>>>>> I am setting up ovn-bgp-agent lab in "BGP mode" and i found >>>>>>> everything working great except expose tenant network >>>>>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ >>>>>>> >>>>>>> >>>>>>> Lab Summary: >>>>>>> >>>>>>> 1 controller node >>>>>>> 3 compute node >>>>>>> >>>>>>> ovn-bgp-agent running on all compute node because i am using >>>>>>> "enable_distributed_floating_ip=True" >>>>>>> >>>>>> >>>>>>> ovn-bgp-agent config: >>>>>>> >>>>>>> [DEFAULT] >>>>>>> debug=False >>>>>>> expose_tenant_networks=True >>>>>>> driver=ovn_bgp_driver >>>>>>> reconcile_interval=120 >>>>>>> ovsdb_connection=unix:/var/run/openvswitch/db.sock >>>>>>> >>>>>>> I am not seeing my vm on tenant ip getting exposed but when i attach >>>>>>> FIP which gets exposed in loopback address. here is the full trace of debug >>>>>>> logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/ >>>>>>> >>>>>> >>>>>> It is not exposed in any node, right? Note when expose_tenant_network >>>>>> is enabled, the traffic to the tenant VM is exposed in the node holding the >>>>>> cr-lrp (ovn router gateway port) for the router connecting the tenant >>>>>> network to the provider one. >>>>>> >>>>>> The FIP will be exposed in the node where the VM is. >>>>>> >>>>>> On the other hand, the error you see there should not happen, so I'll >>>>>> investigate why that is and also double check if the expose_tenant_network >>>>>> flag is broken somehow. >>>>>> >>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> -- >>>>>> LUIS TOM?S BOL?VAR >>>>>> Principal Software Engineer >>>>>> Red Hat >>>>>> Madrid, Spain >>>>>> ltomasbo at redhat.com >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> LUIS TOM?S BOL?VAR >>>>> Principal Software Engineer >>>>> Red Hat >>>>> Madrid, Spain >>>>> ltomasbo at redhat.com >>>>> >>>>> >>>>> >>>> >>>> -- >>>> LUIS TOM?S BOL?VAR >>>> Principal Software Engineer >>>> Red Hat >>>> Madrid, Spain >>>> ltomasbo at redhat.com >>>> >>>> >>> >> >> -- >> LUIS TOM?S BOL?VAR >> Principal Software Engineer >> Red Hat >> Madrid, Spain >> ltomasbo at redhat.com >> >> > > > -- > LUIS TOM?S BOL?VAR > Principal Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Wed Aug 31 19:23:42 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Thu, 1 Sep 2022 00:53:42 +0530 Subject: [cinder] Antelope PTG Planning Message-ID: Hi All, The virtual PTG for Antelope/2023.1 cycle will be held between 17-21 October 2022. The proposed schedule for cinder is as follows: Dates: Tuesday (18th October) to Friday (21st October) 2022 Time: 1300 to 1700 UTC Etherpad: https://etherpad.opendev.org/p/antelope-ptg-cinder-planning Please add topics to the etherpad as early as possible since that will help arrange the topics day wise in the actual PTG etherpad. Thanks and regards Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From fpantano at redhat.com Wed Aug 31 19:27:34 2022 From: fpantano at redhat.com (Francesco Pantano) Date: Wed, 31 Aug 2022 21:27:34 +0200 Subject: [election][manila] PTL Candidacy for Antelope cycle In-Reply-To: References: Message-ID: Thank you Carlos for all the work you did so far, and big +1 here !! On Tue, Aug 30, 2022 at 11:14 PM Carlos Silva wrote: > Greetings, Zorillas and interested stackers, > > I would like to announce my candidacy to be the Manila PTL during the > Antelope > cycle. > > I have been the PTL for Manila since the Zed cycle, and have been > contributing > to OpenStack since the Stein release. It has been an awesome experience. > > Over the Zed cycle my focus was to continue mentoring and adding new core > reviewers to the Manila repositories; Pursuing feature parity and complete > support for manila in OSC (also increasing our functional tests coverage), > tackling the tech debt; enhancing the documentation to help third party > drivers > to set up their CI systems; promoting events to gather the community > members > and getting bugs fixed and implementations moving faster. > > I am happy with the progress we made in those areas, but I still think > there is > room for improvement. During the Antelope cycle, I would like to focus on: > > - Continue our efforts to mentor contributors and increase the amount of > active > reviewers through engaging them in the community, teaching the OpenStack > way > and making it collaboratively as we did in the past cycles, promoting > hackathons, > bug squashes and collaborative review sessions. > > - Getting more features already available in the Manila core to Manila UI > and get > more attention to our changes to OpenStack SDK; > > - Continue pushing Manila to cover the tech debt areas we identified over > the > past cycles; > > - Enforcing maintainers to collaboratively cover the lack of documentation > we > have on third party CI setups for Manila, helping potential new vendors > to > quickly setup up their CI systems; > > Thank you for your consideration! > > Carlos da Silva > IRC: carloss > -- Francesco Pantano GPG KEY: F41BD75C -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Wed Aug 31 19:28:27 2022 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Thu, 1 Sep 2022 00:58:27 +0530 Subject: [cinder] Extending Feature Freeze Deadline by 1 week Message-ID: Hi All, As discussed in the cinder meeting today, due to the abundance of features proposed[1] and the shortage of review bandwidth, we are planning to extend the feature freeze by one week. Current feature freeze date (for cinder): 01 September 2022 New feature freeze date (for cinder): 08 September 2022 Kindly let me know if there are any objections to the above proposal else all the features proposed are eligible for the feature freeze exception of 1 week. [1] https://etherpad.opendev.org/p/cinder-zed-features Thanks and regards Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.matulis at canonical.com Wed Aug 31 19:53:24 2022 From: peter.matulis at canonical.com (Peter Matulis) Date: Wed, 31 Aug 2022 15:53:24 -0400 Subject: [charms] Team Delegation proposal In-Reply-To: References: Message-ID: On Mon, Aug 8, 2022 at 4:25 PM Alex Kavanagh wrote: > Hi Chris > > On Thu, 28 Jul 2022 at 21:46, Chris MacNaughton < > chris.macnaughton at canonical.com> wrote: > >> Hello All, >> >> >> I would like to propose some new ACLs in Gerrit for the openstack-charms >> project: >> >> - openstack-core-charms >> - ceph-charms >> - network-charms >> - stable-maintenance >> >> > > I think the names need to be tweaked slightly: > > - charms-openstack > - charms-ceph > - charms-ovn > - charms-maintenance > We would also need an ACL for the documentation: - charms-docs -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Aug 31 20:07:39 2022 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 01 Sep 2022 01:37:39 +0530 Subject: [all][tc] Technical Committee next weekly meeting on 2022 Sept 1 at 1500 UTC Message-ID: <182f58439aa.e75ea1a5456644.8676429994105243864@ghanshyammann.com> Hello Everyone, Below is the agenda for Tomorrow's TC meeting schedule at 1500 UTC. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting * Roll call * Follow up on past action items * Gate health check ** Bare 'recheck' state *** https://etherpad.opendev.org/p/recheck-weekly-summary * Checks on Zed cycle tracker ** https://etherpad.opendev.org/p/tc-zed-tracker * 2023.1 cycle PTG Planning ** TC + Leaders interaction sessions *** https://etherpad.opendev.org/p/tc-leaders-interaction-2023-1 ** TC PTG etherpad *** https://etherpad.opendev.org/p/tc-2023-1-ptg ** Schedule 'operator hours' as a separate slot in PTG(avoiding conflicts among other projects 'operator hours') *** https://twitter.com/osopsmeetup/status/1561708283492196353 *** https://doodle.com/meeting/participate/id/bD9kR2yd * 2023.1 cycle Technical Election ** https://governance.openstack.org/election/ * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open -gmann From ricolin at ricolky.com Wed Aug 31 20:31:58 2022 From: ricolin at ricolky.com (Rico Lin) Date: Thu, 1 Sep 2022 04:31:58 +0800 Subject: [heat][elections] PTL Candidacy for Antelope Cycle In-Reply-To: References: Message-ID: +1 Thanks Brendan :) *Rico Lin* On Tue, Aug 30, 2022 at 2:28 PM Brendan Shephard wrote: > Hi all, > > I first wanted to thank Rico for his ongoing commitment to the project > over the last cycles. He has provided lots of guidance and help to the Heat > project for a long time and his contribution deserves recognition. > > I am proposing my candidacy for Heat PTL during the Antelope cycle. > > I have worked with the Heat project for several years both as a user and > more > recently over the last 2 years as a contributor. I see great potential in > the > project for our users and look forward to continuing work in order to > support features and functionality of the project. > > Some of my objectives for the next few cycles are: > > > Remove the dependencies on legacy python-*client libraries and instead > shift to the openstacksdk client library. > > While the legacy libraries have served us well, they are starting to > show their limitations and the delta in servicibilty will only increase > as each project moves towards leveraging the openstacksdk. So this change, > while quite extensive will ensure future compatibility with the other > OpenStack project teams. > > > Continue ensuring Heat supports the most up-to-date and recent features > provided by each project. > > To ensure Heat is the default and best choice for our users, we need to > ensure we are able to leverage the latest available features from the > complimentary OpenStack projects. This is an ongoing challenge to stay > up-to-date with the changes each cycle and work towards implementing them > in Heat. > > Thank you all for you consideration, and I look forward to the next cycle > and continuing to work with you all. > > > Regards, > Brendan > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: